[PATCH AUTOSEL 6.2 01/21] ARM: OMAP2+: omap4-common: Fix refcount leak bug

List overview All Threads
Download

newer

older

Re: [Android 5.10] kernel BUG in...

Odlitky-pořádek

Sasha Levin

26 Feb 2023 26 Feb '23

3:41 a.m.

From: Liang He windhl@126.com

[ Upstream commit 7c32919a378782c95c72bc028b5c30dfe8c11f82 ]

In omap4_sram_init(), of_find_compatible_node() will return a node pointer with refcount incremented. We should use of_node_put() when it is not used anymore.

Signed-off-by: Liang He windhl@126.com Message-Id: 20220628112939.160737-1-windhl@126.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-omap2/omap4-common.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index 6d1eb4eefefe5..d9ed2a5dcd5ef 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -140,6 +140,7 @@ static int __init omap4_sram_init(void) __func__); else sram_sync = (void __iomem *)gen_pool_alloc(sram_pool, PAGE_SIZE); + of_node_put(np);

return 0; }

-- 2.39.0

Show replies by date

Sasha Levin

26 Feb 26 Feb

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 02/21] arm64: dts: qcom: msm8996: Add additional A2NoC clocks

From: Konrad Dybcio konrad.dybcio@linaro.org

[ Upstream commit 67fb53745e0b38275fa0b422b6a3c6c1c028c9a2 ]

On eMMC devices, the UFS clocks aren't started in the bootloader (or well, at least it should not be, as that would just leak power..), which results in platform reboots when trying to access the unclocked UFS hardware, which unfortunately happens on each and every boot, as interconnect calls sync_state and goes over each and every path.

Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org #db820c Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20221210200353.418391-6-konrad.dybcio@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/msm8996.dtsi | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index d31464204f696..e51a75d15cb29 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -830,9 +830,11 @@ a2noc: interconnect@583000 { compatible = "qcom,msm8996-a2noc"; reg = <0x00583000 0x7000>; #interconnect-cells = <1>; - clock-names = "bus", "bus_a"; + clock-names = "bus", "bus_a", "aggre2_ufs_axi", "ufs_axi"; clocks = <&rpmcc RPM_SMD_AGGR2_NOC_CLK>, - <&rpmcc RPM_SMD_AGGR2_NOC_A_CLK>; + <&rpmcc RPM_SMD_AGGR2_NOC_A_CLK>, + <&gcc GCC_AGGRE2_UFS_AXI_CLK>, + <&gcc GCC_UFS_AXI_CLK>; };

mnoc: interconnect@5a4000 {

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 03/21] udf: Define EFSCORRUPTED error code

From: Jan Kara jack@suse.cz

[ Upstream commit 3d2d7e61553dbcc8ba45201d8ae4f383742c8202 ]

Similarly to other filesystems define EFSCORRUPTED error code for reporting internal filesystem corruption.

Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/udf/udf_sb.h | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h index 291b56dd011ee..6bccff3c70f54 100644 --- a/fs/udf/udf_sb.h +++ b/fs/udf/udf_sb.h @@ -55,6 +55,8 @@ #define MF_DUPLICATE_MD 0x01 #define MF_MIRROR_FE_LOADED 0x02

+#define EFSCORRUPTED EUCLEAN + struct udf_meta_data { __u32 s_meta_file_loc; __u32 s_mirror_file_loc;

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 04/21] context_tracking: Fix noinstr vs KASAN

From: Peter Zijlstra peterz@infradead.org

[ Upstream commit 0e26e1de0032779e43929174339429c16307a299 ]

Low level noinstr context-tracking code is calling out to instrumented code on KASAN:

vmlinux.o: warning: objtool: __ct_user_enter+0x72: call to __kasan_check_write() leaves .noinstr.text section vmlinux.o: warning: objtool: __ct_user_exit+0x47: call to __kasan_check_write() leaves .noinstr.text section

Use even lower level atomic methods to avoid the instrumentation.

Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20230112195542.458034262@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/context_tracking.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 77978e3723771..a09f1c19336ae 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -510,7 +510,7 @@ void noinstr __ct_user_enter(enum ctx_state state) * In this we case we don't care about any concurrency/ordering. */ if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) - atomic_set(&ct->state, state); + arch_atomic_set(&ct->state, state); } else { /* * Even if context tracking is disabled on this CPU, because it's outside @@ -527,7 +527,7 @@ void noinstr __ct_user_enter(enum ctx_state state) */ if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) { /* Tracking for vtime only, no concurrent RCU EQS accounting */ - atomic_set(&ct->state, state); + arch_atomic_set(&ct->state, state); } else { /* * Tracking for vtime and RCU EQS. Make sure we don't race @@ -535,7 +535,7 @@ void noinstr __ct_user_enter(enum ctx_state state) * RCU only requires RCU_DYNTICKS_IDX increments to be fully * ordered. */ - atomic_add(state, &ct->state); + arch_atomic_add(state, &ct->state); } } } @@ -630,12 +630,12 @@ void noinstr __ct_user_exit(enum ctx_state state) * In this we case we don't care about any concurrency/ordering. */ if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) - atomic_set(&ct->state, CONTEXT_KERNEL); + arch_atomic_set(&ct->state, CONTEXT_KERNEL);

} else { if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) { /* Tracking for vtime only, no concurrent RCU EQS accounting */ - atomic_set(&ct->state, CONTEXT_KERNEL); + arch_atomic_set(&ct->state, CONTEXT_KERNEL); } else { /* * Tracking for vtime and RCU EQS. Make sure we don't race @@ -643,7 +643,7 @@ void noinstr __ct_user_exit(enum ctx_state state) * RCU only requires RCU_DYNTICKS_IDX increments to be fully * ordered. */ - atomic_sub(state, &ct->state); + arch_atomic_sub(state, &ct->state); } } }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 05/21] exit: Detect and fix irq disabled state in oops

From: Nicholas Piggin npiggin@gmail.com

[ Upstream commit 001c28e57187570e4b5aa4492c7a957fb6d65d7b ]

If a task oopses with irqs disabled, this can cause various cascading problems in the oops path such as sleep-from-invalid warnings, and potentially worse.

Since commit 0258b5fd7c712 ("coredump: Limit coredumps to a single thread group"), the unconditional irq enable in coredump_task_exit() will "fix" the irq state to be enabled early in do_exit(), so currently this may not be triggerable, but that is coincidental and fragile.

Detect and fix the irqs_disabled() condition in the oops path before calling do_exit(), similarly to the way in_atomic() is handled.

Reported-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: "Eric W. Biederman" ebiederm@xmission.com Link: https://lore.kernel.org/lkml/20221004094401.708299-1-npiggin@gmail.com/ Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/exit.c | 7 +++++++ 1 file changed, 7 insertions(+)

diff --git a/kernel/exit.c b/kernel/exit.c index 15dc2ec80c467..bccfa4218356e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -807,6 +807,8 @@ void __noreturn do_exit(long code) struct task_struct *tsk = current; int group_dead;

+ WARN_ON(irqs_disabled()); + synchronize_group_exit(tsk, code);

WARN_ON(tsk->plug); @@ -938,6 +940,11 @@ void __noreturn make_task_dead(int signr) if (unlikely(!tsk->pid)) panic("Attempted to kill the idle task!");

+ if (unlikely(irqs_disabled())) { + pr_info("note: %s[%d] exited with irqs disabled\n", + current->comm, task_pid_nr(current)); + local_irq_enable(); + } if (unlikely(in_atomic())) { pr_info("note: %s[%d] exited with preempt_count %d\n", current->comm, task_pid_nr(current),

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 06/21] ARM: dts: exynos: Use Exynos5420 compatible for the MIPI video phy

From: Markuss Broks markuss.broks@gmail.com

[ Upstream commit 5d5aa219a790d61cad2c38e1aa32058f16ad2f0b ]

For some reason, the driver adding support for Exynos5420 MIPI phy back in 2016 wasn't used on Exynos5420, which caused a kernel panic. Add the proper compatible for it.

Signed-off-by: Markuss Broks markuss.broks@gmail.com Link: https://lore.kernel.org/r/20230121201844.46872-2-markuss.broks@gmail.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/exynos5420.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi index 9f2523a873d9d..62263eb91b3cc 100644 --- a/arch/arm/boot/dts/exynos5420.dtsi +++ b/arch/arm/boot/dts/exynos5420.dtsi @@ -592,7 +592,7 @@ dp_phy: dp-video-phy { };

mipi_phy: mipi-video-phy { - compatible = "samsung,s5pv210-mipi-video-phy"; + compatible = "samsung,exynos5420-mipi-video-phy"; syscon = <&pmu_system_controller>; #phy-cells = <1>; };

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 07/21] fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected

From: Jann Horn jannh@google.com

[ Upstream commit 47d586913f2abec4d240bae33417f537fda987ec ]

Currently, filp_close() and generic_shutdown_super() use printk() to log messages when bugs are detected. This is problematic because infrastructure like syzkaller has no idea that this message indicates a bug. In addition, some people explicitly want their kernels to BUG() when kernel data corruption has been detected (CONFIG_BUG_ON_DATA_CORRUPTION). And finally, when generic_shutdown_super() detects remaining inodes on a system without CONFIG_BUG_ON_DATA_CORRUPTION, it would be nice if later accesses to a busy inode would at least crash somewhat cleanly rather than walking through freed memory.

To address all three, use CHECK_DATA_CORRUPTION() when kernel bugs are detected.

Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/open.c | 5 +++-- fs/super.c | 21 +++++++++++++++++---- include/linux/poison.h | 3 +++ 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/open.c b/fs/open.c index 82c1a28b33089..ceb88ac0ca3b2 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1411,8 +1411,9 @@ int filp_close(struct file *filp, fl_owner_t id) { int retval = 0;

- if (!file_count(filp)) { - printk(KERN_ERR "VFS: Close: file count is 0\n"); + if (CHECK_DATA_CORRUPTION(file_count(filp) == 0, + "VFS: Close: file count is 0 (f_op=%ps)", + filp->f_op)) { return 0; }

diff --git a/fs/super.c b/fs/super.c index 12c08cb20405d..cf737ec2bd05c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -491,10 +491,23 @@ void generic_shutdown_super(struct super_block *sb) if (sop->put_super) sop->put_super(sb);

- if (!list_empty(&sb->s_inodes)) { - printk("VFS: Busy inodes after unmount of %s. " - "Self-destruct in 5 seconds. Have a nice day...\n", - sb->s_id); + if (CHECK_DATA_CORRUPTION(!list_empty(&sb->s_inodes), + "VFS: Busy inodes after unmount of %s (%s)", + sb->s_id, sb->s_type->name)) { + /* + * Adding a proper bailout path here would be hard, but + * we can at least make it more likely that a later + * iput_final() or such crashes cleanly. + */ + struct inode *inode; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + inode->i_op = VFS_PTR_POISON; + inode->i_sb = VFS_PTR_POISON; + inode->i_mapping = VFS_PTR_POISON; + } + spin_unlock(&sb->s_inode_list_lock); } } spin_lock(&sb_lock); diff --git a/include/linux/poison.h b/include/linux/poison.h index 2d3249eb0e62d..0e8a1f2ceb2f1 100644 --- a/include/linux/poison.h +++ b/include/linux/poison.h @@ -84,4 +84,7 @@ /********** kernel/bpf/ **********/ #define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))

+/********** VFS **********/ +#define VFS_PTR_POISON ((void *)(0xF5 + POISON_POINTER_DELTA)) + #endif

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 08/21] blk-iocost: fix divide by 0 error in calc_lcoefs()

From: Li Nan linan122@huawei.com

[ Upstream commit 984af1e66b4126cf145153661cc24c213e2ec231 ]

echo max of u64 to cost.model can cause divide by 0 error.

# echo 8:0 rbps=18446744073709551615 > /sys/fs/cgroup/io.cost.model

divide error: 0000 [#1] PREEMPT SMP RIP: 0010:calc_lcoefs+0x4c/0xc0 Call Trace: <TASK> ioc_refresh_params+0x2b3/0x4f0 ioc_cost_model_write+0x3cb/0x4c0 ? _copy_from_iter+0x6d/0x6c0 ? kernfs_fop_write_iter+0xfc/0x270 cgroup_file_write+0xa0/0x200 kernfs_fop_write_iter+0x17d/0x270 vfs_write+0x414/0x620 ksys_write+0x73/0x160 __x64_sys_write+0x1e/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd

calc_lcoefs() uses the input value of cost.model in DIV_ROUND_UP_ULL, overflow would happen if bps plus IOC_PAGE_SIZE is greater than ULLONG_MAX, it can cause divide by 0 error.

Fix the problem by setting basecost

Signed-off-by: Li Nan linan122@huawei.com Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20230117070806.3857142-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-iocost.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 6955605629e4f..ec7219caea165 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -866,9 +866,14 @@ static void calc_lcoefs(u64 bps, u64 seqiops, u64 randiops,

*page = *seqio = *randio = 0;

- if (bps) - *page = DIV64_U64_ROUND_UP(VTIME_PER_SEC, - DIV_ROUND_UP_ULL(bps, IOC_PAGE_SIZE)); + if (bps) { + u64 bps_pages = DIV_ROUND_UP_ULL(bps, IOC_PAGE_SIZE); + + if (bps_pages) + *page = DIV64_U64_ROUND_UP(VTIME_PER_SEC, bps_pages); + else + *page = 1; + }

if (seqiops) { v = DIV64_U64_ROUND_UP(VTIME_PER_SEC, seqiops);

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 09/21] blk-cgroup: dropping parent refcount after pd_free_fn() is done

From: Yu Kuai yukuai3@huawei.com

[ Upstream commit c7241babf0855d8a6180cd1743ff0ec34de40b4e ]

Some cgroup policies will access parent pd through child pd even after pd_offline_fn() is done. If pd_free_fn() for parent is called before child, then UAF can be triggered. Hence it's better to guarantee the order of pd_free_fn().

Currently refcount of parent blkg is dropped in __blkg_release(), which is before pd_free_fn() is called in blkg_free_work_fn() while blkg_free_work_fn() is called asynchronously.

This patch make sure pd_free_fn() called from removing cgroup is ordered by delaying dropping parent refcount after calling pd_free_fn() for child.

BTW, pd_free_fn() will also be called from blkcg_deactivate_policy() from deleting device, and following patches will guarantee the order.

Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20230119110350.2287325-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 9ac1efb053e08..aa890e3e4e509 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -124,6 +124,8 @@ static void blkg_free_workfn(struct work_struct *work) if (blkg->pd[i]) blkcg_policy[i]->pd_free_fn(blkg->pd[i]);

+ if (blkg->parent) + blkg_put(blkg->parent); if (blkg->q) blk_put_queue(blkg->q); free_percpu(blkg->iostat_cpu); @@ -158,8 +160,6 @@ static void __blkg_release(struct rcu_head *rcu)

/* release the blkcg and parent blkg refs this blkg has been holding */ css_put(&blkg->blkcg->css); - if (blkg->parent) - blkg_put(blkg->parent); blkg_free(blkg); }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 10/21] blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()

From: Yu Kuai yukuai3@huawei.com

[ Upstream commit f1c006f1c6850c14040f8337753a63119bba39b9 ]

Currently parent pd can be freed before child pd:

t1: remove cgroup C1 blkcg_destroy_blkgs blkg_destroy list_del_init(&blkg->q_node) // remove blkg from queue list percpu_ref_kill(&blkg->refcnt) blkg_release call_rcu

t2: from t1 __blkg_release blkg_free schedule_work t4: deactivate policy blkcg_deactivate_policy pd_free_fn // parent of C1 is freed first t3: from t2 blkg_free_workfn pd_free_fn

If policy(for example, ioc_timer_fn() from iocost) access parent pd from child pd after pd_offline_fn(), then UAF can be triggered.

Fix the problem by delaying 'list_del_init(&blkg->q_node)' from blkg_destroy() to blkg_free_workfn(), and using a new disk level mutex to synchronize blkg_free_workfn() and blkcg_deactivate_policy().

Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Tejun Heo tj@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20230119110350.2287325-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 35 +++++++++++++++++++++++++++++------ include/linux/blkdev.h | 1 + 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index aa890e3e4e509..45881f8c79130 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -118,16 +118,32 @@ static void blkg_free_workfn(struct work_struct *work) { struct blkcg_gq *blkg = container_of(work, struct blkcg_gq, free_work); + struct request_queue *q = blkg->q; int i;

+ /* + * pd_free_fn() can also be called from blkcg_deactivate_policy(), + * in order to make sure pd_free_fn() is called in order, the deletion + * of the list blkg->q_node is delayed to here from blkg_destroy(), and + * blkcg_mutex is used to synchronize blkg_free_workfn() and + * blkcg_deactivate_policy(). + */ + if (q) + mutex_lock(&q->blkcg_mutex); + for (i = 0; i < BLKCG_MAX_POLS; i++) if (blkg->pd[i]) blkcg_policy[i]->pd_free_fn(blkg->pd[i]);

if (blkg->parent) blkg_put(blkg->parent); - if (blkg->q) - blk_put_queue(blkg->q); + + if (q) { + list_del_init(&blkg->q_node); + mutex_unlock(&q->blkcg_mutex); + blk_put_queue(q); + } + free_percpu(blkg->iostat_cpu); percpu_ref_exit(&blkg->refcnt); kfree(blkg); @@ -458,9 +474,14 @@ static void blkg_destroy(struct blkcg_gq *blkg) lockdep_assert_held(&blkg->q->queue_lock); lockdep_assert_held(&blkcg->lock);

- /* Something wrong if we are trying to remove same group twice */ - WARN_ON_ONCE(list_empty(&blkg->q_node)); - WARN_ON_ONCE(hlist_unhashed(&blkg->blkcg_node)); + /* + * blkg stays on the queue list until blkg_free_workfn(), see details in + * blkg_free_workfn(), hence this function can be called from + * blkcg_destroy_blkgs() first and again from blkg_destroy_all() before + * blkg_free_workfn(). + */ + if (hlist_unhashed(&blkg->blkcg_node)) + return;

for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkcg_policy *pol = blkcg_policy[i]; @@ -472,7 +493,6 @@ static void blkg_destroy(struct blkcg_gq *blkg) blkg->online = false;

radix_tree_delete(&blkcg->blkg_tree, blkg->q->id); - list_del_init(&blkg->q_node); hlist_del_init_rcu(&blkg->blkcg_node);

/* @@ -1273,6 +1293,7 @@ int blkcg_init_disk(struct gendisk *disk) int ret;

INIT_LIST_HEAD(&q->blkg_list); + mutex_init(&q->blkcg_mutex);

new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL); if (!new_blkg) @@ -1510,6 +1531,7 @@ void blkcg_deactivate_policy(struct request_queue *q, if (queue_is_mq(q)) blk_mq_freeze_queue(q);

+ mutex_lock(&q->blkcg_mutex); spin_lock_irq(&q->queue_lock);

__clear_bit(pol->plid, q->blkcg_pols); @@ -1528,6 +1550,7 @@ void blkcg_deactivate_policy(struct request_queue *q, }

spin_unlock_irq(&q->queue_lock); + mutex_unlock(&q->blkcg_mutex);

if (queue_is_mq(q)) blk_mq_unfreeze_queue(q); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 43d4e073b1115..10ee92db680c9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -484,6 +484,7 @@ struct request_queue { DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS); struct blkcg_gq *root_blkg; struct list_head blkg_list; + struct mutex blkcg_mutex; #endif

struct queue_limits limits;

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 11/21] trace/blktrace: fix memory leak with using debugfs_lookup()

From: Greg Kroah-Hartman gregkh@linuxfoundation.org

[ Upstream commit 83e8864fee26f63a7435e941b7c36a20fd6fe93e ]

When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once.

Cc: Jens Axboe axboe@kernel.dk Cc: Steven Rostedt rostedt@goodmis.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20230202141956.2299521-1-gregkh@linuxfoundation.or... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/blktrace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 918a7d12df8ff..5743be5594153 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -320,8 +320,8 @@ static void blk_trace_free(struct request_queue *q, struct blk_trace *bt) * under 'q->debugfs_dir', thus lookup and remove them. */ if (!bt->dir) { - debugfs_remove(debugfs_lookup("dropped", q->debugfs_dir)); - debugfs_remove(debugfs_lookup("msg", q->debugfs_dir)); + debugfs_lookup_and_remove("dropped", q->debugfs_dir); + debugfs_lookup_and_remove("msg", q->debugfs_dir); } else { debugfs_remove(bt->dir); }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()

From: Eric Biggers ebiggers@google.com

[ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]

Now that the key associated with the "test_dummy_operation" mount option is added on-demand when it's needed, rather than immediately when the filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be called from __put_super() to avoid a memory leak on mount failure.

Remove this call, which was causing confusion because it appeared to be a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).

Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/super.c | 1 - 1 file changed, 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c index cf737ec2bd05c..8e531174e7c28 100644 --- a/fs/super.c +++ b/fs/super.c @@ -291,7 +291,6 @@ static void __put_super(struct super_block *s) WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); security_sb_free(s); - fscrypt_destroy_keyring(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); call_rcu(&s->rcu, destroy_super_rcu);

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 13/21] sched/fair: sanitize vruntime of entity being placed

From: Zhang Qiao zhangqiao22@huawei.com

[ Upstream commit 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ]

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale.

[rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Co-developed-by: Roman Kagan rkagan@amazon.de Signed-off-by: Roman Kagan rkagan@amazon.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0f87369914274..717c3ca970e15 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4656,6 +4656,7 @@ static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { u64 vruntime = cfs_rq->min_vruntime; + u64 sleep_time;

/* * The 'current' period is already promised to the current tasks, @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) vruntime -= thresh; }

- /* ensure we never gain time by being placed backwards. */ - se->vruntime = max_vruntime(se->vruntime, vruntime); + /* + * Pull vruntime of the entity being placed to the base level of + * cfs_rq, to prevent boosting it if placed backwards. If the entity + * slept for a long time, don't even try to compare its vruntime with + * the base as it may be too far off and the comparison may get + * inversed due to s64 overflow. + */ + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start; + if ((s64)sleep_time > 60LL * NSEC_PER_SEC) + se->vruntime = vruntime; + else + se->vruntime = max_vruntime(se->vruntime, vruntime); }

static void check_enqueue_throttle(struct cfs_rq *cfs_rq);

-- 2.39.0

Zhang Qiao

1 Mar 1 Mar

1:03 p.m.

New subject: [PATCH AUTOSEL 6.2 13/21] sched/fair: sanitize vruntime of entity being placed

在 2023/2/26 11:41, Sasha Levin 写道:

...

From: Zhang Qiao zhangqiao22@huawei.com

[ Upstream commit 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ]

Hi, This patch has significant impact on the hackbench.throughput [1]. Please don't backport this patch.

[1] https://lore.kernel.org/lkml/202302211553.9738f304-yujie.liu@intel.com/T/#u

Thanks. Zhang Qiao.

...

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale.

[rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Co-developed-by: Roman Kagan rkagan@amazon.de Signed-off-by: Roman Kagan rkagan@amazon.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de Signed-off-by: Sasha Levin sashal@kernel.org

kernel/sched/fair.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0f87369914274..717c3ca970e15 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4656,6 +4656,7 @@ static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { u64 vruntime = cfs_rq->min_vruntime;

u64 sleep_time;

/* * The 'current' period is already promised to the current tasks, @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) vruntime -= thresh; }

/* ensure we never gain time by being placed backwards. */

se->vruntime = max_vruntime(se->vruntime, vruntime);
/*
* Pull vruntime of the entity being placed to the base level of
* cfs_rq, to prevent boosting it if placed backwards.  If the entity
* slept for a long time, don't even try to compare its vruntime with
* the base as it may be too far off and the comparison may get
* inversed due to s64 overflow.
*/
sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;

if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
se->vruntime = vruntime;
else
se->vruntime = max_vruntime(se->vruntime, vruntime);
} static void check_enqueue_throttle(struct cfs_rq *cfs_rq);

Sasha Levin

26 Feb 26 Feb

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 14/21] btrfs: scrub: improve tree block error reporting

From: Qu Wenruo wqu@suse.com

[ Upstream commit 28232909ba43561887508a6ef46d7f33a648f375 ]

[BUG] When debugging a scrub related metadata error, it turns out that our metadata error reporting is not ideal.

The only 3 error messages are:

- BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1 Showing we have metadata generation mismatch errors.

- BTRFS error (device dm-2): unable to fixup (regular) error at logical 7110656 on dev /dev/mapper/test-scratch1 Showing which tree blocks are corrupted.

- BTRFS warning (device dm-2): checksum/header error at logical 24772608 on dev /dev/mapper/test-scratch2, physical 3801088: metadata node (level 1) in tree 5 Showing which physical range the corrupted metadata is at.

We have to combine the above 3 to know we have a corrupted metadata with generation mismatch.

And this is already the better case, if we have other problems, like fsid mismatch, we can not even know the cause.

[CAUSE] The problem is caused by the fact that, scrub_checksum_tree_block() never outputs any error message.

It just return two bits for scrub: sblock->header_error, and sblock->generation_error.

And later we report error in scrub_print_warning(), but unfortunately we only have two bits, there is not really much thing we can done to print any detailed errors.

[FIX] This patch will do the following to enhance the error reporting of metadata scrub:

- Add extra warning (ratelimited) for every error we hit This can help us to distinguish the different types of errors. Some errors can help us to know what's going wrong immediately, like bytenr mismatch.

- Re-order the checks Currently we check bytenr first, then immediately generation. This can lead to false generation mismatch reports, while the fsid mismatches.

Here is the new output for the bug I'm debugging (we forgot to writeback tree blocks for commit roots):

BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc

Now we can immediately know it's some tree blocks didn't even get written back, other than the original confusing generation mismatch.

Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/scrub.c | 49 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 52b346795f660..a5d026041be45 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2053,20 +2053,33 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) * a) don't have an extent buffer and * b) the page is already kmapped */ - if (sblock->logical != btrfs_stack_header_bytenr(h)) + if (sblock->logical != btrfs_stack_header_bytenr(h)) { sblock->header_error = 1; - - if (sector->generation != btrfs_stack_header_generation(h)) { - sblock->header_error = 1; - sblock->generation_error = 1; + btrfs_warn_rl(fs_info, + "tree block %llu mirror %u has bad bytenr, has %llu want %llu", + sblock->logical, sblock->mirror_num, + btrfs_stack_header_bytenr(h), + sblock->logical); + goto out; }

- if (!scrub_check_fsid(h->fsid, sector)) + if (!scrub_check_fsid(h->fsid, sector)) { sblock->header_error = 1; + btrfs_warn_rl(fs_info, + "tree block %llu mirror %u has bad fsid, has %pU want %pU", + sblock->logical, sblock->mirror_num, + h->fsid, sblock->dev->fs_devices->fsid); + goto out; + }

- if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid, - BTRFS_UUID_SIZE)) + if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid, BTRFS_UUID_SIZE)) { sblock->header_error = 1; + btrfs_warn_rl(fs_info, + "tree block %llu mirror %u has bad chunk tree uuid, has %pU want %pU", + sblock->logical, sblock->mirror_num, + h->chunk_tree_uuid, fs_info->chunk_tree_uuid); + goto out; + }

shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); @@ -2079,9 +2092,27 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) }

crypto_shash_final(shash, calculated_csum); - if (memcmp(calculated_csum, on_disk_csum, sctx->fs_info->csum_size)) + if (memcmp(calculated_csum, on_disk_csum, sctx->fs_info->csum_size)) { sblock->checksum_error = 1; + btrfs_warn_rl(fs_info, + "tree block %llu mirror %u has bad csum, has " CSUM_FMT " want " CSUM_FMT, + sblock->logical, sblock->mirror_num, + CSUM_FMT_VALUE(fs_info->csum_size, on_disk_csum), + CSUM_FMT_VALUE(fs_info->csum_size, calculated_csum)); + goto out; + } + + if (sector->generation != btrfs_stack_header_generation(h)) { + sblock->header_error = 1; + sblock->generation_error = 1; + btrfs_warn_rl(fs_info, + "tree block %llu mirror %u has bad generation, has %llu want %llu", + sblock->logical, sblock->mirror_num, + btrfs_stack_header_generation(h), + sector->generation); + }

+out: return sblock->header_error || sblock->checksum_error; }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 15/21] arm64: zynqmp: Enable hs termination flag for USB dwc3 controller

From: Michael Grzeschik m.grzeschik@pengutronix.de

[ Upstream commit 32405e532d358a2f9d4befae928b9883c8597616 ]

Since we need to support legacy phys with the dwc3 controller, we enable this quirk on the zynqmp platforms.

Signed-off-by: Michael Grzeschik m.grzeschik@pengutronix.de Link: https://lore.kernel.org/r/20221023215649.221726-1-m.grzeschik@pengutronix.de Signed-off-by: Michal Simek michal.simek@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index 4325cb8526edc..f92df478f0eea 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -858,6 +858,7 @@ dwc3_0: usb@fe200000 { clock-names = "bus_early", "ref"; iommus = <&smmu 0x860>; snps,quirk-frame-length-adjustment = <0x20>; + snps,resume-hs-terminations; /* dma-coherent; */ }; }; @@ -884,6 +885,7 @@ dwc3_1: usb@fe300000 { clock-names = "bus_early", "ref"; iommus = <&smmu 0x861>; snps,quirk-frame-length-adjustment = <0x20>; + snps,resume-hs-terminations; /* dma-coherent; */ }; };

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 16/21] cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE

From: Peter Zijlstra peterz@infradead.org

[ Upstream commit 821ad23d0eaff73ef599ece39ecc77482df20a8c ]

Fix instrumentation bugs objtool found:

vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() leaves .noinstr.text section vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Tested-by: Tony Lindgren tony@atomide.com Tested-by: Ulf Hansson ulf.hansson@linaro.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Frederic Weisbecker frederic@kernel.org Link: https://lore.kernel.org/r/20230112195540.494977795@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/fpu/xcr.h | 4 ++-- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/kernel/fpu/core.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xcr.h b/arch/x86/include/asm/fpu/xcr.h index 9656a5bc6feae..9a710c0604457 100644 --- a/arch/x86/include/asm/fpu/xcr.h +++ b/arch/x86/include/asm/fpu/xcr.h @@ -5,7 +5,7 @@ #define XCR_XFEATURE_ENABLED_MASK 0x00000000 #define XCR_XFEATURE_IN_USE_MASK 0x00000001

-static inline u64 xgetbv(u32 index) +static __always_inline u64 xgetbv(u32 index) { u32 eax, edx;

@@ -27,7 +27,7 @@ static inline void xsetbv(u32 index, u64 value) * * Callers should check X86_FEATURE_XGETBV1. */ -static inline u64 xfeatures_in_use(void) +static __always_inline u64 xfeatures_in_use(void) { return xgetbv(XCR_XFEATURE_IN_USE_MASK); } diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 35f709f619fb4..c2e322189f853 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -295,7 +295,7 @@ static inline int enqcmds(void __iomem *dst, const void *src) return 0; }

-static inline void tile_release(void) +static __always_inline void tile_release(void) { /* * Instruction opcode for TILERELEASE; supported in binutils diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 9baa89a8877d0..dccce58201b7c 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -853,12 +853,12 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr) * Initialize register state that may prevent from entering low-power idle. * This function will be invoked from the cpuidle driver only when needed. */ -void fpu_idle_fpregs(void) +noinstr void fpu_idle_fpregs(void) { /* Note: AMX_TILE being enabled implies XGETBV1 support */ if (cpu_feature_enabled(X86_FEATURE_AMX_TILE) && (xfeatures_in_use() & XFEATURE_MASK_XTILE)) { tile_release(); - fpregs_deactivate(&current->thread.fpu); + __this_cpu_write(fpu_fpregs_owner_ctx, NULL); } }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 17/21] entry, kasan, x86: Disallow overriding mem*() functions

From: Peter Zijlstra peterz@infradead.org

[ Upstream commit 69d4c0d3218692ffa56b0e1b9c76c50c699d7044 ]

KASAN cannot just hijack the mem*() functions, it needs to emit __asan_mem*() variants if it wants instrumentation (other sanitizers already do this).

vmlinux.o: warning: objtool: sync_regs+0x24: call to memcpy() leaves .noinstr.text section vmlinux.o: warning: objtool: vc_switch_off_ist+0xbe: call to memcpy() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret+0x36: call to memset() leaves .noinstr.text section vmlinux.o: warning: objtool: __sev_get_ghcb+0xa0: call to memcpy() leaves .noinstr.text section vmlinux.o: warning: objtool: __sev_put_ghcb+0x35: call to memcpy() leaves .noinstr.text section

Remove the weak aliases to ensure nobody hijacks these functions and add them to the noinstr section.

Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Tested-by: Tony Lindgren tony@atomide.com Tested-by: Ulf Hansson ulf.hansson@linaro.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Frederic Weisbecker frederic@kernel.org Link: https://lore.kernel.org/r/20230112195542.028523143@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/lib/memcpy_64.S | 5 ++--- arch/x86/lib/memmove_64.S | 4 +++- arch/x86/lib/memset_64.S | 4 +++- mm/kasan/kasan.h | 4 ++++ mm/kasan/shadow.c | 38 ++++++++++++++++++++++++++++++++++++++ tools/objtool/check.c | 3 +++ 6 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S index dd8cd8831251f..a64017602010e 100644 --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -8,7 +8,7 @@ #include <asm/alternative.h> #include <asm/export.h>

-.pushsection .noinstr.text, "ax" +.section .noinstr.text, "ax"

/* * We build a jump to memcpy_orig by default which gets NOPped out on @@ -43,7 +43,7 @@ SYM_TYPED_FUNC_START(__memcpy) SYM_FUNC_END(__memcpy) EXPORT_SYMBOL(__memcpy)

-SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy) +SYM_FUNC_ALIAS(memcpy, __memcpy) EXPORT_SYMBOL(memcpy)

/* @@ -184,4 +184,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig) RET SYM_FUNC_END(memcpy_orig)

-.popsection diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 724bbf83eb5b0..02661861e5dd9 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -13,6 +13,8 @@

#undef memmove

+.section .noinstr.text, "ax" + /* * Implement memmove(). This can handle overlap between src and dst. * @@ -213,5 +215,5 @@ SYM_FUNC_START(__memmove) SYM_FUNC_END(__memmove) EXPORT_SYMBOL(__memmove)

-SYM_FUNC_ALIAS_WEAK(memmove, __memmove) +SYM_FUNC_ALIAS(memmove, __memmove) EXPORT_SYMBOL(memmove) diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index fc9ffd3ff3b21..6143b1a6fa2ca 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -6,6 +6,8 @@ #include <asm/alternative.h> #include <asm/export.h>

+.section .noinstr.text, "ax" + /* * ISO C memset - set a memory block to a byte value. This function uses fast * string to get better performance than the original function. The code is @@ -43,7 +45,7 @@ SYM_FUNC_START(__memset) SYM_FUNC_END(__memset) EXPORT_SYMBOL(__memset)

-SYM_FUNC_ALIAS_WEAK(memset, __memset) +SYM_FUNC_ALIAS(memset, __memset) EXPORT_SYMBOL(memset)

/* diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index ea8cf1310b1e8..71c15438afcfc 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -618,6 +618,10 @@ void __asan_set_shadow_f3(const void *addr, size_t size); void __asan_set_shadow_f5(const void *addr, size_t size); void __asan_set_shadow_f8(const void *addr, size_t size);

+void *__asan_memset(void *addr, int c, size_t len); +void *__asan_memmove(void *dest, const void *src, size_t len); +void *__asan_memcpy(void *dest, const void *src, size_t len); + void __hwasan_load1_noabort(unsigned long addr); void __hwasan_store1_noabort(unsigned long addr); void __hwasan_load2_noabort(unsigned long addr); diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 15cfb34d16a13..3703983a8e556 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -38,6 +38,12 @@ bool __kasan_check_write(const volatile void *p, unsigned int size) } EXPORT_SYMBOL(__kasan_check_write);

+#ifndef CONFIG_GENERIC_ENTRY +/* + * CONFIG_GENERIC_ENTRY relies on compiler emitted mem*() calls to not be + * instrumented. KASAN enabled toolchains should emit __asan_mem*() functions + * for the sites they want to instrument. + */ #undef memset void *memset(void *addr, int c, size_t len) { @@ -68,6 +74,38 @@ void *memcpy(void *dest, const void *src, size_t len)

return __memcpy(dest, src, len); } +#endif + +void *__asan_memset(void *addr, int c, size_t len) +{ + if (!kasan_check_range((unsigned long)addr, len, true, _RET_IP_)) + return NULL; + + return __memset(addr, c, len); +} +EXPORT_SYMBOL(__asan_memset); + +#ifdef __HAVE_ARCH_MEMMOVE +void *__asan_memmove(void *dest, const void *src, size_t len) +{ + if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) || + !kasan_check_range((unsigned long)dest, len, true, _RET_IP_)) + return NULL; + + return __memmove(dest, src, len); +} +EXPORT_SYMBOL(__asan_memmove); +#endif + +void *__asan_memcpy(void *dest, const void *src, size_t len) +{ + if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) || + !kasan_check_range((unsigned long)dest, len, true, _RET_IP_)) + return NULL; + + return __memcpy(dest, src, len); +} +EXPORT_SYMBOL(__asan_memcpy);

void kasan_poison(const void *addr, size_t size, u8 value, bool init) { diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 4b7c8b33069e5..3bd5bbfb4dee0 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1082,6 +1082,9 @@ static const char *uaccess_safe_builtin[] = { "__asan_store16_noabort", "__kasan_check_read", "__kasan_check_write", + "__asan_memset", + "__asan_memmove", + "__asan_memcpy", /* KASAN in-line */ "__asan_report_load_n_noabort", "__asan_report_load1_noabort",

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 18/21] x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads

From: Jens Axboe axboe@kernel.dk

[ Upstream commit cb3ea4b7671b7cfbac3ee609976b790aebd0bbda ]

We don't set it on PF_KTHREAD threads as they never return to userspace, and PF_IO_WORKER threads are identical in that regard. As they keep running in the kernel until they die, skip setting the FPU flag on them.

More of a cosmetic thing that was found while debugging and issue and pondering why the FPU flag is set on these threads.

Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/560c844c-f128-555b-40c6-31baff27537f@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/fpu/sched.h | 2 +- arch/x86/kernel/fpu/context.h | 2 +- arch/x86/kernel/fpu/core.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index b2486b2cbc6e0..c2d6cd78ed0c2 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -39,7 +39,7 @@ extern void fpu_flush_thread(void); static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu) { if (cpu_feature_enabled(X86_FEATURE_FPU) && - !(current->flags & PF_KTHREAD)) { + !(current->flags & (PF_KTHREAD | PF_IO_WORKER))) { save_fpregs_to_fpstate(old_fpu); /* * The save operation preserved register state, so the diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h index 958accf2ccf07..9fcfa5c4dad79 100644 --- a/arch/x86/kernel/fpu/context.h +++ b/arch/x86/kernel/fpu/context.h @@ -57,7 +57,7 @@ static inline void fpregs_restore_userregs(void) struct fpu *fpu = &current->thread.fpu; int cpu = smp_processor_id();

- if (WARN_ON_ONCE(current->flags & PF_KTHREAD)) + if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_IO_WORKER))) return;

if (!fpregs_state_valid(fpu, cpu)) { diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index dccce58201b7c..caf33486dc5ee 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -426,7 +426,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)

this_cpu_write(in_kernel_fpu, true);

- if (!(current->flags & PF_KTHREAD) && + if (!(current->flags & (PF_KTHREAD | PF_IO_WORKER)) && !test_thread_flag(TIF_NEED_FPU_LOAD)) { set_thread_flag(TIF_NEED_FPU_LOAD); save_fpregs_to_fpstate(&current->thread.fpu);

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 19/21] cpuidle: drivers: firmware: psci: Dont instrument suspend code

From: Mark Rutland mark.rutland@arm.com

[ Upstream commit 393e2ea30aec634b37004d401863428e120d5e1b ]

The PSCI suspend code is currently instrumentable, which is not safe as instrumentation (e.g. ftrace) may try to make use of RCU during idle periods when RCU is not watching.

To fix this we need to ensure that psci_suspend_finisher() and anything it calls are not instrumented. We can do this fairly simply by marking psci_suspend_finisher() and the psci*_cpu_suspend() functions as noinstr, and the underlying helper functions as __always_inline.

When CONFIG_DEBUG_VIRTUAL=y, __pa_symbol() can expand to an out-of-line instrumented function, so we must use __pa_symbol_nodebug() within psci_suspend_finisher().

The raw SMCCC invocation functions are written in assembly, and are not subject to compiler instrumentation.

Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20230126151323.349423061@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/psci/psci.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index 447ee4ea5c903..f78249fe2512a 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -108,9 +108,10 @@ bool psci_power_state_is_valid(u32 state) return !(state & ~valid_mask); }

-static unsigned long __invoke_psci_fn_hvc(unsigned long function_id, - unsigned long arg0, unsigned long arg1, - unsigned long arg2) +static __always_inline unsigned long +__invoke_psci_fn_hvc(unsigned long function_id, + unsigned long arg0, unsigned long arg1, + unsigned long arg2) { struct arm_smccc_res res;

@@ -118,9 +119,10 @@ static unsigned long __invoke_psci_fn_hvc(unsigned long function_id, return res.a0; }

-static unsigned long __invoke_psci_fn_smc(unsigned long function_id, - unsigned long arg0, unsigned long arg1, - unsigned long arg2) +static __always_inline unsigned long +__invoke_psci_fn_smc(unsigned long function_id, + unsigned long arg0, unsigned long arg1, + unsigned long arg2) { struct arm_smccc_res res;

@@ -128,7 +130,7 @@ static unsigned long __invoke_psci_fn_smc(unsigned long function_id, return res.a0; }

-static int psci_to_linux_errno(int errno) +static __always_inline int psci_to_linux_errno(int errno) { switch (errno) { case PSCI_RET_SUCCESS: @@ -169,7 +171,8 @@ int psci_set_osi_mode(bool enable) return psci_to_linux_errno(err); }

-static int __psci_cpu_suspend(u32 fn, u32 state, unsigned long entry_point) +static __always_inline int +__psci_cpu_suspend(u32 fn, u32 state, unsigned long entry_point) { int err;

@@ -177,13 +180,15 @@ static int __psci_cpu_suspend(u32 fn, u32 state, unsigned long entry_point) return psci_to_linux_errno(err); }

-static int psci_0_1_cpu_suspend(u32 state, unsigned long entry_point) +static __always_inline int +psci_0_1_cpu_suspend(u32 state, unsigned long entry_point) { return __psci_cpu_suspend(psci_0_1_function_ids.cpu_suspend, state, entry_point); }

-static int psci_0_2_cpu_suspend(u32 state, unsigned long entry_point) +static __always_inline int +psci_0_2_cpu_suspend(u32 state, unsigned long entry_point) { return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND), state, entry_point); @@ -450,10 +455,12 @@ late_initcall(psci_debugfs_init) #endif

#ifdef CONFIG_CPU_IDLE -static int psci_suspend_finisher(unsigned long state) +static noinstr int psci_suspend_finisher(unsigned long state) { u32 power_state = state; - phys_addr_t pa_cpu_resume = __pa_symbol(cpu_resume); + phys_addr_t pa_cpu_resume; + + pa_cpu_resume = __pa_symbol_nodebug((unsigned long)cpu_resume);

return psci_ops.cpu_suspend(power_state, pa_cpu_resume); }

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 20/21] cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG

From: Peter Zijlstra peterz@infradead.org

[ Upstream commit 5a5d7e9badd2cb8065db171961bd30bd3595e4b6 ]

In order to avoid WARN/BUG from generating nested or even recursive warnings, force rcu_is_watching() true during WARN/lockdep_rcu_suspicious().

Notably things like unwinding the stack can trigger rcu_dereference() warnings, which then triggers more unwinding which then triggers more warnings etc..

Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/context_tracking.h | 27 +++++++++++++++++++++++++++ kernel/locking/lockdep.c | 3 +++ kernel/panic.c | 5 +++++ lib/bug.c | 15 ++++++++++++++- 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index dcef4a9e4d63e..d4afa8508a806 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -130,9 +130,36 @@ static __always_inline unsigned long ct_state_inc(int incby) return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state)); }

+static __always_inline bool warn_rcu_enter(void) +{ + bool ret = false; + + /* + * Horrible hack to shut up recursive RCU isn't watching fail since + * lots of the actual reporting also relies on RCU. + */ + preempt_disable_notrace(); + if (rcu_dynticks_curr_cpu_in_eqs()) { + ret = true; + ct_state_inc(RCU_DYNTICKS_IDX); + } + + return ret; +} + +static __always_inline void warn_rcu_exit(bool rcu) +{ + if (rcu) + ct_state_inc(RCU_DYNTICKS_IDX); + preempt_enable_notrace(); +} + #else static inline void ct_idle_enter(void) { } static inline void ct_idle_exit(void) { } + +static __always_inline bool warn_rcu_enter(void) { return false; } +static __always_inline void warn_rcu_exit(bool rcu) { } #endif /* !CONFIG_CONTEXT_TRACKING_IDLE */

#endif diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e3375bc40dadc..50d4863974e7a 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -55,6 +55,7 @@ #include <linux/rcupdate.h> #include <linux/kprobes.h> #include <linux/lockdep.h> +#include <linux/context_tracking.h>

#include <asm/sections.h>

@@ -6555,6 +6556,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) { struct task_struct *curr = current; int dl = READ_ONCE(debug_locks); + bool rcu = warn_rcu_enter();

/* Note: the following can be executed concurrently, so be careful. */ pr_warn("\n"); @@ -6595,5 +6597,6 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) lockdep_print_held_locks(curr); pr_warn("\nstack backtrace:\n"); dump_stack(); + warn_rcu_exit(rcu); } EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious); diff --git a/kernel/panic.c b/kernel/panic.c index 463c9295bc28a..487f5b03bf835 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -34,6 +34,7 @@ #include <linux/ratelimit.h> #include <linux/debugfs.h> #include <linux/sysfs.h> +#include <linux/context_tracking.h> #include <trace/events/error_report.h> #include <asm/sections.h>

@@ -679,6 +680,7 @@ void __warn(const char *file, int line, void *caller, unsigned taint, void warn_slowpath_fmt(const char *file, int line, unsigned taint, const char *fmt, ...) { + bool rcu = warn_rcu_enter(); struct warn_args args;

pr_warn(CUT_HERE); @@ -693,11 +695,13 @@ void warn_slowpath_fmt(const char *file, int line, unsigned taint, va_start(args.args, fmt); __warn(file, line, __builtin_return_address(0), taint, NULL, &args); va_end(args.args); + warn_rcu_exit(rcu); } EXPORT_SYMBOL(warn_slowpath_fmt); #else void __warn_printk(const char *fmt, ...) { + bool rcu = warn_rcu_enter(); va_list args;

pr_warn(CUT_HERE); @@ -705,6 +709,7 @@ void __warn_printk(const char *fmt, ...) va_start(args, fmt); vprintk(fmt, args); va_end(args); + warn_rcu_exit(rcu); } EXPORT_SYMBOL(__warn_printk); #endif diff --git a/lib/bug.c b/lib/bug.c index c223a2575b721..e0ff219899902 100644 --- a/lib/bug.c +++ b/lib/bug.c @@ -47,6 +47,7 @@ #include <linux/sched.h> #include <linux/rculist.h> #include <linux/ftrace.h> +#include <linux/context_tracking.h>

extern struct bug_entry __start___bug_table[], __stop___bug_table[];

@@ -153,7 +154,7 @@ struct bug_entry *find_bug(unsigned long bugaddr) return module_find_bug(bugaddr); }

-enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) +static enum bug_trap_type __report_bug(unsigned long bugaddr, struct pt_regs *regs) { struct bug_entry *bug; const char *file; @@ -209,6 +210,18 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) return BUG_TRAP_TYPE_BUG; }

+enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) +{ + enum bug_trap_type ret; + bool rcu = false; + + rcu = warn_rcu_enter(); + ret = __report_bug(bugaddr, regs); + warn_rcu_exit(rcu); + + return ret; +} + static void clear_once_table(struct bug_entry *start, struct bug_entry *end) { struct bug_entry *bug;

-- 2.39.0

Sasha Levin

3:41 a.m.

New subject: [PATCH AUTOSEL 6.2 21/21] perf/x86/intel/uncore: Add Meteor Lake support

From: Kan Liang kan.liang@linux.intel.com

[ Upstream commit c828441f21ddc819a28b5723a72e3c840e9de1c6 ]

The uncore subsystem for Meteor Lake is similar to the previous Alder Lake. The main difference is that MTL provides PMU support for different tiles, while ADL only provides PMU support for the whole package. On ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides a fixed counter for clockticks. Also, new MSR addresses are introduced on MTL.

The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for Meteor Lake.

Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/uncore.c | 7 ++ arch/x86/events/intel/uncore.h | 1 + arch/x86/events/intel/uncore_snb.c | 161 +++++++++++++++++++++++++++++ 3 files changed, 169 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 459b1aafd4d4a..27b34f5b87600 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -1765,6 +1765,11 @@ static const struct intel_uncore_init_fun adl_uncore_init __initconst = { .mmio_init = adl_uncore_mmio_init, };

+static const struct intel_uncore_init_fun mtl_uncore_init __initconst = { + .cpu_init = mtl_uncore_cpu_init, + .mmio_init = adl_uncore_mmio_init, +}; + static const struct intel_uncore_init_fun icx_uncore_init __initconst = { .cpu_init = icx_uncore_cpu_init, .pci_init = icx_uncore_pci_init, @@ -1832,6 +1837,8 @@ static const struct x86_cpu_id intel_uncore_match[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, &adl_uncore_init), X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, &adl_uncore_init), X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, &adl_uncore_init), + X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, &mtl_uncore_init), + X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, &mtl_uncore_init), X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &spr_uncore_init), X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, &spr_uncore_init), X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &snr_uncore_init), diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index e278e2e7c051a..305a54d88beee 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -602,6 +602,7 @@ void skl_uncore_cpu_init(void); void icl_uncore_cpu_init(void); void tgl_uncore_cpu_init(void); void adl_uncore_cpu_init(void); +void mtl_uncore_cpu_init(void); void tgl_uncore_mmio_init(void); void tgl_l_uncore_mmio_init(void); void adl_uncore_mmio_init(void); diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c index 1f4869227efb9..7fd4334e12a17 100644 --- a/arch/x86/events/intel/uncore_snb.c +++ b/arch/x86/events/intel/uncore_snb.c @@ -109,6 +109,19 @@ #define PCI_DEVICE_ID_INTEL_RPL_23_IMC 0xA728 #define PCI_DEVICE_ID_INTEL_RPL_24_IMC 0xA729 #define PCI_DEVICE_ID_INTEL_RPL_25_IMC 0xA72A +#define PCI_DEVICE_ID_INTEL_MTL_1_IMC 0x7d00 +#define PCI_DEVICE_ID_INTEL_MTL_2_IMC 0x7d01 +#define PCI_DEVICE_ID_INTEL_MTL_3_IMC 0x7d02 +#define PCI_DEVICE_ID_INTEL_MTL_4_IMC 0x7d05 +#define PCI_DEVICE_ID_INTEL_MTL_5_IMC 0x7d10 +#define PCI_DEVICE_ID_INTEL_MTL_6_IMC 0x7d14 +#define PCI_DEVICE_ID_INTEL_MTL_7_IMC 0x7d15 +#define PCI_DEVICE_ID_INTEL_MTL_8_IMC 0x7d16 +#define PCI_DEVICE_ID_INTEL_MTL_9_IMC 0x7d21 +#define PCI_DEVICE_ID_INTEL_MTL_10_IMC 0x7d22 +#define PCI_DEVICE_ID_INTEL_MTL_11_IMC 0x7d23 +#define PCI_DEVICE_ID_INTEL_MTL_12_IMC 0x7d24 +#define PCI_DEVICE_ID_INTEL_MTL_13_IMC 0x7d28

#define IMC_UNCORE_DEV(a) \ @@ -205,6 +218,32 @@ #define ADL_UNC_ARB_PERFEVTSEL0 0x2FD0 #define ADL_UNC_ARB_MSR_OFFSET 0x8

+/* MTL Cbo register */ +#define MTL_UNC_CBO_0_PER_CTR0 0x2448 +#define MTL_UNC_CBO_0_PERFEVTSEL0 0x2442 + +/* MTL HAC_ARB register */ +#define MTL_UNC_HAC_ARB_CTR 0x2018 +#define MTL_UNC_HAC_ARB_CTRL 0x2012 + +/* MTL ARB register */ +#define MTL_UNC_ARB_CTR 0x2418 +#define MTL_UNC_ARB_CTRL 0x2412 + +/* MTL cNCU register */ +#define MTL_UNC_CNCU_FIXED_CTR 0x2408 +#define MTL_UNC_CNCU_FIXED_CTRL 0x2402 +#define MTL_UNC_CNCU_BOX_CTL 0x240e + +/* MTL sNCU register */ +#define MTL_UNC_SNCU_FIXED_CTR 0x2008 +#define MTL_UNC_SNCU_FIXED_CTRL 0x2002 +#define MTL_UNC_SNCU_BOX_CTL 0x200e + +/* MTL HAC_CBO register */ +#define MTL_UNC_HBO_CTR 0x2048 +#define MTL_UNC_HBO_CTRL 0x2042 + DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7"); DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15"); DEFINE_UNCORE_FORMAT_ATTR(chmask, chmask, "config:8-11"); @@ -598,6 +637,115 @@ void adl_uncore_cpu_init(void) uncore_msr_uncores = adl_msr_uncores; }

+static struct intel_uncore_type mtl_uncore_cbox = { + .name = "cbox", + .num_counters = 2, + .perf_ctr_bits = 48, + .perf_ctr = MTL_UNC_CBO_0_PER_CTR0, + .event_ctl = MTL_UNC_CBO_0_PERFEVTSEL0, + .event_mask = ADL_UNC_RAW_EVENT_MASK, + .msr_offset = SNB_UNC_CBO_MSR_OFFSET, + .ops = &icl_uncore_msr_ops, + .format_group = &adl_uncore_format_group, +}; + +static struct intel_uncore_type mtl_uncore_hac_arb = { + .name = "hac_arb", + .num_counters = 2, + .num_boxes = 2, + .perf_ctr_bits = 48, + .perf_ctr = MTL_UNC_HAC_ARB_CTR, + .event_ctl = MTL_UNC_HAC_ARB_CTRL, + .event_mask = ADL_UNC_RAW_EVENT_MASK, + .msr_offset = SNB_UNC_CBO_MSR_OFFSET, + .ops = &icl_uncore_msr_ops, + .format_group = &adl_uncore_format_group, +}; + +static struct intel_uncore_type mtl_uncore_arb = { + .name = "arb", + .num_counters = 2, + .num_boxes = 2, + .perf_ctr_bits = 48, + .perf_ctr = MTL_UNC_ARB_CTR, + .event_ctl = MTL_UNC_ARB_CTRL, + .event_mask = ADL_UNC_RAW_EVENT_MASK, + .msr_offset = SNB_UNC_CBO_MSR_OFFSET, + .ops = &icl_uncore_msr_ops, + .format_group = &adl_uncore_format_group, +}; + +static struct intel_uncore_type mtl_uncore_hac_cbox = { + .name = "hac_cbox", + .num_counters = 2, + .num_boxes = 2, + .perf_ctr_bits = 48, + .perf_ctr = MTL_UNC_HBO_CTR, + .event_ctl = MTL_UNC_HBO_CTRL, + .event_mask = ADL_UNC_RAW_EVENT_MASK, + .msr_offset = SNB_UNC_CBO_MSR_OFFSET, + .ops = &icl_uncore_msr_ops, + .format_group = &adl_uncore_format_group, +}; + +static void mtl_uncore_msr_init_box(struct intel_uncore_box *box) +{ + wrmsrl(uncore_msr_box_ctl(box), SNB_UNC_GLOBAL_CTL_EN); +} + +static struct intel_uncore_ops mtl_uncore_msr_ops = { + .init_box = mtl_uncore_msr_init_box, + .disable_event = snb_uncore_msr_disable_event, + .enable_event = snb_uncore_msr_enable_event, + .read_counter = uncore_msr_read_counter, +}; + +static struct intel_uncore_type mtl_uncore_cncu = { + .name = "cncu", + .num_counters = 1, + .num_boxes = 1, + .box_ctl = MTL_UNC_CNCU_BOX_CTL, + .fixed_ctr_bits = 48, + .fixed_ctr = MTL_UNC_CNCU_FIXED_CTR, + .fixed_ctl = MTL_UNC_CNCU_FIXED_CTRL, + .single_fixed = 1, + .event_mask = SNB_UNC_CTL_EV_SEL_MASK, + .format_group = &icl_uncore_clock_format_group, + .ops = &mtl_uncore_msr_ops, + .event_descs = icl_uncore_events, +}; + +static struct intel_uncore_type mtl_uncore_sncu = { + .name = "sncu", + .num_counters = 1, + .num_boxes = 1, + .box_ctl = MTL_UNC_SNCU_BOX_CTL, + .fixed_ctr_bits = 48, + .fixed_ctr = MTL_UNC_SNCU_FIXED_CTR, + .fixed_ctl = MTL_UNC_SNCU_FIXED_CTRL, + .single_fixed = 1, + .event_mask = SNB_UNC_CTL_EV_SEL_MASK, + .format_group = &icl_uncore_clock_format_group, + .ops = &mtl_uncore_msr_ops, + .event_descs = icl_uncore_events, +}; + +static struct intel_uncore_type *mtl_msr_uncores[] = { + &mtl_uncore_cbox, + &mtl_uncore_hac_arb, + &mtl_uncore_arb, + &mtl_uncore_hac_cbox, + &mtl_uncore_cncu, + &mtl_uncore_sncu, + NULL +}; + +void mtl_uncore_cpu_init(void) +{ + mtl_uncore_cbox.num_boxes = icl_get_cbox_num(); + uncore_msr_uncores = mtl_msr_uncores; +} + enum { SNB_PCI_UNCORE_IMC, }; @@ -1264,6 +1412,19 @@ static const struct pci_device_id tgl_uncore_pci_ids[] = { IMC_UNCORE_DEV(RPL_23), IMC_UNCORE_DEV(RPL_24), IMC_UNCORE_DEV(RPL_25), + IMC_UNCORE_DEV(MTL_1), + IMC_UNCORE_DEV(MTL_2), + IMC_UNCORE_DEV(MTL_3), + IMC_UNCORE_DEV(MTL_4), + IMC_UNCORE_DEV(MTL_5), + IMC_UNCORE_DEV(MTL_6), + IMC_UNCORE_DEV(MTL_7), + IMC_UNCORE_DEV(MTL_8), + IMC_UNCORE_DEV(MTL_9), + IMC_UNCORE_DEV(MTL_10), + IMC_UNCORE_DEV(MTL_11), + IMC_UNCORE_DEV(MTL_12), + IMC_UNCORE_DEV(MTL_13), { /* end: all zeroes */ } };

-- 2.39.0

970

days inactive

973

days old

linux-stable-mirror@lists.linaro.org

21 comments

participants

tags (0)

participants (2)

Sasha Levin
Zhang Qiao