(second try, sending with mailx...)
After 3 days of successfully running 5.4.143 with this patch attached
and no issues, on a production workload (host + vms) of a busy
webserver and mysql database, I request queueing this for a future 5.4
stable, like the 5.10 one requested by Borislav; copying his mail text
in the hope that this is best form.
please queue for 5.4 stable
See https://bugzilla.kernel.org/show_bug.cgi?id=214159 for more info.
---
Commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 upstream.
The kthread_is_per_cpu() construct relies on only being called on
PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the
following usage pattern:
if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
However, as reported by syzcaller, this is broken. The scenario is:
CPU0 CPU1 (running p)
(p->flags & PF_KTHREAD) // true
begin_new_exec()
me->flags &= ~(PF_KTHREAD|...);
kthread_is_per_cpu(p)
to_kthread(p)
WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
Introduce __to_kthread() that omits the WARN and is sure to check both
values.
Use this to remove the problematic pattern for kthread_is_per_cpu()
and fix a number of other kthread_*() functions that have similar
issues but are currently not used in ways that would expose the
problem.
Notably kthread_func() is only ever called on 'current', while
kthread_probe_data() is only used for PF_WQ_WORKER, which implies the
task is from kthread_create*().
Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Patrick Schaaf <bof(a)bof.de>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b2bac5d929d2..22750a8af83e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -76,6 +76,25 @@ static inline struct kthread *to_kthread(struct task_struct *k)
return (__force void *)k->set_child_tid;
}
+/*
+ * Variant of to_kthread() that doesn't assume @p is a kthread.
+ *
+ * Per construction; when:
+ *
+ * (p->flags & PF_KTHREAD) && p->set_child_tid
+ *
+ * the task is both a kthread and struct kthread is persistent. However
+ * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and
+ * begin_new_exec()).
+ */
+static inline struct kthread *__to_kthread(struct task_struct *p)
+{
+ void *kthread = (__force void *)p->set_child_tid;
+ if (kthread && !(p->flags & PF_KTHREAD))
+ kthread = NULL;
+ return kthread;
+}
+
void free_kthread_struct(struct task_struct *k)
{
struct kthread *kthread;
@@ -176,10 +195,11 @@ void *kthread_data(struct task_struct *task)
*/
void *kthread_probe_data(struct task_struct *task)
{
- struct kthread *kthread = to_kthread(task);
+ struct kthread *kthread = __to_kthread(task);
void *data = NULL;
- probe_kernel_read(&data, &kthread->data, sizeof(data));
+ if (kthread)
+ probe_kernel_read(&data, &kthread->data, sizeof(data));
return data;
}
@@ -490,9 +510,9 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu)
set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
}
-bool kthread_is_per_cpu(struct task_struct *k)
+bool kthread_is_per_cpu(struct task_struct *p)
{
- struct kthread *kthread = to_kthread(k);
+ struct kthread *kthread = __to_kthread(p);
if (!kthread)
return false;
@@ -1272,11 +1292,9 @@ EXPORT_SYMBOL(kthread_destroy_worker);
*/
void kthread_associate_blkcg(struct cgroup_subsys_state *css)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
+
- if (!(current->flags & PF_KTHREAD))
- return;
- kthread = to_kthread(current);
if (!kthread)
return;
@@ -1298,13 +1316,10 @@ EXPORT_SYMBOL(kthread_associate_blkcg);
*/
struct cgroup_subsys_state *kthread_blkcg(void)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
- if (current->flags & PF_KTHREAD) {
- kthread = to_kthread(current);
- if (kthread)
- return kthread->blkcg_css;
- }
+ if (kthread)
+ return kthread->blkcg_css;
return NULL;
}
EXPORT_SYMBOL(kthread_blkcg);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 74cb20f32f72..87d9fad9d01d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7301,7 +7301,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;
/* Disregard pcpu kthreads; they are where they need to be. */
- if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ if (kthread_is_per_cpu(p))
return 0;
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
[ Upstream commit a49145acfb975d921464b84fe00279f99827d816 ]
A fb_ioctl() FBIOPUT_VSCREENINFO call with invalid xres setting
or yres setting in struct fb_var_screeninfo will result in a
KASAN: vmalloc-out-of-bounds failure in bitfill_aligned() as
the margins are being cleared. The margins are cleared in
chunks and if the xres setting or yres setting is a value of
zero upto the chunk size, the failure will occur.
Add a margin check to validate xres and yres settings.
Note that, this patch needs special handling to backport it to linux
kernel 4.19, 4.14, 4.9, 4.4.
Signed-off-by: George Kennedy <george.kennedy(a)oracle.com>
Reported-by: syzbot+e5fd3e65515b48c02a30(a)syzkaller.appspotmail.com
Reviewed-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: Dhaval Giani <dhaval.giani(a)oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie(a)samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1594149963-13801-1-git-send-e…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/video/fbdev/core/fbmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 84845275dbef..de04c097d67c 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -991,6 +991,10 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo *var)
goto done;
}
+ /* bitfill_aligned() assumes that it's at least 8x8 */
+ if (var->xres < 8 || var->yres < 8)
+ return -EINVAL;
+
ret = info->fbops->fb_check_var(var, info);
if (ret)
--
2.25.1
of_parse_thermal_zones() parses the thermal-zones node and registers a
thermal_zone device for each subnode. However, if a thermal zone is
consuming a thermal sensor and that thermal sensor device hasn't probed
yet, an attempt to set trip_point_*_temp for that thermal zone device
can cause a NULL pointer dereference. Fix it.
console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
...
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
Call trace:
of_thermal_set_trip_temp+0x40/0xc4
trip_point_temp_store+0xc0/0x1dc
dev_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
do_el0_svc+0x28/0xa0
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x1c0/0x200
Cc: stable(a)vger.kernel.org
Suggested-by: David Collins <quic_collinsd(a)quicinc.com>
Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram(a)quicinc.com>
---
drivers/thermal/thermal_of.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 6379f26..ba53252 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -301,7 +301,7 @@ static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
if (trip >= data->ntrips || trip < 0)
return -EDOM;
- if (data->ops->set_trip_temp) {
+ if (data->ops && data->ops->set_trip_temp) {
int ret;
ret = data->ops->set_trip_temp(data->sensor_data, trip, temp);
--
2.7.4
The patch titled
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
has been removed from the -mm tree. Its filename was
hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
Guillaume Morin reported hitting the following WARNING followed by GPF or
NULL pointer deference either in cgroups_destroy or in the kill_css path.:
percpu ref (css_release) <= 0 (-1) after switching to atomic
WARNING: CPU: 23 PID: 130 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x127/0x130
CPU: 23 PID: 130 Comm: ksoftirqd/23 Kdump: loaded Tainted: G O 5.10.60 #1
RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x127/0x130
Call Trace:
rcu_core+0x30f/0x530
rcu_core_si+0xe/0x10
__do_softirq+0x103/0x2a2
? sort_range+0x30/0x30
run_ksoftirqd+0x2b/0x40
smpboot_thread_fn+0x11a/0x170
kthread+0x10a/0x140
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x22/0x30
Upon further examination, it was discovered that the css structure was
associated with hugetlb reservations.
For private hugetlb mappings the vma points to a reserve map that contains
a pointer to the css. At mmap time, reservations are set up and a
reference to the css is taken. This reference is dropped in the vma close
operation; hugetlb_vm_op_close. However, if a vma is split no additional
reference to the css is taken yet hugetlb_vm_op_close will be called twice
for the split vma resulting in an underflow.
Fix by taking another reference in hugetlb_vm_op_open. Note that the
reference is only taken for the owner of the reserve map. In the more
common fork case, the pointer to the reserve map is cleared for non-owning
vmas.
Link: https://lkml.kernel.org/r/20210830215015.155224-1-mike.kravetz@oracle.com
Fixes: e9fe92ae0cd2 ("hugetlb_cgroup: add reservation accounting for
private mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Guillaume Morin <guillaume(a)morinfr.org>
Suggested-by: Guillaume Morin <guillaume(a)morinfr.org>
Tested-by: Guillaume Morin <guillaume(a)morinfr.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++
mm/hugetlb.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/include/linux/hugetlb_cgroup.h
@@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rs
css_put(&h_cg->css);
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+ if (resv_map->css)
+ css_get(resv_map->css);
+}
+
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rs
{
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+}
+
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
{
--- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/mm/hugetlb.c
@@ -4106,8 +4106,10 @@ static void hugetlb_vm_op_open(struct vm
* after this open call completes. It is therefore safe to take a
* new reference here without additional locking.
*/
- if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
+ if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ resv_map_dup_hugetlb_cgroup_uncharge_info(resv);
kref_get(&resv->refs);
+ }
}
static void hugetlb_vm_op_close(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: mm: fix panic caused by __page_handle_poison()
has been removed from the -mm tree. Its filename was
mm-fix-panic-caused-by-__page_handle_poison.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michael Wang <yun.wang(a)linux.alibaba.com>
Subject: mm: fix panic caused by __page_handle_poison()
In commit 510d25c92ec4 ("mm/hwpoison: disable pcp for
page_handle_poison()"), __page_handle_poison() was introduced, and if we
mark:
RET_A = dissolve_free_huge_page();
RET_B = take_page_off_buddy();
then __page_handle_poison was supposed to return TRUE When RET_A == 0 &&
RET_B == TRUE
But since it failed to take care the case when RET_A is -EBUSY or -ENOMEM,
and just return the ret as a bool which actually become TRUE, it break the
original logic.
The following result is a huge page in freelist but was
referenced as poisoned, and lead into the final panic:
kernel BUG at mm/internal.h:95!
invalid opcode: 0000 [#1] SMP PTI
skip...
RIP: 0010:set_page_refcounted mm/internal.h:95 [inline]
RIP: 0010:remove_hugetlb_page+0x23c/0x240 mm/hugetlb.c:1371
skip...
Call Trace:
remove_pool_huge_page+0xe4/0x110 mm/hugetlb.c:1892
return_unused_surplus_pages+0x8d/0x150 mm/hugetlb.c:2272
hugetlb_acct_memory.part.91+0x524/0x690 mm/hugetlb.c:4017
This patch replaces 'bool' with 'int' to handle RET_A correctly.
Link: https://lkml.kernel.org/r/61782ac6-1e8a-4f6f-35e6-e94fce3b37f5@linux.alibab…
Fixes: 510d25c92ec4 ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Michael Wang <yun.wang(a)linux.alibaba.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-fix-panic-caused-by-__page_handle_poison
+++ a/mm/memory-failure.c
@@ -68,7 +68,7 @@ atomic_long_t num_poisoned_pages __read_
static bool __page_handle_poison(struct page *page)
{
- bool ret;
+ int ret;
zone_pcp_disable(page_zone(page));
ret = dissolve_free_huge_page(page);
@@ -76,7 +76,7 @@ static bool __page_handle_poison(struct
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));
- return ret;
+ return ret > 0;
}
static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
_
Patches currently in -mm which might be from yun.wang(a)linux.alibaba.com are