From: Mathias Krause <minipli(a)googlemail.com>
[ Upstream commit 1bd845bcb41d5b7f83745e0cb99273eb376f2ec5 ]
The parallel queue per-cpu data structure gets initialized only for CPUs
in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer
may run on a different CPU and might wrongly decide it's the target CPU
for the next reorder item as per-cpu memory gets memset(0) and we might
be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0.
Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index'
compare in padata_get_next() fail in this case by initializing the
cpu_index member of all per-cpu parallel queues. Use -1 for unused ones.
Signed-off-by: Mathias Krause <minipli(a)googlemail.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Daniel Jordan <daniel.m.jordan(a)oracle.com>
---
kernel/padata.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/padata.c b/kernel/padata.c
index 40a0ebb8ea51..858e82179744 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -462,8 +462,14 @@ static void padata_init_pqueues(struct parallel_data *pd)
struct padata_parallel_queue *pqueue;
cpu_index = 0;
- for_each_cpu(cpu, pd->cpumask.pcpu) {
+ for_each_possible_cpu(cpu) {
pqueue = per_cpu_ptr(pd->pqueue, cpu);
+
+ if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) {
+ pqueue->cpu_index = -1;
+ continue;
+ }
+
pqueue->pd = pd;
pqueue->cpu_index = cpu_index;
cpu_index++;
--
2.26.2
The patch titled
Subject: mm: ptdump: expand type of 'val' in note_page()
has been added to the -mm tree. Its filename is
mm-ptdump-expand-type-of-val-in-note_page.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-ptdump-expand-type-of-val-in-no…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-ptdump-expand-type-of-val-in-no…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Steven Price <steven.price(a)arm.com>
Subject: mm: ptdump: expand type of 'val' in note_page()
The page table entry is passed in the 'val' argument to note_page(),
however this was previously an "unsigned long" which is fine on 64-bit
platforms. But for 32 bit x86 it is not always big enough to contain a
page table entry which may be 64 bits.
Change the type to u64 to ensure that it is always big enough.
Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.com
Signed-off-by: Steven Price <steven.price(a)arm.com>
Reported-by: Jan Beulich <jbeulich(a)suse.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/mm/dump.c | 2 +-
arch/x86/mm/dump_pagetables.c | 2 +-
include/linux/ptdump.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/mm/dump.c~mm-ptdump-expand-type-of-val-in-note_page
+++ a/arch/arm64/mm/dump.c
@@ -247,7 +247,7 @@ static void note_prot_wx(struct pg_state
}
static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
- unsigned long val)
+ u64 val)
{
struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
static const char units[] = "KMGTPE";
--- a/arch/x86/mm/dump_pagetables.c~mm-ptdump-expand-type-of-val-in-note_page
+++ a/arch/x86/mm/dump_pagetables.c
@@ -273,7 +273,7 @@ static void effective_prot(struct ptdump
* print what we collected so far.
*/
static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
- unsigned long val)
+ u64 val)
{
struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
pgprotval_t new_prot, new_eff;
--- a/include/linux/ptdump.h~mm-ptdump-expand-type-of-val-in-note_page
+++ a/include/linux/ptdump.h
@@ -13,7 +13,7 @@ struct ptdump_range {
struct ptdump_state {
/* level is 0:PGD to 4:PTE, or -1 if unknown */
void (*note_page)(struct ptdump_state *st, unsigned long addr,
- int level, unsigned long val);
+ int level, u64 val);
void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
const struct ptdump_range *range;
};
_
Patches currently in -mm which might be from steven.price(a)arm.com are
x86-mm-ptdump-calculate-effective-permissions-correctly.patch
mm-ptdump-expand-type-of-val-in-note_page.patch
The patch titled
Subject: x86: mm: ptdump: calculate effective permissions correctly
has been added to the -mm tree. Its filename is
x86-mm-ptdump-calculate-effective-permissions-correctly.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-ptdump-calculate-effective-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-ptdump-calculate-effective-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Steven Price <steven.price(a)arm.com>
Subject: x86: mm: ptdump: calculate effective permissions correctly
Patch series "Fix W+X debug feature on x86"
Jan alerted me[1] that the W+X detection debug feature was broken in x86
by my change[2] to switch x86 to use the generic ptdump infrastructure.
Fundamentally the approach of trying to move the calculation of effective
permissions into note_page() was broken because note_page() is only called
for 'leaf' entries and the effective permissions are passed down via the
internal nodes of the page tree. The solution I've taken here is to
create a new (optional) callback which is called for all nodes of the page
tree and therefore can calculate the effective permissions.
Secondly on some configurations (32 bit with PAE) "unsigned long" is not
large enough to store the table entries. The fix here is simple - let's
just use a u64.
[1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/
[2] 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
This patch (of 2):
By switching the x86 page table dump code to use the generic code the
effective permissions are no longer calculated correctly because the
note_page() function is only called for *leaf* entries. To calculate the
actual effective permissions it is necessary to observe the full hierarchy
of the page tree.
Introduce a new callback for ptdump which is called for every entry and
can therefore update the prot_levels array correctly. note_page() can
then simply access the appropriate element in the array.
Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com
Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.com
Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
Signed-off-by: Steven Price <steven.price(a)arm.com>
Reported-by: Jan Beulich <jbeulich(a)suse.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/dump_pagetables.c | 31 +++++++++++++++++++------------
include/linux/ptdump.h | 1 +
mm/ptdump.c | 17 ++++++++++++++++-
3 files changed, 36 insertions(+), 13 deletions(-)
--- a/arch/x86/mm/dump_pagetables.c~x86-mm-ptdump-calculate-effective-permissions-correctly
+++ a/arch/x86/mm/dump_pagetables.c
@@ -249,10 +249,22 @@ static void note_wx(struct pg_state *st,
(void *)st->start_address);
}
-static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
+static void effective_prot(struct ptdump_state *pt_st, int level, u64 val)
{
- return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
- ((prot1 | prot2) & _PAGE_NX);
+ struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
+ pgprotval_t prot = val & PTE_FLAGS_MASK;
+ pgprotval_t effective;
+
+ if (level > 0) {
+ pgprotval_t higher_prot = st->prot_levels[level - 1];
+
+ effective = (higher_prot & prot & (_PAGE_USER | _PAGE_RW)) |
+ ((higher_prot | prot) & _PAGE_NX);
+ } else {
+ effective = prot;
+ }
+
+ st->prot_levels[level] = effective;
}
/*
@@ -270,16 +282,10 @@ static void note_page(struct ptdump_stat
struct seq_file *m = st->seq;
new_prot = val & PTE_FLAGS_MASK;
+ new_eff = st->prot_levels[level];
- if (level > 0) {
- new_eff = effective_prot(st->prot_levels[level - 1],
- new_prot);
- } else {
- new_eff = new_prot;
- }
-
- if (level >= 0)
- st->prot_levels[level] = new_eff;
+ if (!val)
+ new_eff = 0;
/*
* If we have a "break" in the series, we need to flush the state that
@@ -374,6 +380,7 @@ static void ptdump_walk_pgd_level_core(s
struct pg_state st = {
.ptdump = {
.note_page = note_page,
+ .effective_prot = effective_prot,
.range = ptdump_ranges
},
.level = -1,
--- a/include/linux/ptdump.h~x86-mm-ptdump-calculate-effective-permissions-correctly
+++ a/include/linux/ptdump.h
@@ -14,6 +14,7 @@ struct ptdump_state {
/* level is 0:PGD to 4:PTE, or -1 if unknown */
void (*note_page)(struct ptdump_state *st, unsigned long addr,
int level, unsigned long val);
+ void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
const struct ptdump_range *range;
};
--- a/mm/ptdump.c~x86-mm-ptdump-calculate-effective-permissions-correctly
+++ a/mm/ptdump.c
@@ -36,6 +36,9 @@ static int ptdump_pgd_entry(pgd_t *pgd,
return note_kasan_page_table(walk, addr);
#endif
+ if (st->effective_prot)
+ st->effective_prot(st, 0, pgd_val(val));
+
if (pgd_leaf(val))
st->note_page(st, addr, 0, pgd_val(val));
@@ -53,6 +56,9 @@ static int ptdump_p4d_entry(p4d_t *p4d,
return note_kasan_page_table(walk, addr);
#endif
+ if (st->effective_prot)
+ st->effective_prot(st, 1, p4d_val(val));
+
if (p4d_leaf(val))
st->note_page(st, addr, 1, p4d_val(val));
@@ -70,6 +76,9 @@ static int ptdump_pud_entry(pud_t *pud,
return note_kasan_page_table(walk, addr);
#endif
+ if (st->effective_prot)
+ st->effective_prot(st, 2, pud_val(val));
+
if (pud_leaf(val))
st->note_page(st, addr, 2, pud_val(val));
@@ -87,6 +96,8 @@ static int ptdump_pmd_entry(pmd_t *pmd,
return note_kasan_page_table(walk, addr);
#endif
+ if (st->effective_prot)
+ st->effective_prot(st, 3, pmd_val(val));
if (pmd_leaf(val))
st->note_page(st, addr, 3, pmd_val(val));
@@ -97,8 +108,12 @@ static int ptdump_pte_entry(pte_t *pte,
unsigned long next, struct mm_walk *walk)
{
struct ptdump_state *st = walk->private;
+ pte_t val = READ_ONCE(*pte);
+
+ if (st->effective_prot)
+ st->effective_prot(st, 4, pte_val(val));
- st->note_page(st, addr, 4, pte_val(READ_ONCE(*pte)));
+ st->note_page(st, addr, 4, pte_val(val));
return 0;
}
_
Patches currently in -mm which might be from steven.price(a)arm.com are
x86-mm-ptdump-calculate-effective-permissions-correctly.patch
mm-ptdump-expand-type-of-val-in-note_page.patch
I hope you are doing great?
This is Felix from Toronto-Canada. I have a lucrative business
offer that will benefit us both immensely within a very short
period of time. However, I need your initial approval of interest
prior to further and complete details regarding the deal.
Thanks,
Felix.
The ABI is broken and we cannot support it properly. Turn it off.
If this causes a meaningful performance regression for someone, KVM
can introduce an improved ABI that is supportable.
Cc: stable(a)vger.kernel.org
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
arch/x86/kernel/kvm.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 93ab0cbd304e..e6f2aefa298b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -318,11 +318,26 @@ static void kvm_guest_cpu_init(void)
pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
-#ifdef CONFIG_PREEMPTION
- pa |= KVM_ASYNC_PF_SEND_ALWAYS;
-#endif
pa |= KVM_ASYNC_PF_ENABLED;
+ /*
+ * We do not set KVM_ASYNC_PF_SEND_ALWAYS. With the current
+ * KVM paravirt ABI, the following scenario is possible:
+ *
+ * #PF: async page fault (KVM_PV_REASON_PAGE_NOT_PRESENT)
+ * NMI before CR2 or KVM_PF_REASON_PAGE_NOT_PRESENT
+ * NMI accesses user memory, e.g. due to perf
+ * #PF: normal page fault
+ * #PF reads CR2 and apf_reason -- apf_reason should be 0
+ *
+ * outer #PF reads CR2 and apf_reason -- apf_reason should be
+ * KVM_PV_REASON_PAGE_NOT_PRESENT
+ *
+ * There is no possible way that both reads of CR2 and
+ * apf_reason get the correct values. Fixing this would
+ * require paravirt ABI changes.
+ */
+
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
--
2.24.1
I noticed that commit 07928d9bfc81 "padata: Remove broken queue
flushing" has been backported to most stable branches, but commit
6fc4dbcf0276 "padata: Replace delayed timer with immediate workqueue in
padata_reorder" has not.
Is this correct? What prevents the parallel_data ref-count from
dropping to 0 while the timer is scheduled?
Ben.
--
Ben Hutchings
Larkinson's Law: All laws are basically false.
This is a note to let you know that I've just added the patch titled
kobject: Make sure the parent does not get released before its
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4ef12f7198023c09ad6d25b652bd8748c965c7fa Mon Sep 17 00:00:00 2001
From: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Date: Wed, 13 May 2020 18:18:40 +0300
Subject: kobject: Make sure the parent does not get released before its
children
In the function kobject_cleanup(), kobject_del(kobj) is
called before the kobj->release(). That makes it possible to
release the parent of the kobject before the kobject itself.
To fix that, adding function __kboject_del() that does
everything that kobject_del() does except release the parent
reference. kobject_cleanup() then calls __kobject_del()
instead of kobject_del(), and separately decrements the
reference count of the parent kobject after kobj->release()
has been called.
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Reported-by: kernel test robot <rong.a.chen(a)intel.com>
Fixes: 7589238a8cf3 ("Revert "software node: Simplify software_node_release() function"")
Suggested-by: "Rafael J. Wysocki" <rafael(a)kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
Tested-by: Brendan Higgins <brendanhiggins(a)google.com>
Acked-by: Randy Dunlap <rdunlap(a)infradead.org>
Link: https://lore.kernel.org/r/20200513151840.36400-1-heikki.krogerus@linux.inte…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/kobject.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index 83198cb37d8d..2bd631460e18 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -599,14 +599,7 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent)
}
EXPORT_SYMBOL_GPL(kobject_move);
-/**
- * kobject_del() - Unlink kobject from hierarchy.
- * @kobj: object.
- *
- * This is the function that should be called to delete an object
- * successfully added via kobject_add().
- */
-void kobject_del(struct kobject *kobj)
+static void __kobject_del(struct kobject *kobj)
{
struct kernfs_node *sd;
const struct kobj_type *ktype;
@@ -625,9 +618,23 @@ void kobject_del(struct kobject *kobj)
kobj->state_in_sysfs = 0;
kobj_kset_leave(kobj);
- kobject_put(kobj->parent);
kobj->parent = NULL;
}
+
+/**
+ * kobject_del() - Unlink kobject from hierarchy.
+ * @kobj: object.
+ *
+ * This is the function that should be called to delete an object
+ * successfully added via kobject_add().
+ */
+void kobject_del(struct kobject *kobj)
+{
+ struct kobject *parent = kobj->parent;
+
+ __kobject_del(kobj);
+ kobject_put(parent);
+}
EXPORT_SYMBOL(kobject_del);
/**
@@ -663,6 +670,7 @@ EXPORT_SYMBOL(kobject_get_unless_zero);
*/
static void kobject_cleanup(struct kobject *kobj)
{
+ struct kobject *parent = kobj->parent;
struct kobj_type *t = get_ktype(kobj);
const char *name = kobj->name;
@@ -684,7 +692,7 @@ static void kobject_cleanup(struct kobject *kobj)
if (kobj->state_in_sysfs) {
pr_debug("kobject: '%s' (%p): auto cleanup kobject_del\n",
kobject_name(kobj), kobj);
- kobject_del(kobj);
+ __kobject_del(kobj);
}
if (t && t->release) {
@@ -698,6 +706,8 @@ static void kobject_cleanup(struct kobject *kobj)
pr_debug("kobject: '%s': free name\n", name);
kfree_const(name);
}
+
+ kobject_put(parent);
}
#ifdef CONFIG_DEBUG_KOBJECT_RELEASE
--
2.26.2
This is a note to let you know that I've just added the patch titled
driver core: Fix handling of SYNC_STATE_ONLY + STATELESS device links
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 44e960490ddf868fc9135151c4a658936e771dc2 Mon Sep 17 00:00:00 2001
From: Saravana Kannan <saravanak(a)google.com>
Date: Tue, 19 May 2020 21:36:26 -0700
Subject: driver core: Fix handling of SYNC_STATE_ONLY + STATELESS device links
Commit 21c27f06587d ("driver core: Fix SYNC_STATE_ONLY device link
implementation") didn't completely fix STATELESS + SYNC_STATE_ONLY
handling.
What looks like an optimization in that commit is actually a bug that
causes an if condition to always take the else path. This prevents
reordering of devices in the dpm_list when a DL_FLAG_STATELESS device
link is create on top of an existing DL_FLAG_SYNC_STATE_ONLY device
link.
Fixes: 21c27f06587d ("driver core: Fix SYNC_STATE_ONLY device link implementation")
Signed-off-by: Saravana Kannan <saravanak(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Link: https://lore.kernel.org/r/20200520043626.181820-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4e0e430315d9..0cad34f1eede 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -360,12 +360,14 @@ struct device_link *device_link_add(struct device *consumer,
if (flags & DL_FLAG_STATELESS) {
kref_get(&link->kref);
- link->flags |= DL_FLAG_STATELESS;
if (link->flags & DL_FLAG_SYNC_STATE_ONLY &&
- !(link->flags & DL_FLAG_STATELESS))
+ !(link->flags & DL_FLAG_STATELESS)) {
+ link->flags |= DL_FLAG_STATELESS;
goto reorder;
- else
+ } else {
+ link->flags |= DL_FLAG_STATELESS;
goto out;
+ }
}
/*
--
2.26.2