From: Tony Lindgren <tony(a)atomide.com>
[ Upstream commit 7078a5ba7a58e5db07583b176f8a03e0b8714731 ]
We have rst_map_012 used for various accelerators like dsp, ipu and iva.
For these use cases, we have rstctrl bit 2 control the subsystem module
reset, and have and bits 0 and 1 control the accelerator specific
features.
If the bootloader, or kexec boot, has left any accelerator specific
reset bits deasserted, deasserting bit 2 reset will potentially enable
an accelerator with unconfigured MMU and no firmware. And we may get
spammed with a lot by warnings on boot with "Data Access in User mode
during Functional access", or depending on the accelerator, the system
can also just hang.
This issue can be quite easily reproduced by setting a rst_map_012 type
rstctrl register to 0 or 4 in the bootloader, and booting the system.
Let's just assert all reset bits for rst_map_012 type resets. So far
it looks like the other rstctrl types don't need this. If it turns out
that the other type rstctrl bits also need reset on init, we need to
add an instance specific reset mask for the bits to avoid resetting
unwanted bits.
Reported-by: Carl Philipp Klemm <philipp(a)uvos.xyz>
Cc: Philipp Zabel <p.zabel(a)pengutronix.de>
Cc: Santosh Shilimkar <ssantosh(a)kernel.org>
Cc: Suman Anna <s-anna(a)ti.com>
Cc: Tero Kristo <t-kristo(a)ti.com>
Tested-by: Carl Philipp Klemm <philipp(a)uvos.xyz>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/soc/ti/omap_prm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
index 4d41dc3cdce1f..c8b14b3a171f7 100644
--- a/drivers/soc/ti/omap_prm.c
+++ b/drivers/soc/ti/omap_prm.c
@@ -552,6 +552,7 @@ static int omap_prm_reset_init(struct platform_device *pdev,
const struct omap_rst_map *map;
struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev);
char buf[32];
+ u32 v;
/*
* Check if we have controllable resets. If either rstctrl is non-zero
@@ -599,6 +600,16 @@ static int omap_prm_reset_init(struct platform_device *pdev,
map++;
}
+ /* Quirk handling to assert rst_map_012 bits on reset and avoid errors */
+ if (prm->data->rstmap == rst_map_012) {
+ v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl);
+ if ((v & reset->mask) != reset->mask) {
+ dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v);
+ writel_relaxed(reset->mask, reset->prm->base +
+ reset->prm->data->rstctrl);
+ }
+ }
+
return devm_reset_controller_register(&pdev->dev, &reset->rcdev);
}
--
2.27.0
During allocation the allocator will try to allocate an extent using
cluster policy. Once the current cluster is exhausted it will remove the
its entry under btrfs_free_cluster::lock and subsequently acquire
btrfs_free_space_ctl::tree_lock to dispose of the already-deleted
entry and adjust btrfs_free_space_ctl::total_bitmap. This poses a
problem because there exists a race condition between removing the
entry under one lock and doing the necessary accounting holding a
different lock since extent freeing only uses the 2nd lock. This can
result in the following situation:
T1: T2:
btrfs_alloc_from_cluster insert_into_bitmap <holds tree_lock>
if (entry->bytes == 0) if (block_group && !list_empty(&block_group->cluster_list)) {
rb_erase(entry)
spin_unlock(&cluster->lock);
(total_bitmaps is still 4) spin_lock(&cluster->lock);
<doesn't find entry in cluster->root>
spin_lock(&ctl->tree_lock); <goes to new_bitmap label, adds
<blocked since T2 holds tree_lock> <a new entry and calls add_new_bitmap>
recalculate_thresholds <crashes,
due to total_bitmaps
becoming 5 and triggering
an ASSERT>
To fix this ensure that once depleted, the cluster entry is deleted when
both cluster lock and tree locks are held in the allocator (T1), this
ensures that even if there is a race with a concurrent
insert_into_bitmap call it will correctly find the entry in the cluster
and add the new space to it.
Signed-off-by: Nikolay Borisov <nborisov(a)suse.com>
Cc: <stable(a)vger.kernel.org>
---
fs/btrfs/free-space-cache.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 0d6dcb5ff963..e386c468feaf 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3031,8 +3031,6 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group,
entry->bytes -= bytes;
}
- if (entry->bytes == 0)
- rb_erase(&entry->offset_index, &cluster->root);
break;
}
out:
@@ -3049,7 +3047,9 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group,
ctl->free_space -= bytes;
if (!entry->bitmap && !btrfs_free_space_trimmed(entry))
ctl->discardable_bytes[BTRFS_STAT_CURR] -= bytes;
+ spin_lock(&cluster->lock);
if (entry->bytes == 0) {
+ rb_erase(&entry->offset_index, &cluster->root);
ctl->free_extents--;
if (entry->bitmap) {
kmem_cache_free(btrfs_free_space_bitmap_cachep,
@@ -3062,6 +3062,7 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group,
kmem_cache_free(btrfs_free_space_cachep, entry);
}
+ spin_unlock(&cluster->lock);
spin_unlock(&ctl->tree_lock);
return ret;
--
2.25.1
On Wed, 2021-02-10 at 17:06 +0000, Julien Grall wrote:
> From: Julien Grall <jgrall(a)amazon.com>
>
> After Commit 3499ba8198cad ("xen: Fix event channel callback via
> INTX/GSI"), xenbus_probe() will be called too early on Arm. This will
> recent to a guest hang during boot.
>
> If there hang wasn't there, we would have ended up to call
> xenbus_probe() twice (the second time is in xenbus_probe_initcall()).
>
> We don't need to initialize xenbus_probe() early for Arm guest.
> Therefore, the call in xen_guest_init() is now removed.
>
> After this change, there is no more external caller for xenbus_probe().
> So the function is turned to a static one. Interestingly there were two
> prototypes for it.
>
> Fixes: 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI")
> Reported-by: Ian Jackson <iwj(a)xenproject.org>
> Signed-off-by: Julien Grall <jgrall(a)amazon.com>
Reviewed-by: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: stable(a)vger.kernel.org
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 70245f86c109e0eafb92ea9653184c0e44b4b35c
Gitweb: https://git.kernel.org/tip/70245f86c109e0eafb92ea9653184c0e44b4b35c
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 10 Feb 2021 16:27:41 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 10 Feb 2021 22:06:47 +01:00
x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()
Invoking x86_init.irqs.create_pci_msi_domain() before
x86_init.pci.arch_init() breaks XEN PV.
The XEN_PV specific pci.arch_init() function overrides the default
create_pci_msi_domain() which is obviously too late.
As a consequence the XEN PV PCI/MSI allocation goes through the native
path which runs out of vectors and causes malfunction.
Invoke it after x86_init.pci.arch_init().
Fixes: 6b15ffa07dc3 ("x86/irq: Initialize PCI/MSI domain at PCI init time")
Reported-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Juergen Gross <jgross(a)suse.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87pn18djte.fsf@nanos.tec.linutronix.de
---
arch/x86/pci/init.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c
index 00bfa1e..0bb3b8b 100644
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -9,16 +9,23 @@
in the right sequence from here. */
static __init int pci_arch_init(void)
{
- int type;
-
- x86_create_pci_msi_domain();
+ int type, pcbios = 1;
type = pci_direct_probe();
if (!(pci_probe & PCI_PROBE_NOEARLY))
pci_mmcfg_early_init();
- if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
+ if (x86_init.pci.arch_init)
+ pcbios = x86_init.pci.arch_init();
+
+ /*
+ * Must happen after x86_init.pci.arch_init(). Xen sets up the
+ * x86_init.irqs.create_pci_msi_domain there.
+ */
+ x86_create_pci_msi_domain();
+
+ if (!pcbios)
return 0;
pci_pcbios_init();
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slub: better heuristic for number of cpus when calculating slab order
When creating a new kmem cache, SLUB determines how large the slab pages will
based on number of inputs, including the number of CPUs in the system. Larger
slab pages mean that more objects can be allocated/free from per-cpu slabs
before accessing shared structures, but also potentially more memory can be
wasted due to low slab usage and fragmentation.
The rough idea of using number of CPUs is that larger systems will be more
likely to benefit from reduced contention, and also should have enough memory
to spare.
Number of CPUs used to be determined as nr_cpu_ids, which is number of possible
cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b
("mm/slub: let number of online CPUs determine the slub page order") changed it
to nr_online_cpus(). However, for kmem caches created early before CPUs are
onlined, this may lead to permamently low slab page sizes.
Vincent reports a regression [1] of hackbench on arm64 systems:
> I'm facing significant performances regression on a large arm64 server
> system (224 CPUs). Regressions is also present on small arm64 system
> (8 CPUs) but in a far smaller order of magnitude
> On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
> v5.11-rc4 : 9.135sec (+/- 0.45%)
> v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
> v5.10: 3.136sec (+/- 0.40%)
Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:
> i.e. the patch incurs a 7% to 32% performance penalty. This bisected
> cleanly yesterday when I was looking for the regression and then found
> the thread.
> Numerous caches change size. For example, kmalloc-512 goes from order-0
> (vanilla) to order-2 with the revert.
> So mostly this is down to the number of times SLUB calls into the page
> allocator which only caches order-0 pages on a per-cpu basis.
Clearly num_online_cpus() doesn't work too early in bootup. We could change
the order dynamically in a memory hotplug callback, but runtime order changing
for existing kmem caches has been already shown as dangerous, and removed in
32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be
resurrected in a safe manner with some effort, but to fix the regression we
need something simpler.
We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined. That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].
So this patch tries to determine the best available value without specific
arch knowledge.
- num_present_cpus() if the number is larger than 1, as that means the
arch is likely setting it properly
- nr_cpu_ids otherwise
This should fix the reported regressions while also keeping the effect of
045ab8c9487b for PowerPC systems. It's possible there are configurations
where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same
time bloated, so these (if they exist) would keep the large orders based
on nr_cpu_ids as was before 045ab8c9487b.
[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj…
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03Ys…
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Reported-by: Mel Gorman <mgorman(a)techsingularity.net>
Tested-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Bharata B Rao <bharata(a)linux.ibm.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
--- a/mm/slub.c~mm-slub-better-heuristic-for-number-of-cpus-when-calculating-slab-order
+++ a/mm/slub.c
@@ -3423,6 +3423,7 @@ static inline int calculate_order(unsign
unsigned int order;
unsigned int min_objects;
unsigned int max_objects;
+ unsigned int nr_cpus;
/*
* Attempt to find best configuration for a slab. This
@@ -3433,8 +3434,21 @@ static inline int calculate_order(unsign
* we reduce the minimum objects required in a slab.
*/
min_objects = slub_min_objects;
- if (!min_objects)
- min_objects = 4 * (fls(num_online_cpus()) + 1);
+ if (!min_objects) {
+ /*
+ * Some architectures will only update present cpus when
+ * onlining them, so don't trust the number if it's just 1. But
+ * we also don't want to use nr_cpu_ids always, as on some other
+ * architectures, there can be many possible cpus, but never
+ * onlined. Here we compromise between trying to avoid too high
+ * order on systems that appear larger than they are, and too
+ * low order on systems that appear smaller than they are.
+ */
+ nr_cpus = num_present_cpus();
+ if (nr_cpus <= 1)
+ nr_cpus = nr_cpu_ids;
+ min_objects = 4 * (fls(nr_cpus) + 1);
+ }
max_objects = order_objects(slub_max_order, size);
min_objects = min(min_objects, max_objects);
_
Hi,
While reconciling the lttng-modules writeback instrumentation with its counterpart
within the upstream Linux kernel, I notice that the following commit introduced in
5.6 is present in stable branches 5.4 and 5.5, but is missing from LTS stable branches
for 4.4, 4.9, 4.14, 4.19:
commit 68f23b89067fdf187763e75a56087550624fdbee
("memcg: fix a crash in wb_workfn when a device disappears")
Considering that this fix was CC'd to the stable mailing list, is there any
reason why it has not been integrated into those LTS branches ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
The recent rework of probe_kernel_address() and its conversion to
get_kernel_nofault() inadvertently broke is_prefetch(). Before this change,
probe_kernel_address() was used as a sloppy "read user or kernel memory"
helper, but it doesn't do that any more. The new get_kernel_nofault()
reads *kernel* memory only, which completely broke is_prefetch() for user
access.
Adjust the code to the the correct accessor based on access mode. The
manual address bounds check is no longer necessary, since the accessor
helpers (get_user() / get_kernel_nofault()) do the right thing all by
themselves. As a bonus, by using the correct accessor, we don't need the
open-coded address bounds check.
Fixes: eab0c6089b68 ("maccess: unify the probe kernel arch hooks")
Cc: stable(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
arch/x86/mm/fault.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..441c3e9b8971 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -54,7 +54,7 @@ kmmio_fault(struct pt_regs *regs, unsigned long addr)
* 32-bit mode:
*
* Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
- * Check that here and ignore it.
+ * Check that here and ignore it. This is AMD erratum #91.
*
* 64-bit mode:
*
@@ -83,11 +83,7 @@ check_prefetch_opcode(struct pt_regs *regs, unsigned char *instr,
#ifdef CONFIG_X86_64
case 0x40:
/*
- * In AMD64 long mode 0x40..0x4F are valid REX prefixes
- * Need to figure out under what instruction mode the
- * instruction was issued. Could check the LDT for lm,
- * but for now it's good enough to assume that long
- * mode only uses well known segments or kernel.
+ * In 64-bit mode 0x40..0x4F are valid REX prefixes
*/
return (!user_mode(regs) || user_64bit_mode(regs));
#endif
@@ -127,20 +123,31 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
instr = (void *)convert_ip_to_linear(current, regs);
max_instr = instr + 15;
- if (user_mode(regs) && instr >= (unsigned char *)TASK_SIZE_MAX)
- return 0;
+ /*
+ * This code has historically always bailed out if IP points to a
+ * not-present page (e.g. due to a race). No one has ever
+ * complained about this.
+ */
+ pagefault_disable();
while (instr < max_instr) {
unsigned char opcode;
- if (get_kernel_nofault(opcode, instr))
- break;
+ if (user_mode(regs)) {
+ if (get_user(opcode, instr))
+ break;
+ } else {
+ if (get_kernel_nofault(opcode, instr))
+ break;
+ }
instr++;
if (!check_prefetch_opcode(regs, instr, opcode, &prefetch))
break;
}
+
+ pagefault_enable();
return prefetch;
}
--
2.29.2
printk_safe_flush_on_panic() caused the following deadlock on our
server:
CPU0: CPU1:
panic rcu_dump_cpu_stacks
kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace
register_nmi_handler(crash_nmi_callback) printk_safe_flush
__printk_safe_flush
raw_spin_lock_irqsave(&read_lock)
// send NMI to other processors
apic_send_IPI_allbutself(NMI_VECTOR)
// NMI interrupt, dead loop
crash_nmi_callback
printk_safe_flush_on_panic
printk_safe_flush
__printk_safe_flush
// deadlock
raw_spin_lock_irqsave(&read_lock)
DEADLOCK: read_lock is taken on CPU1 and will never get released.
It happens when panic() stops a CPU by NMI while it has been in
the middle of printk_safe_flush().
Handle the lock the same way as logbuf_lock. The printk_safe buffers
are flushed only when both locks can be safely taken. It can avoid
the deadlock _in this particular case_ at expense of losing contents
of printk_safe buffers.
Note: It would actually be safe to re-init the locks when all CPUs were
stopped by NMI. But it would require passing this information
from arch-specific code. It is not worth the complexity.
Especially because logbuf_lock and printk_safe buffers have been
obsoleted by the lockless ring buffer.
Fixes: cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Cc: <stable(a)vger.kernel.org>
---
kernel/printk/printk_safe.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index a0e6f746de6c..2e9e3ed7d63e 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -45,6 +45,8 @@ struct printk_safe_seq_buf {
static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
static DEFINE_PER_CPU(int, printk_context);
+static DEFINE_RAW_SPINLOCK(safe_read_lock);
+
#ifdef CONFIG_PRINTK_NMI
static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
#endif
@@ -180,8 +182,6 @@ static void report_message_lost(struct printk_safe_seq_buf *s)
*/
static void __printk_safe_flush(struct irq_work *work)
{
- static raw_spinlock_t read_lock =
- __RAW_SPIN_LOCK_INITIALIZER(read_lock);
struct printk_safe_seq_buf *s =
container_of(work, struct printk_safe_seq_buf, work);
unsigned long flags;
@@ -195,7 +195,7 @@ static void __printk_safe_flush(struct irq_work *work)
* different CPUs. This is especially important when printing
* a backtrace.
*/
- raw_spin_lock_irqsave(&read_lock, flags);
+ raw_spin_lock_irqsave(&safe_read_lock, flags);
i = 0;
more:
@@ -232,7 +232,7 @@ static void __printk_safe_flush(struct irq_work *work)
out:
report_message_lost(s);
- raw_spin_unlock_irqrestore(&read_lock, flags);
+ raw_spin_unlock_irqrestore(&safe_read_lock, flags);
}
/**
@@ -278,6 +278,14 @@ void printk_safe_flush_on_panic(void)
raw_spin_lock_init(&logbuf_lock);
}
+ if (raw_spin_is_locked(&safe_read_lock)) {
+ if (num_online_cpus() > 1)
+ return;
+
+ debug_locks_off();
+ raw_spin_lock_init(&safe_read_lock);
+ }
+
printk_safe_flush();
}
--
2.11.0
This is a note to let you know that I've just added the patch titled
usb: quirks: add quirk to start video capture on ELMO L-12F document
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 1ebe718bb48278105816ba03a0408ecc2d6cf47f Mon Sep 17 00:00:00 2001
From: Stefan Ursella <stefan.ursella(a)wolfvision.net>
Date: Wed, 10 Feb 2021 15:07:11 +0100
Subject: usb: quirks: add quirk to start video capture on ELMO L-12F document
camera reliable
Without this quirk starting a video capture from the device often fails with
kernel: uvcvideo: Failed to set UVC probe control : -110 (exp. 34).
Signed-off-by: Stefan Ursella <stefan.ursella(a)wolfvision.net>
Link: https://lore.kernel.org/r/20210210140713.18711-1-stefan.ursella@wolfvision.…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 66a0dc618dfc..6ade3daf7858 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -391,6 +391,9 @@ static const struct usb_device_id usb_quirk_list[] = {
/* X-Rite/Gretag-Macbeth Eye-One Pro display colorimeter */
{ USB_DEVICE(0x0971, 0x2000), .driver_info = USB_QUIRK_NO_SET_INTF },
+ /* ELMO L-12F document camera */
+ { USB_DEVICE(0x09a1, 0x0028), .driver_info = USB_QUIRK_DELAY_CTRL_MSG },
+
/* Broadcom BCM92035DGROM BT dongle */
{ USB_DEVICE(0x0a5c, 0x2021), .driver_info = USB_QUIRK_RESET_RESUME },
--
2.30.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e013f455d95add874f310dc47c608e8c70692ae5 Mon Sep 17 00:00:00 2001
From: Sibi Sankar <sibis(a)codeaurora.org>
Date: Thu, 23 Jul 2020 01:40:45 +0530
Subject: [PATCH] remoteproc: qcom_q6v5_mss: Validate MBA firmware size before
load
The following mem abort is observed when the mba firmware size exceeds
the allocated mba region. MBA firmware size is restricted to a maximum
size of 1M and remaining memory region is used by modem debug policy
firmware when available. Hence verify whether the MBA firmware size lies
within the allocated memory region and is not greater than 1M before
loading.
Err Logs:
Unable to handle kernel paging request at virtual address
Mem abort info:
...
Call trace:
__memcpy+0x110/0x180
rproc_start+0x40/0x218
rproc_boot+0x5b4/0x608
state_store+0x54/0xf8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80
kernfs_fop_write+0x140/0x230
vfs_write+0xc4/0x208
ksys_write+0x74/0xf8
__arm64_sys_write+0x24/0x30
...
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Fixes: 051fb70fd4ea4 ("remoteproc: qcom: Driver for the self-authenticating Hexagon v5")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sibi Sankar <sibis(a)codeaurora.org>
Link: https://lore.kernel.org/r/20200722201047.12975-2-sibis@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c
index 03d7f3d702b3..7826f229957d 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -411,6 +411,12 @@ static int q6v5_load(struct rproc *rproc, const struct firmware *fw)
{
struct q6v5 *qproc = rproc->priv;
+ /* MBA is restricted to a maximum size of 1M */
+ if (fw->size > qproc->mba_size || fw->size > SZ_1M) {
+ dev_err(qproc->dev, "MBA firmware load failed\n");
+ return -EINVAL;
+ }
+
memcpy(qproc->mba_region, fw->data, fw->size);
return 0;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 29 Jan 2021 10:13:53 -0500
Subject: [PATCH] fgraph: Initialize tracing_graph_pause at task creation
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.
The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.
CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)
start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)
unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)
The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.
There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.
Cc: stable(a)vger.kernel.org
Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois(a)arm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/init/init_task.c b/init/init_task.c
index 8a992d73e6fb..3711cdaafed2 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -198,7 +198,8 @@ struct task_struct init_task
.lockdep_recursion = 0,
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- .ret_stack = NULL,
+ .ret_stack = NULL,
+ .tracing_graph_pause = ATOMIC_INIT(0),
#endif
#if defined(CONFIG_TRACING) && defined(CONFIG_PREEMPTION)
.trace_recursion = 0,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 73edb9e4f354..29a6ebeebc9e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -394,7 +394,6 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
}
if (t->ret_stack == NULL) {
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->curr_ret_stack = -1;
t->curr_ret_depth = -1;
@@ -489,7 +488,6 @@ static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
static void
graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
{
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visible before we add the ret_stack */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 03a58ea5905fdbd93ff9e52e670d802600ba38cd Mon Sep 17 00:00:00 2001
From: Kent Gibson <warthog618(a)gmail.com>
Date: Thu, 21 Jan 2021 22:10:38 +0800
Subject: [PATCH] gpiolib: cdev: clear debounce period if line set to output
When set_config changes a line from input to output debounce is
implicitly disabled, as debounce makes no sense for outputs, but the
debounce period is not being cleared and is still reported in the
line info.
So clear the debounce period when the debouncer is stopped in
edge_detector_stop().
Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index 1a7b51163528..1631727bf0da 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -776,6 +776,8 @@ static void edge_detector_stop(struct line *line)
cancel_delayed_work_sync(&line->work);
WRITE_ONCE(line->sw_debounced, 0);
WRITE_ONCE(line->eflags, 0);
+ if (line->desc)
+ WRITE_ONCE(line->desc->debounce_period_us, 0);
/* do not change line->level - see comment in debounced_value() */
}
A bit more than expected because apart from 9 failed-to-apply patches
there are lots of dependencies to them, but for the most part
automatically merged.
Hao Xu (1):
io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
Jens Axboe (2):
io_uring: account io_uring internal files as REQ_F_INFLIGHT
io_uring: if we see flush on exit, cancel related tasks
Pavel Begunkov (13):
io_uring: simplify io_task_match()
io_uring: add a {task,files} pair matching helper
io_uring: don't iterate io_uring_cancel_files()
io_uring: pass files into kill timeouts/poll
io_uring: always batch cancel in *cancel_files()
io_uring: fix files cancellation
io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE
io_uring: replace inflight_wait with tctx->wait
io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE
io_uring: fix list corruption for splice file_get
io_uring: fix sqo ownership false positive warning
io_uring: reinforce cancel on flush during exit
io_uring: drop mm/files between task_work_submit
fs/io-wq.c | 10 --
fs/io-wq.h | 1 -
fs/io_uring.c | 360 ++++++++++++++++++++------------------------------
3 files changed, 141 insertions(+), 230 deletions(-)
--
2.24.0
When creating a new kmem cache, SLUB determines how large the slab pages will
based on number of inputs, including the number of CPUs in the system. Larger
slab pages mean that more objects can be allocated/free from per-cpu slabs
before accessing shared structures, but also potentially more memory can be
wasted due to low slab usage and fragmentation.
The rough idea of using number of CPUs is that larger systems will be more
likely to benefit from reduced contention, and also should have enough memory
to spare.
Number of CPUs used to be determined as nr_cpu_ids, which is number of possible
cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b
("mm/slub: let number of online CPUs determine the slub page order") changed it
to nr_online_cpus(). However, for kmem caches created early before CPUs are
onlined, this may lead to permamently low slab page sizes.
Vincent reports a regression [1] of hackbench on arm64 systems:
> I'm facing significant performances regression on a large arm64 server
> system (224 CPUs). Regressions is also present on small arm64 system
> (8 CPUs) but in a far smaller order of magnitude
> On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
> v5.11-rc4 : 9.135sec (+/- 0.45%)
> v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
> v5.10: 3.136sec (+/- 0.40%)
Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:
> i.e. the patch incurs a 7% to 32% performance penalty. This bisected
> cleanly yesterday when I was looking for the regression and then found
> the thread.
> Numerous caches change size. For example, kmalloc-512 goes from order-0
> (vanilla) to order-2 with the revert.
> So mostly this is down to the number of times SLUB calls into the page
> allocator which only caches order-0 pages on a per-cpu basis.
Clearly num_online_cpus() doesn't work too early in bootup. We could change
the order dynamically in a memory hotplug callback, but runtime order changing
for existing kmem caches has been already shown as dangerous, and removed in
32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be
resurrected in a safe manner with some effort, but to fix the regression we
need something simpler.
We could use num_present_cpus() that should be the number of physically present
CPUs even before they are onlined. That would for for PowerPC [3], which
triggered the original commit, but that still doesn't work on arm64 [4] as
explained in [5].
So this patch tries to determine the best available value without specific arch
knowledge.
- num_present_cpus() if the number is larger than 1, as that means the arch is
likely setting it properly
- nr_cpu_ids otherwise
This should fix the reported regressions while also keeping the effect of
045ab8c9487b for PowerPC systems. It's possible there are configurations where
num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time
bloated, so these (if they exist) would keep the large orders based on
nr_cpu_ids as was before 045ab8c9487b.
[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj…
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03Ys…
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
Reported-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Reported-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
---
OK, this is a 5.11 regression, so we should try to it by 5.12. I've also
Cc'd stable for that reason although it's not a crash fix.
We can still try later to replace this with a safe order update in hotplug
callbacks, but that's infeasible for 5.12.
mm/slub.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 176b1cb0d006..8fc9190e6cb3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3454,6 +3454,7 @@ static inline int calculate_order(unsigned int size)
unsigned int order;
unsigned int min_objects;
unsigned int max_objects;
+ unsigned int nr_cpus;
/*
* Attempt to find best configuration for a slab. This
@@ -3464,8 +3465,21 @@ static inline int calculate_order(unsigned int size)
* we reduce the minimum objects required in a slab.
*/
min_objects = slub_min_objects;
- if (!min_objects)
- min_objects = 4 * (fls(num_online_cpus()) + 1);
+ if (!min_objects) {
+ /*
+ * Some architectures will only update present cpus when
+ * onlining them, so don't trust the number if it's just 1. But
+ * we also don't want to use nr_cpu_ids always, as on some other
+ * architectures, there can be many possible cpus, but never
+ * onlined. Here we compromise between trying to avoid too high
+ * order on systems that appear larger than they are, and too
+ * low order on systems that appear smaller than they are.
+ */
+ nr_cpus = num_present_cpus();
+ if (nr_cpus <= 1)
+ nr_cpus = nr_cpu_ids;
+ min_objects = 4 * (fls(nr_cpus) + 1);
+ }
max_objects = order_objects(slub_max_order, size);
min_objects = min(min_objects, max_objects);
--
2.30.0
From: Johannes Weiner <hannes(a)cmpxchg.org>
commit 739f79fc9db1b38f96b5a5109b247a650fbebf6d upstream
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:
BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
IP: test_clear_page_writeback+0x12e/0x2c0
[...]
RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
Call Trace:
<IRQ>
end_page_writeback+0x47/0x70
f2fs_write_end_io+0x76/0x180 [f2fs]
bio_endio+0x9f/0x120
blk_update_request+0xa8/0x2f0
scsi_end_request+0x39/0x1d0
scsi_io_completion+0x211/0x690
scsi_finish_command+0xd9/0x120
scsi_softirq_done+0x127/0x150
__blk_mq_complete_request_remote+0x13/0x20
flush_smp_call_function_queue+0x56/0x110
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x27/0x40
call_function_single_interrupt+0x89/0x90
RIP: 0010:native_safe_halt+0x6/0x10
(gdb) l *(test_clear_page_writeback+0x12e)
0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
614 mod_node_page_state(page_pgdat(page), idx, val);
615 if (mem_cgroup_disabled() || !page->mem_cgroup)
616 return;
617 mod_memcg_state(page->mem_cgroup, idx, val);
618 pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
619 this_cpu_add(pn->lruvec_stat->count[idx], val);
620 }
621
622 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
623 gfp_t gfp_mask,
The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback(). The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles. But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.
It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared. Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.
Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward. It
is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.
Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Tested-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Reported-by: Bradley Bolen <bradleybolen(a)gmail.com>
Tested-by: Brad Bolen <bradleybolen(a)gmail.com>
Cc: Vladimir Davydov <vdavydov(a)virtuozzo.com>
Cc: Michal Hocko <mhocko(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[guptap(a)codeaurora.org: Resolved merge conflicts]
Signed-off-by: Prakash Gupta <guptap(a)codeaurora.org>
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
This patch is present in a downstream Android tree:
https://source.mcwhirter.io/craige/bluecross/commit/d4a742865c6b69ef9316947…
and I happened to have stumbled across the same problem too.
Johannes can you review it for correctness with respect to the 4.9
kernel? Thanks!
include/linux/memcontrol.h | 33 ++++++++++++++++++++++++-----
mm/memcontrol.c | 43 +++++++++++++++++++++++++++-----------
mm/page-writeback.c | 14 ++++++++++---
3 files changed, 70 insertions(+), 20 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8b35bdbdc214..fd77f8303ab9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -490,9 +490,21 @@ bool mem_cgroup_oom_synchronize(bool wait);
extern int do_swap_account;
#endif
-void lock_page_memcg(struct page *page);
+struct mem_cgroup *lock_page_memcg(struct page *page);
+void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);
+static inline void __mem_cgroup_update_page_stat(struct page *page,
+ struct mem_cgroup *memcg,
+ enum mem_cgroup_stat_index idx,
+ int val)
+{
+ VM_BUG_ON(!(rcu_read_lock_held() || PageLocked(page)));
+
+ if (memcg && memcg->stat)
+ this_cpu_add(memcg->stat->count[idx], val);
+}
+
/**
* mem_cgroup_update_page_stat - update page state statistics
* @page: the page
@@ -508,13 +520,12 @@ void unlock_page_memcg(struct page *page);
* mem_cgroup_update_page_stat(page, state, -1);
* unlock_page(page) or unlock_page_memcg(page)
*/
+
static inline void mem_cgroup_update_page_stat(struct page *page,
enum mem_cgroup_stat_index idx, int val)
{
- VM_BUG_ON(!(rcu_read_lock_held() || PageLocked(page)));
- if (page->mem_cgroup)
- this_cpu_add(page->mem_cgroup->stat->count[idx], val);
+ __mem_cgroup_update_page_stat(page, page->mem_cgroup, idx, val);
}
static inline void mem_cgroup_inc_page_stat(struct page *page,
@@ -709,7 +720,12 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
{
}
-static inline void lock_page_memcg(struct page *page)
+static inline struct mem_cgroup *lock_page_memcg(struct page *page)
+{
+ return NULL;
+}
+
+static inline void __unlock_page_memcg(struct mem_cgroup *memcg)
{
}
@@ -745,6 +761,13 @@ static inline void mem_cgroup_update_page_stat(struct page *page,
{
}
+static inline void __mem_cgroup_update_page_stat(struct page *page,
+ struct mem_cgroup *memcg,
+ enum mem_cgroup_stat_index idx,
+ int nr)
+{
+}
+
static inline void mem_cgroup_inc_page_stat(struct page *page,
enum mem_cgroup_stat_index idx)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d4232744c59f..27b0b4f03fcd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1638,9 +1638,13 @@ bool mem_cgroup_oom_synchronize(bool handle)
* @page: the page
*
* This function protects unlocked LRU pages from being moved to
- * another cgroup and stabilizes their page->mem_cgroup binding.
+ * another cgroup.
+ *
+ * It ensures lifetime of the returned memcg. Caller is responsible
+ * for the lifetime of the page; __unlock_page_memcg() is available
+ * when @page might get freed inside the locked section.
*/
-void lock_page_memcg(struct page *page)
+struct mem_cgroup *lock_page_memcg(struct page *page)
{
struct mem_cgroup *memcg;
unsigned long flags;
@@ -1649,18 +1653,24 @@ void lock_page_memcg(struct page *page)
* The RCU lock is held throughout the transaction. The fast
* path can get away without acquiring the memcg->move_lock
* because page moving starts with an RCU grace period.
- */
+ *
+ * The RCU lock also protects the memcg from being freed when
+ * the page state that is going to change is the only thing
+ * preventing the page itself from being freed. E.g. writeback
+ * doesn't hold a page reference and relies on PG_writeback to
+ * keep off truncation, migration and so forth.
+ */
rcu_read_lock();
if (mem_cgroup_disabled())
- return;
+ return NULL;
again:
memcg = page->mem_cgroup;
if (unlikely(!memcg))
- return;
+ return NULL;
if (atomic_read(&memcg->moving_account) <= 0)
- return;
+ return memcg;
spin_lock_irqsave(&memcg->move_lock, flags);
if (memcg != page->mem_cgroup) {
@@ -1676,18 +1686,18 @@ void lock_page_memcg(struct page *page)
memcg->move_lock_task = current;
memcg->move_lock_flags = flags;
- return;
+ return memcg;
}
EXPORT_SYMBOL(lock_page_memcg);
/**
- * unlock_page_memcg - unlock a page->mem_cgroup binding
- * @page: the page
+ * __unlock_page_memcg - unlock and unpin a memcg
+ * @memcg: the memcg
+ *
+ * Unlock and unpin a memcg returned by lock_page_memcg().
*/
-void unlock_page_memcg(struct page *page)
+void __unlock_page_memcg(struct mem_cgroup *memcg)
{
- struct mem_cgroup *memcg = page->mem_cgroup;
-
if (memcg && memcg->move_lock_task == current) {
unsigned long flags = memcg->move_lock_flags;
@@ -1699,6 +1709,15 @@ void unlock_page_memcg(struct page *page)
rcu_read_unlock();
}
+
+/**
+ * unlock_page_memcg - unlock a page->mem_cgroup binding
+ * @page: the page
+ */
+void unlock_page_memcg(struct page *page)
+{
+ __unlock_page_memcg(page->mem_cgroup);
+}
EXPORT_SYMBOL(unlock_page_memcg);
/*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 462c778b9fb5..498c924f2fcd 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2717,9 +2717,10 @@ EXPORT_SYMBOL(clear_page_dirty_for_io);
int test_clear_page_writeback(struct page *page)
{
struct address_space *mapping = page_mapping(page);
+ struct mem_cgroup *memcg;
int ret;
- lock_page_memcg(page);
+ memcg = lock_page_memcg(page);
if (mapping && mapping_use_writeback_tags(mapping)) {
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -2747,13 +2748,20 @@ int test_clear_page_writeback(struct page *page)
} else {
ret = TestClearPageWriteback(page);
}
+ /*
+ * NOTE: Page might be free now! Writeback doesn't hold a page
+ * reference on its own, it relies on truncation to wait for
+ * the clearing of PG_writeback. The below can only access
+ * page state that is static across allocation cycles.
+ */
if (ret) {
- mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_WRITEBACK);
+ __mem_cgroup_update_page_stat(page, memcg,
+ MEM_CGROUP_STAT_WRITEBACK, -1);
dec_node_page_state(page, NR_WRITEBACK);
dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
inc_node_page_state(page, NR_WRITTEN);
}
- unlock_page_memcg(page);
+ __unlock_page_memcg(memcg);
return ret;
}
--
2.25.1
commit 97c753e62e6c31a404183898d950d8c08d752dbd upstream.
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@de…
Cc: stable(a)vger.kernel.org
Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv(a)arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
include/linux/kprobes.h | 2 +-
kernel/kprobes.c | 34 +++++++++++++++++++++++++---------
kernel/trace/trace_kprobe.c | 4 ++--
3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9f22652d69bb..c28204e22b54 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -245,7 +245,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
extern bool arch_kprobe_on_func_entry(unsigned long offset);
-extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
+extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
extern int kprobe_add_ksym_blacklist(unsigned long entry);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2161f519d481..ebbd4320143d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1921,29 +1921,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset)
return !offset;
}
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+/**
+ * kprobe_on_func_entry() -- check whether given address is function entry
+ * @addr: Target address
+ * @sym: Target symbol name
+ * @offset: The offset from the symbol or the address
+ *
+ * This checks whether the given @addr+@offset or @sym+@offset is on the
+ * function entry address or not.
+ * This returns 0 if it is the function entry, or -EINVAL if it is not.
+ * And also it returns -ENOENT if it fails the symbol or address lookup.
+ * Caller must pass @addr or @sym (either one must be NULL), or this
+ * returns -EINVAL.
+ */
+int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
if (IS_ERR(kp_addr))
- return false;
+ return PTR_ERR(kp_addr);
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
+ return -ENOENT;
- return true;
+ if (!arch_kprobe_on_func_entry(offset))
+ return -EINVAL;
+
+ return 0;
}
int register_kretprobe(struct kretprobe *rp)
{
- int ret = 0;
+ int ret;
struct kretprobe_instance *inst;
int i;
void *addr;
- if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
- return -EINVAL;
+ ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
+ if (ret)
+ return ret;
if (kretprobe_blacklist_size) {
addr = kprobe_addr(&rp->kp);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5c17f70c7f2d..61eff45653f5 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -112,9 +112,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call)
{
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
- return kprobe_on_func_entry(tk->rp.kp.addr,
+ return (kprobe_on_func_entry(tk->rp.kp.addr,
tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name,
- tk->rp.kp.addr ? 0 : tk->rp.kp.offset);
+ tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0);
}
bool trace_kprobe_error_injectable(struct trace_event_call *call)
This is the start of the stable review cycle for the 5.10.15 release.
There are 120 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.15-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.15-rc1
Alexander Ovechkin <ovov(a)yandex-team.ru>
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
DENG Qingfang <dqfext(a)gmail.com>
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Dongseok Yi <dseok.yi(a)samsung.com>
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
Vadim Fedorenko <vfedorenko(a)novek.ru>
net: ip_tunnel: fix mtu calculation
Chinmay Agarwal <chinagar(a)codeaurora.org>
neighbour: Prevent a dead entry from updating gc_list
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
igc: Report speed and duplex as unknown when device is runtime suspended
Xiao Ni <xni(a)redhat.com>
md: Set prev_flush_start and flush_bio in an atomic way
Marek Vasut <marex(a)denx.de>
Input: ili210x - implement pressure reporting for ILI251x
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
AngeloGioacchino Del Regno <angelogioacchino.delregno(a)somainline.org>
Input: goodix - add support for Goodix GT9286 chip
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Lai Jiangshan <laijs(a)linux.alibaba.com>
x86/debug: Prevent data breakpoints on cpu_dr7
Lai Jiangshan <laijs(a)linux.alibaba.com>
x86/debug: Prevent data breakpoints on __per_cpu_offset
Peter Zijlstra <peterz(a)infradead.org>
x86/debug: Fix DR6 handling
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Waiman Long <longman(a)redhat.com>
mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
Hugh Dickins <hughd(a)google.com>
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Rick Edgecombe <rick.p.edgecombe(a)intel.com>
mm/vmalloc: separate put pages and flush VM flags
Rokudo Yan <wu-yan(a)tcl.com>
mm, compaction: move high_pfn to the for loop scope
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between freeing and dissolving the page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Dmitry Osipenko <digetx(a)gmail.com>
ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
H. Nikolaus Schaller <hns(a)goldelico.com>
ARM: dts; gta04: SPI panel chip select is active low
H. Nikolaus Schaller <hns(a)goldelico.com>
DTS: ARM: gta04: remove legacy spi-cs-high to make display work again
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode
Michael Roth <michael.roth(a)amd.com>
KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off
Ben Gardon <bgardon(a)google.com>
KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
Thorsten Leemhuis <linux(a)leemhuis.info>
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Xiaoguang Wang <xiaoguang.wang(a)linux.alibaba.com>
io_uring: don't modify identity's files uncess identity is cowed
Stylon Wang <stylon.wang(a)amd.com>
drm/amd/display: Revert "Fix EDID parsing after resume from suspend"
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915: Power up combo PHY lanes for for HDMI as well
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915: Extract intel_ddi_power_up_lanes()
Andres Calderon Jaramillo <andrescj(a)chromium.org>
drm/i915/display: Prevent double YUV range correction on HDR planes
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/gt: Close race between enable_breadcrumbs and cancel_breadcrumbs
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/gem: Drop lru bumping on display unpinning
Imre Deak <imre.deak(a)intel.com>
drm/i915: Fix the MST PBN divider calculation
Imre Deak <imre.deak(a)intel.com>
drm/dp/mst: Export drm_dp_get_vc_payload_bw()
Peter Gonda <pgonda(a)google.com>
Fix unsynchronized access to sev members through svm_register_enc_region
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Ulf Hansson <ulf.hansson(a)linaro.org>
mmc: sdhci-pltfm: Fix linking err for sdhci-brcmstb
Pavel Shilovsky <pshilov(a)microsoft.com>
smb3: fix crediting for compounding when only one request in flight
Gustavo A. R. Silva <gustavoars(a)kernel.org>
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Joerg Roedel <jroedel(a)suse.de>
iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Atish Patra <atish.patra(a)wdc.com>
RISC-V: Define MAXPHYSMEM_1GB only for RV32
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: fix bounce buffer usage for non-sg list case
Rolf Eike Beer <eb(a)emlix.com>
scripts: use pkg-config to locate libcrypto
Marc Zyngier <maz(a)kernel.org>
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Hans de Goede <hdegoede(a)redhat.com>
genirq: Prevent [devm_]irq_alloc_desc from returning irq 0
Dan Williams <dan.j.williams(a)intel.com>
libnvdimm/dimm: Avoid race between probe and available_slots_show()
Dan Williams <dan.j.williams(a)intel.com>
libnvdimm/namespace: Fix visibility of namespace resource attribute
Alexey Kardashevskiy <aik(a)ozlabs.ru>
tracepoint: Fix race between tracing and removing tracepoint
Viktor Rosendahl <Viktor.Rosendahl(a)bmw.de>
tracing: Use pause-on-trace with the latency tracers
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Masami Hiramatsu <mhiramat(a)kernel.org>
tracing/kprobe: Fix to support kretprobe events on unloaded modules
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
fgraph: Initialize tracing_graph_pause at task creation
Quanyang Wang <quanyang.wang(a)windriver.com>
gpiolib: free device name on error path to fix kmemleak
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Sargun Dhillon <sargun(a)sargun.me>
ovl: implement volatile-specific fsync error behaviour
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: avoid deadlock on directory ioctl
Liangyan <liangyan.peng(a)linux.alibaba.com>
ovl: fix dentry leak in ovl_get_redirect
Mario Limonciello <mario.limonciello(a)dell.com>
thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link()
Masahiro Yamada <masahiroy(a)kernel.org>
kbuild: fix duplicated flags in DEBUG_CFLAGS
Roman Gushchin <guro(a)fb.com>
memblock: do not start bottom-up allocations with kernel_end
Eli Cohen <elic(a)nvidia.com>
vdpa/mlx5: Restore the hardware used index after change map
Sagi Grimberg <sagi(a)grimberg.me>
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
Hermann Lauer <Hermann.Lauer(a)uni-heidelberg.de>
ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
Dan Carpenter <dan.carpenter(a)oracle.com>
net: ipa: pass correct dma_handle to dma_free_coherent()
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: TCAM entry enable should be written after SRAM data
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Maor Dickman <maord(a)nvidia.com>
net/mlx5e: Release skb in case of failure in tc update skb
Maxim Mikityanskiy <maximmi(a)mellanox.com>
net/mlx5e: Update max_opened_tc also when channels are closed
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix leak upon failure of rule creation
Daniel Jurgens <danielj(a)nvidia.com>
net/mlx5: Fix function calculation for page trees
Lijun Pan <ljp(a)linux.ibm.com>
ibmvnic: device remove has higher precedence over reset
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
Kevin Lo <kevlo(a)kevlo.org>
igc: check return value of ret_val in igc_config_fc_after_link_up
Kevin Lo <kevlo(a)kevlo.org>
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Fix NFS READs that start at non-page-aligned offsets
Zyta Szpak <zr(a)semihalf.com>
arm64: dts: ls1046a: fix dcfg address range
David Howells <dhowells(a)redhat.com>
rxrpc: Fix deadlock around release of dst cached on udp tunnel
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: work around RTL8125 UDP hw bug
Marek Szyprowski <m.szyprowski(a)samsung.com>
arm64: dts: meson: switch TFLASH_VDD_EN pin to open drain on Odroid-C4
Quentin Monnet <quentin(a)isovalent.com>
bpf, preload: Fix build when $(O) points to a relative path
Johannes Berg <johannes.berg(a)intel.com>
um: virtio: free vu_dev only with the contained struct device
Pan Bian <bianpan2016(a)163.com>
bpf, inode_storage: Put file handler if no storage was found
Loris Reiff <loris.reiff(a)liblor.ch>
bpf, cgroup: Fix problematic bounds check
Loris Reiff <loris.reiff(a)liblor.ch>
bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
Eli Cohen <elic(a)nvidia.com>
vdpa/mlx5: Fix memory key MTT population
Marek Vasut <marex(a)denx.de>
ARM: dts: stm32: Fix GPIO hog flags on DHCOM DRC02
Marek Vasut <marex(a)denx.de>
ARM: dts: stm32: Disable optional TSC2004 on DRC02 board
Marek Vasut <marex(a)denx.de>
ARM: dts: stm32: Disable WP on DHCOM uSD slot
Marek Vasut <marex(a)denx.de>
ARM: dts: stm32: Connect card-detect signal on DHCOM
Marek Vasut <marex(a)denx.de>
ARM: dts: stm32: Fix polarity of the DH DRC02 uSD card detect
Simon South <simon(a)simonsouth.net>
arm64: dts: rockchip: Use only supported PCIe link speed on Pinebook Pro
Sandy Huang <hjc(a)rock-chips.com>
arm64: dts: rockchip: fix vopl iommu irq on px30
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
arm64: dts: amlogic: meson-g12: Set FL-adj property value
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Shawn Guo <shawn.guo(a)linaro.org>
arm64: dts: qcom: c630: keep both touchpad devices enabled
Linus Walleij <linus.walleij(a)linaro.org>
ARM: OMAP1: OSK: fix ohci-omap breakage
Chunfeng Yun <chunfeng.yun(a)mediatek.com>
usb: xhci-mtk: break loop when find the endpoint to drop
Chunfeng Yun <chunfeng.yun(a)mediatek.com>
usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints
Ikjoon Jang <ikjn(a)chromium.org>
usb: xhci-mtk: fix unreleased bandwidth data
Gary Bisson <gary.bisson(a)boundarydevices.com>
usb: dwc3: fix clock issue during resume in OTG mode
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
kernel test robot <lkp(a)intel.com>
usb: gadget: aspeed: add missing of_node_put
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Pali Rohár <pali(a)kernel.org>
usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
-------------
Diffstat:
Documentation/filesystems/overlayfs.rst | 8 ++
Makefile | 14 +--
arch/arm/boot/dts/omap3-gta04.dtsi | 3 +-
arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi | 12 +-
arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi | 3 +-
arch/arm/boot/dts/sun7i-a20-bananapro.dts | 2 +-
arch/arm/include/debug/tegra.S | 54 ++++-----
arch/arm/mach-footbridge/dc21285.c | 12 +-
arch/arm/mach-omap1/board-osk.c | 2 +
arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 2 +-
.../arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts | 2 +-
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +-
.../boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 10 +-
arch/arm64/boot/dts/rockchip/px30.dtsi | 2 +-
.../boot/dts/rockchip/rk3399-pinebook-pro.dts | 1 -
arch/riscv/Kconfig | 2 +
arch/um/drivers/virtio_uml.c | 3 +-
arch/x86/Makefile | 3 +
arch/x86/include/asm/apic.h | 10 --
arch/x86/include/asm/barrier.h | 18 +++
arch/x86/kernel/apic/apic.c | 4 +
arch/x86/kernel/apic/x2apic_cluster.c | 6 +-
arch/x86/kernel/apic/x2apic_phys.c | 9 +-
arch/x86/kernel/hw_breakpoint.c | 61 ++++++----
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/emulate.c | 2 +
arch/x86/kvm/mmu/tdp_mmu.c | 6 +-
arch/x86/kvm/svm/sev.c | 17 +--
arch/x86/kvm/svm/svm.c | 5 +
arch/x86/kvm/vmx/vmx.c | 17 ++-
arch/x86/kvm/x86.c | 27 +++--
arch/x86/mm/mem_encrypt.c | 1 +
drivers/gpio/gpiolib.c | 10 +-
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -
drivers/gpu/drm/drm_dp_mst_topology.c | 24 +++-
drivers/gpu/drm/i915/display/intel_ddi.c | 37 +++---
drivers/gpu/drm/i915/display/intel_display.c | 9 +-
drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 +-
drivers/gpu/drm/i915/display/intel_overlay.c | 4 +-
drivers/gpu/drm/i915/display/intel_sprite.c | 65 ++---------
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 45 -------
drivers/gpu/drm/i915/gem/i915_gem_object.h | 1 -
drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 6 +-
drivers/input/joystick/xpad.c | 17 ++-
drivers/input/serio/i8042-x86ia64io.h | 2 +
drivers/input/touchscreen/goodix.c | 2 +
drivers/input/touchscreen/ili210x.c | 26 +++--
drivers/md/md.c | 2 +
drivers/mmc/core/sdio_cis.c | 6 +
drivers/mmc/host/sdhci-pltfm.h | 7 +-
drivers/net/dsa/mv88e6xxx/chip.c | 6 +-
drivers/net/ethernet/ibm/ibmvnic.c | 5 -
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +--
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 -
drivers/net/ethernet/intel/igc/igc_ethtool.c | 3 +-
drivers/net/ethernet/intel/igc/igc_i225.c | 3 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 2 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 10 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 16 ++-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +
.../net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
drivers/net/ethernet/realtek/r8169_main.c | 75 ++++++++++--
drivers/net/ipa/gsi.c | 2 +-
drivers/nvdimm/dimm_devs.c | 18 ++-
drivers/nvdimm/namespace_devs.c | 10 +-
drivers/nvme/host/pci.c | 2 +
drivers/nvme/target/tcp.c | 3 +-
drivers/thunderbolt/acpi.c | 2 +-
drivers/usb/class/usblp.c | 19 +--
drivers/usb/dwc2/gadget.c | 8 +-
drivers/usb/dwc3/core.c | 2 +-
drivers/usb/gadget/legacy/ether.c | 4 +-
drivers/usb/gadget/udc/aspeed-vhub/hub.c | 4 +-
drivers/usb/host/xhci-mtk-sch.c | 130 +++++++++++++++------
drivers/usb/host/xhci-mtk.c | 2 +
drivers/usb/host/xhci-mtk.h | 15 +++
drivers/usb/host/xhci-mvebu.c | 42 +++++++
drivers/usb/host/xhci-mvebu.h | 6 +
drivers/usb/host/xhci-plat.c | 20 +++-
drivers/usb/host/xhci-plat.h | 1 +
drivers/usb/host/xhci-ring.c | 31 +++--
drivers/usb/host/xhci.c | 8 +-
drivers/usb/host/xhci.h | 4 +
drivers/usb/renesas_usbhs/fifo.c | 1 +
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/option.c | 6 +
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 28 ++---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 +++
fs/afs/main.c | 6 +-
fs/cifs/dir.c | 22 +++-
fs/cifs/smb2pdu.h | 2 +-
fs/cifs/transport.c | 18 ++-
fs/hugetlbfs/inode.c | 3 +-
fs/io_uring.c | 6 -
fs/overlayfs/dir.c | 2 +-
fs/overlayfs/file.c | 5 +-
fs/overlayfs/overlayfs.h | 1 +
fs/overlayfs/ovl_entry.h | 2 +
fs/overlayfs/readdir.c | 28 ++---
fs/overlayfs/super.c | 34 ++++--
fs/overlayfs/util.c | 27 +++++
include/drm/drm_dp_mst_helper.h | 1 +
include/linux/hugetlb.h | 2 +
include/linux/iommu.h | 5 +-
include/linux/irq.h | 4 +-
include/linux/kprobes.h | 2 +-
include/linux/msi.h | 6 +
include/linux/tracepoint.h | 12 +-
include/linux/vmalloc.h | 9 +-
include/net/sch_generic.h | 2 +-
include/net/udp.h | 2 +-
init/init_task.c | 3 +-
kernel/bpf/bpf_inode_storage.c | 6 +-
kernel/bpf/cgroup.c | 7 +-
kernel/bpf/preload/Makefile | 5 +-
kernel/irq/msi.c | 44 ++++---
kernel/kprobes.c | 36 ++++--
kernel/trace/fgraph.c | 2 -
kernel/trace/trace_irqsoff.c | 4 +
kernel/trace/trace_kprobe.c | 10 +-
mm/compaction.c | 3 +-
mm/filemap.c | 4 +
mm/huge_memory.c | 37 +++---
mm/hugetlb.c | 48 +++++++-
mm/memblock.c | 49 +-------
net/core/neighbour.c | 7 +-
net/ipv4/ip_tunnel.c | 16 ++-
net/ipv4/udp_offload.c | 69 ++++++++++-
net/ipv6/udp_offload.c | 2 +-
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 +-
net/mac80211/rate.c | 3 +-
net/rxrpc/af_rxrpc.c | 6 +-
net/sunrpc/svcsock.c | 7 +-
scripts/Makefile | 8 +-
137 files changed, 1106 insertions(+), 606 deletions(-)
We park SQPOLL task before going into io_uring_cancel_files(), so the
task won't run task_works including those that might be important for
the cancellation passes. In this case it's io_poll_remove_one(), which
frees requests via io_put_req_deferred().
Unpark it for while waiting, it's ok as we disable submissions
beforehand, so no new will be generated.
INFO: task syz-executor893:8493 blocked for more than 143 seconds.
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
io_uring_cancel_files fs/io_uring.c:8912 [inline]
io_uring_cancel_task_requests+0xe70/0x11a0 fs/io_uring.c:8979
__io_uring_files_cancel+0x110/0x1b0 fs/io_uring.c:9067
io_uring_files_cancel include/linux/io_uring.h:51 [inline]
do_exit+0x2fe/0x2ae0 kernel/exit.c:780
do_group_exit+0x125/0x310 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Cc: stable(a)vger.kernel.org # 5.5+
Reported-by: syzbot+695b03d82fa8e4901b06(a)syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/io_uring.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6b73e38aa1a9..1e803a9afc8e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9056,11 +9056,16 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
break;
io_uring_try_cancel_requests(ctx, task, files);
+
+ if (ctx->sq_data)
+ io_sq_thread_unpark(ctx->sq_data);
prepare_to_wait(&task->io_uring->wait, &wait,
TASK_UNINTERRUPTIBLE);
if (inflight == io_uring_count_inflight(ctx, task, files))
schedule();
finish_wait(&task->io_uring->wait, &wait);
+ if (ctx->sq_data)
+ io_sq_thread_park(ctx->sq_data);
}
}
--
2.24.0
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@de…
Cc: stable(a)vger.kernel.org
Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv(a)arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
include/linux/kprobes.h | 2 +-
kernel/kprobes.c | 34 +++++++++++++++++++++++++---------
kernel/trace/trace_kprobe.c | 4 ++--
3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9f22652d69bb..c28204e22b54 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -245,7 +245,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
extern bool arch_kprobe_on_func_entry(unsigned long offset);
-extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
+extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
extern int kprobe_add_ksym_blacklist(unsigned long entry);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2161f519d481..ebbd4320143d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1921,29 +1921,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset)
return !offset;
}
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+/**
+ * kprobe_on_func_entry() -- check whether given address is function entry
+ * @addr: Target address
+ * @sym: Target symbol name
+ * @offset: The offset from the symbol or the address
+ *
+ * This checks whether the given @addr+@offset or @sym+@offset is on the
+ * function entry address or not.
+ * This returns 0 if it is the function entry, or -EINVAL if it is not.
+ * And also it returns -ENOENT if it fails the symbol or address lookup.
+ * Caller must pass @addr or @sym (either one must be NULL), or this
+ * returns -EINVAL.
+ */
+int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
if (IS_ERR(kp_addr))
- return false;
+ return PTR_ERR(kp_addr);
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
+ return -ENOENT;
- return true;
+ if (!arch_kprobe_on_func_entry(offset))
+ return -EINVAL;
+
+ return 0;
}
int register_kretprobe(struct kretprobe *rp)
{
- int ret = 0;
+ int ret;
struct kretprobe_instance *inst;
int i;
void *addr;
- if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
- return -EINVAL;
+ ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
+ if (ret)
+ return ret;
if (kretprobe_blacklist_size) {
addr = kprobe_addr(&rp->kp);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5c17f70c7f2d..61eff45653f5 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -112,9 +112,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call)
{
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
- return kprobe_on_func_entry(tk->rp.kp.addr,
+ return (kprobe_on_func_entry(tk->rp.kp.addr,
tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name,
- tk->rp.kp.addr ? 0 : tk->rp.kp.offset);
+ tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0);
}
bool trace_kprobe_error_injectable(struct trace_event_call *call)
commit 97c753e62e6c31a404183898d950d8c08d752dbd upstream.
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@de…
Cc: stable(a)vger.kernel.org
Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv(a)arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
include/linux/kprobes.h | 2 +-
kernel/kprobes.c | 34 +++++++++++++++++++++++++---------
kernel/trace/trace_kprobe.c | 10 ++++++----
3 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index a60488867dd0..a121fd8e7c3a 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -232,7 +232,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
extern bool arch_kprobe_on_func_entry(unsigned long offset);
-extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
+extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
extern int kprobe_add_ksym_blacklist(unsigned long entry);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 283c8b01ce78..8f9fbc74021d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1948,29 +1948,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset)
return !offset;
}
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+/**
+ * kprobe_on_func_entry() -- check whether given address is function entry
+ * @addr: Target address
+ * @sym: Target symbol name
+ * @offset: The offset from the symbol or the address
+ *
+ * This checks whether the given @addr+@offset or @sym+@offset is on the
+ * function entry address or not.
+ * This returns 0 if it is the function entry, or -EINVAL if it is not.
+ * And also it returns -ENOENT if it fails the symbol or address lookup.
+ * Caller must pass @addr or @sym (either one must be NULL), or this
+ * returns -EINVAL.
+ */
+int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
if (IS_ERR(kp_addr))
- return false;
+ return PTR_ERR(kp_addr);
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
+ return -ENOENT;
- return true;
+ if (!arch_kprobe_on_func_entry(offset))
+ return -EINVAL;
+
+ return 0;
}
int register_kretprobe(struct kretprobe *rp)
{
- int ret = 0;
+ int ret;
struct kretprobe_instance *inst;
int i;
void *addr;
- if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
- return -EINVAL;
+ ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
+ if (ret)
+ return ret;
if (kretprobe_blacklist_size) {
addr = kprobe_addr(&rp->kp);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1074a69beff3..233322c77b76 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -220,9 +220,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call)
{
struct trace_kprobe *tk = trace_kprobe_primary_from_call(call);
- return tk ? kprobe_on_func_entry(tk->rp.kp.addr,
+ return tk ? (kprobe_on_func_entry(tk->rp.kp.addr,
tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name,
- tk->rp.kp.addr ? 0 : tk->rp.kp.offset) : false;
+ tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0) : false;
}
bool trace_kprobe_error_injectable(struct trace_event_call *call)
@@ -811,9 +811,11 @@ static int trace_kprobe_create(int argc, const char *argv[])
trace_probe_log_err(0, BAD_PROBE_ADDR);
goto parse_error;
}
- if (kprobe_on_func_entry(NULL, symbol, offset))
+ ret = kprobe_on_func_entry(NULL, symbol, offset);
+ if (ret == 0)
flags |= TPARG_FL_FENTRY;
- if (offset && is_return && !(flags & TPARG_FL_FENTRY)) {
+ /* Defer the ENOENT case until register kprobe */
+ if (ret == -EINVAL && is_return) {
trace_probe_log_err(0, BAD_RETPROBE);
goto parse_error;
}
From: Dave Hansen <dave.hansen(a)linux.intel.com>
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly checks the RECLAIM_ZONE bit. The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'.
The RECLAIM_ZONE #define was removed in a cleanup. That, by itself
is fine.
But, when the bit was removed (bit 0) the _other_ bit locations also
got changed. That's not OK because the bit values are documented to
mean one specific thing and users surely rely on them meaning that one
thing and not changing from kernel to kernel. The end result is that
if someone had a script that did:
sysctl vm.zone_reclaim_mode=1
This script would have gone from enalbing node reclaim for clean
unmapped pages to writing out pages during node reclaim after the
commit in question. That's not great.
Put the bits back the way they were and add a comment so something
like this is a bit harder to do again. Update the documentation to
make it clear that the first bit is ignored.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky(a)intel.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Daniel Wagner <dwagner(a)suse.de>
Cc: "Tobin C. Harding" <tobin(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Daniel Wagner <dwagner(a)suse.de>
Cc: osalvador <osalvador(a)suse.de>
Cc: stable(a)vger.kernel.org
--
Changes from v2:
* Update description to indicate that bit0 was used for clean
unmapped page node reclaim.
---
b/Documentation/admin-guide/sysctl/vm.rst | 10 +++++-----
b/mm/vmscan.c | 9 +++++++--
2 files changed, 12 insertions(+), 7 deletions(-)
diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2021-01-25 16:23:06.048866718 -0800
+++ b/Documentation/admin-guide/sysctl/vm.rst 2021-01-25 16:23:06.056866718 -0800
@@ -978,11 +978,11 @@ that benefit from having their data cach
left disabled as the caching effect is likely to be more important than
data locality.
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction. The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction. The page allocator will take additional actions before
+allocating off node pages.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2021-01-25 16:23:06.052866718 -0800
+++ b/mm/vmscan.c 2021-01-25 16:23:06.057866718 -0800
@@ -4086,8 +4086,13 @@ module_init(kswapd_init)
*/
int node_reclaim_mode __read_mostly;
-#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI. New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
/*
* Priority for NODE_RECLAIM. This determines the fraction of pages
_
This is a note to let you know that I've just added the patch titled
phy: lantiq: rcu-usb2: wait after clock enable
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 36acd5e24e3000691fb8d1ee31cf959cb1582d35 Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 7 Jan 2021 23:49:01 +0100
Subject: phy: lantiq: rcu-usb2: wait after clock enable
Commit 65dc2e725286 ("usb: dwc2: Update Core Reset programming flow.")
revealed that the phy isn't ready immediately after enabling it's
clocks. The dwc2_check_core_version() fails and the dwc2 usb driver
errors out.
Add a short delay to let the phy get up and running. There isn't any
documentation how much time is required, the value was chosen based on
tests.
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: Hauke Mehrtens <hauke(a)hauke-m.de>
Acked-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Cc: <stable(a)vger.kernel.org> # v5.7+
Link: https://lore.kernel.org/r/20210107224901.2102479-1-dev@kresin.me
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
---
drivers/phy/lantiq/phy-lantiq-rcu-usb2.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/lantiq/phy-lantiq-rcu-usb2.c b/drivers/phy/lantiq/phy-lantiq-rcu-usb2.c
index a7d126192cf1..29d246ea24b4 100644
--- a/drivers/phy/lantiq/phy-lantiq-rcu-usb2.c
+++ b/drivers/phy/lantiq/phy-lantiq-rcu-usb2.c
@@ -124,8 +124,16 @@ static int ltq_rcu_usb2_phy_power_on(struct phy *phy)
reset_control_deassert(priv->phy_reset);
ret = clk_prepare_enable(priv->phy_gate_clk);
- if (ret)
+ if (ret) {
dev_err(dev, "failed to enable PHY gate\n");
+ return ret;
+ }
+
+ /*
+ * at least the xrx200 usb2 phy requires some extra time to be
+ * operational after enabling the clock
+ */
+ usleep_range(100, 200);
return ret;
}
--
2.30.1
ftw() has been obsolete for about 12 years now.
Fixes: bb1c15b60b98 ("perf stat: Support regex pattern in --for-each-cgroup")
CC: stable(a)vger.kernel.org
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
Notes:
NOTE: Not runtime-tested, I have no idea what I need to do in perf
to test this. But at least it compiles now with my uClibc-based
toolchain.
tools/perf/util/cgroup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 5dff7e489921..f24ab4585553 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -161,7 +161,7 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup)
/* helper function for ftw() in match_cgroups and list_cgroups */
static int add_cgroup_name(const char *fpath, const struct stat *sb __maybe_unused,
- int typeflag)
+ int typeflag, struct FTW *ftwbuf __maybe_unused)
{
struct cgroup_name *cn;
@@ -209,12 +209,12 @@ static int list_cgroups(const char *str)
if (!s)
return -1;
/* pretend if it's added by ftw() */
- ret = add_cgroup_name(s, NULL, FTW_D);
+ ret = add_cgroup_name(s, NULL, FTW_D, NULL);
free(s);
if (ret)
return -1;
} else {
- if (add_cgroup_name("", NULL, FTW_D) < 0)
+ if (add_cgroup_name("", NULL, FTW_D, NULL) < 0)
return -1;
}
@@ -247,7 +247,7 @@ static int match_cgroups(const char *str)
prefix_len = strlen(mnt);
/* collect all cgroups in the cgroup_list */
- if (ftw(mnt, add_cgroup_name, 20) < 0)
+ if (nftw(mnt, add_cgroup_name, 20, 0) < 0)
return -1;
for (;;) {
--
2.30.0
I'm announcing the release of the 5.4.97 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 8 -
arch/arm/boot/dts/sun7i-a20-bananapro.dts | 2
arch/arm/mach-footbridge/dc21285.c | 12 -
arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 2
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2
arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 10 -
arch/arm64/boot/dts/rockchip/px30.dtsi | 2
arch/um/drivers/virtio_uml.c | 3
arch/x86/Makefile | 3
arch/x86/include/asm/apic.h | 10 -
arch/x86/include/asm/barrier.h | 18 ++
arch/x86/kernel/apic/apic.c | 4
arch/x86/kernel/apic/x2apic_cluster.c | 6
arch/x86/kernel/apic/x2apic_phys.c | 9 -
arch/x86/kvm/emulate.c | 2
arch/x86/kvm/svm.c | 5
arch/x86/mm/mem_encrypt.c | 1
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2
drivers/input/joystick/xpad.c | 17 ++
drivers/input/serio/i8042-x86ia64io.h | 2
drivers/iommu/intel-iommu.c | 6
drivers/md/md.c | 2
drivers/mmc/core/sdio_cis.c | 6
drivers/net/dsa/mv88e6xxx/chip.c | 6
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 -
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1
drivers/net/ethernet/intel/igc/igc_ethtool.c | 3
drivers/net/ethernet/intel/igc/igc_i225.c | 3
drivers/net/ethernet/intel/igc/igc_mac.c | 2
drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 10 -
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5
drivers/net/ethernet/realtek/r8169_main.c | 4
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 9 +
drivers/nvdimm/dimm_devs.c | 18 ++
drivers/nvme/host/pci.c | 2
drivers/nvme/target/tcp.c | 3
drivers/usb/class/usblp.c | 19 +-
drivers/usb/dwc2/gadget.c | 8 -
drivers/usb/dwc3/core.c | 2
drivers/usb/gadget/legacy/ether.c | 4
drivers/usb/host/xhci-mtk-sch.c | 130 +++++++++++++------
drivers/usb/host/xhci-mtk.c | 2
drivers/usb/host/xhci-mtk.h | 15 ++
drivers/usb/host/xhci-mvebu.c | 42 ++++++
drivers/usb/host/xhci-mvebu.h | 6
drivers/usb/host/xhci-plat.c | 26 +++
drivers/usb/host/xhci-plat.h | 1
drivers/usb/host/xhci-ring.c | 31 ++--
drivers/usb/host/xhci.c | 8 -
drivers/usb/host/xhci.h | 5
drivers/usb/renesas_usbhs/fifo.c | 1
drivers/usb/serial/cp210x.c | 2
drivers/usb/serial/option.c | 6
fs/afs/main.c | 6
fs/cifs/dir.c | 22 ++-
fs/cifs/smb2pdu.h | 2
fs/cifs/transport.c | 18 ++
fs/hugetlbfs/inode.c | 3
fs/overlayfs/dir.c | 2
include/linux/hugetlb.h | 2
include/linux/msi.h | 6
include/net/sch_generic.h | 2
init/init_task.c | 3
kernel/bpf/cgroup.c | 7 -
kernel/irq/msi.c | 44 ++----
kernel/kprobes.c | 4
kernel/trace/fgraph.c | 2
mm/compaction.c | 3
mm/huge_memory.c | 37 +++--
mm/hugetlb.c | 48 ++++++-
mm/memblock.c | 49 -------
net/core/neighbour.c | 7 -
net/ipv4/ip_tunnel.c | 16 +-
net/lapb/lapb_out.c | 3
net/mac80211/driver-ops.c | 5
net/mac80211/rate.c | 3
net/rxrpc/af_rxrpc.c | 6
77 files changed, 557 insertions(+), 264 deletions(-)
Aleksandr Loktionov (1):
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
Alexander Ovechkin (1):
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
Alexey Dobriyan (1):
Input: i8042 - unbreak Pegatron C15B
Aurelien Aptel (1):
cifs: report error instead of invalid when revalidating a dentry fails
Benjamin Valentin (1):
Input: xpad - sync supported devices with fork on GitHub
Chenxin Jin (1):
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Chinmay Agarwal (1):
neighbour: Prevent a dead entry from updating gc_list
Christoph Schemmel (1):
USB: serial: option: Adding support for Cinterion MV31
Chunfeng Yun (2):
usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints
usb: xhci-mtk: break loop when find the endpoint to drop
DENG Qingfang (1):
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Dan Carpenter (1):
USB: gadget: legacy: fix an error code in eth_bind()
Dan Williams (1):
libnvdimm/dimm: Avoid race between probe and available_slots_show()
Dave Hansen (1):
x86/apic: Add extra serialization for non-serializing MSRs
David Howells (1):
rxrpc: Fix deadlock around release of dst cached on udp tunnel
Felix Fietkau (1):
mac80211: fix station rate table updates on assoc
Fengnan Chang (1):
mmc: core: Limit retries when analyse of SDIO tuples fails
Gary Bisson (1):
usb: dwc3: fix clock issue during resume in OTG mode
Greg Kroah-Hartman (1):
Linux 5.4.97
Gustavo A. R. Silva (1):
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Heiko Stuebner (1):
usb: dwc2: Fix endpoint direction check in ep_from_windex
Heiner Kallweit (1):
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
Hermann Lauer (1):
ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
Hugh Dickins (1):
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Ikjoon Jang (1):
usb: xhci-mtk: fix unreleased bandwidth data
Jeremy Figgins (1):
USB: usblp: don't call usb_set_interface if there's a single alt
Johannes Berg (1):
um: virtio: free vu_dev only with the contained struct device
Josh Poimboeuf (1):
x86/build: Disable CET instrumentation in the kernel
Kai-Heng Feng (1):
igc: Report speed and duplex as unknown when device is runtime suspended
Kevin Lo (2):
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
igc: check return value of ret_val in igc_config_fc_after_link_up
Liangyan (1):
ovl: fix dentry leak in ovl_get_redirect
Loris Reiff (2):
bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
bpf, cgroup: Fix problematic bounds check
Luca Coelho (1):
iwlwifi: mvm: don't send RFH_QUEUE_CONFIG_CMD with no queues
Maor Gottlieb (1):
net/mlx5: Fix leak upon failure of rule creation
Marc Zyngier (1):
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Mathias Nyman (1):
xhci: fix bounce buffer usage for non-sg list case
Muchun Song (4):
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
mm: hugetlb: fix a race between freeing and dissolving the page
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Nadav Amit (1):
iommu/vt-d: Do not use flush-queue when caching-mode is on
Pali Rohár (1):
usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720
Pavel Shilovsky (1):
smb3: fix crediting for compounding when only one request in flight
Peter Chen (1):
usb: host: xhci-plat: add priv quirk for skip PHY initialization
Pho Tran (1):
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Rokudo Yan (1):
mm, compaction: move high_pfn to the for loop scope
Roman Gushchin (1):
memblock: do not start bottom-up allocations with kernel_end
Russell King (1):
ARM: footbridge: fix dc21285 PCI configuration accessors
Sagi Grimberg (1):
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
Sandy Huang (1):
arm64: dts: rockchip: fix vopl iommu irq on px30
Sean Christopherson (2):
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode
Serge Semin (1):
arm64: dts: amlogic: meson-g12: Set FL-adj property value
Shawn Guo (1):
arm64: dts: qcom: c630: keep both touchpad devices enabled
Stefan Chulski (1):
net: mvpp2: TCAM entry enable should be written after SRAM data
Steven Rostedt (VMware) (1):
fgraph: Initialize tracing_graph_pause at task creation
Stylon Wang (1):
drm/amd/display: Revert "Fix EDID parsing after resume from suspend"
Thorsten Leemhuis (1):
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Vadim Fedorenko (1):
net: ip_tunnel: fix mtu calculation
Wang ShaoBo (1):
kretprobe: Avoid re-registration of the same kretprobe earlier
Xiao Ni (1):
md: Set prev_flush_start and flush_bio in an atomic way
Xie He (1):
net: lapb: Copy the skb before sending a packet
Yoshihiro Shimoda (1):
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
Zyta Szpak (1):
arm64: dts: ls1046a: fix dcfg address range
I'm announcing the release of the 4.19.175 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 8 ----
arch/arm/mach-footbridge/dc21285.c | 12 +++---
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 -
arch/x86/Makefile | 3 +
arch/x86/include/asm/apic.h | 10 -----
arch/x86/include/asm/barrier.h | 18 +++++++++
arch/x86/kernel/apic/apic.c | 4 ++
arch/x86/kernel/apic/x2apic_cluster.c | 6 ++-
arch/x86/kernel/apic/x2apic_phys.c | 6 ++-
arch/x86/kvm/svm.c | 5 ++
drivers/input/joystick/xpad.c | 17 ++++++++
drivers/input/serio/i8042-x86ia64io.h | 2 +
drivers/iommu/intel-iommu.c | 6 +++
drivers/md/md.c | 2 +
drivers/mmc/core/sdio_cis.c | 6 +++
drivers/net/dsa/mv88e6xxx/chip.c | 6 ++-
drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 10 ++---
drivers/nvme/host/pci.c | 2 +
drivers/usb/class/usblp.c | 19 +++++----
drivers/usb/dwc2/gadget.c | 8 ----
drivers/usb/dwc3/core.c | 2 -
drivers/usb/gadget/legacy/ether.c | 4 +-
drivers/usb/host/xhci-ring.c | 31 ++++++++++-----
drivers/usb/renesas_usbhs/fifo.c | 1
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/option.c | 6 +++
fs/afs/main.c | 6 +--
fs/cifs/dir.c | 22 ++++++++++-
fs/cifs/smb2pdu.h | 2 -
fs/hugetlbfs/inode.c | 3 +
fs/overlayfs/dir.c | 2 -
include/linux/elfcore.h | 22 +++++++++++
include/linux/hugetlb.h | 3 +
include/linux/msi.h | 6 +++
kernel/Makefile | 1
kernel/elfcore.c | 26 -------------
kernel/irq/msi.c | 44 ++++++++++------------
kernel/kprobes.c | 4 ++
mm/huge_memory.c | 37 +++++++++++-------
mm/hugetlb.c | 48 +++++++++++++++++++++---
mm/memblock.c | 49 +++----------------------
net/ipv4/ip_tunnel.c | 16 +++-----
net/lapb/lapb_out.c | 3 +
net/mac80211/driver-ops.c | 5 ++
net/mac80211/rate.c | 3 +
net/rxrpc/af_rxrpc.c | 6 +--
46 files changed, 307 insertions(+), 199 deletions(-)
Alexey Dobriyan (1):
Input: i8042 - unbreak Pegatron C15B
Arnd Bergmann (1):
elfcore: fix building with clang
Aurelien Aptel (1):
cifs: report error instead of invalid when revalidating a dentry fails
Benjamin Valentin (1):
Input: xpad - sync supported devices with fork on GitHub
Chenxin Jin (1):
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Christoph Schemmel (1):
USB: serial: option: Adding support for Cinterion MV31
DENG Qingfang (1):
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Dan Carpenter (1):
USB: gadget: legacy: fix an error code in eth_bind()
Dave Hansen (1):
x86/apic: Add extra serialization for non-serializing MSRs
David Howells (1):
rxrpc: Fix deadlock around release of dst cached on udp tunnel
Felix Fietkau (1):
mac80211: fix station rate table updates on assoc
Fengnan Chang (1):
mmc: core: Limit retries when analyse of SDIO tuples fails
Gary Bisson (1):
usb: dwc3: fix clock issue during resume in OTG mode
Greg Kroah-Hartman (1):
Linux 4.19.175
Gustavo A. R. Silva (1):
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Heiko Stuebner (1):
usb: dwc2: Fix endpoint direction check in ep_from_windex
Hugh Dickins (1):
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Jeremy Figgins (1):
USB: usblp: don't call usb_set_interface if there's a single alt
Josh Poimboeuf (1):
x86/build: Disable CET instrumentation in the kernel
Liangyan (1):
ovl: fix dentry leak in ovl_get_redirect
Marc Zyngier (1):
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Mathias Nyman (1):
xhci: fix bounce buffer usage for non-sg list case
Muchun Song (4):
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
mm: hugetlb: fix a race between freeing and dissolving the page
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Nadav Amit (1):
iommu/vt-d: Do not use flush-queue when caching-mode is on
Pho Tran (1):
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Roman Gushchin (1):
memblock: do not start bottom-up allocations with kernel_end
Russell King (1):
ARM: footbridge: fix dc21285 PCI configuration accessors
Sean Christopherson (1):
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
Stefan Chulski (1):
net: mvpp2: TCAM entry enable should be written after SRAM data
Thorsten Leemhuis (1):
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Vadim Fedorenko (1):
net: ip_tunnel: fix mtu calculation
Wang ShaoBo (1):
kretprobe: Avoid re-registration of the same kretprobe earlier
Xiao Ni (1):
md: Set prev_flush_start and flush_bio in an atomic way
Xie He (1):
net: lapb: Copy the skb before sending a packet
Yoshihiro Shimoda (1):
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
Zyta Szpak (1):
arm64: dts: ls1046a: fix dcfg address range
When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock
-> gfs2_quota_hold is called from gfs2_iomap_begin. At the end of the
write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be
called again in gfs2_iomap_end, which is incorrect and leads to a failed
assertion. Instead, move the call to gfs2_quota_unlock before the call
to punch_hole to fix that.
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Cc: stable(a)vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
---
fs/gfs2/bmap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index cf6ccdd00587..7a358ae05185 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1230,6 +1230,9 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
gfs2_inplace_release(ip);
+ if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+ gfs2_quota_unlock(ip);
+
if (length != written && (iomap->flags & IOMAP_F_NEW)) {
/* Deallocate blocks that were just allocated. */
loff_t blockmask = i_blocksize(inode) - 1;
@@ -1242,9 +1245,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
}
}
- if (ip->i_qadata && ip->i_qadata->qa_qd_num)
- gfs2_quota_unlock(ip);
-
if (unlikely(!written))
goto out_unlock;
--
2.26.2
I'm announcing the release of the 4.9.257 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 10 -
arch/arm/mach-footbridge/dc21285.c | 12 -
arch/x86/Makefile | 3
arch/x86/include/asm/apic.h | 10 -
arch/x86/include/asm/barrier.h | 18 ++
arch/x86/kernel/apic/apic.c | 4
arch/x86/kernel/apic/x2apic_cluster.c | 6
arch/x86/kernel/apic/x2apic_phys.c | 6
drivers/acpi/thermal.c | 55 ++++--
drivers/input/joystick/xpad.c | 17 +-
drivers/input/serio/i8042-x86ia64io.h | 2
drivers/iommu/intel-iommu.c | 6
drivers/mmc/core/sdio_cis.c | 6
drivers/net/dsa/bcm_sf2.c | 8
drivers/net/ethernet/ibm/ibmvnic.c | 6
drivers/scsi/ibmvscsi/ibmvfc.c | 4
drivers/scsi/libfc/fc_exch.c | 16 +
drivers/usb/class/usblp.c | 19 +-
drivers/usb/dwc2/gadget.c | 8
drivers/usb/gadget/legacy/ether.c | 4
drivers/usb/host/xhci-ring.c | 31 ++-
drivers/usb/serial/cp210x.c | 2
drivers/usb/serial/option.c | 6
fs/cifs/dir.c | 22 ++
fs/hugetlbfs/inode.c | 3
include/linux/elfcore.h | 22 ++
include/linux/hugetlb.h | 3
kernel/Makefile | 1
kernel/elfcore.c | 25 ---
kernel/futex.c | 276 +++++++++++++++++++---------------
kernel/kprobes.c | 4
kernel/locking/rtmutex-debug.c | 9 -
kernel/locking/rtmutex-debug.h | 3
kernel/locking/rtmutex.c | 127 +++++++++------
kernel/locking/rtmutex.h | 2
kernel/locking/rtmutex_common.h | 12 -
mm/huge_memory.c | 37 ++--
mm/hugetlb.c | 9 -
net/lapb/lapb_out.c | 3
net/mac80211/driver-ops.c | 5
net/mac80211/rate.c | 3
net/mac80211/rx.c | 2
net/sched/sch_api.c | 3
sound/pci/hda/patch_realtek.c | 2
tools/objtool/elf.c | 7
45 files changed, 520 insertions(+), 319 deletions(-)
Alexey Dobriyan (1):
Input: i8042 - unbreak Pegatron C15B
Arnd Bergmann (1):
elfcore: fix building with clang
Aurelien Aptel (1):
cifs: report error instead of invalid when revalidating a dentry fails
Benjamin Valentin (1):
Input: xpad - sync supported devices with fork on GitHub
Brian King (1):
scsi: ibmvfc: Set default timeout to avoid crash during migration
Chenxin Jin (1):
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Christoph Schemmel (1):
USB: serial: option: Adding support for Cinterion MV31
Dan Carpenter (1):
USB: gadget: legacy: fix an error code in eth_bind()
Dave Hansen (1):
x86/apic: Add extra serialization for non-serializing MSRs
Eric Dumazet (1):
net_sched: reject silly cell_log in qdisc_get_rtab()
Felix Fietkau (2):
mac80211: fix fast-rx encryption check
mac80211: fix station rate table updates on assoc
Fengnan Chang (1):
mmc: core: Limit retries when analyse of SDIO tuples fails
Greg Kroah-Hartman (1):
Linux 4.9.257
Heiko Stuebner (1):
usb: dwc2: Fix endpoint direction check in ep_from_windex
Hugh Dickins (1):
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Javed Hasan (1):
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Jeremy Figgins (1):
USB: usblp: don't call usb_set_interface if there's a single alt
Josh Poimboeuf (2):
objtool: Don't fail on missing symbol table
x86/build: Disable CET instrumentation in the kernel
Lijun Pan (1):
ibmvnic: Ensure that CRQ entry read are correctly ordered
Mathias Nyman (1):
xhci: fix bounce buffer usage for non-sg list case
Muchun Song (3):
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Nadav Amit (1):
iommu/vt-d: Do not use flush-queue when caching-mode is on
Pan Bian (1):
net: dsa: bcm_sf2: put device node before return
Peter Zijlstra (4):
futex,rt_mutex: Provide futex specific rt_mutex API
futex: Remove rt_mutex_deadlock_account_*()
futex: Rework inconsistent rt_mutex/futex_q state
futex: Avoid violating the 10th rule of futex
Pho Tran (1):
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Rafael J. Wysocki (1):
ACPI: thermal: Do not call acpi_thermal_check() directly
Russell King (1):
ARM: footbridge: fix dc21285 PCI configuration accessors
Sasha Levin (1):
stable: clamp SUBLEVEL in 4.4 and 4.9
Shih-Yuan Lee (FourDollars) (1):
ALSA: hda/realtek - Fix typo of pincfg for Dell quirk
Thomas Gleixner (6):
futex: Replace pointless printk in fixup_owner()
futex: Provide and use pi_state_update_owner()
rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
futex: Use pi_state_update_owner() in put_pi_state()
futex: Simplify fixup_pi_state_owner()
futex: Handle faults correctly for PI futexes
Wang ShaoBo (1):
kretprobe: Avoid re-registration of the same kretprobe earlier
Xie He (1):
net: lapb: Copy the skb before sending a packet
I'm announcing the release of the 4.4.257 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 10 -
arch/arm/mach-footbridge/dc21285.c | 12 -
arch/mips/Kconfig | 1
arch/x86/Makefile | 3
arch/x86/include/asm/apic.h | 10 -
arch/x86/include/asm/barrier.h | 18 ++
arch/x86/kernel/apic/apic.c | 4
arch/x86/kernel/apic/x2apic_cluster.c | 3
arch/x86/kernel/apic/x2apic_phys.c | 3
drivers/acpi/thermal.c | 54 ++++--
drivers/input/joystick/xpad.c | 17 +-
drivers/input/serio/i8042-x86ia64io.h | 2
drivers/mmc/core/sdio_cis.c | 6
drivers/scsi/ibmvscsi/ibmvfc.c | 4
drivers/scsi/libfc/fc_exch.c | 16 +
drivers/usb/class/usblp.c | 19 +-
drivers/usb/dwc2/gadget.c | 8
drivers/usb/gadget/legacy/ether.c | 4
drivers/usb/gadget/udc/udc-core.c | 13 +
drivers/usb/serial/cp210x.c | 2
drivers/usb/serial/option.c | 6
fs/Kconfig.binfmt | 8
fs/cifs/dir.c | 22 ++
fs/hugetlbfs/inode.c | 3
include/linux/elfcore.h | 22 ++
include/linux/hugetlb.h | 3
kernel/Makefile | 3
kernel/elfcore.c | 25 ---
kernel/futex.c | 278 +++++++++++++++++++---------------
kernel/kprobes.c | 4
kernel/locking/rtmutex-debug.c | 9 -
kernel/locking/rtmutex-debug.h | 3
kernel/locking/rtmutex.c | 127 +++++++++------
kernel/locking/rtmutex.h | 2
kernel/locking/rtmutex_common.h | 12 -
mm/hugetlb.c | 9 -
net/lapb/lapb_out.c | 3
net/mac80211/driver-ops.c | 5
net/mac80211/rate.c | 3
net/sched/sch_api.c | 3
sound/pci/hda/patch_realtek.c | 2
41 files changed, 468 insertions(+), 293 deletions(-)
Alexey Dobriyan (1):
Input: i8042 - unbreak Pegatron C15B
Arnd Bergmann (1):
elfcore: fix building with clang
Aurelien Aptel (1):
cifs: report error instead of invalid when revalidating a dentry fails
Benjamin Valentin (1):
Input: xpad - sync supported devices with fork on GitHub
Brian King (1):
scsi: ibmvfc: Set default timeout to avoid crash during migration
Chenxin Jin (1):
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Christoph Schemmel (1):
USB: serial: option: Adding support for Cinterion MV31
Dan Carpenter (1):
USB: gadget: legacy: fix an error code in eth_bind()
Dave Hansen (1):
x86/apic: Add extra serialization for non-serializing MSRs
Eric Dumazet (1):
net_sched: reject silly cell_log in qdisc_get_rtab()
Felix Fietkau (1):
mac80211: fix station rate table updates on assoc
Fengnan Chang (1):
mmc: core: Limit retries when analyse of SDIO tuples fails
Greg Kroah-Hartman (1):
Linux 4.4.257
Heiko Stuebner (1):
usb: dwc2: Fix endpoint direction check in ep_from_windex
Javed Hasan (1):
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Jeremy Figgins (1):
USB: usblp: don't call usb_set_interface if there's a single alt
Josh Poimboeuf (1):
x86/build: Disable CET instrumentation in the kernel
Lee Jones (10):
futex,rt_mutex: Provide futex specific rt_mutex API
futex: Remove rt_mutex_deadlock_account_*()
futex: Rework inconsistent rt_mutex/futex_q state
futex: Avoid violating the 10th rule of futex
futex: Replace pointless printk in fixup_owner()
futex: Provide and use pi_state_update_owner()
rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
futex: Use pi_state_update_owner() in put_pi_state()
futex: Simplify fixup_pi_state_owner()
futex: Handle faults correctly for PI futexes
Muchun Song (3):
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Pho Tran (1):
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Rafael J. Wysocki (1):
ACPI: thermal: Do not call acpi_thermal_check() directly
Ralf Baechle (1):
ELF/MIPS build fix
Russell King (1):
ARM: footbridge: fix dc21285 PCI configuration accessors
Sasha Levin (1):
stable: clamp SUBLEVEL in 4.4 and 4.9
Shih-Yuan Lee (FourDollars) (1):
ALSA: hda/realtek - Fix typo of pincfg for Dell quirk
Thinh Nguyen (1):
usb: udc: core: Use lock when write to soft_connect
Wang ShaoBo (1):
kretprobe: Avoid re-registration of the same kretprobe earlier
Xie He (1):
net: lapb: Copy the skb before sending a packet
This is the start of the stable review cycle for the 5.4.97 release.
There are 65 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.97-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.97-rc1
Pali Rohár <pali(a)kernel.org>
usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720
Alexander Ovechkin <ovov(a)yandex-team.ru>
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
DENG Qingfang <dqfext(a)gmail.com>
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Vadim Fedorenko <vfedorenko(a)novek.ru>
net: ip_tunnel: fix mtu calculation
Chinmay Agarwal <chinagar(a)codeaurora.org>
neighbour: Prevent a dead entry from updating gc_list
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
igc: Report speed and duplex as unknown when device is runtime suspended
Xiao Ni <xni(a)redhat.com>
md: Set prev_flush_start and flush_bio in an atomic way
Nadav Amit <namit(a)vmware.com>
iommu/vt-d: Do not use flush-queue when caching-mode is on
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
Luca Coelho <luciano.coelho(a)intel.com>
iwlwifi: mvm: don't send RFH_QUEUE_CONFIG_CMD with no queues
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Hugh Dickins <hughd(a)google.com>
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Rokudo Yan <wu-yan(a)tcl.com>
mm, compaction: move high_pfn to the for loop scope
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between freeing and dissolving the page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
Thorsten Leemhuis <linux(a)leemhuis.info>
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Stylon Wang <stylon.wang(a)amd.com>
drm/amd/display: Revert "Fix EDID parsing after resume from suspend"
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Pavel Shilovsky <pshilov(a)microsoft.com>
smb3: fix crediting for compounding when only one request in flight
Gustavo A. R. Silva <gustavoars(a)kernel.org>
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: fix bounce buffer usage for non-sg list case
Marc Zyngier <maz(a)kernel.org>
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Dan Williams <dan.j.williams(a)intel.com>
libnvdimm/dimm: Avoid race between probe and available_slots_show()
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
fgraph: Initialize tracing_graph_pause at task creation
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Liangyan <liangyan.peng(a)linux.alibaba.com>
ovl: fix dentry leak in ovl_get_redirect
Peter Chen <peter.chen(a)nxp.com>
usb: host: xhci-plat: add priv quirk for skip PHY initialization
Chunfeng Yun <chunfeng.yun(a)mediatek.com>
usb: xhci-mtk: break loop when find the endpoint to drop
Chunfeng Yun <chunfeng.yun(a)mediatek.com>
usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints
Ikjoon Jang <ikjn(a)chromium.org>
usb: xhci-mtk: fix unreleased bandwidth data
Gary Bisson <gary.bisson(a)boundarydevices.com>
usb: dwc3: fix clock issue during resume in OTG mode
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Roman Gushchin <guro(a)fb.com>
memblock: do not start bottom-up allocations with kernel_end
Sagi Grimberg <sagi(a)grimberg.me>
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
Hermann Lauer <Hermann.Lauer(a)uni-heidelberg.de>
ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: TCAM entry enable should be written after SRAM data
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix leak upon failure of rule creation
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
Kevin Lo <kevlo(a)kevlo.org>
igc: check return value of ret_val in igc_config_fc_after_link_up
Kevin Lo <kevlo(a)kevlo.org>
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
Zyta Szpak <zr(a)semihalf.com>
arm64: dts: ls1046a: fix dcfg address range
David Howells <dhowells(a)redhat.com>
rxrpc: Fix deadlock around release of dst cached on udp tunnel
Johannes Berg <johannes.berg(a)intel.com>
um: virtio: free vu_dev only with the contained struct device
Loris Reiff <loris.reiff(a)liblor.ch>
bpf, cgroup: Fix problematic bounds check
Loris Reiff <loris.reiff(a)liblor.ch>
bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
Sandy Huang <hjc(a)rock-chips.com>
arm64: dts: rockchip: fix vopl iommu irq on px30
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
arm64: dts: amlogic: meson-g12: Set FL-adj property value
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Shawn Guo <shawn.guo(a)linaro.org>
arm64: dts: qcom: c630: keep both touchpad devices enabled
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
-------------
Diffstat:
Makefile | 10 +-
arch/arm/boot/dts/sun7i-a20-bananapro.dts | 2 +-
arch/arm/mach-footbridge/dc21285.c | 12 +-
arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 2 +-
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +-
.../boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 10 +-
arch/arm64/boot/dts/rockchip/px30.dtsi | 2 +-
arch/um/drivers/virtio_uml.c | 3 +-
arch/x86/Makefile | 3 +
arch/x86/include/asm/apic.h | 10 --
arch/x86/include/asm/barrier.h | 18 +++
arch/x86/kernel/apic/apic.c | 4 +
arch/x86/kernel/apic/x2apic_cluster.c | 6 +-
arch/x86/kernel/apic/x2apic_phys.c | 9 +-
arch/x86/kvm/emulate.c | 2 +
arch/x86/kvm/svm.c | 5 +
arch/x86/mm/mem_encrypt.c | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -
drivers/input/joystick/xpad.c | 17 ++-
drivers/input/serio/i8042-x86ia64io.h | 2 +
drivers/iommu/intel-iommu.c | 6 +
drivers/md/md.c | 2 +
drivers/mmc/core/sdio_cis.c | 6 +
drivers/net/dsa/mv88e6xxx/chip.c | 6 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +--
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 -
drivers/net/ethernet/intel/igc/igc_ethtool.c | 3 +-
drivers/net/ethernet/intel/igc/igc_i225.c | 3 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 2 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 10 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +
drivers/net/ethernet/realtek/r8169_main.c | 4 +-
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 9 +-
drivers/nvdimm/dimm_devs.c | 18 ++-
drivers/nvme/host/pci.c | 2 +
drivers/nvme/target/tcp.c | 3 +-
drivers/usb/class/usblp.c | 19 +--
drivers/usb/dwc2/gadget.c | 8 +-
drivers/usb/dwc3/core.c | 2 +-
drivers/usb/gadget/legacy/ether.c | 4 +-
drivers/usb/host/xhci-mtk-sch.c | 130 +++++++++++++++------
drivers/usb/host/xhci-mtk.c | 2 +
drivers/usb/host/xhci-mtk.h | 15 +++
drivers/usb/host/xhci-mvebu.c | 42 +++++++
drivers/usb/host/xhci-mvebu.h | 6 +
drivers/usb/host/xhci-plat.c | 26 ++++-
drivers/usb/host/xhci-plat.h | 1 +
drivers/usb/host/xhci-ring.c | 31 +++--
drivers/usb/host/xhci.c | 8 +-
drivers/usb/host/xhci.h | 5 +
drivers/usb/renesas_usbhs/fifo.c | 1 +
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/option.c | 6 +
fs/afs/main.c | 6 +-
fs/cifs/dir.c | 22 +++-
fs/cifs/smb2pdu.h | 2 +-
fs/cifs/transport.c | 18 ++-
fs/hugetlbfs/inode.c | 3 +-
fs/overlayfs/dir.c | 2 +-
include/linux/hugetlb.h | 2 +
include/linux/msi.h | 6 +
include/net/sch_generic.h | 2 +-
init/init_task.c | 3 +-
kernel/bpf/cgroup.c | 7 +-
kernel/irq/msi.c | 44 ++++---
kernel/kprobes.c | 4 +
kernel/trace/fgraph.c | 2 -
mm/compaction.c | 3 +-
mm/huge_memory.c | 37 +++---
mm/hugetlb.c | 48 +++++++-
mm/memblock.c | 49 +-------
net/core/neighbour.c | 7 +-
net/ipv4/ip_tunnel.c | 16 ++-
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 +-
net/mac80211/rate.c | 3 +-
net/rxrpc/af_rxrpc.c | 6 +-
77 files changed, 558 insertions(+), 265 deletions(-)
From: Kai Krakow <kai(a)kaishome.de>
This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.
As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.
Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:
> // workqueue.c:
> system_wq = alloc_workqueue("events", 0, 0);
Cc: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai(a)kaishome.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/journal.c | 4 ++--
drivers/md/bcache/super.c | 16 ++++++++++++++++
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 2b8c7dd2cfae..848dd4db1659 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1005,6 +1005,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
extern struct workqueue_struct *bcache_wq;
extern struct workqueue_struct *bch_journal_wq;
+extern struct workqueue_struct *bch_flush_wq;
extern struct mutex bch_register_lock;
extern struct list_head bch_cache_sets;
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003b..c6613e817333 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -932,8 +932,8 @@ atomic_t *bch_journal(struct cache_set *c,
journal_try_write(c);
} else if (!w->dirty) {
w->dirty = true;
- schedule_delayed_work(&c->journal.work,
- msecs_to_jiffies(c->journal_delay_ms));
+ queue_delayed_work(bch_flush_wq, &c->journal.work,
+ msecs_to_jiffies(c->journal_delay_ms));
spin_unlock(&c->journal.lock);
} else {
spin_unlock(&c->journal.lock);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 85a44a0cffe0..0228ccb293fc 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -49,6 +49,7 @@ static int bcache_major;
static DEFINE_IDA(bcache_device_idx);
static wait_queue_head_t unregister_wait;
struct workqueue_struct *bcache_wq;
+struct workqueue_struct *bch_flush_wq;
struct workqueue_struct *bch_journal_wq;
@@ -2821,6 +2822,8 @@ static void bcache_exit(void)
destroy_workqueue(bcache_wq);
if (bch_journal_wq)
destroy_workqueue(bch_journal_wq);
+ if (bch_flush_wq)
+ destroy_workqueue(bch_flush_wq);
bch_btree_exit();
if (bcache_major)
@@ -2884,6 +2887,19 @@ static int __init bcache_init(void)
if (!bcache_wq)
goto err;
+ /*
+ * Let's not make this `WQ_MEM_RECLAIM` for the following reasons:
+ *
+ * 1. It used `system_wq` before which also does no memory reclaim.
+ * 2. With `WQ_MEM_RECLAIM` desktop stalls, increased boot times, and
+ * reduced throughput can be observed.
+ *
+ * We still want to user our own queue to not congest the `system_wq`.
+ */
+ bch_flush_wq = alloc_workqueue("bch_flush", 0, 0);
+ if (!bch_flush_wq)
+ goto err;
+
bch_journal_wq = alloc_workqueue("bch_journal", WQ_MEM_RECLAIM, 0);
if (!bch_journal_wq)
goto err;
--
2.26.2
From: Kai Krakow <kai(a)kaishome.de>
Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.
Let's combine both worlds and make it multi threaded but able to
reclaim memory.
Cc: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai(a)kaishome.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/btree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 952f022db5a5..fe6dce125aba 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2775,7 +2775,7 @@ void bch_btree_exit(void)
int __init bch_btree_init(void)
{
- btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+ btree_io_wq = alloc_workqueue("bch_btree_io", WQ_MEM_RECLAIM, 0);
if (!btree_io_wq)
return -ENOMEM;
--
2.26.2
From: Kai Krakow <kai(a)kaishome.de>
This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.
With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.
After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.
So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.
Cc: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai(a)kaishome.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/bcache.h | 2 ++
drivers/md/bcache/btree.c | 21 +++++++++++++++++++--
drivers/md/bcache/super.c | 4 ++++
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index d7a84327b7f1..2b8c7dd2cfae 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1046,5 +1046,7 @@ void bch_debug_exit(void);
void bch_debug_init(void);
void bch_request_exit(void);
int bch_request_init(void);
+void bch_btree_exit(void);
+int bch_btree_init(void);
#endif /* _BCACHE_H */
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 910df242c83d..952f022db5a5 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -99,6 +99,8 @@
#define PTR_HASH(c, k) \
(((k)->ptr[0] >> c->bucket_bits) | PTR_GEN(k, 0))
+static struct workqueue_struct *btree_io_wq;
+
#define insert_lock(s, b) ((b)->level <= (s)->lock)
@@ -308,7 +310,7 @@ static void __btree_node_write_done(struct closure *cl)
btree_complete_write(b, w);
if (btree_node_dirty(b))
- schedule_delayed_work(&b->work, 30 * HZ);
+ queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
closure_return_with_destructor(cl, btree_node_write_unlock);
}
@@ -481,7 +483,7 @@ static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
BUG_ON(!i->keys);
if (!btree_node_dirty(b))
- schedule_delayed_work(&b->work, 30 * HZ);
+ queue_delayed_work(btree_io_wq, &b->work, 30 * HZ);
set_btree_node_dirty(b);
@@ -2764,3 +2766,18 @@ void bch_keybuf_init(struct keybuf *buf)
spin_lock_init(&buf->lock);
array_allocator_init(&buf->freelist);
}
+
+void bch_btree_exit(void)
+{
+ if (btree_io_wq)
+ destroy_workqueue(btree_io_wq);
+}
+
+int __init bch_btree_init(void)
+{
+ btree_io_wq = create_singlethread_workqueue("bch_btree_io");
+ if (!btree_io_wq)
+ return -ENOMEM;
+
+ return 0;
+}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e7d1b52c5cc8..85a44a0cffe0 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2821,6 +2821,7 @@ static void bcache_exit(void)
destroy_workqueue(bcache_wq);
if (bch_journal_wq)
destroy_workqueue(bch_journal_wq);
+ bch_btree_exit();
if (bcache_major)
unregister_blkdev(bcache_major, "bcache");
@@ -2876,6 +2877,9 @@ static int __init bcache_init(void)
return bcache_major;
}
+ if (bch_btree_init())
+ goto err;
+
bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0);
if (!bcache_wq)
goto err;
--
2.26.2
This is the start of the stable review cycle for the 4.4.257 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.257-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.257-rc1
Shih-Yuan Lee (FourDollars) <sylee(a)canonical.com>
ALSA: hda/realtek - Fix typo of pincfg for Dell quirk
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: thermal: Do not call acpi_thermal_check() directly
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Arnd Bergmann <arnd(a)arndb.de>
elfcore: fix building with clang
Ralf Baechle <ralf(a)linux-mips.org>
ELF/MIPS build fix
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Sasha Levin <sashal(a)kernel.org>
stable: clamp SUBLEVEL in 4.4 and 4.9
Brian King <brking(a)linux.vnet.ibm.com>
scsi: ibmvfc: Set default timeout to avoid crash during migration
Javed Hasan <jhasan(a)marvell.com>
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: udc: core: Use lock when write to soft_connect
Lee Jones <lee.jones(a)linaro.org>
futex: Handle faults correctly for PI futexes
Lee Jones <lee.jones(a)linaro.org>
futex: Simplify fixup_pi_state_owner()
Lee Jones <lee.jones(a)linaro.org>
futex: Use pi_state_update_owner() in put_pi_state()
Lee Jones <lee.jones(a)linaro.org>
rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
Lee Jones <lee.jones(a)linaro.org>
futex: Provide and use pi_state_update_owner()
Lee Jones <lee.jones(a)linaro.org>
futex: Replace pointless printk in fixup_owner()
Lee Jones <lee.jones(a)linaro.org>
futex: Avoid violating the 10th rule of futex
Lee Jones <lee.jones(a)linaro.org>
futex: Rework inconsistent rt_mutex/futex_q state
Lee Jones <lee.jones(a)linaro.org>
futex: Remove rt_mutex_deadlock_account_*()
Lee Jones <lee.jones(a)linaro.org>
futex,rt_mutex: Provide futex specific rt_mutex API
Eric Dumazet <edumazet(a)google.com>
net_sched: reject silly cell_log in qdisc_get_rtab()
-------------
Diffstat:
Makefile | 12 +-
arch/arm/mach-footbridge/dc21285.c | 12 +-
arch/mips/Kconfig | 1 +
arch/x86/Makefile | 3 +
arch/x86/include/asm/apic.h | 10 --
arch/x86/include/asm/barrier.h | 18 +++
arch/x86/kernel/apic/apic.c | 4 +
arch/x86/kernel/apic/x2apic_cluster.c | 3 +-
arch/x86/kernel/apic/x2apic_phys.c | 3 +-
drivers/acpi/thermal.c | 54 +++++--
drivers/input/joystick/xpad.c | 17 ++-
drivers/input/serio/i8042-x86ia64io.h | 2 +
drivers/mmc/core/sdio_cis.c | 6 +
drivers/scsi/ibmvscsi/ibmvfc.c | 4 +-
drivers/scsi/libfc/fc_exch.c | 16 +-
drivers/usb/class/usblp.c | 19 ++-
drivers/usb/dwc2/gadget.c | 8 +-
drivers/usb/gadget/legacy/ether.c | 4 +-
drivers/usb/gadget/udc/udc-core.c | 13 +-
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/option.c | 6 +
fs/Kconfig.binfmt | 8 +
fs/cifs/dir.c | 22 ++-
fs/hugetlbfs/inode.c | 3 +-
include/linux/elfcore.h | 22 +++
include/linux/hugetlb.h | 3 +
kernel/Makefile | 3 -
kernel/elfcore.c | 25 ---
kernel/futex.c | 278 +++++++++++++++++++---------------
kernel/kprobes.c | 4 +
kernel/locking/rtmutex-debug.c | 9 --
kernel/locking/rtmutex-debug.h | 3 -
kernel/locking/rtmutex.c | 127 ++++++++++------
kernel/locking/rtmutex.h | 2 -
kernel/locking/rtmutex_common.h | 12 +-
mm/hugetlb.c | 9 +-
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 +-
net/mac80211/rate.c | 3 +-
net/sched/sch_api.c | 3 +-
sound/pci/hda/patch_realtek.c | 2 +-
41 files changed, 469 insertions(+), 294 deletions(-)
This is the start of the stable review cycle for the 4.19.175 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.175-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.175-rc1
DENG Qingfang <dqfext(a)gmail.com>
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Vadim Fedorenko <vfedorenko(a)novek.ru>
net: ip_tunnel: fix mtu calculation
Xiao Ni <xni(a)redhat.com>
md: Set prev_flush_start and flush_bio in an atomic way
Nadav Amit <namit(a)vmware.com>
iommu/vt-d: Do not use flush-queue when caching-mode is on
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Hugh Dickins <hughd(a)google.com>
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between freeing and dissolving the page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
Thorsten Leemhuis <linux(a)leemhuis.info>
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Gustavo A. R. Silva <gustavoars(a)kernel.org>
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: fix bounce buffer usage for non-sg list case
Marc Zyngier <maz(a)kernel.org>
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Liangyan <liangyan.peng(a)linux.alibaba.com>
ovl: fix dentry leak in ovl_get_redirect
Gary Bisson <gary.bisson(a)boundarydevices.com>
usb: dwc3: fix clock issue during resume in OTG mode
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Roman Gushchin <guro(a)fb.com>
memblock: do not start bottom-up allocations with kernel_end
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: TCAM entry enable should be written after SRAM data
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Zyta Szpak <zr(a)semihalf.com>
arm64: dts: ls1046a: fix dcfg address range
David Howells <dhowells(a)redhat.com>
rxrpc: Fix deadlock around release of dst cached on udp tunnel
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Arnd Bergmann <arnd(a)arndb.de>
elfcore: fix building with clang
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
-------------
Diffstat:
Makefile | 10 ++----
arch/arm/mach-footbridge/dc21285.c | 12 +++----
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +-
arch/x86/Makefile | 3 ++
arch/x86/include/asm/apic.h | 10 ------
arch/x86/include/asm/barrier.h | 18 ++++++++++
arch/x86/kernel/apic/apic.c | 4 +++
arch/x86/kernel/apic/x2apic_cluster.c | 6 ++--
arch/x86/kernel/apic/x2apic_phys.c | 6 ++--
arch/x86/kvm/svm.c | 5 +++
drivers/input/joystick/xpad.c | 17 ++++++++-
drivers/input/serio/i8042-x86ia64io.h | 2 ++
drivers/iommu/intel-iommu.c | 6 ++++
drivers/md/md.c | 2 ++
drivers/mmc/core/sdio_cis.c | 6 ++++
drivers/net/dsa/mv88e6xxx/chip.c | 6 +++-
drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 10 +++---
drivers/nvme/host/pci.c | 2 ++
drivers/usb/class/usblp.c | 19 +++++-----
drivers/usb/dwc2/gadget.c | 8 +----
drivers/usb/dwc3/core.c | 2 +-
drivers/usb/gadget/legacy/ether.c | 4 ++-
drivers/usb/host/xhci-ring.c | 31 ++++++++++------
drivers/usb/renesas_usbhs/fifo.c | 1 +
drivers/usb/serial/cp210x.c | 2 ++
drivers/usb/serial/option.c | 6 ++++
fs/afs/main.c | 6 ++--
fs/cifs/dir.c | 22 ++++++++++--
fs/cifs/smb2pdu.h | 2 +-
fs/hugetlbfs/inode.c | 3 +-
fs/overlayfs/dir.c | 2 +-
include/linux/elfcore.h | 22 ++++++++++++
include/linux/hugetlb.h | 3 ++
include/linux/msi.h | 6 ++++
kernel/Makefile | 1 -
kernel/elfcore.c | 26 --------------
kernel/irq/msi.c | 44 +++++++++++------------
kernel/kprobes.c | 4 +++
mm/huge_memory.c | 37 +++++++++++--------
mm/hugetlb.c | 48 ++++++++++++++++++++++---
mm/memblock.c | 49 ++++----------------------
net/ipv4/ip_tunnel.c | 16 ++++-----
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 ++-
net/mac80211/rate.c | 3 +-
net/rxrpc/af_rxrpc.c | 6 ++--
46 files changed, 308 insertions(+), 200 deletions(-)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 97c753e62e6c31a404183898d950d8c08d752dbd Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Thu, 28 Jan 2021 00:37:51 +0900
Subject: [PATCH] tracing/kprobe: Fix to support kretprobe events on unloaded
modules
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@de…
Cc: stable(a)vger.kernel.org
Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv(a)arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index b3a36b0cfc81..1883a4a9f16a 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -266,7 +266,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
extern bool arch_kprobe_on_func_entry(unsigned long offset);
-extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
+extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
extern int kprobe_add_ksym_blacklist(unsigned long entry);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f7fb5d135930..1a5bc321e0a5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1954,29 +1954,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset)
return !offset;
}
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+/**
+ * kprobe_on_func_entry() -- check whether given address is function entry
+ * @addr: Target address
+ * @sym: Target symbol name
+ * @offset: The offset from the symbol or the address
+ *
+ * This checks whether the given @addr+@offset or @sym+@offset is on the
+ * function entry address or not.
+ * This returns 0 if it is the function entry, or -EINVAL if it is not.
+ * And also it returns -ENOENT if it fails the symbol or address lookup.
+ * Caller must pass @addr or @sym (either one must be NULL), or this
+ * returns -EINVAL.
+ */
+int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
if (IS_ERR(kp_addr))
- return false;
+ return PTR_ERR(kp_addr);
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
+ return -ENOENT;
- return true;
+ if (!arch_kprobe_on_func_entry(offset))
+ return -EINVAL;
+
+ return 0;
}
int register_kretprobe(struct kretprobe *rp)
{
- int ret = 0;
+ int ret;
struct kretprobe_instance *inst;
int i;
void *addr;
- if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
- return -EINVAL;
+ ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
+ if (ret)
+ return ret;
if (kretprobe_blacklist_size) {
addr = kprobe_addr(&rp->kp);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index e6fba1798771..56c7fbff7bd7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -221,9 +221,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call)
{
struct trace_kprobe *tk = trace_kprobe_primary_from_call(call);
- return tk ? kprobe_on_func_entry(tk->rp.kp.addr,
+ return tk ? (kprobe_on_func_entry(tk->rp.kp.addr,
tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name,
- tk->rp.kp.addr ? 0 : tk->rp.kp.offset) : false;
+ tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0) : false;
}
bool trace_kprobe_error_injectable(struct trace_event_call *call)
@@ -828,9 +828,11 @@ static int trace_kprobe_create(int argc, const char *argv[])
}
if (is_return)
flags |= TPARG_FL_RETURN;
- if (kprobe_on_func_entry(NULL, symbol, offset))
+ ret = kprobe_on_func_entry(NULL, symbol, offset);
+ if (ret == 0)
flags |= TPARG_FL_FENTRY;
- if (offset && is_return && !(flags & TPARG_FL_FENTRY)) {
+ /* Defer the ENOENT case until register kprobe */
+ if (ret == -EINVAL && is_return) {
trace_probe_log_err(0, BAD_RETPROBE);
goto parse_error;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 135b9e8d1cd8ba5ac9ad9bcf24b464b7b052e5b8 Mon Sep 17 00:00:00 2001
From: Sibi Sankar <sibis(a)codeaurora.org>
Date: Thu, 23 Jul 2020 01:40:46 +0530
Subject: [PATCH] remoteproc: qcom_q6v5_mss: Validate modem blob firmware size
before load
The following mem abort is observed when one of the modem blob firmware
size exceeds the allocated mpss region. Fix this by restricting the copy
size to segment size using request_firmware_into_buf before load.
Err Logs:
Unable to handle kernel paging request at virtual address
Mem abort info:
...
Call trace:
__memcpy+0x110/0x180
rproc_start+0xd0/0x190
rproc_boot+0x404/0x550
state_store+0x54/0xf8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80
kernfs_fop_write+0x140/0x230
vfs_write+0xc4/0x208
ksys_write+0x74/0xf8
...
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Fixes: 051fb70fd4ea4 ("remoteproc: qcom: Driver for the self-authenticating Hexagon v5")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sibi Sankar <sibis(a)codeaurora.org>
Link: https://lore.kernel.org/r/20200722201047.12975-3-sibis@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c
index 7826f229957d..8199d9f59209 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -1173,15 +1173,14 @@ static int q6v5_mpss_load(struct q6v5 *qproc)
} else if (phdr->p_filesz) {
/* Replace "xxx.xxx" with "xxx.bxx" */
sprintf(fw_name + fw_name_len - 3, "b%02d", i);
- ret = request_firmware(&seg_fw, fw_name, qproc->dev);
+ ret = request_firmware_into_buf(&seg_fw, fw_name, qproc->dev,
+ ptr, phdr->p_filesz);
if (ret) {
dev_err(qproc->dev, "failed to load %s\n", fw_name);
iounmap(ptr);
goto release_firmware;
}
- memcpy(ptr, seg_fw->data, seg_fw->size);
-
release_firmware(seg_fw);
}
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.
Before the patch, a write to memory.high would first put the new limit in
place for the workload, and then reclaim the requested delta. After the
patch, the kernel tries to reclaim the delta before putting the new limit
into place, in order to not overwhelm the workload with a sudden, large
excess over the limit. However, if reclaim is actively racing with new
allocations from the uncurbed workload, it can keep the write() working
inside the kernel indefinitely.
This is causing problems in Facebook production. A privileged
system-level daemon that adjusts memory.high for various workloads running
on a host can get unexpectedly stuck in the kernel and essentially turn
into a sort of involuntary kswapd for one of the workloads. We've
observed that daemon busy-spin in a write() for minutes at a time,
neglecting its other duties on the system, and expending privileged system
resources on behalf of a workload.
To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged to
the new limit or not - and bound the write() call this way. However, the
root cause that inspired the sequence change in the first place has been
fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.
The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is meant
to throttle excessive growth from below. Allocating threads could end up
sleeping long after the write() had already reclaimed the delta for which
they were being punished.
However, erroneous throttling also caused problems in other scenarios at
around the same time. This resulted in commit b3ff92916af3 ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included in
the same release as the offending commit. When allocating threads now
encounter large excess caused by a racing write() to memory.high, instead
of entering punitive sleeps, they will simply be tasked with helping
reclaim down the excess, and will be held no longer than it takes to
accomplish that. This is in line with regular limit enforcement - i.e.
if the workload allocates up against or over an otherwise unchanged limit
from below.
With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.
Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Tejun Heo <tj(a)kernel.org>
Acked-by: Chris Down <chris(a)chrisdown.name>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Koutný <mkoutny(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/mm/memcontrol.c~revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh
+++ a/mm/memcontrol.c
@@ -6271,6 +6271,8 @@ static ssize_t memory_high_write(struct
if (err)
return err;
+ page_counter_set_high(&memcg->memory, high);
+
for (;;) {
unsigned long nr_pages = page_counter_read(&memcg->memory);
unsigned long reclaimed;
@@ -6294,10 +6296,7 @@ static ssize_t memory_high_write(struct
break;
}
- page_counter_set_high(&memcg->memory, high);
-
memcg_wb_domain_size_changed(memcg);
-
return nbytes;
}
_
From: Seth Forshee <seth.forshee(a)canonical.com>
Subject: tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, but passing the "inode64" mount
option will fail. This leads to the following behavior:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.
So prevent CONFIG_TMPFS_INODE64 from being selected on s390.
Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: Chris Down <chris(a)chrisdown.name>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/Kconfig~tmpfs-disallow-config_tmpfs_inode64-on-s390
+++ a/fs/Kconfig
@@ -203,7 +203,7 @@ config TMPFS_XATTR
config TMPFS_INODE64
bool "Use 64-bit ino_t by default in tmpfs"
- depends on TMPFS && 64BIT
+ depends on TMPFS && 64BIT && !S390
default n
help
tmpfs has historically used only inode numbers as wide as an unsigned
_
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: add more sanity checks in xattr id lookup
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit. This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.
This patch adds a number of additional sanity checks to detect this
corruption and others.
1. It checks for a corrupted xattr index read from the inode. This could
be because the metadata block is uncompressed, or because the
"compression" bit has been corrupted (turning a compressed block
into an uncompressed block). This would cause an out of bounds read.
2. It checks against corruption of the xattr_ids count. This can either
lead to the above kmalloc failure, or a smaller than expected
table to be read.
3. It checks the contents of the index table for corruption.
[phillip(a)squashfs.org.uk: fix checkpatch issue]
Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co…
Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: syzbot+2ccea6339d368360800d(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/xattr_id.c | 66 +++++++++++++++++++++++++++++++++------
1 file changed, 57 insertions(+), 9 deletions(-)
--- a/fs/squashfs/xattr_id.c~squashfs-add-more-sanity-checks-in-xattr-id-lookup
+++ a/fs/squashfs/xattr_id.c
@@ -31,10 +31,15 @@ int squashfs_xattr_lookup(struct super_b
struct squashfs_sb_info *msblk = sb->s_fs_info;
int block = SQUASHFS_XATTR_BLOCK(index);
int offset = SQUASHFS_XATTR_BLOCK_OFFSET(index);
- u64 start_block = le64_to_cpu(msblk->xattr_id_table[block]);
+ u64 start_block;
struct squashfs_xattr_id id;
int err;
+ if (index >= msblk->xattr_ids)
+ return -EINVAL;
+
+ start_block = le64_to_cpu(msblk->xattr_id_table[block]);
+
err = squashfs_read_metadata(sb, &id, &start_block, &offset,
sizeof(id));
if (err < 0)
@@ -50,13 +55,17 @@ int squashfs_xattr_lookup(struct super_b
/*
* Read uncompressed xattr id lookup table indexes from disk into memory
*/
-__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 start,
+__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 table_start,
u64 *xattr_table_start, int *xattr_ids)
{
- unsigned int len;
+ struct squashfs_sb_info *msblk = sb->s_fs_info;
+ unsigned int len, indexes;
struct squashfs_xattr_id_table *id_table;
+ __le64 *table;
+ u64 start, end;
+ int n;
- id_table = squashfs_read_table(sb, start, sizeof(*id_table));
+ id_table = squashfs_read_table(sb, table_start, sizeof(*id_table));
if (IS_ERR(id_table))
return (__le64 *) id_table;
@@ -70,13 +79,52 @@ __le64 *squashfs_read_xattr_id_table(str
if (*xattr_ids == 0)
return ERR_PTR(-EINVAL);
- /* xattr_table should be less than start */
- if (*xattr_table_start >= start)
+ len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
+ indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
+
+ /*
+ * The computed size of the index table (len bytes) should exactly
+ * match the table start and end points
+ */
+ start = table_start + sizeof(*id_table);
+ end = msblk->bytes_used;
+
+ if (len != (end - start))
return ERR_PTR(-EINVAL);
- len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
+ table = squashfs_read_table(sb, start, len);
+ if (IS_ERR(table))
+ return table;
+
+ /* table[0], table[1], ... table[indexes - 1] store the locations
+ * of the compressed xattr id blocks. Each entry should be less than
+ * the next (i.e. table[0] < table[1]), and the difference between them
+ * should be SQUASHFS_METADATA_SIZE or less. table[indexes - 1]
+ * should be less than table_start, and again the difference
+ * shouls be SQUASHFS_METADATA_SIZE or less.
+ *
+ * Finally xattr_table_start should be less than table[0].
+ */
+ for (n = 0; n < (indexes - 1); n++) {
+ start = le64_to_cpu(table[n]);
+ end = le64_to_cpu(table[n + 1]);
+
+ if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+ kfree(table);
+ return ERR_PTR(-EINVAL);
+ }
+ }
+
+ start = le64_to_cpu(table[indexes - 1]);
+ if (start >= table_start || (table_start - start) > SQUASHFS_METADATA_SIZE) {
+ kfree(table);
+ return ERR_PTR(-EINVAL);
+ }
- TRACE("In read_xattr_index_table, length %d\n", len);
+ if (*xattr_table_start >= le64_to_cpu(table[0])) {
+ kfree(table);
+ return ERR_PTR(-EINVAL);
+ }
- return squashfs_read_table(sb, start + sizeof(*id_table), len);
+ return table;
}
_
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: add more sanity checks in inode lookup
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode. This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the following
corruption.
1. It checks against corruption of the inodes count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large inodes count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
[phillip(a)squashfs.org.uk: fix checkpatch issue]
Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co…
Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: syzbot+04419e3ff19d2970ea28(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/export.c | 41 +++++++++++++++++++++++++++++++++--------
1 file changed, 33 insertions(+), 8 deletions(-)
--- a/fs/squashfs/export.c~squashfs-add-more-sanity-checks-in-inode-lookup
+++ a/fs/squashfs/export.c
@@ -41,12 +41,17 @@ static long long squashfs_inode_lookup(s
struct squashfs_sb_info *msblk = sb->s_fs_info;
int blk = SQUASHFS_LOOKUP_BLOCK(ino_num - 1);
int offset = SQUASHFS_LOOKUP_BLOCK_OFFSET(ino_num - 1);
- u64 start = le64_to_cpu(msblk->inode_lookup_table[blk]);
+ u64 start;
__le64 ino;
int err;
TRACE("Entered squashfs_inode_lookup, inode_number = %d\n", ino_num);
+ if (ino_num == 0 || (ino_num - 1) >= msblk->inodes)
+ return -EINVAL;
+
+ start = le64_to_cpu(msblk->inode_lookup_table[blk]);
+
err = squashfs_read_metadata(sb, &ino, &start, &offset, sizeof(ino));
if (err < 0)
return err;
@@ -111,7 +116,10 @@ __le64 *squashfs_read_inode_lookup_table
u64 lookup_table_start, u64 next_table, unsigned int inodes)
{
unsigned int length = SQUASHFS_LOOKUP_BLOCK_BYTES(inodes);
+ unsigned int indexes = SQUASHFS_LOOKUP_BLOCKS(inodes);
+ int n;
__le64 *table;
+ u64 start, end;
TRACE("In read_inode_lookup_table, length %d\n", length);
@@ -121,20 +129,37 @@ __le64 *squashfs_read_inode_lookup_table
if (inodes == 0)
return ERR_PTR(-EINVAL);
- /* length bytes should not extend into the next table - this check
- * also traps instances where lookup_table_start is incorrectly larger
- * than the next table start
+ /*
+ * The computed size of the lookup table (length bytes) should exactly
+ * match the table start and end points
*/
- if (lookup_table_start + length > next_table)
+ if (length != (next_table - lookup_table_start))
return ERR_PTR(-EINVAL);
table = squashfs_read_table(sb, lookup_table_start, length);
+ if (IS_ERR(table))
+ return table;
/*
- * table[0] points to the first inode lookup table metadata block,
- * this should be less than lookup_table_start
+ * table0], table[1], ... table[indexes - 1] store the locations
+ * of the compressed inode lookup blocks. Each entry should be
+ * less than the next (i.e. table[0] < table[1]), and the difference
+ * between them should be SQUASHFS_METADATA_SIZE or less.
+ * table[indexes - 1] should be less than lookup_table_start, and
+ * again the difference should be SQUASHFS_METADATA_SIZE or less
*/
- if (!IS_ERR(table) && le64_to_cpu(table[0]) >= lookup_table_start) {
+ for (n = 0; n < (indexes - 1); n++) {
+ start = le64_to_cpu(table[n]);
+ end = le64_to_cpu(table[n + 1]);
+
+ if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+ kfree(table);
+ return ERR_PTR(-EINVAL);
+ }
+ }
+
+ start = le64_to_cpu(table[indexes - 1]);
+ if (start >= lookup_table_start || (lookup_table_start - start) > SQUASHFS_METADATA_SIZE) {
kfree(table);
return ERR_PTR(-EINVAL);
}
_
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: add more sanity checks in id lookup
Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused by
a corrupted index value read from the inode. This could be because the
metadata block is uncompressed, or because the "compression" bit has been
corrupted (turning a compressed block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the ids count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large ids count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: syzbot+b06d57ba83f604522af2(a)syzkaller.appspotmail.com
Reported-by: syzbot+c021ba012da41ee9807c(a)syzkaller.appspotmail.com
Reported-by: syzbot+5024636e8b5fd19f0f19(a)syzkaller.appspotmail.com
Reported-by: syzbot+bcbc661df46657d0fa4f(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/id.c | 40 ++++++++++++++++++++++++++-------
fs/squashfs/squashfs_fs_sb.h | 1
fs/squashfs/super.c | 6 ++--
fs/squashfs/xattr.h | 10 +++++++-
4 files changed, 45 insertions(+), 12 deletions(-)
--- a/fs/squashfs/id.c~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/id.c
@@ -35,10 +35,15 @@ int squashfs_get_id(struct super_block *
struct squashfs_sb_info *msblk = sb->s_fs_info;
int block = SQUASHFS_ID_BLOCK(index);
int offset = SQUASHFS_ID_BLOCK_OFFSET(index);
- u64 start_block = le64_to_cpu(msblk->id_table[block]);
+ u64 start_block;
__le32 disk_id;
int err;
+ if (index >= msblk->ids)
+ return -EINVAL;
+
+ start_block = le64_to_cpu(msblk->id_table[block]);
+
err = squashfs_read_metadata(sb, &disk_id, &start_block, &offset,
sizeof(disk_id));
if (err < 0)
@@ -56,7 +61,10 @@ __le64 *squashfs_read_id_index_table(str
u64 id_table_start, u64 next_table, unsigned short no_ids)
{
unsigned int length = SQUASHFS_ID_BLOCK_BYTES(no_ids);
+ unsigned int indexes = SQUASHFS_ID_BLOCKS(no_ids);
+ int n;
__le64 *table;
+ u64 start, end;
TRACE("In read_id_index_table, length %d\n", length);
@@ -67,20 +75,36 @@ __le64 *squashfs_read_id_index_table(str
return ERR_PTR(-EINVAL);
/*
- * length bytes should not extend into the next table - this check
- * also traps instances where id_table_start is incorrectly larger
- * than the next table start
+ * The computed size of the index table (length bytes) should exactly
+ * match the table start and end points
*/
- if (id_table_start + length > next_table)
+ if (length != (next_table - id_table_start))
return ERR_PTR(-EINVAL);
table = squashfs_read_table(sb, id_table_start, length);
+ if (IS_ERR(table))
+ return table;
/*
- * table[0] points to the first id lookup table metadata block, this
- * should be less than id_table_start
+ * table[0], table[1], ... table[indexes - 1] store the locations
+ * of the compressed id blocks. Each entry should be less than
+ * the next (i.e. table[0] < table[1]), and the difference between them
+ * should be SQUASHFS_METADATA_SIZE or less. table[indexes - 1]
+ * should be less than id_table_start, and again the difference
+ * should be SQUASHFS_METADATA_SIZE or less
*/
- if (!IS_ERR(table) && le64_to_cpu(table[0]) >= id_table_start) {
+ for (n = 0; n < (indexes - 1); n++) {
+ start = le64_to_cpu(table[n]);
+ end = le64_to_cpu(table[n + 1]);
+
+ if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+ kfree(table);
+ return ERR_PTR(-EINVAL);
+ }
+ }
+
+ start = le64_to_cpu(table[indexes - 1]);
+ if (start >= id_table_start || (id_table_start - start) > SQUASHFS_METADATA_SIZE) {
kfree(table);
return ERR_PTR(-EINVAL);
}
--- a/fs/squashfs/squashfs_fs_sb.h~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/squashfs_fs_sb.h
@@ -64,5 +64,6 @@ struct squashfs_sb_info {
unsigned int inodes;
unsigned int fragments;
int xattr_ids;
+ unsigned int ids;
};
#endif
--- a/fs/squashfs/super.c~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/super.c
@@ -166,6 +166,7 @@ static int squashfs_fill_super(struct su
msblk->directory_table = le64_to_cpu(sblk->directory_table_start);
msblk->inodes = le32_to_cpu(sblk->inodes);
msblk->fragments = le32_to_cpu(sblk->fragments);
+ msblk->ids = le16_to_cpu(sblk->no_ids);
flags = le16_to_cpu(sblk->flags);
TRACE("Found valid superblock on %pg\n", sb->s_bdev);
@@ -177,7 +178,7 @@ static int squashfs_fill_super(struct su
TRACE("Block size %d\n", msblk->block_size);
TRACE("Number of inodes %d\n", msblk->inodes);
TRACE("Number of fragments %d\n", msblk->fragments);
- TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids));
+ TRACE("Number of ids %d\n", msblk->ids);
TRACE("sblk->inode_table_start %llx\n", msblk->inode_table);
TRACE("sblk->directory_table_start %llx\n", msblk->directory_table);
TRACE("sblk->fragment_table_start %llx\n",
@@ -236,8 +237,7 @@ static int squashfs_fill_super(struct su
allocate_id_index_table:
/* Allocate and read id index table */
msblk->id_table = squashfs_read_id_index_table(sb,
- le64_to_cpu(sblk->id_table_start), next_table,
- le16_to_cpu(sblk->no_ids));
+ le64_to_cpu(sblk->id_table_start), next_table, msblk->ids);
if (IS_ERR(msblk->id_table)) {
errorf(fc, "unable to read id index table");
err = PTR_ERR(msblk->id_table);
--- a/fs/squashfs/xattr.h~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/xattr.h
@@ -17,8 +17,16 @@ extern int squashfs_xattr_lookup(struct
static inline __le64 *squashfs_read_xattr_id_table(struct super_block *sb,
u64 start, u64 *xattr_table_start, int *xattr_ids)
{
+ struct squashfs_xattr_id_table *id_table;
+
+ id_table = squashfs_read_table(sb, start, sizeof(*id_table));
+ if (IS_ERR(id_table))
+ return (__le64 *) id_table;
+
+ *xattr_table_start = le64_to_cpu(id_table->xattr_table_start);
+ kfree(id_table);
+
ERROR("Xattrs in filesystem, these will be ignored\n");
- *xattr_table_start = start;
return ERR_PTR(-ENOTSUPP);
}
_
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: avoid out of bounds writes in decompressors
Patch series "Squashfs: fix BIO migration regression and add sanity checks".
Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block
usage to BIO" patch, which has produced a number of Sysbot/Syzkaller
reports.
Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.
Each patch has been tested against the Sysbot reproducers using the given
kernel configuration. They have the appropriate "Reported-by:" lines
added.
Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.
Additional testing with other configurations and architectures (32bit, big
endian), and normal filesystems has also been done to trap any inadvertent
regressions caused by the additional sanity checks.
This patch (of 4):
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above patch.
Specifically, the patch removed the following sanity check
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
This check did two things:
1. It ensured any reads were not beyond the end of the filesystem
2. It ensured that the "length" field read from the filesystem
was within the expected maximum length. Without this any
corrupted values can over-run allocated buffers.
Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: syzbot+6fba78f99b9afd4b5634(a)syzkaller.appspotmail.com
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Cc: Philippe Liard <pliard(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/block.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/squashfs/block.c~squashfs-avoid-out-of-bounds-writes-in-decompressors
+++ a/fs/squashfs/block.c
@@ -196,9 +196,15 @@ int squashfs_read_data(struct super_bloc
length = SQUASHFS_COMPRESSED_SIZE(length);
index += 2;
- TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
+ TRACE("Block @ 0x%llx, %scompressed size %d\n", index - 2,
compressed ? "" : "un", length);
}
+ if (length < 0 || length > output->length ||
+ (index + length) > msblk->bytes_used) {
+ res = -EIO;
+ goto out;
+ }
+
if (next_index)
*next_index = index + length;
_
This is the start of the stable review cycle for the 4.9.257 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.257-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.257-rc1
Shih-Yuan Lee (FourDollars) <sylee(a)canonical.com>
ALSA: hda/realtek - Fix typo of pincfg for Dell quirk
Nadav Amit <namit(a)vmware.com>
iommu/vt-d: Do not use flush-queue when caching-mode is on
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: thermal: Do not call acpi_thermal_check() directly
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Hugh Dickins <hughd(a)google.com>
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: fix bounce buffer usage for non-sg list case
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Arnd Bergmann <arnd(a)arndb.de>
elfcore: fix building with clang
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
Sasha Levin <sashal(a)kernel.org>
stable: clamp SUBLEVEL in 4.4 and 4.9
Josh Poimboeuf <jpoimboe(a)redhat.com>
objtool: Don't fail on missing symbol table
Brian King <brking(a)linux.vnet.ibm.com>
scsi: ibmvfc: Set default timeout to avoid crash during migration
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix fast-rx encryption check
Javed Hasan <jhasan(a)marvell.com>
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Thomas Gleixner <tglx(a)linutronix.de>
futex: Handle faults correctly for PI futexes
Thomas Gleixner <tglx(a)linutronix.de>
futex: Simplify fixup_pi_state_owner()
Thomas Gleixner <tglx(a)linutronix.de>
futex: Use pi_state_update_owner() in put_pi_state()
Thomas Gleixner <tglx(a)linutronix.de>
rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
Thomas Gleixner <tglx(a)linutronix.de>
futex: Provide and use pi_state_update_owner()
Thomas Gleixner <tglx(a)linutronix.de>
futex: Replace pointless printk in fixup_owner()
Peter Zijlstra <peterz(a)infradead.org>
futex: Avoid violating the 10th rule of futex
Peter Zijlstra <peterz(a)infradead.org>
futex: Rework inconsistent rt_mutex/futex_q state
Peter Zijlstra <peterz(a)infradead.org>
futex: Remove rt_mutex_deadlock_account_*()
Peter Zijlstra <peterz(a)infradead.org>
futex,rt_mutex: Provide futex specific rt_mutex API
Eric Dumazet <edumazet(a)google.com>
net_sched: reject silly cell_log in qdisc_get_rtab()
Lijun Pan <ljp(a)linux.ibm.com>
ibmvnic: Ensure that CRQ entry read are correctly ordered
Pan Bian <bianpan2016(a)163.com>
net: dsa: bcm_sf2: put device node before return
-------------
Diffstat:
Makefile | 12 +-
arch/arm/mach-footbridge/dc21285.c | 12 +-
arch/x86/Makefile | 3 +
arch/x86/include/asm/apic.h | 10 --
arch/x86/include/asm/barrier.h | 18 +++
arch/x86/kernel/apic/apic.c | 4 +
arch/x86/kernel/apic/x2apic_cluster.c | 6 +-
arch/x86/kernel/apic/x2apic_phys.c | 6 +-
drivers/acpi/thermal.c | 55 ++++---
drivers/input/joystick/xpad.c | 17 ++-
drivers/input/serio/i8042-x86ia64io.h | 2 +
drivers/iommu/intel-iommu.c | 6 +
drivers/mmc/core/sdio_cis.c | 6 +
drivers/net/dsa/bcm_sf2.c | 8 +-
drivers/net/ethernet/ibm/ibmvnic.c | 6 +
drivers/scsi/ibmvscsi/ibmvfc.c | 4 +-
drivers/scsi/libfc/fc_exch.c | 16 +-
drivers/usb/class/usblp.c | 19 ++-
drivers/usb/dwc2/gadget.c | 8 +-
drivers/usb/gadget/legacy/ether.c | 4 +-
drivers/usb/host/xhci-ring.c | 31 ++--
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/option.c | 6 +
fs/cifs/dir.c | 22 ++-
fs/hugetlbfs/inode.c | 3 +-
include/linux/elfcore.h | 22 +++
include/linux/hugetlb.h | 3 +
kernel/Makefile | 1 -
kernel/elfcore.c | 25 ---
kernel/futex.c | 276 +++++++++++++++++++---------------
kernel/kprobes.c | 4 +
kernel/locking/rtmutex-debug.c | 9 --
kernel/locking/rtmutex-debug.h | 3 -
kernel/locking/rtmutex.c | 127 ++++++++++------
kernel/locking/rtmutex.h | 2 -
kernel/locking/rtmutex_common.h | 12 +-
mm/huge_memory.c | 37 +++--
mm/hugetlb.c | 9 +-
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 +-
net/mac80211/rate.c | 3 +-
net/mac80211/rx.c | 2 +
net/sched/sch_api.c | 3 +-
sound/pci/hda/patch_realtek.c | 2 +-
tools/objtool/elf.c | 7 +-
45 files changed, 521 insertions(+), 320 deletions(-)
This is the start of the stable review cycle for the 4.14.220 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 07 Feb 2021 14:06:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.220-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.220-rc1
Peter Zijlstra <peterz(a)infradead.org>
kthread: Extract KTHREAD_IS_PER_CPU
Josh Poimboeuf <jpoimboe(a)redhat.com>
objtool: Don't fail on missing symbol table
Brian King <brking(a)linux.vnet.ibm.com>
scsi: ibmvfc: Set default timeout to avoid crash during migration
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix fast-rx encryption check
Javed Hasan <jhasan(a)marvell.com>
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Martin Wilck <mwilck(a)suse.com>
scsi: scsi_transport_srp: Don't block target in failfast state
Peter Zijlstra <peterz(a)infradead.org>
x86: __always_inline __{rd,wr}msr()
Tony Lindgren <tony(a)atomide.com>
phy: cpcap-usb: Fix warning for missing regulator_disable
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
driver core: Extend device_is_dependent()
Benjamin Gaignard <benjamin.gaignard(a)linaro.org>
base: core: Remove WARN_ON from link dependencies check
Eric Dumazet <edumazet(a)google.com>
net_sched: gen_estimator: support large ewma log
Eric Dumazet <edumazet(a)google.com>
net_sched: reject silly cell_log in qdisc_get_rtab()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: thermal: Do not call acpi_thermal_check() directly
Lijun Pan <ljp(a)linux.ibm.com>
ibmvnic: Ensure that CRQ entry read are correctly ordered
Pan Bian <bianpan2016(a)163.com>
net: dsa: bcm_sf2: put device node before return
-------------
Diffstat:
Makefile | 4 +--
arch/x86/include/asm/msr.h | 4 +--
drivers/acpi/thermal.c | 55 +++++++++++++++++++++++++-----------
drivers/base/core.c | 19 +++++++++++--
drivers/net/dsa/bcm_sf2.c | 8 ++++--
drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++
drivers/phy/motorola/phy-cpcap-usb.c | 19 +++++++++----
drivers/scsi/ibmvscsi/ibmvfc.c | 4 ++-
drivers/scsi/libfc/fc_exch.c | 16 +++++++++--
drivers/scsi/scsi_transport_srp.c | 9 +++++-
include/linux/kthread.h | 3 ++
kernel/kthread.c | 27 +++++++++++++++++-
kernel/smpboot.c | 1 +
net/core/gen_estimator.c | 11 +++++---
net/mac80211/rx.c | 2 ++
net/sched/sch_api.c | 3 +-
tools/objtool/elf.c | 7 +++--
17 files changed, 155 insertions(+), 43 deletions(-)
This is the start of the stable review cycle for the 4.14.221 release.
There are 30 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.221-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.221-rc1
DENG Qingfang <dqfext(a)gmail.com>
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Nadav Amit <namit(a)vmware.com>
iommu/vt-d: Do not use flush-queue when caching-mode is on
Benjamin Valentin <benpicco(a)googlemail.com>
Input: xpad - sync supported devices with fork on GitHub
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/build: Disable CET instrumentation in the kernel
Hugh Dickins <hughd(a)google.com>
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlb: fix a race between isolating and freeing page
Muchun Song <songmuchun(a)bytedance.com>
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: footbridge: fix dc21285 PCI configuration accessors
Thorsten Leemhuis <linux(a)leemhuis.info>
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Fengnan Chang <fengnanchang(a)gmail.com>
mmc: core: Limit retries when analyse of SDIO tuples fails
Gustavo A. R. Silva <gustavoars(a)kernel.org>
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Aurelien Aptel <aaptel(a)suse.com>
cifs: report error instead of invalid when revalidating a dentry fails
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: fix bounce buffer usage for non-sg list case
Wang ShaoBo <bobo.shaobowang(a)huawei.com>
kretprobe: Avoid re-registration of the same kretprobe earlier
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix station rate table updates on assoc
Liangyan <liangyan.peng(a)linux.alibaba.com>
ovl: fix dentry leak in ovl_get_redirect
Heiko Stuebner <heiko.stuebner(a)theobroma-systems.com>
usb: dwc2: Fix endpoint direction check in ep_from_windex
Jeremy Figgins <kernel(a)jeremyfiggins.com>
USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: gadget: legacy: fix an error code in eth_bind()
Wei Wang <weiwan(a)google.com>
ipv4: fix race condition between route lookup and invalidation
Arnd Bergmann <arnd(a)arndb.de>
elfcore: fix building with clang
Josh Poimboeuf <jpoimboe(a)redhat.com>
objtool: Support Clang non-section symbols in ORC generation
Xie He <xie.he.0141(a)gmail.com>
net: lapb: Copy the skb before sending a packet
Zyta Szpak <zr(a)semihalf.com>
arm64: dts: ls1046a: fix dcfg address range
Alexey Dobriyan <adobriyan(a)gmail.com>
Input: i8042 - unbreak Pegatron C15B
Christoph Schemmel <christoph.schemmel(a)gmail.com>
USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin <bg4akv(a)hotmail.com>
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran <Pho.Tran(a)silabs.com>
USB: serial: cp210x: add pid/vid for WSDA-200-USB
-------------
Diffstat:
Makefile | 10 ++-----
arch/arm/mach-footbridge/dc21285.c | 12 ++++----
arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +-
arch/x86/Makefile | 3 ++
arch/x86/include/asm/apic.h | 10 -------
arch/x86/include/asm/barrier.h | 18 ++++++++++++
arch/x86/kernel/apic/apic.c | 4 +++
arch/x86/kernel/apic/x2apic_cluster.c | 6 ++--
arch/x86/kernel/apic/x2apic_phys.c | 6 ++--
drivers/input/joystick/xpad.c | 17 +++++++++++-
drivers/input/serio/i8042-x86ia64io.h | 2 ++
drivers/iommu/intel-iommu.c | 6 ++++
drivers/mmc/core/sdio_cis.c | 6 ++++
drivers/net/dsa/mv88e6xxx/chip.c | 6 +++-
drivers/nvme/host/pci.c | 2 ++
drivers/usb/class/usblp.c | 19 +++++++------
drivers/usb/dwc2/gadget.c | 8 +-----
drivers/usb/gadget/legacy/ether.c | 4 ++-
drivers/usb/host/xhci-ring.c | 31 +++++++++++++--------
drivers/usb/serial/cp210x.c | 2 ++
drivers/usb/serial/option.c | 6 ++++
fs/cifs/dir.c | 22 +++++++++++++--
fs/cifs/smb2pdu.h | 2 +-
fs/hugetlbfs/inode.c | 3 +-
fs/overlayfs/dir.c | 2 +-
include/linux/elfcore.h | 22 +++++++++++++++
include/linux/hugetlb.h | 3 ++
kernel/Makefile | 1 -
kernel/elfcore.c | 26 ------------------
kernel/kprobes.c | 4 +++
mm/huge_memory.c | 37 +++++++++++++++----------
mm/hugetlb.c | 9 +++---
net/ipv4/route.c | 38 +++++++++++++-------------
net/lapb/lapb_out.c | 3 +-
net/mac80211/driver-ops.c | 5 +++-
net/mac80211/rate.c | 3 +-
tools/objtool/orc_gen.c | 33 +++++++++++++++++-----
37 files changed, 255 insertions(+), 138 deletions(-)
The patch titled
Subject: net: fix iteration for sctp transport seq_files
has been removed from the -mm tree. Its filename was
net-fix-iteration-for-sctp-transport-seq_files.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: NeilBrown <neilb(a)suse.de>
Subject: net: fix iteration for sctp transport seq_files
The sctp transport seq_file iterators take a reference to the transport in
the ->start and ->next functions and releases the reference in the ->show
function. The preferred handling for such resources is to release them in
the subsequent ->next or ->stop function call.
Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak references.
So move the sctp_transport_put() call to ->next and ->stop.
Link: https://lkml.kernel.org/r/161248539022.21478.17038123892954492263.stgit@nob…
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reported-by: Xin Long <lucien.xin(a)gmail.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Vlad Yasevich <vyasevich(a)gmail.com>
Cc: Neil Horman <nhorman(a)tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
net/sctp/proc.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--- a/net/sctp/proc.c~net-fix-iteration-for-sctp-transport-seq_files
+++ a/net/sctp/proc.c
@@ -215,6 +215,12 @@ static void sctp_transport_seq_stop(stru
{
struct sctp_ht_iter *iter = seq->private;
+ if (v && v != SEQ_START_TOKEN) {
+ struct sctp_transport *transport = v;
+
+ sctp_transport_put(transport);
+ }
+
sctp_transport_walk_stop(&iter->hti);
}
@@ -222,6 +228,12 @@ static void *sctp_transport_seq_next(str
{
struct sctp_ht_iter *iter = seq->private;
+ if (v && v != SEQ_START_TOKEN) {
+ struct sctp_transport *transport = v;
+
+ sctp_transport_put(transport);
+ }
+
++*pos;
return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
@@ -277,8 +289,6 @@ static int sctp_assocs_seq_show(struct s
sk->sk_rcvbuf);
seq_printf(seq, "\n");
- sctp_transport_put(transport);
-
return 0;
}
@@ -354,8 +364,6 @@ static int sctp_remaddr_seq_show(struct
seq_printf(seq, "\n");
}
- sctp_transport_put(transport);
-
return 0;
}
_
Patches currently in -mm which might be from neilb(a)suse.de are
seq_file-document-how-per-entry-resources-are-managed.patch
x86-fix-seq_file-iteration-for-pat-memtypec.patch
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 4b049f55ed95cd889bcdb3034fd75e1f01852b38 Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Mon, 8 Feb 2021 13:53:16 -0800
Subject: usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
The dep->interval captures the number of frames/microframes per interval
from bInterval. Fullspeed interrupt endpoint bInterval is the number of
frames per interval and not 2^(bInterval - 1). So fix it here. This
change is only for debugging purpose and should not affect the interrupt
endpoint operation.
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/1263b563dedc4ab8b0fb854fba06ce4bc56bd495.16128209…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index d0f8d3ec855f..aebcf8ec0716 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -615,8 +615,13 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
if (dwc->gadget->speed == USB_SPEED_FULL)
bInterval_m1 = 0;
+ if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
+ dwc->gadget->speed == USB_SPEED_FULL)
+ dep->interval = desc->bInterval;
+ else
+ dep->interval = 1 << (desc->bInterval - 1);
+
params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
- dep->interval = 1 << (desc->bInterval - 1);
}
return dwc3_send_gadget_ep_cmd(dep, DWC3_DEPCMD_SETEPCONFIG, ¶ms);
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From a1679af85b2ae35a2b78ad04c18bb069c37330cc Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Mon, 8 Feb 2021 13:53:10 -0800
Subject: usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it must be set
to 0 when the controller operates in full-speed. See the programming
guide for DEPCFG command section 3.2.2.1 (v3.30a).
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/3f57026f993c0ce71498dbb06e49b3a47c4d0265.16128209…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 97d707b4f384..d0f8d3ec855f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -605,7 +605,17 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
params.param0 |= DWC3_DEPCFG_FIFO_NUMBER(dep->number >> 1);
if (desc->bInterval) {
- params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(desc->bInterval - 1);
+ u8 bInterval_m1;
+
+ /*
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
+ * must be set to 0 when the controller operates in full-speed.
+ */
+ bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
+ if (dwc->gadget->speed == USB_SPEED_FULL)
+ bInterval_m1 = 0;
+
+ params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
dep->interval = 1 << (desc->bInterval - 1);
}
--
2.30.0
This is a note to let you know that I've just added the patch titled
drivers/misc/vmw_vmci: restrict too big queue size in
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 2fd10bcf0310b9525b2af9e1f7aa9ddd87c3772e Mon Sep 17 00:00:00 2001
From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Date: Tue, 9 Feb 2021 16:26:12 +0600
Subject: drivers/misc/vmw_vmci: restrict too big queue size in
qp_host_alloc_queue
syzbot found WARNING in qp_broker_alloc[1] in qp_host_alloc_queue()
when num_pages is 0x100001, giving queue_size + queue_page_size
bigger than KMALLOC_MAX_SIZE for kzalloc(), resulting order >= MAX_ORDER
condition.
queue_size + queue_page_size=0x8000d8, where KMALLOC_MAX_SIZE=0x400000.
[1]
Call Trace:
alloc_pages include/linux/gfp.h:547 [inline]
kmalloc_order+0x40/0x130 mm/slab_common.c:837
kmalloc_order_trace+0x15/0x70 mm/slab_common.c:853
kmalloc_large include/linux/slab.h:481 [inline]
__kmalloc+0x257/0x330 mm/slub.c:3959
kmalloc include/linux/slab.h:557 [inline]
kzalloc include/linux/slab.h:682 [inline]
qp_host_alloc_queue drivers/misc/vmw_vmci/vmci_queue_pair.c:540 [inline]
qp_broker_create drivers/misc/vmw_vmci/vmci_queue_pair.c:1351 [inline]
qp_broker_alloc+0x936/0x2740 drivers/misc/vmw_vmci/vmci_queue_pair.c:1739
Reported-by: syzbot+15ec7391f3d6a1a7cc7d(a)syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Link: https://lore.kernel.org/r/20210209102612.2112247-1-snovitoll@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/vmw_vmci/vmci_queue_pair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index d787ddecee77..880c33ab9f47 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -539,6 +539,9 @@ static struct vmci_queue *qp_host_alloc_queue(u64 size)
queue_page_size = num_pages * sizeof(*queue->kernel_if->u.h.page);
+ if (queue_size + queue_page_size > KMALLOC_MAX_SIZE)
+ return NULL;
+
queue = kzalloc(queue_size + queue_page_size, GFP_KERNEL);
if (queue) {
queue->q_header = NULL;
--
2.30.0
This is a note to let you know that I've just added the patch titled
mei: bus: block send with vtag on non-conformat FW
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From b398d53cd421454d64850f8b1f6d609ede9042d9 Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Mon, 8 Feb 2021 17:06:48 +0200
Subject: mei: bus: block send with vtag on non-conformat FW
Block data send with vtag if either transport layer or
FW client are not supporting vtags.
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20210208150649.141358-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 580074e32599..935acc6bbf3c 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -61,6 +61,13 @@ ssize_t __mei_cl_send(struct mei_cl *cl, u8 *buf, size_t length, u8 vtag,
goto out;
}
+ if (vtag) {
+ /* Check if vtag is supported by client */
+ rets = mei_cl_vt_support_check(cl);
+ if (rets)
+ goto out;
+ }
+
if (length > mei_cl_mtu(cl)) {
rets = -EFBIG;
goto out;
--
2.30.0
Hi Christoph, Greg,
Currently we are observing an incorrect address translation
corresponding to DMA direct mapping methods on 5.4 stable kernel while
sharing dmabuf from one device to another where both devices have
their own coherent DMA memory pools.
I am able to root cause this issue which is caused by incorrect virt
to phys translation for addresses belonging to vmalloc space using
virt_to_page(). But while looking at the mainline kernel, this patch
[1] changes address translation from virt->to->phys to dma->to->phys
which fixes the issue observed on 5.4 stable kernel as well (minimal
fix [2]).
So I would like to seek your suggestion for backport to stable kernels
(5.4 or earlier) as to whether we should backport the complete
mainline commit [1] or we should just apply the minimal fix [2]?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
[2] minimal fix required for 5.4 stable kernel:
commit bb0b3ff6e54d78370b6b0c04426f0d9192f31795
Author: Sumit Garg <sumit.garg(a)linaro.org>
Date: Wed Feb 3 13:08:37 2021 +0530
dma-mapping: Fix common get_sgtable and mmap methods
Currently common get_sgtable and mmap methods can only handle normal
kernel addresses leading to incorrect handling of vmalloc addresses which
is common means for DMA coherent memory mapping.
So instead of cpu_addr, directly decode the physical address from
dma_addr and
hence decode corresponding page and pfn values. In this way we can handle
normal kernel addresses as well as vmalloc addresses.
This fix is inspired from following mainline commit:
34dc0ea6bc96 ("dma-direct: provide mmap and get_sgtable method overrides")
This fixes an issue observed during dmabuf sharing from one device to
another where both devices have their own coherent DMA memory pools.
Signed-off-by: Sumit Garg <sumit.garg(a)linaro.org>
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 8682a53..034bbae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -127,7 +127,7 @@ int dma_common_get_sgtable(struct device *dev,
struct sg_table *sgt,
return -ENXIO;
page = pfn_to_page(pfn);
} else {
- page = virt_to_page(cpu_addr);
+ page = pfn_to_page(PHYS_PFN(dma_to_phys(dev, dma_addr)));
}
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
@@ -214,7 +214,7 @@ int dma_common_mmap(struct device *dev, struct
vm_area_struct *vma,
if (!pfn_valid(pfn))
return -ENXIO;
} else {
- pfn = page_to_pfn(virt_to_page(cpu_addr));
+ pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
}
return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 0fcc7c20d2e2a65fb5b80d42841084e8509d085d
Gitweb: https://git.kernel.org/tip/0fcc7c20d2e2a65fb5b80d42841084e8509d085d
Author: Mikael Beckius <mikael.beckius(a)windriver.com>
AuthorDate: Thu, 28 Jan 2021 15:02:08 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 09 Feb 2021 16:18:42 +01:00
hrtimer: Update softirq_expires_next correctly in hrtimer_force_reprogram()
hrtimer_force_reprogram() invokes __hrtimer_get_next_event() to find the
earliest expiry time of all hrtimer bases. __hrtimer_get_next_event() does
not update cpu_base::[softirq_]_expires_next. That needs to be done at the
callsites.
hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when
the first expiring timer is a softirq timer and the soft interrupt is not
activated. That's wrong because cpu_base::softirq_expires_next is left
stale when the first expiring timer of all bases is a timer which expires
in hard interrupt context.
That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and
the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting
CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that
timer before the stale cpu_base::softirq_expires_next.
cpu_base::softirq_expires_next is cached to make the check for raising the
soft interrupt fast. In the above case the soft interrupt won't be raised
until clock monotonic reaches the stale cpu_base::softirq_expires_next
value. That's incorrect, but what's worse it that if the softirq timer
becomes the first expiring timer of all clock bases after the hard expiry
timer has been handled the reprogramming of the clockevent from
hrtimer_interrupt() will result in an interrupt storm. That happens because
the reprogramming does not use cpu_base::softirq_expires_next, it uses
__hrtimer_get_next_event() which returns the actual expiry time. Once clock
MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is
raised and the storm subsides.
Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard
bases seperately, update softirq_expires_next and handle the case when a
soft expiring timer is the first of all bases by comparing the expiry times
and updating the required cpu base fields.
[ tglx: Modified it to avoid the double evaluation ]
Fixes:5da70160462e ("hrtimer: Implement support for softirq based hrtimers")
Signed-off-by: Mikael Beckius <mikael.beckius(a)windriver.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210128140208.30264-1-mikael.beckius@windriver.c…
---
kernel/time/hrtimer.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 743c852..88a0145 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -626,24 +626,30 @@ static inline int hrtimer_hres_active(void)
static void
hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
{
- ktime_t expires_next;
+ ktime_t expires_next, soft = KTIME_MAX;
/*
- * Find the current next expiration time.
+ * If the soft interrupt has already been activated, ignore the
+ * soft bases. They will be handled in the already raised soft
+ * interrupt.
*/
- expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_ALL);
-
- if (cpu_base->next_timer && cpu_base->next_timer->is_soft) {
+ if (!cpu_base->softirq_activated) {
+ soft = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_SOFT);
/*
- * When the softirq is activated, hrtimer has to be
- * programmed with the first hard hrtimer because soft
- * timer interrupt could occur too late.
+ * Update the soft expiry time. clock_settime() might have
+ * affected it.
*/
- if (cpu_base->softirq_activated)
- expires_next = __hrtimer_get_next_event(cpu_base,
- HRTIMER_ACTIVE_HARD);
- else
- cpu_base->softirq_expires_next = expires_next;
+ cpu_base->softirq_expires_next = soft;
+ }
+
+ expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD);
+ /*
+ * If a softirq timer is expiring first, update cpu_base->next_timer
+ * and program the hardware with the soft expiry time.
+ */
+ if (expires_next > soft) {
+ cpu_base->next_timer = cpu_base->softirq_next_timer;
+ expires_next = soft;
}
if (skip_equal && expires_next == cpu_base->expires_next)
Commit 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer
UBSAN") added a possibility to build the entire kernel with UBSAN
instrumentation for MIPS, with the exception for VDSO.
However, self-extracting head wasn't been added to exceptions, so
this occurs:
mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o:
in function `FSE_buildDTable_wksp':
decompress.c:(.text.FSE_buildDTable_wksp+0x278): undefined reference
to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2a8):
undefined reference to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2c4):
undefined reference to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o:
decompress.c:(.text.FSE_buildDTable_raw+0x9c): more undefined references
to `__ubsan_handle_shift_out_of_bounds' follow
Add UBSAN_SANITIZE := n to mips/boot/compressed/Makefile to exclude
it from instrumentation scope and fix this issue.
Fixes: 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer UBSAN")
Cc: stable(a)vger.kernel.org # 5.0+
Signed-off-by: Alexander Lobakin <alobakin(a)pm.me>
---
arch/mips/boot/compressed/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454a..f93f72bcba97 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+UBSAN_SANITIZE := n
# decompressor objects (linked with vmlinuz)
vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
--
2.30.0
This is a note to let you know that I've just added the patch titled
drivers/misc/vmw_vmci: restrict too big queue size in
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 2fd10bcf0310b9525b2af9e1f7aa9ddd87c3772e Mon Sep 17 00:00:00 2001
From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Date: Tue, 9 Feb 2021 16:26:12 +0600
Subject: drivers/misc/vmw_vmci: restrict too big queue size in
qp_host_alloc_queue
syzbot found WARNING in qp_broker_alloc[1] in qp_host_alloc_queue()
when num_pages is 0x100001, giving queue_size + queue_page_size
bigger than KMALLOC_MAX_SIZE for kzalloc(), resulting order >= MAX_ORDER
condition.
queue_size + queue_page_size=0x8000d8, where KMALLOC_MAX_SIZE=0x400000.
[1]
Call Trace:
alloc_pages include/linux/gfp.h:547 [inline]
kmalloc_order+0x40/0x130 mm/slab_common.c:837
kmalloc_order_trace+0x15/0x70 mm/slab_common.c:853
kmalloc_large include/linux/slab.h:481 [inline]
__kmalloc+0x257/0x330 mm/slub.c:3959
kmalloc include/linux/slab.h:557 [inline]
kzalloc include/linux/slab.h:682 [inline]
qp_host_alloc_queue drivers/misc/vmw_vmci/vmci_queue_pair.c:540 [inline]
qp_broker_create drivers/misc/vmw_vmci/vmci_queue_pair.c:1351 [inline]
qp_broker_alloc+0x936/0x2740 drivers/misc/vmw_vmci/vmci_queue_pair.c:1739
Reported-by: syzbot+15ec7391f3d6a1a7cc7d(a)syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Link: https://lore.kernel.org/r/20210209102612.2112247-1-snovitoll@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/vmw_vmci/vmci_queue_pair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index d787ddecee77..880c33ab9f47 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -539,6 +539,9 @@ static struct vmci_queue *qp_host_alloc_queue(u64 size)
queue_page_size = num_pages * sizeof(*queue->kernel_if->u.h.page);
+ if (queue_size + queue_page_size > KMALLOC_MAX_SIZE)
+ return NULL;
+
queue = kzalloc(queue_size + queue_page_size, GFP_KERNEL);
if (queue) {
queue->q_header = NULL;
--
2.30.0
From: Richard Gong <richard.gong(a)intel.com>
Clean up COMMAND_RECONFIG_FLAG_PARTIAL flag by resetting it to 0, which
aligns with the firmware settings.
Cc: <stable(a)vger.kernel.org> # 5.9+
Fixes: 36847f9e3e56 ("firmware: correct reconfig flag and timeout values")
Signed-off-by: Richard Gong <richard.gong(a)intel.com>
---
v2: add tag Cc: <stable(a)vger.kernel.org> # 5.9+
add 'Fixes: ... ' line in the comment
---
include/linux/firmware/intel/stratix10-svc-client.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h
index a93d859..f843c6a 100644
--- a/include/linux/firmware/intel/stratix10-svc-client.h
+++ b/include/linux/firmware/intel/stratix10-svc-client.h
@@ -56,7 +56,7 @@
* COMMAND_RECONFIG_FLAG_PARTIAL:
* Set to FPGA configuration type (full or partial).
*/
-#define COMMAND_RECONFIG_FLAG_PARTIAL 1
+#define COMMAND_RECONFIG_FLAG_PARTIAL 0
/**
* Timeout settings for service clients:
--
2.7.4
Currently we use bitmap_alloc() to allocate msi bitmap which should be
initialized with zero. This is obviously wrong but it works because msi
can fallback to legacy interrupt mode. So use bitmap_zalloc() instead.
Fixes: 632dcc2c75ef6de3272aa ("irqchip: Add Loongson PCH MSI controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
drivers/irqchip/irq-loongson-pch-msi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-loongson-pch-msi.c b/drivers/irqchip/irq-loongson-pch-msi.c
index 12aeeab43289..32562b7e681b 100644
--- a/drivers/irqchip/irq-loongson-pch-msi.c
+++ b/drivers/irqchip/irq-loongson-pch-msi.c
@@ -225,7 +225,7 @@ static int pch_msi_init(struct device_node *node,
goto err_priv;
}
- priv->msi_map = bitmap_alloc(priv->num_irqs, GFP_KERNEL);
+ priv->msi_map = bitmap_zalloc(priv->num_irqs, GFP_KERNEL);
if (!priv->msi_map) {
ret = -ENOMEM;
goto err_priv;
--
2.27.0
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
We don't have a persistent fb holding a reference to the frontbuffer
object, so every time we do the get+put we throw the frontbuffer object
immediately away. And so the next time around we get a pristine
frontbuffer object with bits==0 even for the old vma. This confuses
the frontbuffer tracking code which understandably expects the old
frontbuffer to have the overlay's bit set.
Fix this by hanging on to the frontbuffer reference until the next
flip. And just to make this a bit more clear let's track the frontbuffer
explicitly instead of just grabbing it via the old vma.
Cc: stable(a)vger.kernel.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136
Fixes: da42104f589d ("drm/i915: Hold reference to intel_frontbuffer as we track activity")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_overlay.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c
index 9c0113f15b58..ef8f44f5e751 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -183,6 +183,7 @@ struct intel_overlay {
struct intel_crtc *crtc;
struct i915_vma *vma;
struct i915_vma *old_vma;
+ struct intel_frontbuffer *frontbuffer;
bool active;
bool pfit_active;
u32 pfit_vscale_ratio; /* shifted-point number, (1<<12) == 1.0 */
@@ -283,21 +284,19 @@ static void intel_overlay_flip_prepare(struct intel_overlay *overlay,
struct i915_vma *vma)
{
enum pipe pipe = overlay->crtc->pipe;
- struct intel_frontbuffer *from = NULL, *to = NULL;
+ struct intel_frontbuffer *frontbuffer = NULL;
drm_WARN_ON(&overlay->i915->drm, overlay->old_vma);
- if (overlay->vma)
- from = intel_frontbuffer_get(overlay->vma->obj);
if (vma)
- to = intel_frontbuffer_get(vma->obj);
+ frontbuffer = intel_frontbuffer_get(vma->obj);
- intel_frontbuffer_track(from, to, INTEL_FRONTBUFFER_OVERLAY(pipe));
+ intel_frontbuffer_track(overlay->frontbuffer, frontbuffer,
+ INTEL_FRONTBUFFER_OVERLAY(pipe));
- if (to)
- intel_frontbuffer_put(to);
- if (from)
- intel_frontbuffer_put(from);
+ if (overlay->frontbuffer)
+ intel_frontbuffer_put(overlay->frontbuffer);
+ overlay->frontbuffer = frontbuffer;
intel_frontbuffer_flip_prepare(overlay->i915,
INTEL_FRONTBUFFER_OVERLAY(pipe));
--
2.26.2
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 4b049f55ed95cd889bcdb3034fd75e1f01852b38 Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Mon, 8 Feb 2021 13:53:16 -0800
Subject: usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
The dep->interval captures the number of frames/microframes per interval
from bInterval. Fullspeed interrupt endpoint bInterval is the number of
frames per interval and not 2^(bInterval - 1). So fix it here. This
change is only for debugging purpose and should not affect the interrupt
endpoint operation.
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/1263b563dedc4ab8b0fb854fba06ce4bc56bd495.16128209…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index d0f8d3ec855f..aebcf8ec0716 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -615,8 +615,13 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
if (dwc->gadget->speed == USB_SPEED_FULL)
bInterval_m1 = 0;
+ if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
+ dwc->gadget->speed == USB_SPEED_FULL)
+ dep->interval = desc->bInterval;
+ else
+ dep->interval = 1 << (desc->bInterval - 1);
+
params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
- dep->interval = 1 << (desc->bInterval - 1);
}
return dwc3_send_gadget_ep_cmd(dep, DWC3_DEPCMD_SETEPCONFIG, ¶ms);
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a1679af85b2ae35a2b78ad04c18bb069c37330cc Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Mon, 8 Feb 2021 13:53:10 -0800
Subject: usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it must be set
to 0 when the controller operates in full-speed. See the programming
guide for DEPCFG command section 3.2.2.1 (v3.30a).
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/3f57026f993c0ce71498dbb06e49b3a47c4d0265.16128209…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 97d707b4f384..d0f8d3ec855f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -605,7 +605,17 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
params.param0 |= DWC3_DEPCFG_FIFO_NUMBER(dep->number >> 1);
if (desc->bInterval) {
- params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(desc->bInterval - 1);
+ u8 bInterval_m1;
+
+ /*
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
+ * must be set to 0 when the controller operates in full-speed.
+ */
+ bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
+ if (dwc->gadget->speed == USB_SPEED_FULL)
+ bInterval_m1 = 0;
+
+ params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
dep->interval = 1 << (desc->bInterval - 1);
}
--
2.30.0
This is a note to let you know that I've just added the patch titled
mei: bus: block send with vtag on non-conformat FW
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From b398d53cd421454d64850f8b1f6d609ede9042d9 Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Mon, 8 Feb 2021 17:06:48 +0200
Subject: mei: bus: block send with vtag on non-conformat FW
Block data send with vtag if either transport layer or
FW client are not supporting vtags.
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20210208150649.141358-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 580074e32599..935acc6bbf3c 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -61,6 +61,13 @@ ssize_t __mei_cl_send(struct mei_cl *cl, u8 *buf, size_t length, u8 vtag,
goto out;
}
+ if (vtag) {
+ /* Check if vtag is supported by client */
+ rets = mei_cl_vt_support_check(cl);
+ if (rets)
+ goto out;
+ }
+
if (length > mei_cl_mtu(cl)) {
rets = -EFBIG;
goto out;
--
2.30.0
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode
numbers and display "inode64" in the mount options, whereas
passing "inode64" in the mount options will fail. This leads to
erroneous behaviours such as this:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
---
fs/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 3347ec7bd837..da524c4d7b7e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -203,7 +203,7 @@ config TMPFS_XATTR
config TMPFS_INODE64
bool "Use 64-bit ino_t by default in tmpfs"
- depends on TMPFS && 64BIT && !S390
+ depends on TMPFS && 64BIT && !(S390 || ALPHA)
default n
help
tmpfs has historically used only inode numbers as wide as an unsigned
--
2.29.2
The patch titled
Subject: tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
has been added to the -mm tree. Its filename is
tmpfs-disallow-config_tmpfs_inode64-on-alpha.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/tmpfs-disallow-config_tmpfs_inode…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/tmpfs-disallow-config_tmpfs_inode…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Seth Forshee <seth.forshee(a)canonical.com>
Subject: tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With
CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, whereas passing "inode64" in the
mount options will fail. This leads to erroneous behaviours such as this:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.
Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
Cc: Chris Down <chris(a)chrisdown.name>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Cc: Richard Henderson <rth(a)twiddle.net>
Cc: Ivan Kokshaysky <ink(a)jurassic.park.msu.ru>
Cc: Matt Turner <mattst88(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/Kconfig~tmpfs-disallow-config_tmpfs_inode64-on-alpha
+++ a/fs/Kconfig
@@ -203,7 +203,7 @@ config TMPFS_XATTR
config TMPFS_INODE64
bool "Use 64-bit ino_t by default in tmpfs"
- depends on TMPFS && 64BIT && !S390
+ depends on TMPFS && 64BIT && !(S390 || ALPHA)
default n
help
tmpfs has historically used only inode numbers as wide as an unsigned
_
Patches currently in -mm which might be from seth.forshee(a)canonical.com are
tmpfs-disallow-config_tmpfs_inode64-on-s390.patch
tmpfs-disallow-config_tmpfs_inode64-on-alpha.patch
Currently, when handling the SPMI summary interrupt, the hw_irq
number is calculated based on SID, Peripheral ID, IRQ index and
APID. This is then passed to irq_find_mapping() to see if a
mapping exists for this hw_irq and if available, invoke the
interrupt handler. Since the IRQ index uses an "int" type, hw_irq
which is of unsigned long data type can take a large value when
SID has its MSB set to 1 and the type conversion happens. Because
of this, irq_find_mapping() returns 0 as there is no mapping
for this hw_irq. This ends up invoking cleanup_irq() as if
the interrupt is spurious whereas it is actually a valid
interrupt. Fix this by using the proper data type (u32) for id.
Cc: stable(a)vger.kernel.org
Signed-off-by: Subbaraman Narayanamurthy <subbaram(a)codeaurora.org>
---
drivers/spmi/spmi-pmic-arb.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index de844b4..bbbd311 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights reserved.
*/
#include <linux/bitmap.h>
#include <linux/delay.h>
@@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 apid, int id)
static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid)
{
unsigned int irq;
- u32 status;
- int id;
+ u32 status, id;
u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF;
u8 per = pmic_arb->apid_data[apid].ppid & 0xFF;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
The dwc3 driver did not account for operating in fullspeed when setting
DEPCFG.bInterval_m1. This series fixes it.
Note that for some bInterval, some IP versions may not exhibit invalid behavior
from the invalid DEPCFG.bInterval_m1 setting, which may mask this issue.
Thinh Nguyen (2):
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
drivers/usb/dwc3/gadget.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
base-commit: d8c849037d9398abe6a5f5d065eafc777eb3bdaf
--
2.28.0
The patch titled
Subject: nilfs2: make splice write available again
has been added to the -mm tree. Its filename is
nilfs2-make-splice-write-available-again.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/nilfs2-make-splice-write-availabl…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/nilfs2-make-splice-write-availabl…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Joachim Henke <joachim.henke(a)t-systems.com>
Subject: nilfs2: make splice write available again
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was
caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops").
This patch initializes the splice_write field in file_operations, like
most file systems do, to restore the functionality.
Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke…
Signed-off-by: Joachim Henke <joachim.henke(a)t-systems.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/file.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/nilfs2/file.c~nilfs2-make-splice-write-available-again
+++ a/fs/nilfs2/file.c
@@ -141,6 +141,7 @@ const struct file_operations nilfs_file_
/* .release = nilfs_release_file, */
.fsync = nilfs_sync_file,
.splice_read = generic_file_splice_read,
+ .splice_write = iter_file_splice_write,
};
const struct inode_operations nilfs_file_inode_operations = {
_
Patches currently in -mm which might be from joachim.henke(a)t-systems.com are
nilfs2-make-splice-write-available-again.patch
The patch titled
Subject: mm, slub: better heuristic for number of cpus when calculating slab order
has been added to the -mm tree. Its filename is
mm-slub-better-heuristic-for-number-of-cpus-when-calculating-slab-order.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-slub-better-heuristic-for-numb…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-better-heuristic-for-numb…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slub: better heuristic for number of cpus when calculating slab order
When creating a new kmem cache, SLUB determines how large the slab pages will
based on number of inputs, including the number of CPUs in the system. Larger
slab pages mean that more objects can be allocated/free from per-cpu slabs
before accessing shared structures, but also potentially more memory can be
wasted due to low slab usage and fragmentation.
The rough idea of using number of CPUs is that larger systems will be more
likely to benefit from reduced contention, and also should have enough memory
to spare.
Number of CPUs used to be determined as nr_cpu_ids, which is number of possible
cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b
("mm/slub: let number of online CPUs determine the slub page order") changed it
to nr_online_cpus(). However, for kmem caches created early before CPUs are
onlined, this may lead to permamently low slab page sizes.
Vincent reports a regression [1] of hackbench on arm64 systems:
> I'm facing significant performances regression on a large arm64 server
> system (224 CPUs). Regressions is also present on small arm64 system
> (8 CPUs) but in a far smaller order of magnitude
> On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
> v5.11-rc4 : 9.135sec (+/- 0.45%)
> v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
> v5.10: 3.136sec (+/- 0.40%)
Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:
> i.e. the patch incurs a 7% to 32% performance penalty. This bisected
> cleanly yesterday when I was looking for the regression and then found
> the thread.
> Numerous caches change size. For example, kmalloc-512 goes from order-0
> (vanilla) to order-2 with the revert.
> So mostly this is down to the number of times SLUB calls into the page
> allocator which only caches order-0 pages on a per-cpu basis.
Clearly num_online_cpus() doesn't work too early in bootup. We could change
the order dynamically in a memory hotplug callback, but runtime order changing
for existing kmem caches has been already shown as dangerous, and removed in
32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be
resurrected in a safe manner with some effort, but to fix the regression we
need something simpler.
We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined. That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].
So this patch tries to determine the best available value without specific
arch knowledge.
- num_present_cpus() if the number is larger than 1, as that means the
arch is likely setting it properly
- nr_cpu_ids otherwise
This should fix the reported regressions while also keeping the effect of
045ab8c9487b for PowerPC systems. It's possible there are configurations
where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same
time bloated, so these (if they exist) would keep the large orders based
on nr_cpu_ids as was before 045ab8c9487b.
[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj…
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03Ys…
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Reported-by: Mel Gorman <mgorman(a)techsingularity.net>
Tested-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Bharata B Rao <bharata(a)linux.ibm.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
--- a/mm/slub.c~mm-slub-better-heuristic-for-number-of-cpus-when-calculating-slab-order
+++ a/mm/slub.c
@@ -3423,6 +3423,7 @@ static inline int calculate_order(unsign
unsigned int order;
unsigned int min_objects;
unsigned int max_objects;
+ unsigned int nr_cpus;
/*
* Attempt to find best configuration for a slab. This
@@ -3433,8 +3434,21 @@ static inline int calculate_order(unsign
* we reduce the minimum objects required in a slab.
*/
min_objects = slub_min_objects;
- if (!min_objects)
- min_objects = 4 * (fls(num_online_cpus()) + 1);
+ if (!min_objects) {
+ /*
+ * Some architectures will only update present cpus when
+ * onlining them, so don't trust the number if it's just 1. But
+ * we also don't want to use nr_cpu_ids always, as on some other
+ * architectures, there can be many possible cpus, but never
+ * onlined. Here we compromise between trying to avoid too high
+ * order on systems that appear larger than they are, and too
+ * low order on systems that appear smaller than they are.
+ */
+ nr_cpus = num_present_cpus();
+ if (nr_cpus <= 1)
+ nr_cpus = nr_cpu_ids;
+ min_objects = 4 * (fls(nr_cpus) + 1);
+ }
max_objects = order_objects(slub_max_order, size);
min_objects = min(min_objects, max_objects);
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-slub-better-heuristic-for-number-of-cpus-when-calculating-slab-order.patch
mm-slub-stop-freeing-kmem_cache_node-structures-on-node-offline.patch
mm-slab-slub-stop-taking-memory-hotplug-lock.patch
mm-slab-slub-stop-taking-cpu-hotplug-lock.patch
mm-slub-splice-cpu-and-page-freelists-in-deactivate_slab.patch
mm-slub-remove-slub_memcg_sysfs-boot-param-and-config_slub_memcg_sysfs_on.patch
From: Marc Zyngier <maz(a)kernel.org>
[ Upstream commit 43f20b1c6140896916f4e91aacc166830a7ba849 ]
It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.
Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 82747048381fa..721f4b6b262f1 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -231,6 +231,7 @@ pcie0: pcie@f8000000 {
reg = <0x0 0xf8000000 0x0 0x2000000>,
<0x0 0xfd000000 0x0 0x1000000>;
reg-names = "axi-base", "apb-base";
+ device_type = "pci";
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
@@ -249,7 +250,6 @@ pcie0: pcie@f8000000 {
<0 0 0 2 &pcie0_intc 1>,
<0 0 0 3 &pcie0_intc 2>,
<0 0 0 4 &pcie0_intc 3>;
- linux,pci-domain = <0>;
max-link-speed = <1>;
msi-map = <0x0 &its 0x0 0x1000>;
phys = <&pcie_phy 0>, <&pcie_phy 1>,
--
2.27.0
From: Marc Zyngier <maz(a)kernel.org>
[ Upstream commit 43f20b1c6140896916f4e91aacc166830a7ba849 ]
It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.
Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index f4ee7c4f83b8b..b1c1a88a1c20c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -198,6 +198,7 @@ pcie0: pcie@f8000000 {
reg = <0x0 0xf8000000 0x0 0x2000000>,
<0x0 0xfd000000 0x0 0x1000000>;
reg-names = "axi-base", "apb-base";
+ device_type = "pci";
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
@@ -216,7 +217,6 @@ pcie0: pcie@f8000000 {
<0 0 0 2 &pcie0_intc 1>,
<0 0 0 3 &pcie0_intc 2>,
<0 0 0 4 &pcie0_intc 3>;
- linux,pci-domain = <0>;
max-link-speed = <1>;
msi-map = <0x0 &its 0x0 0x1000>;
phys = <&pcie_phy 0>, <&pcie_phy 1>,
--
2.27.0
From: Marc Zyngier <maz(a)kernel.org>
[ Upstream commit 43f20b1c6140896916f4e91aacc166830a7ba849 ]
It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.
Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index bb7d0aac6b9db..9d6ed8cda2c86 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -232,6 +232,7 @@ pcie0: pcie@f8000000 {
reg = <0x0 0xf8000000 0x0 0x2000000>,
<0x0 0xfd000000 0x0 0x1000000>;
reg-names = "axi-base", "apb-base";
+ device_type = "pci";
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
@@ -250,7 +251,6 @@ pcie0: pcie@f8000000 {
<0 0 0 2 &pcie0_intc 1>,
<0 0 0 3 &pcie0_intc 2>,
<0 0 0 4 &pcie0_intc 3>;
- linux,pci-domain = <0>;
max-link-speed = <1>;
msi-map = <0x0 &its 0x0 0x1000>;
phys = <&pcie_phy 0>, <&pcie_phy 1>,
--
2.27.0
The patch titled
Subject: mm: hugetlb: fix missing put_page in gather_surplus_pages()
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-missing-put_page-in-gather_surplus_pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: fix missing put_page in gather_surplus_pages()
The VM_BUG_ON_PAGE avoids the generation of any code, even if that
expression has side-effects when !CONFIG_DEBUG_VM.
Link: https://lkml.kernel.org/r/20210126031009.96266-1-songmuchun@bytedance.com
Fixes: e5dfacebe4a4 ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-missing-put_page-in-gather_surplus_pages
+++ a/mm/hugetlb.c
@@ -2047,13 +2047,16 @@ retry:
/* Free the needed pages to the hugetlb pool */
list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
+ int zeroed;
+
if ((--needed) < 0)
break;
/*
* This page is now managed by the hugetlb allocator and has
* no users -- drop the buddy allocator's reference.
*/
- VM_BUG_ON_PAGE(!put_page_testzero(page), page);
+ zeroed = put_page_testzero(page);
+ VM_BUG_ON_PAGE(!zeroed, page);
enqueue_huge_page(h, page);
}
free:
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
The patch titled
Subject: memblock: do not start bottom-up allocations with kernel_end
has been removed from the -mm tree. Its filename was
memblock-do-not-start-bottom-up-allocations-with-kernel_end.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: memblock: do not start bottom-up allocations with kernel_end
With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:
[ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[ 0.002921] ------------[ cut here ]------------
[ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
[ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
[ 0.002937] Modules linked in:
[ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
[ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
[ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
[ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
[ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
[ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
[ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
[ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
[ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
[ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
[ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
[ 0.002956] Call Trace:
[ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e
[ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c
[ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128
[ 0.002968] ? flush_tlb_one_kernel+0xc/0x20
[ 0.002969] ? native_set_fixmap+0x82/0xd0
[ 0.002971] ? flat_get_apic_id+0x5/0x10
[ 0.002973] ? register_lapic_address+0x8e/0x97
[ 0.002975] ? setup_arch+0x8a5/0xc3f
[ 0.002978] ? start_kernel+0x66/0x547
[ 0.002980] ? load_ucode_bsp+0x4c/0xcd
[ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb
[ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
[ 0.002988] ---[ end trace f151227d0b39be70 ]---
At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure. All together it
simplifies the logic.
Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 49 +++++-------------------------------------------
1 file changed, 6 insertions(+), 43 deletions(-)
--- a/mm/memblock.c~memblock-do-not-start-bottom-up-allocations-with-kernel_end
+++ a/mm/memblock.c
@@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr
*
* Find @size free area aligned to @align in the specified range and node.
*
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
* Return:
* Found address on success, 0 on failure.
*/
@@ -291,8 +283,6 @@ static phys_addr_t __init_memblock membl
phys_addr_t end, int nid,
enum memblock_flags flags)
{
- phys_addr_t kernel_end, ret;
-
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@ static phys_addr_t __init_memblock membl
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
- kernel_end = __pa_symbol(_end);
-
- /*
- * try bottom-up allocation only when bottom-up mode
- * is set and @end is above the kernel image.
- */
- if (memblock_bottom_up() && end > kernel_end) {
- phys_addr_t bottom_up_start;
-
- /* make sure we will allocate above the kernel */
- bottom_up_start = max(start, kernel_end);
-
- /* ok, try bottom-up allocation first */
- ret = __memblock_find_range_bottom_up(bottom_up_start, end,
- size, align, nid, flags);
- if (ret)
- return ret;
-
- /*
- * we always limit bottom-up allocation above the kernel,
- * but top-down allocation doesn't have the limit, so
- * retrying top-down allocation may succeed when bottom-up
- * allocation failed.
- *
- * bottom-up allocation is expected to be fail very rarely,
- * so we use WARN_ONCE() here to see the stack trace if
- * fail happens.
- */
- WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
- "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
- }
- return __memblock_find_range_top_down(start, end, size, align, nid,
- flags);
+ if (memblock_bottom_up())
+ return __memblock_find_range_bottom_up(start, end, size, align,
+ nid, flags);
+ else
+ return __memblock_find_range_top_down(start, end, size, align,
+ nid, flags);
}
/**
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch
mm-kmem-make-__memcg_kmem_uncharge-static.patch
mm-cma-allocate-cma-areas-bottom-up.patch
mm-cma-allocate-cma-areas-bottom-up-fix.patch
mm-cma-allocate-cma-areas-bottom-up-fix-2.patch
mm-cma-allocate-cma-areas-bottom-up-fix-3.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
The patch titled
Subject: mm: thp: fix MADV_REMOVE deadlock on shmem THP
has been removed from the -mm tree. Its filename was
mm-thp-fix-madv_remove-deadlock-on-shmem-thp.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: thp: fix MADV_REMOVE deadlock on shmem THP
Sergey reported deadlock between kswapd correctly doing its usual
lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and
madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing
down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).
This happened when shmem_fallocate(punch hole)'s unmap_mapping_range()
reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock
could occur when partially truncating a mapped huge tmpfs file, or using
fallocate(FALLOC_FL_PUNCH_HOLE) on it.
__split_huge_pmd()'s page lock was added in 5.8, to make sure that any
concurrent use of reuse_swap_page() (holding page lock) could not catch
the anon THP's mapcounts and swapcounts while they were being split.
Fortunately, reuse_swap_page() is never applied to a shmem or file THP
(not even by khugepaged, which checks PageSwapCache before calling), and
anonymous THPs are never created in shmem or file areas: so that
__split_huge_pmd()'s page lock can only be necessary for anonymous THPs,
on which there is no risk of deadlock with i_mmap_rwsem.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils
Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 37 +++++++++++++++++++++++--------------
1 file changed, 23 insertions(+), 14 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-madv_remove-deadlock-on-shmem-thp
+++ a/mm/huge_memory.c
@@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_str
{
spinlock_t *ptl;
struct mmu_notifier_range range;
- bool was_locked = false;
+ bool do_unlock_page = false;
pmd_t _pmd;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
@@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_str
VM_BUG_ON(freeze && !page);
if (page) {
VM_WARN_ON_ONCE(!PageLocked(page));
- was_locked = true;
if (page != pmd_page(*pmd))
goto out;
}
@@ -2227,19 +2226,29 @@ repeat:
if (pmd_trans_huge(*pmd)) {
if (!page) {
page = pmd_page(*pmd);
- if (unlikely(!trylock_page(page))) {
- get_page(page);
- _pmd = *pmd;
- spin_unlock(ptl);
- lock_page(page);
- spin_lock(ptl);
- if (unlikely(!pmd_same(*pmd, _pmd))) {
- unlock_page(page);
+ /*
+ * An anonymous page must be locked, to ensure that a
+ * concurrent reuse_swap_page() sees stable mapcount;
+ * but reuse_swap_page() is not used on shmem or file,
+ * and page lock must not be taken when zap_pmd_range()
+ * calls __split_huge_pmd() while i_mmap_lock is held.
+ */
+ if (PageAnon(page)) {
+ if (unlikely(!trylock_page(page))) {
+ get_page(page);
+ _pmd = *pmd;
+ spin_unlock(ptl);
+ lock_page(page);
+ spin_lock(ptl);
+ if (unlikely(!pmd_same(*pmd, _pmd))) {
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ goto repeat;
+ }
put_page(page);
- page = NULL;
- goto repeat;
}
- put_page(page);
+ do_unlock_page = true;
}
}
if (PageMlocked(page))
@@ -2249,7 +2258,7 @@ repeat:
__split_huge_pmd_locked(vma, pmd, range.start, freeze);
out:
spin_unlock(ptl);
- if (!was_locked && page)
+ if (do_unlock_page)
unlock_page(page);
/*
* No need to double call mmu_notifier->invalidate_range() callback.
_
Patches currently in -mm which might be from hughd(a)google.com are
The patch titled
Subject: mm/vmalloc: separate put pages and flush VM flags
has been removed from the -mm tree. Its filename was
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Subject: mm/vmalloc: separate put pages and flush VM flags
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things
so flags are less likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Suggested-by: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Daniel Axtens <dja(a)axtens.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/vmalloc.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-separate-put-pages-and-flush-vm-flags
+++ a/include/linux/vmalloc.h
@@ -24,7 +24,8 @@ struct notifier_block; /* in notifier.h
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
-#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */
+#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
+#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
/*
* VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
@@ -37,12 +38,6 @@ struct notifier_block; /* in notifier.h
* determine which allocations need the module shadow freed.
*/
-/*
- * Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with
- * vfree_atomic().
- */
-#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */
-
/* bits [20..32] reserved for arch specific ioremap internals */
/*
_
Patches currently in -mm which might be from rick.p.edgecombe(a)intel.com are
I'm announcing the release of the 4.4.256 kernel.
This, and the 4.9.256 release are a little bit "different" than normal.
This contains only 1 patch, just the version bump from .255 to .256 which ends
up causing the userspace-visable LINUX_VERSION_CODE to behave a bit differently
than normal due to the "overflow".
With this release, KERNEL_VERSION(4, 4, 256) is the same as KERNEL_VERSION(4, 5, 0).
Nothing in the kernel build itself breaks with this change, but given that this
is a userspace visible change, and some crazy tools (like glibc and gcc) have
logic that checks the kernel version for different reasons, I wanted to do this
release as an "empty" release to ensure that everything still works properly.
So, this is a YOU MUST UPGRADE requirement of a release. If you rely on the
4.4.y kernel, please throw this release into your test builds and rebuild the
world and let us know if anything breaks, or if all is well.
Go forth and do full system rebuilds! Yocto and Gentoo are great for this, as
will systems that use buildroot.
I'll try to hold off on doing a "real" 4.4.y release for a week to give
everyone a chance to test this out and get back to me. The pending patches in
the 4.4.y queue are pretty serious, so I am loath to wait longer than that,
consider yourself warned...
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Greg Kroah-Hartman (1):
Linux 4.4.256
The patch titled
Subject: mm, compaction: move high_pfn to the for loop scope
has been removed from the -mm tree. Its filename was
mm-compaction-move-high_pfn-to-the-for-loop-scope.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Rokudo Yan <wu-yan(a)tcl.com>
Subject: mm, compaction: move high_pfn to the for loop scope
In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >=
low_fn) not found. But the high_pfn is not reset before searching an free
area, so when it was used as freepage, it may from another free area
searched before. And move_freelist_head(freelist, freepage) will have
unexpected behavior(eg. corrupt the MOVABLE freelist)
Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges
-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
---|end of frame
The issue was reported on an smart phone product with 6GB ram and 3GB zram
as swap device.
This patch fixes the issue by reset high_pfn before searching each free
area, which ensure freepage and freelist match when call
move_freelist_head in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Rokudo Yan <wu-yan(a)tcl.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/compaction.c~mm-compaction-move-high_pfn-to-the-for-loop-scope
+++ a/mm/compaction.c
@@ -1342,7 +1342,7 @@ fast_isolate_freepages(struct compact_co
{
unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
unsigned int nr_scanned = 0;
- unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+ unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
unsigned long distance;
struct page *page = NULL;
@@ -1387,6 +1387,7 @@ fast_isolate_freepages(struct compact_co
struct page *freepage;
unsigned long flags;
unsigned int order_scanned = 0;
+ unsigned long high_pfn = 0;
if (!area->nr_free)
continue;
_
Patches currently in -mm which might be from wu-yan(a)tcl.com are
zsmalloc-account-the-number-of-compacted-pages-correctly.patch
The patch titled
Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
has been removed from the -mm tree. Its filename was
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
The page_huge_active() can be called from scan_movable_pages() which do
not hold a reference count to the HugeTLB page. So when we call
page_huge_active() from scan_movable_pages(), the HugeTLB page can be
freed parallel. Then we will trigger a BUG_ON which is in the
page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
VM_BUG_ON_PAGE.
Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com
Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active
+++ a/mm/hugetlb.c
@@ -1361,8 +1361,7 @@ struct hstate *size_to_hstate(unsigned l
*/
bool page_huge_active(struct page *page)
{
- VM_BUG_ON_PAGE(!PageHuge(page), page);
- return PageHead(page) && PagePrivate(&page[1]);
+ return PageHeadHuge(page) && PagePrivate(&page[1]);
}
/* never called for tail page */
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
The patch titled
Subject: mm: hugetlb: fix a race between isolating and freeing page
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: fix a race between isolating and freeing page
There is a race between isolate_huge_page() and __free_huge_page().
CPU0: CPU1:
if (PageHuge(page))
put_page(page)
__free_huge_page(page)
spin_lock(&hugetlb_lock)
update_and_free_page(page)
set_compound_page_dtor(page,
NULL_COMPOUND_DTOR)
spin_unlock(&hugetlb_lock)
isolate_huge_page(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHead(page), page)
spin_lock(&hugetlb_lock)
page_huge_active(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHuge(page), page)
spin_unlock(&hugetlb_lock)
When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
it is already freed to the buddy allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-isolating-and-freeing-page
+++ a/mm/hugetlb.c
@@ -5594,9 +5594,9 @@ bool isolate_huge_page(struct page *page
{
bool ret = true;
- VM_BUG_ON_PAGE(!PageHead(page), page);
spin_lock(&hugetlb_lock);
- if (!page_huge_active(page) || !get_page_unless_zero(page)) {
+ if (!PageHeadHuge(page) || !page_huge_active(page) ||
+ !get_page_unless_zero(page)) {
ret = false;
goto unlock;
}
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
The patch titled
Subject: mm: hugetlb: fix a race between freeing and dissolving the page
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: fix a race between freeing and dissolving the page
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race windows is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list
when it is dissolved.
As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page
+++ a/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hst
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_no
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hs
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page
{
int rc = -EBUSY;
+retry:
/* Not to disrupt normal path by vainly holding hugetlb_lock */
if (!PageHuge(page))
return 0;
@@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head))) {
+ spin_unlock(&hugetlb_lock);
+ cond_resched();
+
+ /*
+ * Theoretically, we should return -EBUSY when we
+ * encounter this race. In fact, we have a chance
+ * to successfully dissolve the page if we do a
+ * retry. Because the race window is quite small.
+ * If we seize this opportunity, it is an optimization
+ * for increasing the success rate of dissolving page.
+ */
+ goto retry;
+ }
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
The patch titled
Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
has been removed from the -mm tree. Its filename was
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
If a new hugetlb page is allocated during fallocate it will not be marked
as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to move
that page. Such a failure would be unexpected and wrong.
Only export set_page_huge_active, just leave clear_page_huge_active as
static. Because there are no external users.
Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com
Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 3 ++-
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/fs/hugetlbfs/inode.c
@@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct f
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ set_page_huge_active(page);
/*
* unlock_page because locked by add_to_page_cache()
- * page_put due to reference from alloc_huge_page()
+ * put_page() due to reference from alloc_huge_page()
*/
unlock_page(page);
put_page(page);
--- a/include/linux/hugetlb.h~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/include/linux/hugetlb.h
@@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot
}
#endif
+void set_page_huge_active(struct page *page);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
--- a/mm/hugetlb.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/mm/hugetlb.c
@@ -1349,7 +1349,7 @@ bool page_huge_active(struct page *page)
}
/* never called for tail page */
-static void set_page_huge_active(struct page *page)
+void set_page_huge_active(struct page *page)
{
VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
SetPagePrivate(&page[1]);
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 29 Jan 2021 10:13:53 -0500
Subject: [PATCH] fgraph: Initialize tracing_graph_pause at task creation
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.
The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.
CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)
start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)
unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)
The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.
There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.
Cc: stable(a)vger.kernel.org
Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois(a)arm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/init/init_task.c b/init/init_task.c
index 8a992d73e6fb..3711cdaafed2 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -198,7 +198,8 @@ struct task_struct init_task
.lockdep_recursion = 0,
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- .ret_stack = NULL,
+ .ret_stack = NULL,
+ .tracing_graph_pause = ATOMIC_INIT(0),
#endif
#if defined(CONFIG_TRACING) && defined(CONFIG_PREEMPTION)
.trace_recursion = 0,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 73edb9e4f354..29a6ebeebc9e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -394,7 +394,6 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
}
if (t->ret_stack == NULL) {
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->curr_ret_stack = -1;
t->curr_ret_depth = -1;
@@ -489,7 +488,6 @@ static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
static void
graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
{
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visible before we add the ret_stack */
The TypeC FIA can be powered down if the TC-COLD power state is allowed,
so block the TC-COLD state when initializing the FIA.
Note that this isn't needed on ICL where the FIA is never modular and
which has no generic way to block TC-COLD (except for platforms with a
legacy TypeC port and on those too only via these legacy ports, not via
a DP-alt/TBT port).
Cc: <stable(a)vger.kernel.org> # v5.10+
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Reported-by: Paul Menzel <pmenzel(a)molgen.mpg.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3027
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_tc.c | 67 ++++++++++++++-----------
1 file changed, 37 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c
index 27dc2dad6809c..2cefc13535a0f 100644
--- a/drivers/gpu/drm/i915/display/intel_tc.c
+++ b/drivers/gpu/drm/i915/display/intel_tc.c
@@ -23,36 +23,6 @@ static const char *tc_port_mode_name(enum tc_port_mode mode)
return names[mode];
}
-static void
-tc_port_load_fia_params(struct drm_i915_private *i915,
- struct intel_digital_port *dig_port)
-{
- enum port port = dig_port->base.port;
- enum tc_port tc_port = intel_port_to_tc(i915, port);
- u32 modular_fia;
-
- if (INTEL_INFO(i915)->display.has_modular_fia) {
- modular_fia = intel_uncore_read(&i915->uncore,
- PORT_TX_DFLEXDPSP(FIA1));
- drm_WARN_ON(&i915->drm, modular_fia == 0xffffffff);
- modular_fia &= MODULAR_FIA_MASK;
- } else {
- modular_fia = 0;
- }
-
- /*
- * Each Modular FIA instance houses 2 TC ports. In SOC that has more
- * than two TC ports, there are multiple instances of Modular FIA.
- */
- if (modular_fia) {
- dig_port->tc_phy_fia = tc_port / 2;
- dig_port->tc_phy_fia_idx = tc_port % 2;
- } else {
- dig_port->tc_phy_fia = FIA1;
- dig_port->tc_phy_fia_idx = tc_port;
- }
-}
-
static enum intel_display_power_domain
tc_cold_get_power_domain(struct intel_digital_port *dig_port)
{
@@ -646,6 +616,43 @@ void intel_tc_port_put_link(struct intel_digital_port *dig_port)
mutex_unlock(&dig_port->tc_lock);
}
+static bool
+tc_has_modular_fia(struct drm_i915_private *i915, struct intel_digital_port *dig_port)
+{
+ intel_wakeref_t wakeref;
+ u32 val;
+
+ if (!INTEL_INFO(i915)->display.has_modular_fia)
+ return false;
+
+ wakeref = tc_cold_block(dig_port);
+ val = intel_uncore_read(&i915->uncore, PORT_TX_DFLEXDPSP(FIA1));
+ tc_cold_unblock(dig_port, wakeref);
+
+ drm_WARN_ON(&i915->drm, val == 0xffffffff);
+
+ return val & MODULAR_FIA_MASK;
+}
+
+static void
+tc_port_load_fia_params(struct drm_i915_private *i915, struct intel_digital_port *dig_port)
+{
+ enum port port = dig_port->base.port;
+ enum tc_port tc_port = intel_port_to_tc(i915, port);
+
+ /*
+ * Each Modular FIA instance houses 2 TC ports. In SOC that has more
+ * than two TC ports, there are multiple instances of Modular FIA.
+ */
+ if (tc_has_modular_fia(i915, dig_port)) {
+ dig_port->tc_phy_fia = tc_port / 2;
+ dig_port->tc_phy_fia_idx = tc_port % 2;
+ } else {
+ dig_port->tc_phy_fia = FIA1;
+ dig_port->tc_phy_fia_idx = tc_port;
+ }
+}
+
void intel_tc_port_init(struct intel_digital_port *dig_port, bool is_legacy)
{
struct drm_i915_private *i915 = to_i915(dig_port->base.base.dev);
--
2.25.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3241929b67d28c83945d3191c6816a3271fd6b85 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Mon, 1 Feb 2021 16:08:03 +0100
Subject: [PATCH] usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada
3720
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Older ATF does not provide SMC call for USB 3.0 phy power on functionality
and therefore initialization of xhci-hcd is failing when older version of
ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP.
[ 3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware
[ 3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95
[ 3.123465] xhci-hcd: probe of d0058000.usb failed with error -95
This patch introduces a new plat_setup callback for xhci platform drivers
which is called prior calling usb_add_hcd() function. This function at its
beginning skips PHY init if hcd->skip_phy_initialization is set.
Current init_quirk callback for xhci platform drivers is called from
xhci_plat_setup() function which is called after chip reset completes.
It happens in the middle of the usb_add_hcd() function and therefore this
callback cannot be used for setting if PHY init should be skipped or not.
For Armada 3720 this patch introduce a new xhci_mvebu_a3700_plat_setup()
function configured as a xhci platform plat_setup callback. This new
function calls phy_power_on() and in case it returns -EOPNOTSUPP then
XHCI_SKIP_PHY_INIT quirk is set to instruct xhci-plat to skip PHY
initialization.
This patch fixes above failure by ignoring 'not supported' error in
xhci-hcd driver. In this case it is expected that phy is already power on.
It fixes initialization of xhci-hcd on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power.
This is regression introduced in commit bd3d25b07342 ("arm64: dts: marvell:
armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined
and therefore xhci-hcd on Espressobin with older ATF started failing.
Fixes: bd3d25b07342 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs")
Cc: <stable(a)vger.kernel.org> # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Cc: <stable(a)vger.kernel.org> # 5.1+: f768e718911e: usb: host: xhci-plat: add priv quirk for skip PHY initialization
Tested-by: Tomasz Maciej Nowak <tmn505(a)gmail.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com> # On R-Car
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com> # xhci-plat
Acked-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Link: https://lore.kernel.org/r/20210201150803.7305-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mvebu.c b/drivers/usb/host/xhci-mvebu.c
index 60651a50770f..8ca1a235d164 100644
--- a/drivers/usb/host/xhci-mvebu.c
+++ b/drivers/usb/host/xhci-mvebu.c
@@ -8,6 +8,7 @@
#include <linux/mbus.h>
#include <linux/of.h>
#include <linux/platform_device.h>
+#include <linux/phy/phy.h>
#include <linux/usb.h>
#include <linux/usb/hcd.h>
@@ -74,6 +75,47 @@ int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
return 0;
}
+int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd)
+{
+ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+ struct device *dev = hcd->self.controller;
+ struct phy *phy;
+ int ret;
+
+ /* Old bindings miss the PHY handle */
+ phy = of_phy_get(dev->of_node, "usb3-phy");
+ if (IS_ERR(phy) && PTR_ERR(phy) == -EPROBE_DEFER)
+ return -EPROBE_DEFER;
+ else if (IS_ERR(phy))
+ goto phy_out;
+
+ ret = phy_init(phy);
+ if (ret)
+ goto phy_put;
+
+ ret = phy_set_mode(phy, PHY_MODE_USB_HOST_SS);
+ if (ret)
+ goto phy_exit;
+
+ ret = phy_power_on(phy);
+ if (ret == -EOPNOTSUPP) {
+ /* Skip initializatin of XHCI PHY when it is unsupported by firmware */
+ dev_warn(dev, "PHY unsupported by firmware\n");
+ xhci->quirks |= XHCI_SKIP_PHY_INIT;
+ }
+ if (ret)
+ goto phy_exit;
+
+ phy_power_off(phy);
+phy_exit:
+ phy_exit(phy);
+phy_put:
+ of_phy_put(phy);
+phy_out:
+
+ return 0;
+}
+
int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd)
{
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
diff --git a/drivers/usb/host/xhci-mvebu.h b/drivers/usb/host/xhci-mvebu.h
index 3be021793cc8..01bf3fcb3eca 100644
--- a/drivers/usb/host/xhci-mvebu.h
+++ b/drivers/usb/host/xhci-mvebu.h
@@ -12,6 +12,7 @@ struct usb_hcd;
#if IS_ENABLED(CONFIG_USB_XHCI_MVEBU)
int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd);
+int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd);
int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd);
#else
static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
@@ -19,6 +20,11 @@ static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
return 0;
}
+static inline int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd)
+{
+ return 0;
+}
+
static inline int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd)
{
return 0;
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 4d34f6005381..c1edcc9b13ce 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -44,6 +44,16 @@ static void xhci_priv_plat_start(struct usb_hcd *hcd)
priv->plat_start(hcd);
}
+static int xhci_priv_plat_setup(struct usb_hcd *hcd)
+{
+ struct xhci_plat_priv *priv = hcd_to_xhci_priv(hcd);
+
+ if (!priv->plat_setup)
+ return 0;
+
+ return priv->plat_setup(hcd);
+}
+
static int xhci_priv_init_quirk(struct usb_hcd *hcd)
{
struct xhci_plat_priv *priv = hcd_to_xhci_priv(hcd);
@@ -111,6 +121,7 @@ static const struct xhci_plat_priv xhci_plat_marvell_armada = {
};
static const struct xhci_plat_priv xhci_plat_marvell_armada3700 = {
+ .plat_setup = xhci_mvebu_a3700_plat_setup,
.init_quirk = xhci_mvebu_a3700_init_quirk,
};
@@ -330,7 +341,14 @@ static int xhci_plat_probe(struct platform_device *pdev)
hcd->tpl_support = of_usb_host_tpl_support(sysdev->of_node);
xhci->shared_hcd->tpl_support = hcd->tpl_support;
- if (priv && (priv->quirks & XHCI_SKIP_PHY_INIT))
+
+ if (priv) {
+ ret = xhci_priv_plat_setup(hcd);
+ if (ret)
+ goto disable_usb_phy;
+ }
+
+ if ((xhci->quirks & XHCI_SKIP_PHY_INIT) || (priv && (priv->quirks & XHCI_SKIP_PHY_INIT)))
hcd->skip_phy_initialization = 1;
if (priv && (priv->quirks & XHCI_SG_TRB_CACHE_SIZE_QUIRK))
diff --git a/drivers/usb/host/xhci-plat.h b/drivers/usb/host/xhci-plat.h
index 1fb149d1fbce..561d0b7bce09 100644
--- a/drivers/usb/host/xhci-plat.h
+++ b/drivers/usb/host/xhci-plat.h
@@ -13,6 +13,7 @@
struct xhci_plat_priv {
const char *firmware_name;
unsigned long long quirks;
+ int (*plat_setup)(struct usb_hcd *);
void (*plat_start)(struct usb_hcd *);
int (*init_quirk)(struct usb_hcd *);
int (*suspend_quirk)(struct usb_hcd *);
Dear Friend:
I am sorry to encroach into your privacy in this manner, I find
it appropriate to offer you my partnership in business, I only
pray at this time that your email address is still valid. I am
contacting you because my status would not permit me to do this
alone.
I want to solicit your attention to receive money on my behalf.
Can I trust you to do business?
I will send you the full details when I receive your reply.
Best wishes.
The erratum 1024718 affects Cortex-A55 r0p0 to r2p0. However
we apply the work around for r0p0 - r1p0. Unfortunately this
won't be fixed for the future revisions for the CPU. Thus
extend the work around for all versions of A55, to cover
for r2p0 and any future revisions.
Cc: stable(a)vger.kernel.org
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: James Morse <james.morse(a)arm.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
arch/arm64/kernel/cpufeature.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e99eddec0a46..db400ca77427 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1455,7 +1455,7 @@ static bool cpu_has_broken_dbm(void)
/* List of CPUs which have broken DBM support. */
static const struct midr_range cpus[] = {
#ifdef CONFIG_ARM64_ERRATUM_1024718
- MIDR_RANGE(MIDR_CORTEX_A55, 0, 0, 1, 0), // A55 r0p0 -r1p0
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A55),
/* Kryo4xx Silver (rdpe => r1p0) */
MIDR_REV(MIDR_QCOM_KRYO_4XX_SILVER, 0xd, 0xe),
#endif
--
2.24.1
The first three patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.
Patches 4 and 5 are some additions to event handling in order to add
some per pv-device statistics to sysfs and the ability to have a per
backend device spurious event delay control.
Patches 6 and 7 are minor fixes I had lying around.
Juergen Gross (7):
xen/events: reset affinity of 2-level event initially
xen/events: don't unmask an event channel when an eoi is pending
xen/events: fix lateeoi irq acknowledgement
xen/events: link interdomain events to associated xenbus device
xen/events: add per-xenbus device event statistics and settings
xen/evtch: use smp barriers for user event ring
xen/evtchn: read producer index only once
drivers/block/xen-blkback/xenbus.c | 2 +-
drivers/net/xen-netback/interface.c | 16 ++--
drivers/xen/events/events_2l.c | 20 +++++
drivers/xen/events/events_base.c | 133 ++++++++++++++++++++++------
drivers/xen/evtchn.c | 6 +-
drivers/xen/pvcalls-back.c | 4 +-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 2 +-
drivers/xen/xenbus/xenbus_probe.c | 66 ++++++++++++++
include/xen/events.h | 7 +-
include/xen/xenbus.h | 7 ++
11 files changed, 217 insertions(+), 48 deletions(-)
--
2.26.2
From: Xiao Ni <xni(a)redhat.com>
One customer reports a crash problem which causes by flush request. It
triggers a warning before crash.
/* new request after previous flush is completed */
if (ktime_after(req_start, mddev->prev_flush_start)) {
WARN_ON(mddev->flush_bio);
mddev->flush_bio = bio;
bio = NULL;
}
The WARN_ON is triggered. We use spin lock to protect prev_flush_start and
flush_bio in md_flush_request. But there is no lock protection in
md_submit_flush_data. It can set flush_bio to NULL first because of
compiler reordering write instructions.
For example, flush bio1 sets flush bio to NULL first in
md_submit_flush_data. An interrupt or vmware causing an extended stall
happen between updating flush_bio and prev_flush_start. Because flush_bio
is NULL, flush bio2 can get the lock and submit to underlayer disks. Then
flush bio1 updates prev_flush_start after the interrupt or extended stall.
Then flush bio3 enters in md_flush_request. The start time req_start is
behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't
finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK
again. INIT_WORK() will re-initialize the list pointers in the
work_struct, which then can result in a corrupted work list and the
work_struct queued a second time. With the work list corrupted, it can
lead in invalid work items being used and cause a crash in
process_one_work.
We need to make sure only one flush bio can be handled at one same time.
So add spin lock in md_submit_flush_data to protect prev_flush_start and
flush_bio in an atomic way.
Reviewed-by: David Jeffery <djeffery(a)redhat.com>
Signed-off-by: Xiao Ni <xni(a)redhat.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
[jwang: backport dc5d17a3c39b06aef866afca19245a9cfb533a79 to 4.19]
Signed-off-by: Jack Wang <jinpu.wang(a)cloud.ionos.com>
---
drivers/md/md.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ea139d0c0bc3..2bd60bd9e2ca 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -639,8 +639,10 @@ static void md_submit_flush_data(struct work_struct *ws)
* could wait for this and below md_handle_request could wait for those
* bios because of suspend check
*/
+ spin_lock_irq(&mddev->lock);
mddev->last_flush = mddev->start_flush;
mddev->flush_bio = NULL;
+ spin_unlock_irq(&mddev->lock);
wake_up(&mddev->sb_wait);
if (bio->bi_iter.bi_size == 0) {
--
2.25.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f295c8cfec833c2707ff1512da10d65386dde7af Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied(a)redhat.com>
Date: Mon, 1 Feb 2021 10:56:32 +1000
Subject: [PATCH] drm/nouveau: fix dma syncing warning with debugging on.
Since I wrote the below patch if you run a debug kernel you can a
dma debug warning like:
nouveau 0000:1f:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000016e012000] [size=4096 bytes]
The old nouveau code wasn't consolidate the pages like the ttm code,
but the dma-debug expects the sync code to give it the same base/range
pairs as the allocator.
Fix the nouveau sync code to consolidate pages before calling the
sync code.
Fixes: bd549d35b4be0 ("nouveau: use ttm populate mapping functions. (v2)")
Reported-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Ben Skeggs <bskeggs(a)redhat.com>
Signed-off-by: Dave Airlie <airlied(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/417588/
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c85b1af06b7b..7ea367a5444d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -547,7 +547,7 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
{
struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
struct ttm_tt *ttm_dma = (struct ttm_tt *)nvbo->bo.ttm;
- int i;
+ int i, j;
if (!ttm_dma)
return;
@@ -556,10 +556,21 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
if (nvbo->force_coherent)
return;
- for (i = 0; i < ttm_dma->num_pages; i++)
+ for (i = 0; i < ttm_dma->num_pages; ++i) {
+ struct page *p = ttm_dma->pages[i];
+ size_t num_pages = 1;
+
+ for (j = i + 1; j < ttm_dma->num_pages; ++j) {
+ if (++p != ttm_dma->pages[j])
+ break;
+
+ ++num_pages;
+ }
dma_sync_single_for_device(drm->dev->dev,
ttm_dma->dma_address[i],
- PAGE_SIZE, DMA_TO_DEVICE);
+ num_pages * PAGE_SIZE, DMA_TO_DEVICE);
+ i += num_pages;
+ }
}
void
@@ -567,7 +578,7 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
{
struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
struct ttm_tt *ttm_dma = (struct ttm_tt *)nvbo->bo.ttm;
- int i;
+ int i, j;
if (!ttm_dma)
return;
@@ -576,9 +587,21 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
if (nvbo->force_coherent)
return;
- for (i = 0; i < ttm_dma->num_pages; i++)
+ for (i = 0; i < ttm_dma->num_pages; ++i) {
+ struct page *p = ttm_dma->pages[i];
+ size_t num_pages = 1;
+
+ for (j = i + 1; j < ttm_dma->num_pages; ++j) {
+ if (++p != ttm_dma->pages[j])
+ break;
+
+ ++num_pages;
+ }
+
dma_sync_single_for_cpu(drm->dev->dev, ttm_dma->dma_address[i],
- PAGE_SIZE, DMA_FROM_DEVICE);
+ num_pages * PAGE_SIZE, DMA_FROM_DEVICE);
+ i += num_pages;
+ }
}
void nouveau_bo_add_io_reserve_lru(struct ttm_buffer_object *bo)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 29b32839725f8c89a41cb6ee054c85f3116ea8b5 Mon Sep 17 00:00:00 2001
From: Nadav Amit <namit(a)vmware.com>
Date: Wed, 27 Jan 2021 09:53:17 -0800
Subject: [PATCH] iommu/vt-d: Do not use flush-queue when caching-mode is on
When an Intel IOMMU is virtualized, and a physical device is
passed-through to the VM, changes of the virtual IOMMU need to be
propagated to the physical IOMMU. The hypervisor therefore needs to
monitor PTE mappings in the IOMMU page-tables. Intel specifications
provide "caching-mode" capability that a virtual IOMMU uses to report
that the IOMMU is virtualized and a TLB flush is needed after mapping to
allow the hypervisor to propagate virtual IOMMU mappings to the physical
IOMMU. To the best of my knowledge no real physical IOMMU reports
"caching-mode" as turned on.
Synchronizing the virtual and the physical IOMMU tables is expensive if
the hypervisor is unaware which PTEs have changed, as the hypervisor is
required to walk all the virtualized tables and look for changes.
Consequently, domain flushes are much more expensive than page-specific
flushes on virtualized IOMMUs with passthrough devices. The kernel
therefore exploited the "caching-mode" indication to avoid domain
flushing and use page-specific flushing in virtualized environments. See
commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching
mode.")
This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use
of iova deferred flushing"). Now, when batched TLB flushing is used (the
default), full TLB domain flushes are performed frequently, requiring
the hypervisor to perform expensive synchronization between the virtual
TLB and the physical one.
Getting batched TLB flushes to use page-specific invalidations again in
such circumstances is not easy, since the TLB invalidation scheme
assumes that "full" domain TLB flushes are performed for scalability.
Disable batched TLB flushes when caching-mode is on, as the performance
benefit from using batched TLB invalidations is likely to be much
smaller than the overhead of the virtual-to-physical IOMMU page-tables
synchronization.
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: stable(a)vger.kernel.org
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210127175317.1600473-1-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f665322a0991..06b00b5363d8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5440,6 +5440,36 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
return ret;
}
+static bool domain_use_flush_queue(void)
+{
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+ bool r = true;
+
+ if (intel_iommu_strict)
+ return false;
+
+ /*
+ * The flush queue implementation does not perform page-selective
+ * invalidations that are required for efficient TLB flushes in virtual
+ * environments. The benefit of batching is likely to be much lower than
+ * the overhead of synchronizing the virtual and physical IOMMU
+ * page-tables.
+ */
+ rcu_read_lock();
+ for_each_active_iommu(iommu, drhd) {
+ if (!cap_caching_mode(iommu->cap))
+ continue;
+
+ pr_warn_once("IOMMU batching is disabled due to virtualization");
+ r = false;
+ break;
+ }
+ rcu_read_unlock();
+
+ return r;
+}
+
static int
intel_iommu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
@@ -5450,7 +5480,7 @@ intel_iommu_domain_get_attr(struct iommu_domain *domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
- *(int *)data = !intel_iommu_strict;
+ *(int *)data = domain_use_flush_queue();
return 0;
default:
return -ENODEV;
Upstream commit 81b704d3e4674e09781d331df73d76675d5ad8cb
Applies to 4.4-stable tree
----------->8---------------
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Thu, 14 Jan 2021 19:34:22 +0100
Subject: [PATCH] ACPI: thermal: Do not call acpi_thermal_check() directly
Calling acpi_thermal_check() from acpi_thermal_notify() directly
is problematic if _TMP triggers Notify () on the thermal zone for
which it has been evaluated (which happens on some systems), because
it causes a new acpi_thermal_notify() invocation to be queued up
every time and if that takes place too often, an indefinite number of
pending work items may accumulate in kacpi_notify_wq over time.
Besides, it is not really useful to queue up a new invocation of
acpi_thermal_check() if one of them is pending already.
For these reasons, rework acpi_thermal_notify() to queue up a thermal
check instead of calling acpi_thermal_check() directly and only allow
one thermal check to be pending at a time. Moreover, only allow one
acpi_thermal_check_fn() instance at a time to run
thermal_zone_device_update() for one thermal zone and make it return
early if it sees other instances running for the same thermal zone.
While at it, fold acpi_thermal_check() into acpi_thermal_check_fn(),
as it is only called from there after the other changes made here.
[This issue appears to have been exposed by commit 6d25be5782e4
("sched/core, workqueues: Distangle worker accounting from rq
lock"), but it is unclear why it was not visible earlier.]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208877
Reported-by: Stephen Berman <stephen.berman(a)gmx.net>
Diagnosed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Tested-by: Stephen Berman <stephen.berman(a)gmx.net>
Cc: All applicable <stable(a)vger.kernel.org>
[bigeasy: Backported to v4.4.y, use atomic_t instead of refcount_t]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
drivers/acpi/thermal.c | 54 +++++++++++++++++++++++++++++-------------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 82707f9824cae..b4826335ad0b3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -188,6 +188,8 @@ struct acpi_thermal {
int tz_enabled;
int kelvin_offset;
struct work_struct thermal_check_work;
+ struct mutex thermal_check_lock;
+ atomic_t thermal_check_count;
};
/* --------------------------------------------------------------------------
@@ -513,16 +515,6 @@ static int acpi_thermal_get_trip_points(struct acpi_thermal *tz)
return 0;
}
-static void acpi_thermal_check(void *data)
-{
- struct acpi_thermal *tz = data;
-
- if (!tz->tz_enabled)
- return;
-
- thermal_zone_device_update(tz->thermal_zone);
-}
-
/* sys I/F for generic thermal sysfs support */
static int thermal_get_temp(struct thermal_zone_device *thermal, int *temp)
@@ -556,6 +548,8 @@ static int thermal_get_mode(struct thermal_zone_device *thermal,
return 0;
}
+static void acpi_thermal_check_fn(struct work_struct *work);
+
static int thermal_set_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode mode)
{
@@ -581,7 +575,7 @@ static int thermal_set_mode(struct thermal_zone_device *thermal,
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"%s kernel ACPI thermal control\n",
tz->tz_enabled ? "Enable" : "Disable"));
- acpi_thermal_check(tz);
+ acpi_thermal_check_fn(&tz->thermal_check_work);
}
return 0;
}
@@ -950,6 +944,12 @@ static void acpi_thermal_unregister_thermal_zone(struct acpi_thermal *tz)
Driver Interface
-------------------------------------------------------------------------- */
+static void acpi_queue_thermal_check(struct acpi_thermal *tz)
+{
+ if (!work_pending(&tz->thermal_check_work))
+ queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+}
+
static void acpi_thermal_notify(struct acpi_device *device, u32 event)
{
struct acpi_thermal *tz = acpi_driver_data(device);
@@ -960,17 +960,17 @@ static void acpi_thermal_notify(struct acpi_device *device, u32 event)
switch (event) {
case ACPI_THERMAL_NOTIFY_TEMPERATURE:
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
break;
case ACPI_THERMAL_NOTIFY_THRESHOLDS:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
case ACPI_THERMAL_NOTIFY_DEVICES:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
@@ -1070,7 +1070,27 @@ static void acpi_thermal_check_fn(struct work_struct *work)
{
struct acpi_thermal *tz = container_of(work, struct acpi_thermal,
thermal_check_work);
- acpi_thermal_check(tz);
+
+ if (!tz->tz_enabled)
+ return;
+ /*
+ * In general, it is not sufficient to check the pending bit, because
+ * subsequent instances of this function may be queued after one of them
+ * has started running (e.g. if _TMP sleeps). Avoid bailing out if just
+ * one of them is running, though, because it may have done the actual
+ * check some time ago, so allow at least one of them to block on the
+ * mutex while another one is running the update.
+ */
+ if (!atomic_add_unless(&tz->thermal_check_count, -1, 1))
+ return;
+
+ mutex_lock(&tz->thermal_check_lock);
+
+ thermal_zone_device_update(tz->thermal_zone);
+
+ atomic_inc(&tz->thermal_check_count);
+
+ mutex_unlock(&tz->thermal_check_lock);
}
static int acpi_thermal_add(struct acpi_device *device)
@@ -1102,6 +1122,8 @@ static int acpi_thermal_add(struct acpi_device *device)
if (result)
goto free_memory;
+ atomic_set(&tz->thermal_check_count, 3);
+ mutex_init(&tz->thermal_check_lock);
INIT_WORK(&tz->thermal_check_work, acpi_thermal_check_fn);
pr_info(PREFIX "%s [%s] (%ld C)\n", acpi_device_name(device),
@@ -1167,7 +1189,7 @@ static int acpi_thermal_resume(struct device *dev)
tz->state.active |= tz->trips.active[i].flags.enabled;
}
- queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+ acpi_queue_thermal_check(tz);
return AE_OK;
}
--
2.30.0
This is the start of the stable review cycle for the 5.10.14 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 07 Feb 2021 14:06:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.14-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.14-rc1
Peter Zijlstra <peterz(a)infradead.org>
workqueue: Restrict affinity change to rescuer
Peter Zijlstra <peterz(a)infradead.org>
kthread: Extract KTHREAD_IS_PER_CPU
Gayatri Kammela <gayatri.kammela(a)intel.com>
x86/cpu: Add another Alder Lake CPU to the Intel family
Josh Poimboeuf <jpoimboe(a)redhat.com>
objtool: Don't fail the kernel build on fatal errors
Oded Gabbay <ogabbay(a)kernel.org>
habanalabs: disable FW events on device removal
Oded Gabbay <ogabbay(a)kernel.org>
habanalabs: fix backward compatibility of idle check
Ofir Bitton <obitton(a)habana.ai>
habanalabs: zero pci counters packet before submit to FW
Vladimir Stempen <vladimir.stempen(a)amd.com>
drm/amd/display: Fixed corruptions on HPDRX link loss restore
Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
drm/amd/display: Use hardware sequencer functions for PG control
Bing Guo <bing.guo(a)amd.com>
drm/amd/display: Change function decide_dp_link_settings to avoid infinite looping
Aric Cyr <aric.cyr(a)amd.com>
drm/amd/display: Allow PSTATE chnage when no displays are enabled
Jake Wang <haonan.wang2(a)amd.com>
drm/amd/display: Update dram_clock_change_latency for DCN2.1
Michael Ellerman <mpe(a)ellerman.id.au>
selftests/powerpc: Only test lwm/stmw on big endian
Jeannie Stevenson <jeanniestevenson(a)protonmail.com>
platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control
Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
nvmet: set right status on error in id-ns handler
Klaus Jensen <k.jensen(a)samsung.com>
nvme-pci: allow use of cmb on v1.4 controllers
Chao Leng <lengchao(a)huawei.com>
nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
Chao Leng <lengchao(a)huawei.com>
nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
Revanth Rajashekar <revanth.rajashekar(a)intel.com>
nvme: check the PRINFO bit before deciding the host buffer length
lianzhi chang <changlianzhi(a)uniontech.com>
udf: fix the problem that the disc content is not displayed
Sowjanya Komatineni <skomatineni(a)nvidia.com>
i2c: tegra: Create i2c_writesl_vi() to use with VI I2C for filling TX FIFO
Kai-Chuan Hsieh <kaichuan.hsieh(a)canonical.com>
ALSA: hda: Add Cometlake-R PCI ID
Brian King <brking(a)linux.vnet.ibm.com>
scsi: ibmvfc: Set default timeout to avoid crash during migration
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix encryption key selection for 802.3 xmit
Felix Fietkau <nbd(a)nbd.name>
mac80211: fix fast-rx encryption check
Shayne Chen <shayne.chen(a)mediatek.com>
mac80211: fix incorrect strlen of .write in debugfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
objtool: Don't add empty symbols to the rbtree
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ALSA: hda: Add AlderLake-P PCI ID and HDMI codec vid
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
ASoC: SOF: Intel: hda: Resume codec to do jack detection
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
scsi: fnic: Fix memleak in vnic_dev_init_devcmd2
Javed Hasan <jhasan(a)marvell.com>
scsi: libfc: Avoid invoking response handler twice if ep is already completed
Martin Wilck <mwilck(a)suse.com>
scsi: scsi_transport_srp: Don't block target in failfast state
Peter Zijlstra <peterz(a)infradead.org>
x86: __always_inline __{rd,wr}msr()
Peter Zijlstra <peterz(a)infradead.org>
locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
Oded Gabbay <ogabbay(a)kernel.org>
habanalabs: fix dma_addr passed to dma_mmap_coherent
Arnold Gozum <arngozum(a)gmail.com>
platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet
Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency
Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
tools/power/x86/intel-speed-select: Set scaling_max_freq to base_frequency
Tony Lindgren <tony(a)atomide.com>
phy: cpcap-usb: Fix warning for missing regulator_disable
Nadav Amit <namit(a)vmware.com>
iommu/vt-d: Do not use flush-queue when caching-mode is on
Nick Desaulniers <ndesaulniers(a)google.com>
ARM: 9025/1: Kconfig: CPU_BIG_ENDIAN depends on !LD_IS_LLD
Mike Rapoport <rppt(a)kernel.org>
Revert "x86/setup: don't remove E820_TYPE_RAM for pfn 0"
Catalin Marinas <catalin.marinas(a)arm.com>
arm64: Do not pass tagged addresses to __is_lm_address()
Vincenzo Frascino <vincenzo.frascino(a)arm.com>
arm64: Fix kernel address detection of __is_lm_address()
Robin Murphy <robin.murphy(a)arm.com>
arm64: dts: meson: Describe G12b GPU as coherent
Robin Murphy <robin.murphy(a)arm.com>
drm/panfrost: Support cache-coherent integrations
Robin Murphy <robin.murphy(a)arm.com>
iommu/io-pgtable-arm: Support coherency for Mali LPAE
Lijun Pan <ljp(a)linux.ibm.com>
ibmvnic: Ensure that CRQ entry read are correctly ordered
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP
Pan Bian <bianpan2016(a)163.com>
net: dsa: bcm_sf2: put device node before return
Ido Schimmel <idosch(a)nvidia.com>
mlxsw: spectrum_span: Do not overwrite policer configuration
Voon Weifeng <weifeng.voon(a)intel.com>
stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing
Kevin Hao <haokexin(a)gmail.com>
net: octeontx2: Make sure the buffer is 128 byte aligned
Pan Bian <bianpan2016(a)163.com>
net: fec: put child node on error path
Pan Bian <bianpan2016(a)163.com>
net: stmmac: dwmac-intel-plat: remove config data on error
Marek Vasut <marex(a)denx.de>
net: dsa: microchip: Adjust reset release timing to match reference reset circuit
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mm/Kconfig | 1 +
arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++
arch/arm64/include/asm/memory.h | 10 ++---
arch/arm64/mm/physaddr.c | 2 +-
arch/x86/include/asm/intel-family.h | 1 +
arch/x86/include/asm/msr.h | 4 +-
arch/x86/kernel/setup.c | 20 +++++-----
.../amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c | 6 ++-
drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 7 +++-
.../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 18 +++++++--
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 9 ++++-
.../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 2 +-
drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +
drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +
drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
drivers/i2c/busses/i2c-tegra.c | 22 ++++++++++-
drivers/iommu/intel/iommu.c | 5 +++
drivers/iommu/io-pgtable-arm.c | 11 +++++-
drivers/misc/habanalabs/common/device.c | 9 +++++
drivers/misc/habanalabs/common/firmware_if.c | 5 +++
drivers/misc/habanalabs/common/habanalabs_ioctl.c | 2 +
drivers/misc/habanalabs/gaudi/gaudi.c | 3 +-
drivers/misc/habanalabs/goya/goya.c | 3 +-
drivers/net/dsa/bcm_sf2.c | 8 +++-
drivers/net/dsa/microchip/ksz_common.c | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 3 +-
drivers/net/ethernet/ibm/ibmvnic.c | 6 +++
.../ethernet/marvell/octeontx2/nic/otx2_common.c | 3 +-
.../net/ethernet/mellanox/mlxsw/spectrum_span.c | 6 +++
.../net/ethernet/mellanox/mlxsw/spectrum_span.h | 1 +
.../net/ethernet/stmicro/stmmac/dwmac-intel-plat.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 2 +
drivers/nvme/host/core.c | 17 ++++++++-
drivers/nvme/host/pci.c | 14 +++++++
drivers/nvme/host/rdma.c | 15 ++++++--
drivers/nvme/host/tcp.c | 14 +++++--
drivers/nvme/target/admin-cmd.c | 8 +++-
drivers/phy/motorola/phy-cpcap-usb.c | 19 +++++++---
drivers/platform/x86/intel-vbtn.c | 6 +++
drivers/platform/x86/thinkpad_acpi.c | 1 +
drivers/platform/x86/touchscreen_dmi.c | 18 +++++++++
drivers/scsi/fnic/vnic_dev.c | 8 ++--
drivers/scsi/ibmvscsi/ibmvfc.c | 4 +-
drivers/scsi/libfc/fc_exch.c | 16 +++++++-
drivers/scsi/scsi_transport_srp.c | 9 ++++-
fs/udf/super.c | 7 ++--
include/linux/kthread.h | 3 ++
include/linux/nvme.h | 6 +++
kernel/kthread.c | 27 ++++++++++++-
kernel/locking/lockdep.c | 7 +++-
kernel/smpboot.c | 1 +
kernel/workqueue.c | 9 ++---
net/mac80211/debugfs.c | 44 ++++++++++------------
net/mac80211/rx.c | 2 +
net/mac80211/tx.c | 27 +++++++------
net/switchdev/switchdev.c | 23 ++++++-----
sound/pci/hda/hda_intel.c | 6 +++
sound/pci/hda/patch_hdmi.c | 1 +
sound/soc/sof/intel/hda-codec.c | 3 +-
tools/objtool/check.c | 14 +++----
tools/objtool/elf.c | 7 ++++
tools/power/x86/intel-speed-select/isst-config.c | 32 ++++++++++++++++
.../powerpc/alignment/alignment_handler.c | 5 ++-
65 files changed, 427 insertions(+), 135 deletions(-)
Hi,
commit 64f55156f7adedb1ac5bb9cdbcbc9ac05ff5a724 upstream
The requested patch allows the iwlwifi driver to work with newer AX200
wifi cards when MSI-X interrupts are not available. Without it,
bringing an interface "up" fails with a Microcode SW error. Xen PCI
passthrough with a linux stubdom doesn't enable MSI-X which triggers
this condition.
I think it's only applicable to 5.4 because it's in 5.10 and earlier
kernels don't have AX200 support.
I'm making this request to stable and CC-ing netdev since I saw a
message [1] on netdev saying:
"""
We're actually experimenting with letting Greg take networking patches
into stable like he does for every other tree. If the patch doesn't
appear in the next stable release please poke stable@ directly.
"""
Thanks,
Jason
[1] https://lore.kernel.org/netdev/20210129193030.46ef3b17@kicinski-fedora-pc1c…
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2dcb3964544177c51853a210b6ad400de78ef17d Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Thu, 4 Feb 2021 18:32:36 -0800
Subject: [PATCH] memblock: do not start bottom-up allocations with kernel_end
With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:
hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
------------[ cut here ]------------
memblock: bottom-up allocation failed, memory hotremove may be affected
WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:memblock_find_in_range_node+0x178/0x25a
Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
Call Trace:
memblock_alloc_range_nid+0x8d/0x11e
cma_declare_contiguous_nid+0x2c4/0x38c
hugetlb_cma_reserve+0xdc/0x128
flush_tlb_one_kernel+0xc/0x20
native_set_fixmap+0x82/0xd0
flat_get_apic_id+0x5/0x10
register_lapic_address+0x8e/0x97
setup_arch+0x8a5/0xc3f
start_kernel+0x66/0x547
load_ucode_bsp+0x4c/0xcd
secondary_startup_64_no_verify+0xb0/0xbb
random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
---[ end trace f151227d0b39be70 ]---
At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure. All together it
simplifies the logic.
Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memblock.c b/mm/memblock.c
index 1eaaec1e7687..8d9b5f1e7040 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
*
* Find @size free area aligned to @align in the specified range and node.
*
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
* Return:
* Found address on success, 0 on failure.
*/
@@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
phys_addr_t end, int nid,
enum memblock_flags flags)
{
- phys_addr_t kernel_end, ret;
-
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
- kernel_end = __pa_symbol(_end);
-
- /*
- * try bottom-up allocation only when bottom-up mode
- * is set and @end is above the kernel image.
- */
- if (memblock_bottom_up() && end > kernel_end) {
- phys_addr_t bottom_up_start;
-
- /* make sure we will allocate above the kernel */
- bottom_up_start = max(start, kernel_end);
- /* ok, try bottom-up allocation first */
- ret = __memblock_find_range_bottom_up(bottom_up_start, end,
- size, align, nid, flags);
- if (ret)
- return ret;
-
- /*
- * we always limit bottom-up allocation above the kernel,
- * but top-down allocation doesn't have the limit, so
- * retrying top-down allocation may succeed when bottom-up
- * allocation failed.
- *
- * bottom-up allocation is expected to be fail very rarely,
- * so we use WARN_ONCE() here to see the stack trace if
- * fail happens.
- */
- WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
- "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
- }
-
- return __memblock_find_range_top_down(start, end, size, align, nid,
- flags);
+ if (memblock_bottom_up())
+ return __memblock_find_range_bottom_up(start, end, size, align,
+ nid, flags);
+ else
+ return __memblock_find_range_top_down(start, end, size, align,
+ nid, flags);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2dcb3964544177c51853a210b6ad400de78ef17d Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Thu, 4 Feb 2021 18:32:36 -0800
Subject: [PATCH] memblock: do not start bottom-up allocations with kernel_end
With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:
hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
------------[ cut here ]------------
memblock: bottom-up allocation failed, memory hotremove may be affected
WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:memblock_find_in_range_node+0x178/0x25a
Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
Call Trace:
memblock_alloc_range_nid+0x8d/0x11e
cma_declare_contiguous_nid+0x2c4/0x38c
hugetlb_cma_reserve+0xdc/0x128
flush_tlb_one_kernel+0xc/0x20
native_set_fixmap+0x82/0xd0
flat_get_apic_id+0x5/0x10
register_lapic_address+0x8e/0x97
setup_arch+0x8a5/0xc3f
start_kernel+0x66/0x547
load_ucode_bsp+0x4c/0xcd
secondary_startup_64_no_verify+0xb0/0xbb
random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
---[ end trace f151227d0b39be70 ]---
At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure. All together it
simplifies the logic.
Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memblock.c b/mm/memblock.c
index 1eaaec1e7687..8d9b5f1e7040 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
*
* Find @size free area aligned to @align in the specified range and node.
*
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
* Return:
* Found address on success, 0 on failure.
*/
@@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
phys_addr_t end, int nid,
enum memblock_flags flags)
{
- phys_addr_t kernel_end, ret;
-
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
- kernel_end = __pa_symbol(_end);
-
- /*
- * try bottom-up allocation only when bottom-up mode
- * is set and @end is above the kernel image.
- */
- if (memblock_bottom_up() && end > kernel_end) {
- phys_addr_t bottom_up_start;
-
- /* make sure we will allocate above the kernel */
- bottom_up_start = max(start, kernel_end);
- /* ok, try bottom-up allocation first */
- ret = __memblock_find_range_bottom_up(bottom_up_start, end,
- size, align, nid, flags);
- if (ret)
- return ret;
-
- /*
- * we always limit bottom-up allocation above the kernel,
- * but top-down allocation doesn't have the limit, so
- * retrying top-down allocation may succeed when bottom-up
- * allocation failed.
- *
- * bottom-up allocation is expected to be fail very rarely,
- * so we use WARN_ONCE() here to see the stack trace if
- * fail happens.
- */
- WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
- "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
- }
-
- return __memblock_find_range_top_down(start, end, size, align, nid,
- flags);
+ if (memblock_bottom_up())
+ return __memblock_find_range_bottom_up(start, end, size, align,
+ nid, flags);
+ else
+ return __memblock_find_range_top_down(start, end, size, align,
+ nid, flags);
}
/**
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2dcb3964544177c51853a210b6ad400de78ef17d Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Thu, 4 Feb 2021 18:32:36 -0800
Subject: [PATCH] memblock: do not start bottom-up allocations with kernel_end
With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:
hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
------------[ cut here ]------------
memblock: bottom-up allocation failed, memory hotremove may be affected
WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:memblock_find_in_range_node+0x178/0x25a
Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
Call Trace:
memblock_alloc_range_nid+0x8d/0x11e
cma_declare_contiguous_nid+0x2c4/0x38c
hugetlb_cma_reserve+0xdc/0x128
flush_tlb_one_kernel+0xc/0x20
native_set_fixmap+0x82/0xd0
flat_get_apic_id+0x5/0x10
register_lapic_address+0x8e/0x97
setup_arch+0x8a5/0xc3f
start_kernel+0x66/0x547
load_ucode_bsp+0x4c/0xcd
secondary_startup_64_no_verify+0xb0/0xbb
random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
---[ end trace f151227d0b39be70 ]---
At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure. All together it
simplifies the logic.
Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memblock.c b/mm/memblock.c
index 1eaaec1e7687..8d9b5f1e7040 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
*
* Find @size free area aligned to @align in the specified range and node.
*
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
* Return:
* Found address on success, 0 on failure.
*/
@@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
phys_addr_t end, int nid,
enum memblock_flags flags)
{
- phys_addr_t kernel_end, ret;
-
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
- kernel_end = __pa_symbol(_end);
-
- /*
- * try bottom-up allocation only when bottom-up mode
- * is set and @end is above the kernel image.
- */
- if (memblock_bottom_up() && end > kernel_end) {
- phys_addr_t bottom_up_start;
-
- /* make sure we will allocate above the kernel */
- bottom_up_start = max(start, kernel_end);
- /* ok, try bottom-up allocation first */
- ret = __memblock_find_range_bottom_up(bottom_up_start, end,
- size, align, nid, flags);
- if (ret)
- return ret;
-
- /*
- * we always limit bottom-up allocation above the kernel,
- * but top-down allocation doesn't have the limit, so
- * retrying top-down allocation may succeed when bottom-up
- * allocation failed.
- *
- * bottom-up allocation is expected to be fail very rarely,
- * so we use WARN_ONCE() here to see the stack trace if
- * fail happens.
- */
- WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
- "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
- }
-
- return __memblock_find_range_top_down(start, end, size, align, nid,
- flags);
+ if (memblock_bottom_up())
+ return __memblock_find_range_bottom_up(start, end, size, align,
+ nid, flags);
+ else
+ return __memblock_find_range_top_down(start, end, size, align,
+ nid, flags);
}
/**
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 24321ac668e452a4942598533d267805f291fdc9 Mon Sep 17 00:00:00 2001
From: Raoni Fassina Firmino <raoni(a)linux.ibm.com>
Date: Mon, 1 Feb 2021 17:05:05 -0300
Subject: [PATCH] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64()
semantics
Commit 0138ba5783ae ("powerpc/64/signal: Balance return predictor
stack in signal trampoline") changed __kernel_sigtramp_rt64() VDSO and
trampoline code, and introduced a regression in the way glibc's
backtrace()[1] detects the signal-handler stack frame. Apart from the
practical implications, __kernel_sigtramp_rt64() was a VDSO function
with the semantics that it is a function you can call from userspace
to end a signal handling. Now this semantics are no longer valid.
I believe the aforementioned change affects all releases since 5.9.
This patch tries to fix both the semantics and practical aspect of
__kernel_sigtramp_rt64() returning it to the previous code, whilst
keeping the intended behaviour of 0138ba5783ae by adding a new symbol
to serve as the jump target from the kernel to the trampoline. Now the
trampoline has two parts, a new entry point and the old return point.
[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html
Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline")
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Raoni Fassina Firmino <raoni(a)linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin(a)gmail.com>
[mpe: Minor tweaks to change log formatting, add stable tag]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210201200505.iz46ubcizipnkcxe@work-tp
diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S
index bbf68cd01088..2d4067561293 100644
--- a/arch/powerpc/kernel/vdso64/sigtramp.S
+++ b/arch/powerpc/kernel/vdso64/sigtramp.S
@@ -15,11 +15,20 @@
.text
+/*
+ * __kernel_start_sigtramp_rt64 and __kernel_sigtramp_rt64 together
+ * are one function split in two parts. The kernel jumps to the former
+ * and the signal handler indirectly (by blr) returns to the latter.
+ * __kernel_sigtramp_rt64 needs to point to the return address so
+ * glibc can correctly identify the trampoline stack frame.
+ */
.balign 8
.balign IFETCH_ALIGN_BYTES
-V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_start_sigtramp_rt64)
.Lsigrt_start:
bctrl /* call the handler */
+V_FUNCTION_END(__kernel_start_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
addi r1, r1, __SIGNAL_FRAMESIZE
li r0,__NR_rt_sigreturn
sc
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 6164d1a1ba11..2f3c359cacd3 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -131,4 +131,4 @@ VERSION
/*
* Make the sigreturn code visible to the kernel.
*/
-VDSO_sigtramp_rt64 = __kernel_sigtramp_rt64;
+VDSO_sigtramp_rt64 = __kernel_start_sigtramp_rt64;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1c35cf78bfab31b8cb455259524395c9e4c7cd6 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 13 Nov 2020 08:30:38 -0500
Subject: [PATCH] KVM: x86: cleanup CR3 reserved bits checks
If not in long mode, the low bits of CR3 are reserved but not enforced to
be zero, so remove those checks. If in long mode, however, the MBZ bits
extend down to the highest physical address bit of the guest, excluding
the encryption bit.
Make the checks consistent with the above, and match them between
nested_vmcb_checks and KVM_SET_SREGS.
Cc: stable(a)vger.kernel.org
Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7a605ad8254d..db30670dd8c4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -231,6 +231,7 @@ static bool nested_vmcb_check_controls(struct vmcb_control_area *control)
static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12)
{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
bool vmcb12_lma;
if ((vmcb12->save.efer & EFER_SVME) == 0)
@@ -244,18 +245,10 @@ static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12)
vmcb12_lma = (vmcb12->save.efer & EFER_LME) && (vmcb12->save.cr0 & X86_CR0_PG);
- if (!vmcb12_lma) {
- if (vmcb12->save.cr4 & X86_CR4_PAE) {
- if (vmcb12->save.cr3 & MSR_CR3_LEGACY_PAE_RESERVED_MASK)
- return false;
- } else {
- if (vmcb12->save.cr3 & MSR_CR3_LEGACY_RESERVED_MASK)
- return false;
- }
- } else {
+ if (vmcb12_lma) {
if (!(vmcb12->save.cr4 & X86_CR4_PAE) ||
!(vmcb12->save.cr0 & X86_CR0_PE) ||
- (vmcb12->save.cr3 & MSR_CR3_LONG_MBZ_MASK))
+ (vmcb12->save.cr3 & vcpu->arch.cr3_lm_rsvd_bits))
return false;
}
if (!kvm_is_valid_cr4(&svm->vcpu, vmcb12->save.cr4))
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0fe874ae5498..6e7d070f8b86 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -403,9 +403,6 @@ static inline bool gif_set(struct vcpu_svm *svm)
}
/* svm.c */
-#define MSR_CR3_LEGACY_RESERVED_MASK 0xfe7U
-#define MSR_CR3_LEGACY_PAE_RESERVED_MASK 0x7U
-#define MSR_CR3_LONG_MBZ_MASK 0xfff0000000000000U
#define MSR_INVALID 0xffffffffU
extern int sev;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42b28d0f0311..c1650e26715b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9624,6 +9624,8 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
*/
if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
return false;
+ if (sregs->cr3 & vcpu->arch.cr3_lm_rsvd_bits)
+ return false;
} else {
/*
* Not in 64-bit mode: EFER.LMA is clear and the code
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 28 Jan 2021 11:45:00 -0500
Subject: [PATCH] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if
tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
- * No need to pass TSX_CTRL_CPUID_CLEAR through, so
- * let's avoid changing CPUID bits under the host
- * kernel's feet.
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+ * interception. Keep the host value unchanged to avoid
+ * changing CPUID bits under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some
+ * combinations of new kernel and old userspace. If
+ * those guests run on a tsx=off host, do allow guests
+ * to use TSX_CTRL, but do not change the value on the
+ * host so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..b05a1fe9dae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void)
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
- /*
- * On TAA affected systems:
- * - nothing to do if TSX is disabled on the host.
- * - we emulate TSX_CTRL if present on the host.
- * This lets the guest use VERW to clear CPU buffers.
- */
- if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
- else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
return data;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 28 Jan 2021 11:45:00 -0500
Subject: [PATCH] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if
tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
- * No need to pass TSX_CTRL_CPUID_CLEAR through, so
- * let's avoid changing CPUID bits under the host
- * kernel's feet.
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+ * interception. Keep the host value unchanged to avoid
+ * changing CPUID bits under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some
+ * combinations of new kernel and old userspace. If
+ * those guests run on a tsx=off host, do allow guests
+ * to use TSX_CTRL, but do not change the value on the
+ * host so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..b05a1fe9dae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void)
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
- /*
- * On TAA affected systems:
- * - nothing to do if TSX is disabled on the host.
- * - we emulate TSX_CTRL if present on the host.
- * This lets the guest use VERW to clear CPU buffers.
- */
- if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
- else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
return data;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 28 Jan 2021 11:45:00 -0500
Subject: [PATCH] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if
tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
- * No need to pass TSX_CTRL_CPUID_CLEAR through, so
- * let's avoid changing CPUID bits under the host
- * kernel's feet.
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+ * interception. Keep the host value unchanged to avoid
+ * changing CPUID bits under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some
+ * combinations of new kernel and old userspace. If
+ * those guests run on a tsx=off host, do allow guests
+ * to use TSX_CTRL, but do not change the value on the
+ * host so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..b05a1fe9dae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void)
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
- /*
- * On TAA affected systems:
- * - nothing to do if TSX is disabled on the host.
- * - we emulate TSX_CTRL if present on the host.
- * This lets the guest use VERW to clear CPU buffers.
- */
- if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
- else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
return data;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 28 Jan 2021 11:45:00 -0500
Subject: [PATCH] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if
tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
- * No need to pass TSX_CTRL_CPUID_CLEAR through, so
- * let's avoid changing CPUID bits under the host
- * kernel's feet.
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+ * interception. Keep the host value unchanged to avoid
+ * changing CPUID bits under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some
+ * combinations of new kernel and old userspace. If
+ * those guests run on a tsx=off host, do allow guests
+ * to use TSX_CTRL, but do not change the value on the
+ * host so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..b05a1fe9dae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void)
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
- /*
- * On TAA affected systems:
- * - nothing to do if TSX is disabled on the host.
- * - we emulate TSX_CTRL if present on the host.
- * This lets the guest use VERW to clear CPU buffers.
- */
- if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
- else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
return data;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 28 Jan 2021 11:45:00 -0500
Subject: [PATCH] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if
tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
- * No need to pass TSX_CTRL_CPUID_CLEAR through, so
- * let's avoid changing CPUID bits under the host
- * kernel's feet.
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+ * interception. Keep the host value unchanged to avoid
+ * changing CPUID bits under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some
+ * combinations of new kernel and old userspace. If
+ * those guests run on a tsx=off host, do allow guests
+ * to use TSX_CTRL, but do not change the value on the
+ * host so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..b05a1fe9dae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void)
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
- /*
- * On TAA affected systems:
- * - nothing to do if TSX is disabled on the host.
- * - we emulate TSX_CTRL if present on the host.
- * This lets the guest use VERW to clear CPU buffers.
- */
- if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
- else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
return data;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aec18a57edad562d620f7d19016de1fc0cc2208c Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 4 Feb 2021 19:22:46 +0000
Subject: [PATCH] io_uring: drop mm/files between task_work_submit
Since SQPOLL task can be shared and so task_work entries can be a mix of
them, we need to drop mm and files before trying to issue next request.
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5d3348d66f06..1f68105a41ed 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2205,6 +2205,9 @@ static void __io_req_task_submit(struct io_kiocb *req)
else
__io_req_task_cancel(req, -EFAULT);
mutex_unlock(&ctx->uring_lock);
+
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ io_sq_thread_drop_mm_files();
}
static void io_req_task_submit(struct callback_head *cb)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 89fa15ecdca7eb46a711476b961f70a74765bbe4 Mon Sep 17 00:00:00 2001
From: Huang Rui <ray.huang(a)amd.com>
Date: Sat, 30 Jan 2021 17:14:30 +0800
Subject: [PATCH] drm/amdgpu: fix the issue that retry constantly once the
buffer is oversize
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We cannot modify initial_domain every time while the retry starts. That
will cause the busy waiting that unable to switch to GTT while the vram
is not enough.
Fixes: f8aab60422c3 ("drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs")
Signed-off-by: Huang Rui <ray.huang(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d0a1fee1f5f6..174a73eb23f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -269,8 +269,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
resv = vm->root.base.bo->tbo.base.resv;
}
-retry:
initial_domain = (u32)(0xffffffff & args->in.domains);
+retry:
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
initial_domain,
flags, ttm_bo_type_device, resv, &gobj);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 19a23da53932bc8011220bd8c410cb76012de004 Mon Sep 17 00:00:00 2001
From: Peter Gonda <pgonda(a)google.com>
Date: Wed, 27 Jan 2021 08:15:24 -0800
Subject: [PATCH] Fix unsynchronized access to sev members through
svm_register_enc_region
Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.
Add a lockdep assertion to help prevent future regressions.
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: x86(a)kernel.org
Cc: kvm(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")
Signed-off-by: Peter Gonda <pgonda(a)google.com>
V2
- Fix up patch description
- Correct file paths svm.c -> sev.c
- Add unlock of kvm->lock on sev_pin_memory error
V1
- https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/
Message-Id: <20210127161524.2832400-1-pgonda(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ac652bc476ae..48017fef1cd9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -342,6 +342,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
unsigned long first, last;
int ret;
+ lockdep_assert_held(&kvm->lock);
+
if (ulen == 0 || uaddr + ulen < uaddr)
return ERR_PTR(-EINVAL);
@@ -1119,12 +1121,20 @@ int svm_register_enc_region(struct kvm *kvm,
if (!region)
return -ENOMEM;
+ mutex_lock(&kvm->lock);
region->pages = sev_pin_memory(kvm, range->addr, range->size, ®ion->npages, 1);
if (IS_ERR(region->pages)) {
ret = PTR_ERR(region->pages);
+ mutex_unlock(&kvm->lock);
goto e_free;
}
+ region->uaddr = range->addr;
+ region->size = range->size;
+
+ list_add_tail(®ion->list, &sev->regions_list);
+ mutex_unlock(&kvm->lock);
+
/*
* The guest may change the memory encryption attribute from C=0 -> C=1
* or vice versa for this memory range. Lets make sure caches are
@@ -1133,13 +1143,6 @@ int svm_register_enc_region(struct kvm *kvm,
*/
sev_clflush_pages(region->pages, region->npages);
- region->uaddr = range->addr;
- region->size = range->size;
-
- mutex_lock(&kvm->lock);
- list_add_tail(®ion->list, &sev->regions_list);
- mutex_unlock(&kvm->lock);
-
return ret;
e_free:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 19a23da53932bc8011220bd8c410cb76012de004 Mon Sep 17 00:00:00 2001
From: Peter Gonda <pgonda(a)google.com>
Date: Wed, 27 Jan 2021 08:15:24 -0800
Subject: [PATCH] Fix unsynchronized access to sev members through
svm_register_enc_region
Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.
Add a lockdep assertion to help prevent future regressions.
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: x86(a)kernel.org
Cc: kvm(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")
Signed-off-by: Peter Gonda <pgonda(a)google.com>
V2
- Fix up patch description
- Correct file paths svm.c -> sev.c
- Add unlock of kvm->lock on sev_pin_memory error
V1
- https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/
Message-Id: <20210127161524.2832400-1-pgonda(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ac652bc476ae..48017fef1cd9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -342,6 +342,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
unsigned long first, last;
int ret;
+ lockdep_assert_held(&kvm->lock);
+
if (ulen == 0 || uaddr + ulen < uaddr)
return ERR_PTR(-EINVAL);
@@ -1119,12 +1121,20 @@ int svm_register_enc_region(struct kvm *kvm,
if (!region)
return -ENOMEM;
+ mutex_lock(&kvm->lock);
region->pages = sev_pin_memory(kvm, range->addr, range->size, ®ion->npages, 1);
if (IS_ERR(region->pages)) {
ret = PTR_ERR(region->pages);
+ mutex_unlock(&kvm->lock);
goto e_free;
}
+ region->uaddr = range->addr;
+ region->size = range->size;
+
+ list_add_tail(®ion->list, &sev->regions_list);
+ mutex_unlock(&kvm->lock);
+
/*
* The guest may change the memory encryption attribute from C=0 -> C=1
* or vice versa for this memory range. Lets make sure caches are
@@ -1133,13 +1143,6 @@ int svm_register_enc_region(struct kvm *kvm,
*/
sev_clflush_pages(region->pages, region->npages);
- region->uaddr = range->addr;
- region->size = range->size;
-
- mutex_lock(&kvm->lock);
- list_add_tail(®ion->list, &sev->regions_list);
- mutex_unlock(&kvm->lock);
-
return ret;
e_free:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7018c897c2f243d4b5f1b94bc6b4831a7eab80fb Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 1 Feb 2021 16:20:40 -0800
Subject: [PATCH] libnvdimm/dimm: Avoid race between probe and
available_slots_show()
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable(a)vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Coly Li <colyli(a)suse.com>
Reported-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Acked-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b59032e0859b..9d208570d059 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -335,16 +335,16 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);
-static ssize_t available_slots_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf)
{
- struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ struct device *dev;
ssize_t rc;
u32 nfree;
if (!ndd)
return -ENXIO;
+ dev = ndd->dev;
nvdimm_bus_lock(dev);
nfree = nd_label_nfree(ndd);
if (nfree - 1 > nfree) {
@@ -356,6 +356,18 @@ static ssize_t available_slots_show(struct device *dev,
nvdimm_bus_unlock(dev);
return rc;
}
+
+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t rc;
+
+ nd_device_lock(dev);
+ rc = __available_slots_show(dev_get_drvdata(dev), buf);
+ nd_device_unlock(dev);
+
+ return rc;
+}
static DEVICE_ATTR_RO(available_slots);
__weak ssize_t security_show(struct device *dev,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7018c897c2f243d4b5f1b94bc6b4831a7eab80fb Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 1 Feb 2021 16:20:40 -0800
Subject: [PATCH] libnvdimm/dimm: Avoid race between probe and
available_slots_show()
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable(a)vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Coly Li <colyli(a)suse.com>
Reported-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Acked-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b59032e0859b..9d208570d059 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -335,16 +335,16 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);
-static ssize_t available_slots_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf)
{
- struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ struct device *dev;
ssize_t rc;
u32 nfree;
if (!ndd)
return -ENXIO;
+ dev = ndd->dev;
nvdimm_bus_lock(dev);
nfree = nd_label_nfree(ndd);
if (nfree - 1 > nfree) {
@@ -356,6 +356,18 @@ static ssize_t available_slots_show(struct device *dev,
nvdimm_bus_unlock(dev);
return rc;
}
+
+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t rc;
+
+ nd_device_lock(dev);
+ rc = __available_slots_show(dev_get_drvdata(dev), buf);
+ nd_device_unlock(dev);
+
+ return rc;
+}
static DEVICE_ATTR_RO(available_slots);
__weak ssize_t security_show(struct device *dev,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7018c897c2f243d4b5f1b94bc6b4831a7eab80fb Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 1 Feb 2021 16:20:40 -0800
Subject: [PATCH] libnvdimm/dimm: Avoid race between probe and
available_slots_show()
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable(a)vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Coly Li <colyli(a)suse.com>
Reported-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Acked-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b59032e0859b..9d208570d059 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -335,16 +335,16 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);
-static ssize_t available_slots_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf)
{
- struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ struct device *dev;
ssize_t rc;
u32 nfree;
if (!ndd)
return -ENXIO;
+ dev = ndd->dev;
nvdimm_bus_lock(dev);
nfree = nd_label_nfree(ndd);
if (nfree - 1 > nfree) {
@@ -356,6 +356,18 @@ static ssize_t available_slots_show(struct device *dev,
nvdimm_bus_unlock(dev);
return rc;
}
+
+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t rc;
+
+ nd_device_lock(dev);
+ rc = __available_slots_show(dev_get_drvdata(dev), buf);
+ nd_device_unlock(dev);
+
+ return rc;
+}
static DEVICE_ATTR_RO(available_slots);
__weak ssize_t security_show(struct device *dev,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c457e8cb75eda91906a4f89fc39bde3f9a43922 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sat, 23 Jan 2021 12:27:59 +0000
Subject: [PATCH] genirq/msi: Activate Multi-MSI early when
MSI_FLAG_ACTIVATE_EARLY is set
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
__msi_domain_alloc_irqs() performs the activation of the interrupt (which
in the case of PCI results in the endpoint being programmed) as soon as the
interrupt is allocated.
But it appears that this is only done for the first vector, introducing an
inconsistent behaviour for PCI Multi-MSI.
Fix it by iterating over the number of vectors allocated to each MSI
descriptor. This is easily achieved by introducing a new
"for_each_msi_vector" iterator, together with a tiny bit of refactoring.
Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 360a0a7e7341..aef35fd1cf11 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -178,6 +178,12 @@ struct msi_desc {
list_for_each_entry((desc), dev_to_msi_list((dev)), list)
#define for_each_msi_entry_safe(desc, tmp, dev) \
list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list)
+#define for_each_msi_vector(desc, __irq, dev) \
+ for_each_msi_entry((desc), (dev)) \
+ if ((desc)->irq) \
+ for (__irq = (desc)->irq; \
+ __irq < ((desc)->irq + (desc)->nvec_used); \
+ __irq++)
#ifdef CONFIG_IRQ_MSI_IOMMU
static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index dc0e2d7fbdfd..b338d622f26e 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
can_reserve = msi_check_reservation_mode(domain, info, dev);
- for_each_msi_entry(desc, dev) {
- virq = desc->irq;
- if (desc->nvec_used == 1)
- dev_dbg(dev, "irq %d for MSI\n", virq);
- else
+ /*
+ * This flag is set by the PCI layer as we need to activate
+ * the MSI entries before the PCI layer enables MSI in the
+ * card. Otherwise the card latches a random msi message.
+ */
+ if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
+ goto skip_activate;
+
+ for_each_msi_vector(desc, i, dev) {
+ if (desc->irq == i) {
+ virq = desc->irq;
dev_dbg(dev, "irq [%d-%d] for MSI\n",
virq, virq + desc->nvec_used - 1);
- /*
- * This flag is set by the PCI layer as we need to activate
- * the MSI entries before the PCI layer enables MSI in the
- * card. Otherwise the card latches a random msi message.
- */
- if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
- continue;
+ }
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ irq_data = irq_domain_get_irq_data(domain, i);
if (!can_reserve) {
irqd_clr_can_reserve(irq_data);
if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
@@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
goto cleanup;
}
+skip_activate:
/*
* If these interrupts use reservation mode, clear the activated bit
* so request_irq() will assign the final vector.
*/
if (can_reserve) {
- for_each_msi_entry(desc, dev) {
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
irqd_clr_activated(irq_data);
}
}
return 0;
cleanup:
- for_each_msi_entry(desc, dev) {
- struct irq_data *irqd;
-
- if (desc->irq == virq)
- break;
-
- irqd = irq_domain_get_irq_data(domain, desc->irq);
- if (irqd_is_activated(irqd))
- irq_domain_deactivate_irq(irqd);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
+ if (irqd_is_activated(irq_data))
+ irq_domain_deactivate_irq(irq_data);
}
msi_domain_free_irqs(domain, dev);
return ret;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c457e8cb75eda91906a4f89fc39bde3f9a43922 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sat, 23 Jan 2021 12:27:59 +0000
Subject: [PATCH] genirq/msi: Activate Multi-MSI early when
MSI_FLAG_ACTIVATE_EARLY is set
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
__msi_domain_alloc_irqs() performs the activation of the interrupt (which
in the case of PCI results in the endpoint being programmed) as soon as the
interrupt is allocated.
But it appears that this is only done for the first vector, introducing an
inconsistent behaviour for PCI Multi-MSI.
Fix it by iterating over the number of vectors allocated to each MSI
descriptor. This is easily achieved by introducing a new
"for_each_msi_vector" iterator, together with a tiny bit of refactoring.
Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 360a0a7e7341..aef35fd1cf11 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -178,6 +178,12 @@ struct msi_desc {
list_for_each_entry((desc), dev_to_msi_list((dev)), list)
#define for_each_msi_entry_safe(desc, tmp, dev) \
list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list)
+#define for_each_msi_vector(desc, __irq, dev) \
+ for_each_msi_entry((desc), (dev)) \
+ if ((desc)->irq) \
+ for (__irq = (desc)->irq; \
+ __irq < ((desc)->irq + (desc)->nvec_used); \
+ __irq++)
#ifdef CONFIG_IRQ_MSI_IOMMU
static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index dc0e2d7fbdfd..b338d622f26e 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
can_reserve = msi_check_reservation_mode(domain, info, dev);
- for_each_msi_entry(desc, dev) {
- virq = desc->irq;
- if (desc->nvec_used == 1)
- dev_dbg(dev, "irq %d for MSI\n", virq);
- else
+ /*
+ * This flag is set by the PCI layer as we need to activate
+ * the MSI entries before the PCI layer enables MSI in the
+ * card. Otherwise the card latches a random msi message.
+ */
+ if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
+ goto skip_activate;
+
+ for_each_msi_vector(desc, i, dev) {
+ if (desc->irq == i) {
+ virq = desc->irq;
dev_dbg(dev, "irq [%d-%d] for MSI\n",
virq, virq + desc->nvec_used - 1);
- /*
- * This flag is set by the PCI layer as we need to activate
- * the MSI entries before the PCI layer enables MSI in the
- * card. Otherwise the card latches a random msi message.
- */
- if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
- continue;
+ }
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ irq_data = irq_domain_get_irq_data(domain, i);
if (!can_reserve) {
irqd_clr_can_reserve(irq_data);
if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
@@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
goto cleanup;
}
+skip_activate:
/*
* If these interrupts use reservation mode, clear the activated bit
* so request_irq() will assign the final vector.
*/
if (can_reserve) {
- for_each_msi_entry(desc, dev) {
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
irqd_clr_activated(irq_data);
}
}
return 0;
cleanup:
- for_each_msi_entry(desc, dev) {
- struct irq_data *irqd;
-
- if (desc->irq == virq)
- break;
-
- irqd = irq_domain_get_irq_data(domain, desc->irq);
- if (irqd_is_activated(irqd))
- irq_domain_deactivate_irq(irqd);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
+ if (irqd_is_activated(irq_data))
+ irq_domain_deactivate_irq(irq_data);
}
msi_domain_free_irqs(domain, dev);
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c457e8cb75eda91906a4f89fc39bde3f9a43922 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sat, 23 Jan 2021 12:27:59 +0000
Subject: [PATCH] genirq/msi: Activate Multi-MSI early when
MSI_FLAG_ACTIVATE_EARLY is set
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
__msi_domain_alloc_irqs() performs the activation of the interrupt (which
in the case of PCI results in the endpoint being programmed) as soon as the
interrupt is allocated.
But it appears that this is only done for the first vector, introducing an
inconsistent behaviour for PCI Multi-MSI.
Fix it by iterating over the number of vectors allocated to each MSI
descriptor. This is easily achieved by introducing a new
"for_each_msi_vector" iterator, together with a tiny bit of refactoring.
Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 360a0a7e7341..aef35fd1cf11 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -178,6 +178,12 @@ struct msi_desc {
list_for_each_entry((desc), dev_to_msi_list((dev)), list)
#define for_each_msi_entry_safe(desc, tmp, dev) \
list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list)
+#define for_each_msi_vector(desc, __irq, dev) \
+ for_each_msi_entry((desc), (dev)) \
+ if ((desc)->irq) \
+ for (__irq = (desc)->irq; \
+ __irq < ((desc)->irq + (desc)->nvec_used); \
+ __irq++)
#ifdef CONFIG_IRQ_MSI_IOMMU
static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index dc0e2d7fbdfd..b338d622f26e 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
can_reserve = msi_check_reservation_mode(domain, info, dev);
- for_each_msi_entry(desc, dev) {
- virq = desc->irq;
- if (desc->nvec_used == 1)
- dev_dbg(dev, "irq %d for MSI\n", virq);
- else
+ /*
+ * This flag is set by the PCI layer as we need to activate
+ * the MSI entries before the PCI layer enables MSI in the
+ * card. Otherwise the card latches a random msi message.
+ */
+ if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
+ goto skip_activate;
+
+ for_each_msi_vector(desc, i, dev) {
+ if (desc->irq == i) {
+ virq = desc->irq;
dev_dbg(dev, "irq [%d-%d] for MSI\n",
virq, virq + desc->nvec_used - 1);
- /*
- * This flag is set by the PCI layer as we need to activate
- * the MSI entries before the PCI layer enables MSI in the
- * card. Otherwise the card latches a random msi message.
- */
- if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
- continue;
+ }
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ irq_data = irq_domain_get_irq_data(domain, i);
if (!can_reserve) {
irqd_clr_can_reserve(irq_data);
if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
@@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
goto cleanup;
}
+skip_activate:
/*
* If these interrupts use reservation mode, clear the activated bit
* so request_irq() will assign the final vector.
*/
if (can_reserve) {
- for_each_msi_entry(desc, dev) {
- irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
irqd_clr_activated(irq_data);
}
}
return 0;
cleanup:
- for_each_msi_entry(desc, dev) {
- struct irq_data *irqd;
-
- if (desc->irq == virq)
- break;
-
- irqd = irq_domain_get_irq_data(domain, desc->irq);
- if (irqd_is_activated(irqd))
- irq_domain_deactivate_irq(irqd);
+ for_each_msi_vector(desc, i, dev) {
+ irq_data = irq_domain_get_irq_data(domain, i);
+ if (irqd_is_activated(irq_data))
+ irq_domain_deactivate_irq(irq_data);
}
msi_domain_free_irqs(domain, dev);
return ret;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 97c753e62e6c31a404183898d950d8c08d752dbd Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Thu, 28 Jan 2021 00:37:51 +0900
Subject: [PATCH] tracing/kprobe: Fix to support kretprobe events on unloaded
modules
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@de…
Cc: stable(a)vger.kernel.org
Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv(a)arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index b3a36b0cfc81..1883a4a9f16a 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -266,7 +266,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
extern bool arch_kprobe_on_func_entry(unsigned long offset);
-extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
+extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
extern int kprobe_add_ksym_blacklist(unsigned long entry);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f7fb5d135930..1a5bc321e0a5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1954,29 +1954,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset)
return !offset;
}
-bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
+/**
+ * kprobe_on_func_entry() -- check whether given address is function entry
+ * @addr: Target address
+ * @sym: Target symbol name
+ * @offset: The offset from the symbol or the address
+ *
+ * This checks whether the given @addr+@offset or @sym+@offset is on the
+ * function entry address or not.
+ * This returns 0 if it is the function entry, or -EINVAL if it is not.
+ * And also it returns -ENOENT if it fails the symbol or address lookup.
+ * Caller must pass @addr or @sym (either one must be NULL), or this
+ * returns -EINVAL.
+ */
+int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
if (IS_ERR(kp_addr))
- return false;
+ return PTR_ERR(kp_addr);
- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) ||
- !arch_kprobe_on_func_entry(offset))
- return false;
+ if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
+ return -ENOENT;
- return true;
+ if (!arch_kprobe_on_func_entry(offset))
+ return -EINVAL;
+
+ return 0;
}
int register_kretprobe(struct kretprobe *rp)
{
- int ret = 0;
+ int ret;
struct kretprobe_instance *inst;
int i;
void *addr;
- if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
- return -EINVAL;
+ ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
+ if (ret)
+ return ret;
if (kretprobe_blacklist_size) {
addr = kprobe_addr(&rp->kp);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index e6fba1798771..56c7fbff7bd7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -221,9 +221,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call)
{
struct trace_kprobe *tk = trace_kprobe_primary_from_call(call);
- return tk ? kprobe_on_func_entry(tk->rp.kp.addr,
+ return tk ? (kprobe_on_func_entry(tk->rp.kp.addr,
tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name,
- tk->rp.kp.addr ? 0 : tk->rp.kp.offset) : false;
+ tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0) : false;
}
bool trace_kprobe_error_injectable(struct trace_event_call *call)
@@ -828,9 +828,11 @@ static int trace_kprobe_create(int argc, const char *argv[])
}
if (is_return)
flags |= TPARG_FL_RETURN;
- if (kprobe_on_func_entry(NULL, symbol, offset))
+ ret = kprobe_on_func_entry(NULL, symbol, offset);
+ if (ret == 0)
flags |= TPARG_FL_FENTRY;
- if (offset && is_return && !(flags & TPARG_FL_FENTRY)) {
+ /* Defer the ENOENT case until register kprobe */
+ if (ret == -EINVAL && is_return) {
trace_probe_log_err(0, BAD_RETPROBE);
goto parse_error;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 29 Jan 2021 10:13:53 -0500
Subject: [PATCH] fgraph: Initialize tracing_graph_pause at task creation
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.
The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.
CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)
start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)
unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)
The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.
There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.
Cc: stable(a)vger.kernel.org
Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois(a)arm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/init/init_task.c b/init/init_task.c
index 8a992d73e6fb..3711cdaafed2 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -198,7 +198,8 @@ struct task_struct init_task
.lockdep_recursion = 0,
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- .ret_stack = NULL,
+ .ret_stack = NULL,
+ .tracing_graph_pause = ATOMIC_INIT(0),
#endif
#if defined(CONFIG_TRACING) && defined(CONFIG_PREEMPTION)
.trace_recursion = 0,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 73edb9e4f354..29a6ebeebc9e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -394,7 +394,6 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
}
if (t->ret_stack == NULL) {
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->curr_ret_stack = -1;
t->curr_ret_depth = -1;
@@ -489,7 +488,6 @@ static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
static void
graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
{
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visible before we add the ret_stack */
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 29 Jan 2021 10:13:53 -0500
Subject: [PATCH] fgraph: Initialize tracing_graph_pause at task creation
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.
The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.
CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)
start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)
unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)
The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.
There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.
Cc: stable(a)vger.kernel.org
Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois(a)arm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/init/init_task.c b/init/init_task.c
index 8a992d73e6fb..3711cdaafed2 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -198,7 +198,8 @@ struct task_struct init_task
.lockdep_recursion = 0,
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- .ret_stack = NULL,
+ .ret_stack = NULL,
+ .tracing_graph_pause = ATOMIC_INIT(0),
#endif
#if defined(CONFIG_TRACING) && defined(CONFIG_PREEMPTION)
.trace_recursion = 0,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 73edb9e4f354..29a6ebeebc9e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -394,7 +394,6 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
}
if (t->ret_stack == NULL) {
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->curr_ret_stack = -1;
t->curr_ret_depth = -1;
@@ -489,7 +488,6 @@ static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
static void
graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
{
- atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visible before we add the ret_stack */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9917f0e3cdba7b9f1a23f70e3f70b1a106be54a8 Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Mon, 1 Feb 2021 21:47:20 +0900
Subject: [PATCH] usb: renesas_usbhs: Clear pipe running flag in
usbhs_pkt_pop()
Should clear the pipe running flag in usbhs_pkt_pop(). Otherwise,
we cannot use this pipe after dequeue was called while the pipe was
running.
Fixes: 8355b2b3082d ("usb: renesas_usbhs: fix the behavior of some usbhs_pkt_handle")
Reported-by: Tho Vu <tho.vu.wh(a)renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/1612183640-8898-1-git-send-email-yoshihiro.shimod…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/renesas_usbhs/fifo.c b/drivers/usb/renesas_usbhs/fifo.c
index ac9a81ae8216..e6fa13701808 100644
--- a/drivers/usb/renesas_usbhs/fifo.c
+++ b/drivers/usb/renesas_usbhs/fifo.c
@@ -126,6 +126,7 @@ struct usbhs_pkt *usbhs_pkt_pop(struct usbhs_pipe *pipe, struct usbhs_pkt *pkt)
}
usbhs_pipe_clear_without_sequence(pipe, 0, 0);
+ usbhs_pipe_running(pipe, 0);
__usbhsf_pkt_del(pkt);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9917f0e3cdba7b9f1a23f70e3f70b1a106be54a8 Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Mon, 1 Feb 2021 21:47:20 +0900
Subject: [PATCH] usb: renesas_usbhs: Clear pipe running flag in
usbhs_pkt_pop()
Should clear the pipe running flag in usbhs_pkt_pop(). Otherwise,
we cannot use this pipe after dequeue was called while the pipe was
running.
Fixes: 8355b2b3082d ("usb: renesas_usbhs: fix the behavior of some usbhs_pkt_handle")
Reported-by: Tho Vu <tho.vu.wh(a)renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/1612183640-8898-1-git-send-email-yoshihiro.shimod…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/renesas_usbhs/fifo.c b/drivers/usb/renesas_usbhs/fifo.c
index ac9a81ae8216..e6fa13701808 100644
--- a/drivers/usb/renesas_usbhs/fifo.c
+++ b/drivers/usb/renesas_usbhs/fifo.c
@@ -126,6 +126,7 @@ struct usbhs_pkt *usbhs_pkt_pop(struct usbhs_pipe *pipe, struct usbhs_pkt *pkt)
}
usbhs_pipe_clear_without_sequence(pipe, 0, 0);
+ usbhs_pipe_running(pipe, 0);
__usbhsf_pkt_del(pkt);
}