In htvec_reset() only the first group of initial interrupts is cleared.
This sometimes causes spurious interrupts, so let's clear all groups.
BTW, commit c47e388cfc648421bd821f ("irqchip/loongson-htvec: Support 8
groups of HT vectors") increase interrupt lines from 4 to 8, so update
comments as well.
Cc: stable(a)vger.kernel.org
Fixes: 818e915fbac518e8c78e1877 ("irqchip: Add Loongson HyperTransport Vector support")
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
drivers/irqchip/irq-loongson-htvec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/irqchip/irq-loongson-htvec.c b/drivers/irqchip/irq-loongson-htvec.c
index 13e6016..6392aaf 100644
--- a/drivers/irqchip/irq-loongson-htvec.c
+++ b/drivers/irqchip/irq-loongson-htvec.c
@@ -151,7 +151,7 @@ static void htvec_reset(struct htvec *priv)
/* Clear IRQ cause registers, mask all interrupts */
for (idx = 0; idx < priv->num_parents; idx++) {
writel_relaxed(0x0, priv->base + HTVEC_EN_OFF + 4 * idx);
- writel_relaxed(0xFFFFFFFF, priv->base);
+ writel_relaxed(0xFFFFFFFF, priv->base + 4 * idx);
}
}
@@ -172,7 +172,7 @@ static int htvec_of_init(struct device_node *node,
goto free_priv;
}
- /* Interrupt may come from any of the 4 interrupt line */
+ /* Interrupt may come from any of the 8 interrupt lines */
for (i = 0; i < HTVEC_MAX_PARENT_IRQ; i++) {
parent_irq[i] = irq_of_parse_and_map(node, i);
if (parent_irq[i] <= 0)
--
2.7.0
On Cherry Trail devices there are 2 possible ACPI OpRegions for
accessing GPIOs. The standard GeneralPurposeIo OpRegion and the Cherry
Trail specific UserDefined 0x9X OpRegions.
Having 2 different types of OpRegions leads to potential issues with
checks for OpRegion availability, or in other words checks if _REG has
been called for the OpRegion which the ACPI code wants to use.
The ACPICA core does not call _REG on an ACPI node which does not
define an OpRegion matching the type being registered; and the reference
design DSDT, from which most Cherry Trail DSDTs are derived, does not
define GeneralPurposeIo, nor UserDefined(0x93) OpRegions for the GPO2
(UID 3) device, because no pins were assigned ACPI controlled functions
in the reference design.
Together this leads to the perfect storm, at least on the Cherry Trail
based Medion Akayo E1239T. This design does use a GPO2 pin from its ACPI
code and has added the Cherry Trail specific UserDefined(0x93) opregion
to its GPO2 ACPI node to access this pin.
But it uses a has _REG been called availability check for the standard
GeneralPurposeIo OpRegion. This clearly is a bug in the DSDT, but this
does work under Windows. This issue leads to the intel_vbtn driver
reporting the device always being in tablet-mode at boot, even if it
is in laptop mode. Which in turn causes userspace to ignore touchpad
events. So iow this issues causes the touchpad to not work at boot.
Since the bug in the DSDT stems from the confusion of having 2 different
OpRegion types for accessing GPIOs on Cherry Trail devices, I believe
that this is best fixed inside the Cherryview pinctrl driver.
This commit adds a workaround to the Cherryview pinctrl driver so
that the DSDT's expectations of _REG always getting called for the
GeneralPurposeIo OpRegion are met.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
- Drop unnecessary if (acpi_has_method(adev->handle, "_REG")) check
- Fix Cherryview spelling in the commit message
---
drivers/pinctrl/intel/pinctrl-cherryview.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/pinctrl/intel/pinctrl-cherryview.c b/drivers/pinctrl/intel/pinctrl-cherryview.c
index 4c74fdde576d..4817aec114d6 100644
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1693,6 +1693,8 @@ static acpi_status chv_pinctrl_mmio_access_handler(u32 function,
static int chv_pinctrl_probe(struct platform_device *pdev)
{
+ struct acpi_object_list input;
+ union acpi_object params[2];
struct chv_pinctrl *pctrl;
struct acpi_device *adev;
acpi_status status;
@@ -1755,6 +1757,22 @@ static int chv_pinctrl_probe(struct platform_device *pdev)
if (ACPI_FAILURE(status))
dev_err(&pdev->dev, "failed to install ACPI addr space handler\n");
+ /*
+ * Some DSDT-s use the chv_pinctrl_mmio_access_handler while checking
+ * for the regular GeneralPurposeIo OpRegion availability, mixed with
+ * the DSDT not defining a GeneralPurposeIo OpRegion at all. In this
+ * case the ACPICA code will not call _REG to signal availability of
+ * the GeneralPurposeIo OpRegion. Manually call _REG here so that
+ * the DSDT-s GeneralPurposeIo availability checks will succeed.
+ */
+ params[0].type = ACPI_TYPE_INTEGER;
+ params[0].integer.value = ACPI_ADR_SPACE_GPIO;
+ params[1].type = ACPI_TYPE_INTEGER;
+ params[1].integer.value = 1;
+ input.count = 2;
+ input.pointer = params;
+ acpi_evaluate_object(adev->handle, "_REG", &input, NULL);
+
platform_set_drvdata(pdev, pctrl);
return 0;
--
2.26.0
--Andy
> On Apr 18, 2020, at 12:42 PM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
>>> On Fri, Apr 17, 2020 at 5:12 PM Dan Williams <dan.j.williams(a)intel.com> wrote:
>>>
>>> @@ -106,12 +108,10 @@ static __always_inline __must_check unsigned long
>>> memcpy_mcsafe(void *dst, const void *src, size_t cnt)
>>> {
>>> #ifdef CONFIG_X86_MCE
>>> - i(static_branch_unlikely(&mcsafe_key))
>>> - return __memcpy_mcsafe(dst, src, cnt);
>>> - else
>>> + if (static_branch_unlikely(&mcsafe_slow_key))
>>> + return memcpy_mcsafe_slow(dst, src, cnt);
>>> #endif
>>> - memcpy(dst, src, cnt);
>>> - return 0;
>>> + return memcpy_mcsafe_fast(dst, src, cnt);
>>> }
>
> It strikes me that I see no advantages to making this an inline function at all.
>
> Even for the good case - where it turns into just a memcpy because MCE
> is entirely disabled - it doesn't seem to matter.
>
> The only case that really helps is when the memcpy can be turned into
> a single access. Which - and I checked - does exist, with people doing
>
> r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
>
> to read a single 64-bit field which looks aligned to me.
>
> But that code is incredible garbage anyway, since even on a broken
> machine, there's no actual reason to use the slow variant for that
> whole access that I can tell. The macs-safe copy routines do not do
> anything worthwhile for a single access.
Maybe I’m missing something obvious, but what’s the alternative? The _mcsafe variants don’t just avoid the REP mess — they also tell the kernel that this particular access is recoverable via extable. With a regular memory access, the CPU may not explode, but do_machine_check() will, at very best, OOPS, and even that requires a certain degree of optimism. A panic is more likely.
Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX CPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer
This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.
Signed-off-by: Gaurav Kohli <gkohli(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
Changes since v0:
-Addressed Steven's review comments.
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 93ef0ab..15bf28b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4866,6 +4866,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
+
atomic_inc(&cpu_buffer->resize_disabled);
atomic_inc(&cpu_buffer->record_disabled);
@@ -4876,6 +4879,8 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
+
+ mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
@@ -4889,6 +4894,9 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
struct ring_buffer_per_cpu *cpu_buffer;
int cpu;
+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
+
for_each_online_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
@@ -4907,6 +4915,8 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
}
+
+ mutex_unlock(&buffer->mutex);
}
/**
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Backport commit 38adf94e166e3cb4eb89683458ca578051e8218d and it's
dependencies to linux-stable 5.4.y.
Dependent commits:
314d48dd224897e35ddcaf5a1d7d133b5adddeb7
e08f2ae850929d40e66268ee47e443e7ea56eeb7
When running test cases to stress NVMe device, a race condition / deadlocks is
seen every couple of days or so where multiple threads are trying to acquire
ctrl->subsystem->lock or ctrl->scan_lock.
The test cases send a lot nvme-cli requests to do Sanitize, Format, FW Download,
FW Activate, Flush, Get Log, Identify, and reset requests to two controllers
that share a namespace. Some of those commands target a namespace, some target
a controller. The commands are sent in random order and random mix to the two
controllers.
The test cases does not wait for nvme-cli requests to finish before sending more.
So for example, there could be multiple reset requests, multiple format requests,
outstanding at the same time as a sanitize, on both paths at the same time, etc.
Many of these test cases include combos that don't really make sense in the
context of NVMe, however it is used to create as much stress as possible.
This patchset fixes this issue.
Similar issue with a detailed call trace/log was discussed in the LKML
Link: https://lore.kernel.org/linux-nvme/04580CD6-7652-459D-ABDD-732947B4A6DF@jav…
Revanth Rajashekar (3):
nvme: Cleanup and rename nvme_block_nr()
nvme: Introduce nvme_lba_to_sect()
nvme: consolidate chunk_sectors settings
drivers/nvme/host/core.c | 40 +++++++++++++++++++---------------------
drivers/nvme/host/nvme.h | 16 +++++++++++++---
2 files changed, 32 insertions(+), 24 deletions(-)
--
2.17.1
I recently tracked down a problem I observed when booting a v5.4 kernel
on a sparsemem UMA arm platform which includes a no-map reserved-memory
region in the middle of its HighMem zone.
When memmap_init_zone() is invoked the pfn's that correspond to the
no-map region fail the early_pfn_valid() check and the struct page
structures are not initialized creating a "hole" in the memmap. Later in
my boot sequence the sock_init() initcall leads to a bpf_prog_alloc()
which ends up stealing a page from the block containing the no-map
region which then leads to a call of move_freepages_block() to
reclassify the migratetype of the entire block.
The function move_freepages() includes a check of pfn_valid_within for
each page in the range, but since the arm architecture doesn't include
HOLES_IN_ZONE this check is optimized out and the uninitialized struct
page is accessed. Specifically, PageLRU() calls compound_head() on the
page and if the page->compound_head value is odd the value is used as a
pointer to the head struct page. For uninitialized memory there is a
high chance that a random value of compound head will be odd and contain
an invalid pointer value that causes the kernel to abort and panic.
As you might imagine specifying HOLES_IN_ZONE for the arm build allows
pfn_valid_within to protect against accessing the uninitialized struct
page. However, the performance penalty this incurs seems unnecessary.
Commit 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations") as
part of the "sparse_init rewrite" series introduced in v4.19 changed the
way sparsemem memmaps are initialized. Prior to this patch the sparsemem
memmaps are initialized to all 0's. I observed that on older kernels the
"uninitialized" struct page access also occurs, but the 0
page->compound_head indicates no compound head and the page pointer is
therefore not corrupted. The other logic ends up causing the page to be
skipped and everything "happens to work".
While considering solutions to this issue I observed that the problem
does not occur in the current upstream as a result of a combination of
other commits. The following commits provided functionality to
initialize struct page structures for pages that are unavailable like
the no-map region in my system:
commit a4a3ede2132a ("mm: zero reserved and unavailable struct pages")
commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages")
commit ec393a0f014e ("mm: return zero_resv_unavail optimization")
commit e822969cab48 ("mm/page_alloc.c: fix uninitialized memmaps on a
partially populated last section")
commit 4b094b7851bf ("mm/page_alloc.c: initialize memmap of unavailable
memory directly")
However, those commits added the functionality to the free_area_init()
and free_area_init_nodes() functions and the non-NUMA arm architecture
did not begin calling free_area_init() until the following commit in v5.8:
commit a32c1c61212d ("arm: simplify detection of memory zone boundaries")
Prior to that commit the non-NUMA arm architecture called
free_area_init_node() directly at the end of zone_sizes_init().
So while the problem appears to be fixed upstream by commit a32c1c61212d
("arm: simplify detection of memory zone boundaries") it is still
present in stable branches between v4.19.y and v5.7.y inclusive and
probably for architectures other than arm as well that didn't call
free_area_init(). This upstream commit is not easily/safely backportable
to stable branches, but if we focus on the sliver of functionality that
adds the initialization code from free_area_init() to the
zones_sizes_init() function used by non-NUMA arm kernels I believe a
simple patch could be developed for each relevant stable branch to
resolve the issue I am observing. Similar patches could also be applied
for other architectures that now call free_area_init() upstream but not
in one of these stable branches, but I am not in a position to test
those architectures.
For the linux-5.4.y branch such a patch might look like this:
>From 671c341b5cdb8360349c33ade43115e28ca56a8a Mon Sep 17 00:00:00 2001
From: Doug Berger <opendmb(a)gmail.com>
Date: Tue, 25 Aug 2020 14:39:43 -0700
Subject: [PATCH] ARM: mm: sync zone_sizes_init with free_area_init
The arm architecture does not invoke the common function
free_area_init(). Instead for non-NUMA builds it invokes
free_area_init_node() directly from zone_sizes_init().
As a result recent changes in free_area_init() are not
picked up by arm architecture builds.
This commit adds the updates to the zone_sizes_init()
function to achieve parity with the free_area_init()
functionality.
Fixes: 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations")
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
arch/arm/mm/init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 6f19ba53fd1f..4f171d834c60 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -169,6 +169,7 @@ static void __init zone_sizes_init(unsigned long
min, unsigned long max_low,
arm_dma_zone_size >> PAGE_SHIFT);
#endif
+ zero_resv_unavail();
free_area_init_node(0, zone_size, min, zhole_size);
}
--
2.7.4
I am unclear of the mechanics for submitting such a stable patch when it
represents a perhaps less than obvious sliver of the upstream commit
that fixes the issue, so I am soliciting guidance with this email.
Thank you for taking the time to read this far, and please let me know
how I can improve the situation,
Doug
[BUG]
There are quite some bug reports of btrfs falling into a ENOSPC trap,
where btrfs can't even start a transaction to add new devices.
[CAUSE]
Most of the reports are utilize multi-device profiles, like
RAID1/RAID10/RAID5/RAID6, and the involved disks have very unbalanced
sizes.
It turns out that, the overcommit calculation in btrfs_can_overcommit()
is just a factor based calculation, which can't check if devices can
really fulfill the requirement for the desired profile.
This makes btrfs_can_overcommit() to be always over-confident about
usable space, and when we can't allocate any new metadata chunk but
still allow new metadata operations, we fall into the ENOSPC trap and
have no way to exit it.
[WORKAROUND]
The root fix needs a device layout aware, chunk allocator like available
space calculation.
There used to be such patchset submitted to the mail list, but the extra
failure mode is tricky to handle for chunk allocation, thus that
patchset needs more time to mature.
Meanwhile to prevent such problems reaching more users, workaround the
problem by:
- Half the over-commit available space reported
So that we won't always be that over-confident.
But this won't really help if we have extremely unbalanced disk size.
- Don't over-commit if the space info is already full
This may already be too late, but still better than doing nothing and
believe the over-commit values.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/space-info.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 475968ccbd1d..e8133ec7e34a 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -339,6 +339,18 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
avail >>= 3;
else
avail >>= 1;
+ /*
+ * Since current over-commit calculation is doomed already for
+ * RAID0/RADI1/RAID10/RAID5/6, we half the availabe space to reduce
+ * over-commit amount.
+ *
+ * This is just a workaround before the device layout aware
+ * available space calculation arrives.
+ */
+ if ((BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1_MASK |
+ BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID56_MASK) &
+ profile)
+ avail >>= 1;
return avail;
}
@@ -353,6 +365,14 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
if (space_info->flags & BTRFS_BLOCK_GROUP_DATA)
return 0;
+ /*
+ * If we can't allocate new space already, no overcommit is allowed.
+ *
+ * This check may be already late, but still better than nothing.
+ */
+ if (space_info->full)
+ return 0;
+
used = btrfs_space_info_used(space_info, true);
avail = calc_available_free_space(fs_info, space_info, flush);
--
2.28.0