During my rwsem testing, it was found that after a down_read(), the
reader count may occasionally become 0 or even negative. Consequently,
a writer may steal the lock at that time and execute with the reader
in parallel thus breaking the mutual exclusion guarantee of the write
lock. In other words, both readers and writer can become rwsem owners
simultaneously.
The current reader wakeup code does it in one pass to clear waiter->task
and put them into wake_q before fully incrementing the reader count.
Once waiter->task is cleared, the corresponding reader may see it,
finish the critical section and do unlock to decrement the count before
the count is incremented. This is not a problem if there is only one
reader to wake up as the count has been pre-incremented by 1. It is
a problem if there are more than one readers to be woken up and writer
can steal the lock.
The wakeup was actually done in 2 passes before the v4.9 commit
70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only
once"). To fix this problem, the wakeup is now done in two passes
again. In the first pass, we collect the readers and count them. The
reader count is then fully incremented. In the second pass, the
waiter->task is then cleared and they are put into wake_q to be woken
up later.
Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/locking/rwsem-xadd.c | 46 +++++++++++++++++++++++++------------
1 file changed, 31 insertions(+), 15 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 6b3ee9948bf1..0b1f77957240 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
{
struct rwsem_waiter *waiter, *tmp;
long oldcount, woken = 0, adjustment = 0;
+ struct list_head wlist;
/*
* Take a peek at the queue head waiter such that we can determine
@@ -188,18 +189,43 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
* of the queue. We know that woken will be at least 1 as we accounted
* for above. Note we increment the 'active part' of the count by the
* number of readers before waking any processes up.
+ *
+ * We have to do wakeup in 2 passes to prevent the possibility that
+ * the reader count may be decremented before it is incremented. It
+ * is because the to-be-woken waiter may not have slept yet. So it
+ * may see waiter->task got cleared, finish its critical section and
+ * do an unlock before the reader count increment.
+ *
+ * 1) Collect the read-waiters in a separate list, count them and
+ * fully increment the reader count in rwsem.
+ * 2) For each waiters in the new list, clear waiter->task and
+ * put them into wake_q to be woken up later.
*/
- list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
- struct task_struct *tsk;
-
+ list_for_each_entry(waiter, &sem->wait_list, list) {
if (waiter->type == RWSEM_WAITING_FOR_WRITE)
break;
woken++;
- tsk = waiter->task;
+ }
+ list_cut_before(&wlist, &sem->wait_list, &waiter->list);
+
+ adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+ lockevent_cond_inc(rwsem_wake_reader, woken);
+ if (list_empty(&sem->wait_list)) {
+ /* hit end of list above */
+ adjustment -= RWSEM_WAITING_BIAS;
+ }
+
+ if (adjustment)
+ atomic_long_add(adjustment, &sem->count);
+
+ /* 2nd pass */
+ list_for_each_entry_safe(waiter, tmp, &wlist, list) {
+ struct task_struct *tsk;
+ tsk = waiter->task;
get_task_struct(tsk);
- list_del(&waiter->list);
+
/*
* Ensure calling get_task_struct() before setting the reader
* waiter to nil such that rwsem_down_read_failed() cannot
@@ -213,16 +239,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
*/
wake_q_add_safe(wake_q, tsk);
}
-
- adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
- lockevent_cond_inc(rwsem_wake_reader, woken);
- if (list_empty(&sem->wait_list)) {
- /* hit end of list above */
- adjustment -= RWSEM_WAITING_BIAS;
- }
-
- if (adjustment)
- atomic_long_add(adjustment, &sem->count);
}
/*
--
2.18.1
Changes since v5 [1]:
- Rebase on next-20190416 and the new 'struct mhp_restrictions'
infrastructure.
- Extend mhp_restrictions to the 'remove' case so the sub-section policy
can be clarified with respect to the memblock-api in a symmetric
manner with the 'add' case.
- Kill is_dev_zone() since cleanups have now made it moot
[1]: https://lwn.net/Articles/783808/
---
The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.
To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.
It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:
Current design assumptions:
- Sections that describe boot memory (early sections) are never
unplugged / removed.
- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
valid_section() check
- __add_pages() and helper routines assume all operations occur in
PAGES_PER_SECTION units.
- The memblock sysfs interface only comprehends full sections
New design assumptions:
- Sections are instrumented with a sub-section bitmask to track (on x86)
individual 2MB sub-divisions of a 128MB section.
- Partially populated early sections can be extended with additional
sub-sections, and those sub-sections can be removed with
arch_remove_memory(). With this in place we no longer lose usable memory
capacity to padding.
- pfn_valid() is updated to look deeper than valid_section() to also check the
active-sub-section mask. This indication is in the same cacheline as
the valid_section() so the performance impact is expected to be
negligible. So far the lkp robot has not reported any regressions.
- Outside of the core vmemmap population routines which are replaced,
other helper routines like shrink_{zone,pgdat}_span() are updated to
handle the smaller granularity. Core memory hotplug routines that deal
with online memory are not touched.
- The existing memblock sysfs user api guarantees / assumptions are
not touched since this capability is limited to !online
!memblock-sysfs-accessible sections.
Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem
ranges with other pmem ranges by default [3]. In short,
devm_memremap_pages() has pushed the venerable section-size constraint
past the breaking point, and the simplicity of section-aligned
arch_add_memory() is no longer tenable.
These patches are exposed to the kbuild robot on my libnvdimm-pending
branch [4], and a preview of the unit test for this functionality is
available on the 'subsection-pending' branch of ndctl [5].
[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwi…
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libn…
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
---
Dan Williams (12):
mm/sparsemem: Introduce struct mem_section_usage
mm/sparsemem: Introduce common definitions for the size and mask of a section
mm/sparsemem: Add helpers track active portions of a section at boot
mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
mm/hotplug: Add mem-hotplug restrictions for remove_memory()
mm: Kill is_dev_zone() helper
mm/sparsemem: Prepare for sub-section ranges
mm/sparsemem: Support sub-section hotplug
mm/devm_memremap_pages: Enable sub-section remap
libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
libnvdimm/pfn: Stop padding pmem namespaces to section alignment
arch/ia64/mm/init.c | 4
arch/powerpc/mm/mem.c | 5 -
arch/s390/mm/init.c | 2
arch/sh/mm/init.c | 4
arch/x86/mm/init_32.c | 4
arch/x86/mm/init_64.c | 9 +
drivers/nvdimm/dax_devs.c | 2
drivers/nvdimm/pfn.h | 12 -
drivers/nvdimm/pfn_devs.c | 93 +++-------
include/linux/memory_hotplug.h | 12 +
include/linux/mm.h | 4
include/linux/mmzone.h | 72 ++++++--
kernel/memremap.c | 70 +++-----
mm/hmm.c | 2
mm/memory_hotplug.c | 148 +++++++++-------
mm/page_alloc.c | 8 +
mm/sparse-vmemmap.c | 21 ++
mm/sparse.c | 371 +++++++++++++++++++++++++++-------------
18 files changed, 503 insertions(+), 340 deletions(-)
Currently in sst_dsp_new() if we get an error return from sst_dma_new()
we just print an error message and then still complete the function
successfully. This means that we are trying to run without sst->dma
properly set up, which will result in NULL pointer dereference when
sst->dma is later used. This was happening for me in
sst_dsp_dma_get_channel():
struct sst_dma *dma = dsp->dma;
...
dma->ch = dma_request_channel(mask, dma_chan_filter, dsp);
This resulted in:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: sst_dsp_dma_get_channel+0x4f/0x125 [snd_soc_sst_firmware]
Fix this by adding proper error handling for the case where we fail to
set up DMA.
Signed-off-by: Ross Zwisler <zwisler(a)google.com>
Cc: stable(a)vger.kernel.org
---
sound/soc/intel/common/sst-firmware.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/sound/soc/intel/common/sst-firmware.c b/sound/soc/intel/common/sst-firmware.c
index 1e067504b6043..9be3a793a55e3 100644
--- a/sound/soc/intel/common/sst-firmware.c
+++ b/sound/soc/intel/common/sst-firmware.c
@@ -1251,11 +1251,15 @@ struct sst_dsp *sst_dsp_new(struct device *dev,
goto irq_err;
err = sst_dma_new(sst);
- if (err)
+ if (err) {
dev_warn(dev, "sst_dma_new failed %d\n", err);
+ goto dma_err;
+ }
return sst;
+dma_err:
+ free_irq(sst->irq, sst);
irq_err:
if (sst->ops->free)
sst->ops->free(sst);
--
2.21.0.593.g511ec345e18-goog
Commit 8efd972ef96a ("crypto: testmgr - support checking skcipher output IV")
caused the crypto4xx driver to produce the following error:
| ctr-aes-ppc4xx encryption test failed (wrong output IV)
| on test vector 0, cfg="in-place"
This patch fixes this by reworking the crypto4xx_setkey_aes()
function to:
- not save the iv for ECB (as per 18.2.38 CRYP0_SA_CMD_0:
"This bit mut be cleared for DES ECB mode or AES ECB mode,
when no IV is used.")
- instruct the hardware to save the generated IV for all
other modes of operations that have IV and then supply
it back to the callee in pretty much the same way as we
do it for cbc-aes already.
- make it clear that the DIR_(IN|OUT)BOUND is the important
bit that tells the hardware to encrypt or decrypt the data.
(this is cosmetic - but it hopefully prevents me from
getting confused again).
- don't load any bogus hash when we don't use any hash
operation to begin with.
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Lamparter <chunkeey(a)gmail.com>
---
drivers/crypto/amcc/crypto4xx_alg.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/crypto/amcc/crypto4xx_alg.c b/drivers/crypto/amcc/crypto4xx_alg.c
index 4092c2aad8e2..3458c5a085d9 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -141,9 +141,10 @@ static int crypto4xx_setkey_aes(struct crypto_skcipher *cipher,
/* Setup SA */
sa = ctx->sa_in;
- set_dynamic_sa_command_0(sa, SA_NOT_SAVE_HASH, (cm == CRYPTO_MODE_CBC ?
- SA_SAVE_IV : SA_NOT_SAVE_IV),
- SA_LOAD_HASH_FROM_SA, SA_LOAD_IV_FROM_STATE,
+ set_dynamic_sa_command_0(sa, SA_NOT_SAVE_HASH, (cm == CRYPTO_MODE_ECB ?
+ SA_NOT_SAVE_IV : SA_SAVE_IV),
+ SA_NOT_LOAD_HASH, (cm == CRYPTO_MODE_ECB ?
+ SA_LOAD_IV_FROM_SA : SA_LOAD_IV_FROM_STATE),
SA_NO_HEADER_PROC, SA_HASH_ALG_NULL,
SA_CIPHER_ALG_AES, SA_PAD_TYPE_ZERO,
SA_OP_GROUP_BASIC, SA_OPCODE_DECRYPT,
@@ -162,6 +163,11 @@ static int crypto4xx_setkey_aes(struct crypto_skcipher *cipher,
memcpy(ctx->sa_out, ctx->sa_in, ctx->sa_len * 4);
sa = ctx->sa_out;
sa->sa_command_0.bf.dir = DIR_OUTBOUND;
+ /*
+ * SA_OPCODE_ENCRYPT is the same value as SA_OPCODE_DECRYPT.
+ * it's the DIR_(IN|OUT)BOUND that matters
+ */
+ sa->sa_command_0.bf.opcode = SA_OPCODE_ENCRYPT;
return 0;
}
--
2.20.1
Multiple users have reported their Synaptics touchpad has stopped
working between v4.20.1 and v4.20.2 when using SMBus interface.
The culprit for this appeared to be commit c5eb1190074c ("PCI / PM: Allow
runtime PM without callback functions") that fixed the runtime PM for
i2c-i801 SMBus adapter. Those Synaptics touchpad are using i2c-i801
for SMBus communication and testing showed they are able to get back
working by preventing the runtime suspend of adapter.
Normally when i2c-i801 SMBus adapter transmits with the client it resumes
before operation and autosuspends after.
However, if client requires SMBus Host Notify protocol, what those
Synaptics touchpads do, then the host adapter must not go to runtime
suspend since then it cannot process incoming SMBus Host Notify commands
the client may send.
Fix this by keeping I2C/SMBus adapter active in case client requires
Host Notify.
Reported-by: Keijo Vaara <ferdasyn(a)rocketmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203297
Fixes: c5eb1190074c ("PCI / PM: Allow runtime PM without callback functions")
Cc: stable(a)vger.kernel.org # v4.20+
Signed-off-by: Jarkko Nikula <jarkko.nikula(a)linux.intel.com>
---
Keijo: could you test this does it fix the issue you reported? This is
practically the same diff I sent earlier what you probably haven't tested yet.
I wanted to send a commitable fix in case it works since I'll be out of
office in a few coming days.
---
drivers/i2c/i2c-core-base.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 38af18645133..8149c9e32b69 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -327,6 +327,8 @@ static int i2c_device_probe(struct device *dev)
if (client->flags & I2C_CLIENT_HOST_NOTIFY) {
dev_dbg(dev, "Using Host Notify IRQ\n");
+ /* Keep adapter active when Host Notify is required */
+ pm_runtime_get_sync(&client->adapter->dev);
irq = i2c_smbus_host_notify_to_irq(client);
} else if (dev->of_node) {
irq = of_irq_get_byname(dev->of_node, "irq");
@@ -431,6 +433,8 @@ static int i2c_device_remove(struct device *dev)
device_init_wakeup(&client->dev, false);
client->irq = client->init_irq;
+ if (client->flags & I2C_CLIENT_HOST_NOTIFY)
+ pm_runtime_put(&client->adapter->dev);
return status;
}
--
2.20.1
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5aae7832d1b4ec614996ea0f4fafc4d9855ec0b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Tue, 26 Mar 2019 16:49:02 +0200
Subject: [PATCH] drm/i915: Do not enable FEC without DSC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently we enable FEC even when DSC is no used. While that is
theoretically valid supposedly there isn't much of a benefit from
this. But more importantly we do not account for the FEC link
bandwidth overhead (2.4%) in the non-DSC link bandwidth computations.
So the code may think we have enough bandwidth when we in fact
do not.
Cc: stable(a)vger.kernel.org
Cc: Anusha Srivatsa <anusha.srivatsa(a)intel.com>
Cc: Manasi Navare <manasi.d.navare(a)intel.com>
Fixes: 240999cf339f ("i915/dp/fec: Add fec_enable to the crtc state.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326144903.6617-1-ville.s…
Reviewed-by: Manasi Navare <manasi.d.navare(a)intel.com>
(cherry picked from commit 6fd3134ae3551d4802a04669c0f39f2f5c56f77d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 8891f29a8c7f..48da4a969a0a 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1886,6 +1886,9 @@ static int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
int pipe_bpp;
int ret;
+ pipe_config->fec_enable = !intel_dp_is_edp(intel_dp) &&
+ intel_dp_supports_fec(intel_dp, pipe_config);
+
if (!intel_dp_supports_dsc(intel_dp, pipe_config))
return -EINVAL;
@@ -2116,9 +2119,6 @@ intel_dp_compute_config(struct intel_encoder *encoder,
if (adjusted_mode->flags & DRM_MODE_FLAG_DBLCLK)
return -EINVAL;
- pipe_config->fec_enable = !intel_dp_is_edp(intel_dp) &&
- intel_dp_supports_fec(intel_dp, pipe_config);
-
ret = intel_dp_compute_link_config(encoder, pipe_config, conn_state);
if (ret < 0)
return ret;