This is a note to let you know that I've just added the patch titled
crypto: mcryptd - protect the per-CPU queue with a lock
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-mcryptd-protect-the-per-cpu-queue-with-a-lock.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9abffc6f2efe46c3564c04312e52e07622d40e51 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Thu, 30 Nov 2017 13:39:27 +0100
Subject: crypto: mcryptd - protect the per-CPU queue with a lock
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
commit 9abffc6f2efe46c3564c04312e52e07622d40e51 upstream.
mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.
If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.
Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/mcryptd.c | 23 ++++++++++-------------
include/crypto/mcryptd.h | 1 +
2 files changed, 11 insertions(+), 13 deletions(-)
--- a/crypto/mcryptd.c
+++ b/crypto/mcryptd.c
@@ -80,6 +80,7 @@ static int mcryptd_init_queue(struct mcr
pr_debug("cpu_queue #%d %p\n", cpu, queue->cpu_queue);
crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
INIT_WORK(&cpu_queue->work, mcryptd_queue_worker);
+ spin_lock_init(&cpu_queue->q_lock);
}
return 0;
}
@@ -103,15 +104,16 @@ static int mcryptd_enqueue_request(struc
int cpu, err;
struct mcryptd_cpu_queue *cpu_queue;
- cpu = get_cpu();
- cpu_queue = this_cpu_ptr(queue->cpu_queue);
- rctx->tag.cpu = cpu;
+ cpu_queue = raw_cpu_ptr(queue->cpu_queue);
+ spin_lock(&cpu_queue->q_lock);
+ cpu = smp_processor_id();
+ rctx->tag.cpu = smp_processor_id();
err = crypto_enqueue_request(&cpu_queue->queue, request);
pr_debug("enqueue request: cpu %d cpu_queue %p request %p\n",
cpu, cpu_queue, request);
+ spin_unlock(&cpu_queue->q_lock);
queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
- put_cpu();
return err;
}
@@ -164,16 +166,11 @@ static void mcryptd_queue_worker(struct
cpu_queue = container_of(work, struct mcryptd_cpu_queue, work);
i = 0;
while (i < MCRYPTD_BATCH || single_task_running()) {
- /*
- * preempt_disable/enable is used to prevent
- * being preempted by mcryptd_enqueue_request()
- */
- local_bh_disable();
- preempt_disable();
+
+ spin_lock_bh(&cpu_queue->q_lock);
backlog = crypto_get_backlog(&cpu_queue->queue);
req = crypto_dequeue_request(&cpu_queue->queue);
- preempt_enable();
- local_bh_enable();
+ spin_unlock_bh(&cpu_queue->q_lock);
if (!req) {
mcryptd_opportunistic_flush();
@@ -188,7 +185,7 @@ static void mcryptd_queue_worker(struct
++i;
}
if (cpu_queue->queue.qlen)
- queue_work(kcrypto_wq, &cpu_queue->work);
+ queue_work_on(smp_processor_id(), kcrypto_wq, &cpu_queue->work);
}
void mcryptd_flusher(struct work_struct *__work)
--- a/include/crypto/mcryptd.h
+++ b/include/crypto/mcryptd.h
@@ -26,6 +26,7 @@ static inline struct mcryptd_ahash *__mc
struct mcryptd_cpu_queue {
struct crypto_queue queue;
+ spinlock_t q_lock;
struct work_struct work;
};
Patches currently in stable-queue which might be from bigeasy(a)linutronix.de are
queue-4.4/crypto-mcryptd-protect-the-per-cpu-queue-with-a-lock.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-fix-the-missing-ctl-name-suffix-at-parsing-su.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5a15f289ee87eaf33f13f08a4909ec99d837ec5f Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Mon, 18 Dec 2017 23:36:57 +0100
Subject: ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
From: Takashi Iwai <tiwai(a)suse.de>
commit 5a15f289ee87eaf33f13f08a4909ec99d837ec5f upstream.
The commit 89b89d121ffc ("ALSA: usb-audio: Add check return value for
usb_string()") added the check of the return value from
snd_usb_copy_string_desc(), which is correct per se, but it introduced
a regression. In the original code, either the "Clock Source",
"Playback Source" or "Capture Source" suffix is added after the
terminal string, while the commit changed it to add the suffix only
when get_term_name() is failing. It ended up with an incorrect ctl
name like "PCM" instead of "PCM Capture Source".
Also, even the original code has a similar bug: when the ctl name is
generated from snd_usb_copy_string_desc() for the given iSelector, it
also doesn't put the suffix.
This patch addresses these issues: the suffix is added always when no
static mapping is found. Also the patch tries to put more comments
and cleans up the if/else block for better readability in order to
avoid the same pitfall again.
Fixes: 89b89d121ffc ("ALSA: usb-audio: Add check return value for usb_string()")
Reported-and-tested-by: Mauro Santos <registo.mailling(a)gmail.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -2101,20 +2101,25 @@ static int parse_audio_selector_unit(str
kctl->private_value = (unsigned long)namelist;
kctl->private_free = usb_mixer_selector_elem_free;
- nameid = uac_selector_unit_iSelector(desc);
+ /* check the static mapping table at first */
len = check_mapped_name(map, kctl->id.name, sizeof(kctl->id.name));
- if (len)
- ;
- else if (nameid)
- len = snd_usb_copy_string_desc(state, nameid, kctl->id.name,
- sizeof(kctl->id.name));
- else
- len = get_term_name(state, &state->oterm,
- kctl->id.name, sizeof(kctl->id.name), 0);
-
if (!len) {
- strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name));
+ /* no mapping ? */
+ /* if iSelector is given, use it */
+ nameid = uac_selector_unit_iSelector(desc);
+ if (nameid)
+ len = snd_usb_copy_string_desc(state, nameid,
+ kctl->id.name,
+ sizeof(kctl->id.name));
+ /* ... or pick up the terminal name at next */
+ if (!len)
+ len = get_term_name(state, &state->oterm,
+ kctl->id.name, sizeof(kctl->id.name), 0);
+ /* ... or use the fixed string "USB" as the last resort */
+ if (!len)
+ strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name));
+ /* and add the proper suffix */
if (desc->bDescriptorSubtype == UAC2_CLOCK_SELECTOR)
append_ctl_name(kctl, " Clock Source");
else if ((state->oterm.type & 0xff00) == 0x0100)
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.4/alsa-usb-audio-fix-the-missing-ctl-name-suffix-at-parsing-su.patch
queue-4.4/alsa-rawmidi-avoid-racy-info-ioctl-via-ctl-device.patch
queue-4.4/acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
This is a note to let you know that I've just added the patch titled
ALSA: rawmidi: Avoid racy info ioctl via ctl device
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-rawmidi-avoid-racy-info-ioctl-via-ctl-device.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c1cfd9025cc394fd137a01159d74335c5ac978ce Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Thu, 14 Dec 2017 16:44:12 +0100
Subject: ALSA: rawmidi: Avoid racy info ioctl via ctl device
From: Takashi Iwai <tiwai(a)suse.de>
commit c1cfd9025cc394fd137a01159d74335c5ac978ce upstream.
The rawmidi also allows to obtaining the information via ioctl of ctl
API. It means that user can issue an ioctl to the rawmidi device even
when it's being removed as long as the control device is present.
Although the code has some protection via the global register_mutex,
its range is limited to the search of the corresponding rawmidi
object, and the mutex is already unlocked at accessing the rawmidi
object. This may lead to a use-after-free.
For avoiding it, this patch widens the application of register_mutex
to the whole snd_rawmidi_info_select() function. We have another
mutex per rawmidi object, but this operation isn't very hot path, so
it shouldn't matter from the performance POV.
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/core/rawmidi.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -579,15 +579,14 @@ static int snd_rawmidi_info_user(struct
return 0;
}
-int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info)
+static int __snd_rawmidi_info_select(struct snd_card *card,
+ struct snd_rawmidi_info *info)
{
struct snd_rawmidi *rmidi;
struct snd_rawmidi_str *pstr;
struct snd_rawmidi_substream *substream;
- mutex_lock(®ister_mutex);
rmidi = snd_rawmidi_search(card, info->device);
- mutex_unlock(®ister_mutex);
if (!rmidi)
return -ENXIO;
if (info->stream < 0 || info->stream > 1)
@@ -603,6 +602,16 @@ int snd_rawmidi_info_select(struct snd_c
}
return -ENXIO;
}
+
+int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info)
+{
+ int ret;
+
+ mutex_lock(®ister_mutex);
+ ret = __snd_rawmidi_info_select(card, info);
+ mutex_unlock(®ister_mutex);
+ return ret;
+}
EXPORT_SYMBOL(snd_rawmidi_info_select);
static int snd_rawmidi_info_select_user(struct snd_card *card,
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.4/alsa-usb-audio-fix-the-missing-ctl-name-suffix-at-parsing-su.patch
queue-4.4/alsa-rawmidi-avoid-racy-info-ioctl-via-ctl-device.patch
queue-4.4/acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
This is a note to let you know that I've just added the patch titled
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From bb82e0b4a7e96494f0c1004ce50cec3d7b5fb3d1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Thu, 14 Dec 2017 13:31:16 +0100
Subject: ACPI: APEI / ERST: Fix missing error handling in erst_reader()
From: Takashi Iwai <tiwai(a)suse.de>
commit bb82e0b4a7e96494f0c1004ce50cec3d7b5fb3d1 upstream.
The commit f6f828513290 ("pstore: pass allocated memory region back to
caller") changed the check of the return value from erst_read() in
erst_reader() in the following way:
if (len == -ENOENT)
goto skip;
- else if (len < 0) {
- rc = -1;
+ else if (len < sizeof(*rcd)) {
+ rc = -EIO;
goto out;
This introduced another bug: since the comparison with sizeof() is
cast to unsigned, a negative len value doesn't hit any longer.
As a result, when an error is returned from erst_read(), the code
falls through, and it may eventually lead to some weird thing like
memory corruption.
This patch adds the negative error value check more explicitly for
addressing the issue.
Fixes: f6f828513290 (pstore: pass allocated memory region back to caller)
Tested-by: Jerry Tang <jtang(a)suse.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Acked-by: Kees Cook <keescook(a)chromium.org>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/acpi/apei/erst.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -1020,7 +1020,7 @@ skip:
/* The record may be cleared by others, try read next record */
if (len == -ENOENT)
goto skip;
- else if (len < sizeof(*rcd)) {
+ else if (len < 0 || len < sizeof(*rcd)) {
rc = -EIO;
goto out;
}
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.4/alsa-usb-audio-fix-the-missing-ctl-name-suffix-at-parsing-su.patch
queue-4.4/alsa-rawmidi-avoid-racy-info-ioctl-via-ctl-device.patch
queue-4.4/acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2797c4a11f373b2545c2398ccb02e362ee66a142 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon, 4 Dec 2017 13:25:13 +0000
Subject: [PATCH] drm/i915: Flush pending GTT writes before unbinding
>From the shrinker paths, we want to relinquish the GPU and GGTT access to
the object, releasing the backing storage back to the system for
swapout. As a part of that process we would unpin the pages, marking
them for access by the CPU (for the swapout/swapin). However, if that
process was interrupted after unbind the vma, we missed a flush of the
inflight GGTT writes before we made that GTT space available again for
reuse, with the prospect that we would redirect them to another page.
The bug dates back to the introduction of multiple GGTT vma, but the
code itself dates to commit 02bef8f98d26 ("drm/i915: Unbind closed vma
for i915_gem_object_unbind()").
Fixes: 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()")
Fixes: c5ad54cf7dd8 ("drm/i915: Use partial view in mmap fault handler")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@c…
(cherry picked from commit 5888fc9eac3c2ff96e76aeeb865fdb46ab2d711e)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ad4050f7ab3b..18de6569d04a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -330,17 +330,10 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
* must wait for all rendering to complete to the object (as unbinding
* must anyway), and retire the requests.
*/
- ret = i915_gem_object_wait(obj,
- I915_WAIT_INTERRUPTIBLE |
- I915_WAIT_LOCKED |
- I915_WAIT_ALL,
- MAX_SCHEDULE_TIMEOUT,
- NULL);
+ ret = i915_gem_object_set_to_cpu_domain(obj, false);
if (ret)
return ret;
- i915_gem_retire_requests(to_i915(obj->base.dev));
-
while ((vma = list_first_entry_or_null(&obj->vma_list,
struct i915_vma,
obj_link))) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 111be883981748acc9a56e855c8336404a8e787c Mon Sep 17 00:00:00 2001
From: Shaohua Li <shli(a)fb.com>
Date: Wed, 20 Dec 2017 11:10:17 -0700
Subject: [PATCH] block-throttle: avoid double charge
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.
To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().
This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.
V1-> V2: Not add extra field in bio based on discussion with Tejun
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: stable(a)vger.kernel.org
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shaohua Li <shli(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bio.c b/block/bio.c
index 8bfdea58159b..9ef6cf3addb3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -599,6 +599,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_disk = bio_src->bi_disk;
bio->bi_partno = bio_src->bi_partno;
bio_set_flag(bio, BIO_CLONED);
+ if (bio_flagged(bio_src, BIO_THROTTLED))
+ bio_set_flag(bio, BIO_THROTTLED);
bio->bi_opf = bio_src->bi_opf;
bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 825bc29767e6..d19f416d6101 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2226,13 +2226,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
out_unlock:
spin_unlock_irq(q->queue_lock);
out:
- /*
- * As multiple blk-throtls may stack in the same issue path, we
- * don't want bios to leave with the flag set. Clear the flag if
- * being issued.
- */
- if (!throttled)
- bio_clear_flag(bio, BIO_THROTTLED);
+ bio_set_flag(bio, BIO_THROTTLED);
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (throttled || !td->track_bio_latency)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 82f0c8fd7be8..23d29b39f71e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -492,6 +492,8 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
#define bio_set_dev(bio, bdev) \
do { \
+ if ((bio)->bi_disk != (bdev)->bd_disk) \
+ bio_clear_flag(bio, BIO_THROTTLED);\
(bio)->bi_disk = (bdev)->bd_disk; \
(bio)->bi_partno = (bdev)->bd_partno; \
} while (0)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a1e628e032da..9e7d8bd776d2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -50,8 +50,6 @@ struct blk_issue_stat {
struct bio {
struct bio *bi_next; /* request queue link */
struct gendisk *bi_disk;
- u8 bi_partno;
- blk_status_t bi_status;
unsigned int bi_opf; /* bottom bits req flags,
* top bits REQ_OP. Use
* accessors.
@@ -59,8 +57,8 @@ struct bio {
unsigned short bi_flags; /* status, etc and bvec pool number */
unsigned short bi_ioprio;
unsigned short bi_write_hint;
-
- struct bvec_iter bi_iter;
+ blk_status_t bi_status;
+ u8 bi_partno;
/* Number of segments in this BIO after
* physical address coalescing is performed.
@@ -74,8 +72,9 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;
- atomic_t __bi_remaining;
+ struct bvec_iter bi_iter;
+ atomic_t __bi_remaining;
bio_end_io_t *bi_end_io;
void *bi_private;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 111be883981748acc9a56e855c8336404a8e787c Mon Sep 17 00:00:00 2001
From: Shaohua Li <shli(a)fb.com>
Date: Wed, 20 Dec 2017 11:10:17 -0700
Subject: [PATCH] block-throttle: avoid double charge
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.
To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().
This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.
V1-> V2: Not add extra field in bio based on discussion with Tejun
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: stable(a)vger.kernel.org
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shaohua Li <shli(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bio.c b/block/bio.c
index 8bfdea58159b..9ef6cf3addb3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -599,6 +599,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_disk = bio_src->bi_disk;
bio->bi_partno = bio_src->bi_partno;
bio_set_flag(bio, BIO_CLONED);
+ if (bio_flagged(bio_src, BIO_THROTTLED))
+ bio_set_flag(bio, BIO_THROTTLED);
bio->bi_opf = bio_src->bi_opf;
bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 825bc29767e6..d19f416d6101 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2226,13 +2226,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
out_unlock:
spin_unlock_irq(q->queue_lock);
out:
- /*
- * As multiple blk-throtls may stack in the same issue path, we
- * don't want bios to leave with the flag set. Clear the flag if
- * being issued.
- */
- if (!throttled)
- bio_clear_flag(bio, BIO_THROTTLED);
+ bio_set_flag(bio, BIO_THROTTLED);
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (throttled || !td->track_bio_latency)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 82f0c8fd7be8..23d29b39f71e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -492,6 +492,8 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
#define bio_set_dev(bio, bdev) \
do { \
+ if ((bio)->bi_disk != (bdev)->bd_disk) \
+ bio_clear_flag(bio, BIO_THROTTLED);\
(bio)->bi_disk = (bdev)->bd_disk; \
(bio)->bi_partno = (bdev)->bd_partno; \
} while (0)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a1e628e032da..9e7d8bd776d2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -50,8 +50,6 @@ struct blk_issue_stat {
struct bio {
struct bio *bi_next; /* request queue link */
struct gendisk *bi_disk;
- u8 bi_partno;
- blk_status_t bi_status;
unsigned int bi_opf; /* bottom bits req flags,
* top bits REQ_OP. Use
* accessors.
@@ -59,8 +57,8 @@ struct bio {
unsigned short bi_flags; /* status, etc and bvec pool number */
unsigned short bi_ioprio;
unsigned short bi_write_hint;
-
- struct bvec_iter bi_iter;
+ blk_status_t bi_status;
+ u8 bi_partno;
/* Number of segments in this BIO after
* physical address coalescing is performed.
@@ -74,8 +72,9 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;
- atomic_t __bi_remaining;
+ struct bvec_iter bi_iter;
+ atomic_t __bi_remaining;
bio_end_io_t *bi_end_io;
void *bi_private;
This is a note to let you know that I've just added the patch titled
x86/insn-eval: Add utility functions to get segment selector
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b0caa8c8c6bbc422bc3c32b64852d6d618f32b49 Mon Sep 17 00:00:00 2001
From: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Date: Fri, 27 Oct 2017 13:25:40 -0700
Subject: x86/insn-eval: Add utility functions to get segment selector
From: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
commit 32d0b95300db03c2b23b2ea2c94769a4a138e79d upstream.
[note, only the inat.h portion, to get objtool back in sync - gregkh]
b0caa8c8c6bbc422bc3c32b64852d6d618f32b49 Mon Sep 17 00:00:00 2001
When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero. Thus, the segment base address is needed to
calculate correctly the linear address.
If running in protected mode, the segment selector to be used when
computing a linear address is determined by either any of segment override
prefixes in the instruction or inferred from the registers involved in the
computation of the effective address; in that order. Also, there are cases
when the segment override prefixes shall be ignored (i.e., code segments
are always selected by the CS segment register; string instructions always
use the ES segment register when using rDI register as operand). In long
mode, segment registers are ignored, except for FS and GS. In these two
cases, base addresses are obtained from the respective MSRs.
For clarity, this process can be split into four steps (and an equal
number of functions): determine if segment prefixes overrides can be used;
parse the segment override prefixes, and use them if found; if not found
or cannot be used, use the default segment registers associated with the
operand registers. Once the segment register to use has been identified,
read its value to obtain the segment selector.
The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into a pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.
In order to identify the segment registers, a new set of #defines is
introduced. It also includes two special identifiers. One of them
indicates when the default segment register associated with instruction
operands shall be used. Another one indicates that the contents of the
segment register shall be ignored; this identifier is used when in long
mode.
Improvements-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: ricardo.neri(a)intel.com
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Paul Gortmaker <paul.gortmaker(a)windriver.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Qiaowei Ren <qiaowei.ren(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar(a)intel.com>
Cc: Chris Metcalf <cmetcalf(a)mellanox.com>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Colin Ian King <colin.king(a)canonical.com>
Cc: Chen Yucong <slaoub(a)gmail.com>
Cc: Adam Buchbinder <adam.buchbinder(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Thomas Garnier <thgarnie(a)google.com>
Link: https://lkml.kernel.org/r/1509135945-13762-14-git-send-email-ricardo.neri-c…
Cc: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/inat.h | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -97,6 +97,16 @@
#define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM)
#define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
+/* Identifiers for segment registers */
+#define INAT_SEG_REG_IGNORE 0
+#define INAT_SEG_REG_DEFAULT 1
+#define INAT_SEG_REG_CS 2
+#define INAT_SEG_REG_SS 3
+#define INAT_SEG_REG_DS 4
+#define INAT_SEG_REG_ES 5
+#define INAT_SEG_REG_FS 6
+#define INAT_SEG_REG_GS 7
+
/* Attribute search APIs */
extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
Patches currently in stable-queue which might be from ricardo.neri-calderon(a)linux.intel.com are
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
This is a note to let you know that I've just added the patch titled
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From da99706689481717998d1d48edd389f339eea979 Mon Sep 17 00:00:00 2001
From: Daniel Thompson <daniel.thompson(a)linaro.org>
Date: Thu, 21 Dec 2017 15:06:15 +0200
Subject: usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
When plugging in a USB webcam I see the following message:
xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs
XHCI_TRUST_TX_LENGTH quirk?
handle_tx_event: 913 callbacks suppressed
All is quiet again with this patch (and I've done a fair but of soak
testing with the camera since).
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 7ef1274ef7f7..1aad89b8aba0 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -177,6 +177,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
xhci->quirks |= XHCI_TRUST_TX_LENGTH;
xhci->quirks |= XHCI_BROKEN_STREAMS;
}
+ if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
+ pdev->device == 0x0014)
+ xhci->quirks |= XHCI_TRUST_TX_LENGTH;
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0015)
xhci->quirks |= XHCI_RESET_ON_RESUME;
--
2.15.1
This is a note to let you know that I've just added the patch titled
USB: serial: ftdi_sio: add id for Airbus DS P8GR
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c6a36ad383559a60a249aa6016cebf3cb8b6c485 Mon Sep 17 00:00:00 2001
From: Max Schulze <max.schulze(a)posteo.de>
Date: Wed, 20 Dec 2017 20:47:44 +0100
Subject: USB: serial: ftdi_sio: add id for Airbus DS P8GR
Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver.
Signed-off-by: Max Schulze <max.schulze(a)posteo.de>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 1aba9105b369..fc68952c994a 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1013,6 +1013,7 @@ static const struct usb_device_id id_table_combined[] = {
.driver_info = (kernel_ulong_t)&ftdi_jtag_quirk },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_BT_USB_PID) },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_WL_USB_PID) },
+ { USB_DEVICE(AIRBUS_DS_VID, AIRBUS_DS_P8GR) },
{ } /* Terminating entry */
};
diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
index 4faa09fe308c..8b4ecd2bd297 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -914,6 +914,12 @@
#define ICPDAS_I7561U_PID 0x0104
#define ICPDAS_I7563U_PID 0x0105
+/*
+ * Airbus Defence and Space
+ */
+#define AIRBUS_DS_VID 0x1e8e /* Vendor ID */
+#define AIRBUS_DS_P8GR 0x6001 /* Tetra P8GR */
+
/*
* RT Systems programming cables for various ham radios
*/
--
2.15.1