This is a note to let you know that I've just added the patch titled
dm: fix race between dm_get_from_kobject() and __dm_destroy()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-fix-race-between-dm_get_from_kobject-and-__dm_destroy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b9a41d21dceadf8104812626ef85dc56ee8a60ed Mon Sep 17 00:00:00 2001
From: Hou Tao <houtao1(a)huawei.com>
Date: Wed, 1 Nov 2017 15:42:36 +0800
Subject: dm: fix race between dm_get_from_kobject() and __dm_destroy()
From: Hou Tao <houtao1(a)huawei.com>
commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream.
The following BUG_ON was hit when testing repeat creation and removal of
DM devices:
kernel BUG at drivers/md/dm.c:2919!
CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
Call Trace:
[<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
[<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
[<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
[<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
[<ffffffff811de257>] kernfs_seq_show+0x23/0x25
[<ffffffff81199118>] seq_read+0x16f/0x325
[<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
[<ffffffff8117b625>] __vfs_read+0x26/0x9d
[<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
[<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
[<ffffffff8117be9d>] vfs_read+0x8f/0xcf
[<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
[<ffffffff8117c686>] SyS_read+0x4b/0x76
[<ffffffff817b606e>] system_call_fastpath+0x12/0x71
The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().
To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.
The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2709,11 +2709,15 @@ struct mapped_device *dm_get_from_kobjec
md = container_of(kobj, struct mapped_device, kobj_holder.kobj);
- if (test_bit(DMF_FREEING, &md->flags) ||
- dm_deleting_md(md))
- return NULL;
-
+ spin_lock(&_minor_lock);
+ if (test_bit(DMF_FREEING, &md->flags) || dm_deleting_md(md)) {
+ md = NULL;
+ goto out;
+ }
dm_get(md);
+out:
+ spin_unlock(&_minor_lock);
+
return md;
}
Patches currently in stable-queue which might be from houtao1(a)huawei.com are
queue-4.14/dm-fix-race-between-dm_get_from_kobject-and-__dm_destroy.patch
This is a note to let you know that I've just added the patch titled
dm: discard support requires all targets in a table support discards
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-discard-support-requires-all-targets-in-a-table-support-discards.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8a74d29d541cd86569139c6f3f44b2d210458071 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Tue, 14 Nov 2017 15:40:52 -0500
Subject: dm: discard support requires all targets in a table support discards
From: Mike Snitzer <snitzer(a)redhat.com>
commit 8a74d29d541cd86569139c6f3f44b2d210458071 upstream.
A DM device with a mix of discard capabilities (due to some underlying
devices not having discard support) _should_ just return -EOPNOTSUPP for
the region of the device that doesn't support discards (even if only by
way of the underlying driver formally not supporting discards). BUT,
that does ask the underlying driver to handle something that it never
advertised support for. In doing so we're exposing users to the
potential for a underlying disk driver hanging if/when a discard is
issued a the device that is incapable and never claimed to support
discards.
Fix this by requiring that each DM target in a DM table provide discard
support as a prereq for a DM device to advertise support for discards.
This may cause some configurations that were happily supporting discards
(even in the face of a mix of discard support) to stop supporting
discards -- but the risk of users hitting driver hangs, and forced
reboots, outweighs supporting those fringe mixed discard
configurations.
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-table.c | 33 ++++++++++++++-------------------
1 file changed, 14 insertions(+), 19 deletions(-)
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1758,13 +1758,12 @@ static bool dm_table_supports_write_zero
return true;
}
-
-static int device_discard_capable(struct dm_target *ti, struct dm_dev *dev,
- sector_t start, sector_t len, void *data)
+static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
{
struct request_queue *q = bdev_get_queue(dev->bdev);
- return q && blk_queue_discard(q);
+ return q && !blk_queue_discard(q);
}
static bool dm_table_supports_discards(struct dm_table *t)
@@ -1772,28 +1771,24 @@ static bool dm_table_supports_discards(s
struct dm_target *ti;
unsigned i;
- /*
- * Unless any target used by the table set discards_supported,
- * require at least one underlying device to support discards.
- * t->devices includes internal dm devices such as mirror logs
- * so we need to use iterate_devices here, which targets
- * supporting discard selectively must provide.
- */
for (i = 0; i < dm_table_get_num_targets(t); i++) {
ti = dm_table_get_target(t, i);
if (!ti->num_discard_bios)
- continue;
-
- if (ti->discards_supported)
- return true;
+ return false;
- if (ti->type->iterate_devices &&
- ti->type->iterate_devices(ti, device_discard_capable, NULL))
- return true;
+ /*
+ * Either the target provides discard support (as implied by setting
+ * 'discards_supported') or it relies on _all_ data devices having
+ * discard support.
+ */
+ if (!ti->discards_supported &&
+ (!ti->type->iterate_devices ||
+ ti->type->iterate_devices(ti, device_not_discard_capable, NULL)))
+ return false;
}
- return false;
+ return true;
}
void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
Patches currently in stable-queue which might be from snitzer(a)redhat.com are
queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch
queue-4.14/dm-zoned-ignore-last-smaller-runt-zone.patch
queue-4.14/dm-discard-support-requires-all-targets-in-a-table-support-discards.patch
queue-4.14/dm-fix-race-between-dm_get_from_kobject-and-__dm_destroy.patch
queue-4.14/dm-cache-fix-race-condition-in-the-writeback-mode-overwrite_bio-optimisation.patch
queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch
queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch
queue-4.14/dm-mpath-remove-annoying-message-of-blk_get_request-returned-11.patch
queue-4.14/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
This is a note to let you know that I've just added the patch titled
btrfs: change how we decide to commit transactions during flushing
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-change-how-we-decide-to-commit-transactions-during-flushing.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 996478ca9c460886ac147eb0d00e99841b71d31b Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Tue, 22 Aug 2017 16:00:39 -0400
Subject: btrfs: change how we decide to commit transactions during flushing
From: Josef Bacik <jbacik(a)fb.com>
commit 996478ca9c460886ac147eb0d00e99841b71d31b upstream.
Nikolay reported that generic/273 was failing currently with ENOSPC.
Turns out this is because we get to the point where the outstanding
reservations are greater than the pinned space on the fs. This is a
mistake, previously we used the current reservation amount in
may_commit_transaction, not the entire outstanding reservation amount.
Fix this to find the minimum byte size needed to make progress in
flushing, and pass that into may_commit_transaction. From there we can
make a smarter decision on whether to commit the transaction or not.
This fixes the failure in generic/273.
>From Nikolai, IOW: when we go to the final stage of deciding whether to
do trans commit, instead of passing all the reservations from all
tickets we just pass the reservation for the current ticket. Otherwise,
in case all reservations exceed pinned space, then we don't commit
transaction and fail prematurely. Before we passed num_bytes from
flush_space, where num_bytes was the sum of all pending reserverations,
but now all we do is take the first ticket and commit the trans if we
can satisfy that.
Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Reported-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Josef Bacik <jbacik(a)fb.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Tested-by: Nikolay Borisov <nborisov(a)suse.com>
[ added Nikolai's comment ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/extent-tree.c | 42 ++++++++++++++++++++++++++++--------------
1 file changed, 28 insertions(+), 14 deletions(-)
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4919,6 +4919,13 @@ skip_async:
}
}
+struct reserve_ticket {
+ u64 bytes;
+ int error;
+ struct list_head list;
+ wait_queue_head_t wait;
+};
+
/**
* maybe_commit_transaction - possibly commit the transaction if its ok to
* @root - the root we're allocating for
@@ -4930,18 +4937,29 @@ skip_async:
* will return -ENOSPC.
*/
static int may_commit_transaction(struct btrfs_fs_info *fs_info,
- struct btrfs_space_info *space_info,
- u64 bytes, int force)
+ struct btrfs_space_info *space_info)
{
+ struct reserve_ticket *ticket = NULL;
struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
struct btrfs_trans_handle *trans;
+ u64 bytes;
trans = (struct btrfs_trans_handle *)current->journal_info;
if (trans)
return -EAGAIN;
- if (force)
- goto commit;
+ spin_lock(&space_info->lock);
+ if (!list_empty(&space_info->priority_tickets))
+ ticket = list_first_entry(&space_info->priority_tickets,
+ struct reserve_ticket, list);
+ else if (!list_empty(&space_info->tickets))
+ ticket = list_first_entry(&space_info->tickets,
+ struct reserve_ticket, list);
+ bytes = (ticket) ? ticket->bytes : 0;
+ spin_unlock(&space_info->lock);
+
+ if (!bytes)
+ return 0;
/* See if there is enough pinned space to make this reservation */
if (percpu_counter_compare(&space_info->total_bytes_pinned,
@@ -4956,8 +4974,12 @@ static int may_commit_transaction(struct
return -ENOSPC;
spin_lock(&delayed_rsv->lock);
+ if (delayed_rsv->size > bytes)
+ bytes = 0;
+ else
+ bytes -= delayed_rsv->size;
if (percpu_counter_compare(&space_info->total_bytes_pinned,
- bytes - delayed_rsv->size) < 0) {
+ bytes) < 0) {
spin_unlock(&delayed_rsv->lock);
return -ENOSPC;
}
@@ -4971,13 +4993,6 @@ commit:
return btrfs_commit_transaction(trans);
}
-struct reserve_ticket {
- u64 bytes;
- int error;
- struct list_head list;
- wait_queue_head_t wait;
-};
-
/*
* Try to flush some data based on policy set by @state. This is only advisory
* and may fail for various reasons. The caller is supposed to examine the
@@ -5027,8 +5042,7 @@ static void flush_space(struct btrfs_fs_
ret = 0;
break;
case COMMIT_TRANS:
- ret = may_commit_transaction(fs_info, space_info,
- num_bytes, 0);
+ ret = may_commit_transaction(fs_info, space_info);
break;
default:
ret = -ENOSPC;
Patches currently in stable-queue which might be from jbacik(a)fb.com are
queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch
queue-4.14/btrfs-change-how-we-decide-to-commit-transactions-during-flushing.patch
queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch
This is a note to let you know that I've just added the patch titled
Bluetooth: btqcomsmd: Add support for BD address setup
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bluetooth-btqcomsmd-add-support-for-bd-address-setup.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6e518111060c2290427d79c43d4add9600ad852b Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Tue, 5 Sep 2017 12:26:03 +0200
Subject: Bluetooth: btqcomsmd: Add support for BD address setup
From: Loic Poulain <loic.poulain(a)linaro.org>
commit 6e518111060c2290427d79c43d4add9600ad852b upstream.
This patch implements the hdev setup function since wcnss-bt does not have
persistent memory to store an allocated BD address. The device is therefore
marked as unconfigured if no BD address has been previously retrieved.
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Signed-off-by: Marcel Holtmann <marcel(a)holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bluetooth/btqcomsmd.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -26,6 +26,7 @@
struct btqcomsmd {
struct hci_dev *hdev;
+ bdaddr_t bdaddr;
struct rpmsg_endpoint *acl_channel;
struct rpmsg_endpoint *cmd_channel;
};
@@ -100,6 +101,38 @@ static int btqcomsmd_close(struct hci_de
return 0;
}
+static int btqcomsmd_setup(struct hci_dev *hdev)
+{
+ struct btqcomsmd *btq = hci_get_drvdata(hdev);
+ struct sk_buff *skb;
+ int err;
+
+ skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
+ if (IS_ERR(skb))
+ return PTR_ERR(skb);
+ kfree_skb(skb);
+
+ /* Devices do not have persistent storage for BD address. If no
+ * BD address has been retrieved during probe, mark the device
+ * as having an invalid BD address.
+ */
+ if (!bacmp(&btq->bdaddr, BDADDR_ANY)) {
+ set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
+ return 0;
+ }
+
+ /* When setting a configured BD address fails, mark the device
+ * as having an invalid BD address.
+ */
+ err = qca_set_bdaddr_rome(hdev, &btq->bdaddr);
+ if (err) {
+ set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
+ return 0;
+ }
+
+ return 0;
+}
+
static int btqcomsmd_probe(struct platform_device *pdev)
{
struct btqcomsmd *btq;
@@ -135,6 +168,7 @@ static int btqcomsmd_probe(struct platfo
hdev->open = btqcomsmd_open;
hdev->close = btqcomsmd_close;
hdev->send = btqcomsmd_send;
+ hdev->setup = btqcomsmd_setup;
hdev->set_bdaddr = qca_set_bdaddr_rome;
ret = hci_register_dev(hdev);
Patches currently in stable-queue which might be from loic.poulain(a)linaro.org are
queue-4.14/bluetooth-btqcomsmd-add-support-for-bd-address-setup.patch
This is a note to let you know that I've just added the patch titled
block: Fix a race between blk_cleanup_queue() and timeout handling
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
block-fix-a-race-between-blk_cleanup_queue-and-timeout-handling.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4e9b6f20828ac880dbc1fa2fdbafae779473d1af Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche(a)wdc.com>
Date: Thu, 19 Oct 2017 10:00:48 -0700
Subject: block: Fix a race between blk_cleanup_queue() and timeout handling
From: Bart Van Assche <bart.vanassche(a)wdc.com>
commit 4e9b6f20828ac880dbc1fa2fdbafae779473d1af upstream.
Make sure that if the timeout timer fires after a queue has been
marked "dying" that the affected requests are finished.
Reported-by: chenxiang (M) <chenxiang66(a)hisilicon.com>
Fixes: commit 287922eb0b18 ("block: defer timeouts to a workqueue")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Tested-by: chenxiang (M) <chenxiang66(a)hisilicon.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Keith Busch <keith.busch(a)intel.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
block/blk-core.c | 2 ++
block/blk-timeout.c | 3 ---
2 files changed, 2 insertions(+), 3 deletions(-)
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -333,6 +333,7 @@ EXPORT_SYMBOL(blk_stop_queue);
void blk_sync_queue(struct request_queue *q)
{
del_timer_sync(&q->timeout);
+ cancel_work_sync(&q->timeout_work);
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
@@ -844,6 +845,7 @@ struct request_queue *blk_alloc_queue_no
setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
+ INIT_WORK(&q->timeout_work, NULL);
INIT_LIST_HEAD(&q->queue_head);
INIT_LIST_HEAD(&q->timeout_list);
INIT_LIST_HEAD(&q->icq_list);
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,8 +134,6 @@ void blk_timeout_work(struct work_struct
struct request *rq, *tmp;
int next_set = 0;
- if (blk_queue_enter(q, true))
- return;
spin_lock_irqsave(q->queue_lock, flags);
list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
@@ -145,7 +143,6 @@ void blk_timeout_work(struct work_struct
mod_timer(&q->timeout, round_jiffies_up(next));
spin_unlock_irqrestore(q->queue_lock, flags);
- blk_queue_exit(q);
}
/**
Patches currently in stable-queue which might be from bart.vanassche(a)wdc.com are
queue-4.14/block-fix-a-race-between-blk_cleanup_queue-and-timeout-handling.patch
queue-4.14/scsi-qla2xxx-suppress-a-kernel-complaint-in-qla_init_base_qpair.patch
This is a note to let you know that I've just added the patch titled
bcache: only permit to recovery read error when cache device is clean
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d59b23795933678c9638fd20c942d2b4f3cd6185 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Mon, 30 Oct 2017 14:46:31 -0700
Subject: bcache: only permit to recovery read error when cache device is clean
From: Coly Li <colyli(a)suse.de>
commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream.
When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.
For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.
With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.
For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.
Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc->has_data in writethrough mode still makes sense.
Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
bug to fix, and option to permit it is unnecessary. So this version
the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure to
allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.
[small change to patch comment spelling by mlyle]
Signed-off-by: Coly Li <colyli(a)suse.de>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Reported-by: Arne Wolf <awolf(a)lenovo.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Cc: Nix <nix(a)esperi.org.uk>
Cc: Kai Krakow <hurikhan77(a)gmail.com>
Cc: Eric Wheeler <bcache(a)lists.ewheeler.net>
Cc: Junhui Tang <tang.junhui(a)zte.com.cn>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/request.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -698,8 +698,16 @@ static void cached_dev_read_error(struct
{
struct search *s = container_of(cl, struct search, cl);
struct bio *bio = &s->bio.bio;
+ struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
- if (s->recoverable) {
+ /*
+ * If cache device is dirty (dc->has_dirty is non-zero), then
+ * recovery a failed read request from cached device may get a
+ * stale data back. So read failure recovery is only permitted
+ * when cache device is clean.
+ */
+ if (s->recoverable &&
+ (dc && !atomic_read(&dc->has_dirty))) {
/* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio);
Patches currently in stable-queue which might be from colyli(a)suse.de are
queue-4.14/bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
queue-4.14/raid1-prevent-freeze_array-wait_all_barriers-deadlock.patch
queue-4.14/bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
This is a note to let you know that I've just added the patch titled
bcache: check ca->alloc_thread initialized before wake up it
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 13 Oct 2017 16:35:29 -0700
Subject: bcache: check ca->alloc_thread initialized before wake up it
From: Coly Li <colyli(a)suse.de>
commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.
In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.
There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.
The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca->alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca->alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca->alloc_thread, checking whether it is allocated, and only
wake up ca->alloc_thread when it is not NULL.
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-by: Jorg Bornschein <jb(a)capsec.org>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -407,7 +407,8 @@ long bch_bucket_alloc(struct cache *ca,
finish_wait(&ca->set->bucket_wait, &w);
out:
- wake_up_process(ca->alloc_thread);
+ if (ca->alloc_thread)
+ wake_up_process(ca->alloc_thread);
trace_bcache_alloc(ca, reserve);
Patches currently in stable-queue which might be from colyli(a)suse.de are
queue-4.14/bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
queue-4.14/raid1-prevent-freeze_array-wait_all_barriers-deadlock.patch
queue-4.14/bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
This is a note to let you know that I've just added the patch titled
autofs: don't fail mount for transient error
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
autofs-don-t-fail-mount-for-transient-error.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ecc0c469f27765ed1e2b967be0aa17cee1a60b76 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Fri, 17 Nov 2017 15:29:13 -0800
Subject: autofs: don't fail mount for transient error
From: NeilBrown <neilb(a)suse.com>
commit ecc0c469f27765ed1e2b967be0aa17cee1a60b76 upstream.
Currently if the autofs kernel module gets an error when writing to the
pipe which links to the daemon, then it marks the whole moutpoint as
catatonic, and it will stop working.
It is possible that the error is transient. This can happen if the
daemon is slow and more than 16 requests queue up. If a subsequent
process tries to queue a request, and is then signalled, the write to
the pipe will return -ERESTARTSYS and autofs will take that as total
failure.
So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole mountpoint.
It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.
Ian said:
: And given the problems with a half dozen (or so) user space applications
: consuming large amounts of CPU under heavy mount and umount activity this
: could happen more easily than we expect.
Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb(a)suse.com>
Acked-by: Ian Kent <raven(a)themaw.net>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/autofs4/waitq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_s
spin_unlock_irqrestore(¤t->sighand->siglock, flags);
}
- return (bytes > 0);
+ /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
+ return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
}
static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
@@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct
} pkt;
struct file *pipe = NULL;
size_t pktsz;
+ int ret;
pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
(unsigned long) wq->wait_queue_token,
@@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct
mutex_unlock(&sbi->wq_mutex);
if (autofs4_write(sbi, pipe, &pkt, pktsz))
+ switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
+ case 0:
+ break;
+ case -ENOMEM:
+ case -ERESTARTSYS:
+ /* Just fail this one */
+ autofs4_wait_release(sbi, wq->wait_queue_token, ret);
+ break;
+ default:
autofs4_catatonic_mode(sbi);
+ break;
+ }
fput(pipe);
}
Patches currently in stable-queue which might be from neilb(a)suse.com are
queue-4.14/md-fix-deadlock-error-in-recent-patch.patch
queue-4.14/autofs-don-t-fail-mount-for-transient-error.patch
queue-4.14/md-bitmap-revert-a-patch.patch
queue-4.14/nfs-revalidate-.-etc-correctly-on-open.patch
This is a note to let you know that I've just added the patch titled
ata: fixes kernel crash while tracing ata_eh_link_autopsy event
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ata-fixes-kernel-crash-while-tracing-ata_eh_link_autopsy-event.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f1601113ddc0339a745e702f4fb1ca37d4875e65 Mon Sep 17 00:00:00 2001
From: Rameshwar Prasad Sahu <rsahu(a)apm.com>
Date: Thu, 2 Nov 2017 16:31:07 +0530
Subject: ata: fixes kernel crash while tracing ata_eh_link_autopsy event
From: Rameshwar Prasad Sahu <rsahu(a)apm.com>
commit f1601113ddc0339a745e702f4fb1ca37d4875e65 upstream.
When tracing ata link error event, the kernel crashes when the disk is
removed due to NULL pointer access by trace_ata_eh_link_autopsy API.
This occurs as the dev is NULL when the disk disappeared. This patch
fixes this crash by calling trace_ata_eh_link_autopsy only if "dev"
is not NULL.
v2 changes:
Removed direct passing "link" pointer instead of "dev" in trace API.
Signed-off-by: Rameshwar Prasad Sahu <rsahu(a)apm.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Fixes: 255c03d15a29 ("libata: Add tracepoints")
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/ata/libata-eh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2264,8 +2264,8 @@ static void ata_eh_link_autopsy(struct a
if (dev->flags & ATA_DFLAG_DUBIOUS_XFER)
eflags |= ATA_EFLAG_DUBIOUS_XFER;
ehc->i.action |= ata_eh_speed_down(dev, eflags, all_err_mask);
+ trace_ata_eh_link_autopsy(dev, ehc->i.action, all_err_mask);
}
- trace_ata_eh_link_autopsy(dev, ehc->i.action, all_err_mask);
DPRINTK("EXIT\n");
}
Patches currently in stable-queue which might be from rsahu(a)apm.com are
queue-4.14/ata-fixes-kernel-crash-while-tracing-ata_eh_link_autopsy-event.patch
This is a note to let you know that I've just added the patch titled
ASoC: sun8i-codec: Set the BCLK divider
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-sun8i-codec-set-the-bclk-divider.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 316b7758c998fb13371d14bb6c9e45ab129c19a7 Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Date: Thu, 9 Nov 2017 10:39:24 +0100
Subject: ASoC: sun8i-codec: Set the BCLK divider
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
commit 316b7758c998fb13371d14bb6c9e45ab129c19a7 upstream.
While the current code was reporting to be able to work in master mode, it
failed to do so because the BCLK divider wasn't programmed, meaning that
the BCLK would run at the PLL's frequency no matter the sample rate.
It was obviously a bit too fast.
Add support to retrieve the divider to use, and set it. Since our PLL is
not always able to generate a perfect multiple of the sample rate, we'll
have to choose the closest divider that matches our setup.
Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Reviewed-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/sunxi/sun8i-codec.c | 51 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
--- a/sound/soc/sunxi/sun8i-codec.c
+++ b/sound/soc/sunxi/sun8i-codec.c
@@ -73,6 +73,7 @@
#define SUN8I_SYS_SR_CTRL_AIF2_FS_MASK GENMASK(11, 8)
#define SUN8I_AIF1CLK_CTRL_AIF1_WORD_SIZ_MASK GENMASK(5, 4)
#define SUN8I_AIF1CLK_CTRL_AIF1_LRCK_DIV_MASK GENMASK(8, 6)
+#define SUN8I_AIF1CLK_CTRL_AIF1_BCLK_DIV_MASK GENMASK(12, 9)
struct sun8i_codec {
struct device *dev;
@@ -226,12 +227,57 @@ static int sun8i_set_fmt(struct snd_soc_
return 0;
}
+struct sun8i_codec_clk_div {
+ u8 div;
+ u8 val;
+};
+
+static const struct sun8i_codec_clk_div sun8i_codec_bclk_div[] = {
+ { .div = 1, .val = 0 },
+ { .div = 2, .val = 1 },
+ { .div = 4, .val = 2 },
+ { .div = 6, .val = 3 },
+ { .div = 8, .val = 4 },
+ { .div = 12, .val = 5 },
+ { .div = 16, .val = 6 },
+ { .div = 24, .val = 7 },
+ { .div = 32, .val = 8 },
+ { .div = 48, .val = 9 },
+ { .div = 64, .val = 10 },
+ { .div = 96, .val = 11 },
+ { .div = 128, .val = 12 },
+ { .div = 192, .val = 13 },
+};
+
+static u8 sun8i_codec_get_bclk_div(struct sun8i_codec *scodec,
+ unsigned int rate,
+ unsigned int word_size)
+{
+ unsigned long clk_rate = clk_get_rate(scodec->clk_module);
+ unsigned int div = clk_rate / rate / word_size / 2;
+ unsigned int best_val = 0, best_diff = ~0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(sun8i_codec_bclk_div); i++) {
+ const struct sun8i_codec_clk_div *bdiv = &sun8i_codec_bclk_div[i];
+ unsigned int diff = abs(bdiv->div - div);
+
+ if (diff < best_diff) {
+ best_diff = diff;
+ best_val = bdiv->val;
+ }
+ }
+
+ return best_val;
+}
+
static int sun8i_codec_hw_params(struct snd_pcm_substream *substream,
struct snd_pcm_hw_params *params,
struct snd_soc_dai *dai)
{
struct sun8i_codec *scodec = snd_soc_codec_get_drvdata(dai->codec);
int sample_rate;
+ u8 bclk_div;
/*
* The CPU DAI handles only a sample of 16 bits. Configure the
@@ -241,6 +287,11 @@ static int sun8i_codec_hw_params(struct
SUN8I_AIF1CLK_CTRL_AIF1_WORD_SIZ_MASK,
SUN8I_AIF1CLK_CTRL_AIF1_WORD_SIZ_16);
+ bclk_div = sun8i_codec_get_bclk_div(scodec, params_rate(params), 16);
+ regmap_update_bits(scodec->regmap, SUN8I_AIF1CLK_CTRL,
+ SUN8I_AIF1CLK_CTRL_AIF1_BCLK_DIV_MASK,
+ bclk_div << SUN8I_AIF1CLK_CTRL_AIF1_BCLK_DIV);
+
regmap_update_bits(scodec->regmap, SUN8I_AIF1CLK_CTRL,
SUN8I_AIF1CLK_CTRL_AIF1_LRCK_DIV_MASK,
SUN8I_AIF1CLK_CTRL_AIF1_LRCK_DIV_16);
Patches currently in stable-queue which might be from maxime.ripard(a)free-electrons.com are
queue-4.14/asoc-sun8i-codec-set-the-bclk-divider.patch
queue-4.14/asoc-sun8i-codec-fix-left-and-right-channels-inversion.patch
queue-4.14/asoc-sun8i-codec-invert-master-slave-condition.patch
This is a note to let you know that I've just added the patch titled
ASoC: sun8i-codec: Invert Master / Slave condition
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-sun8i-codec-invert-master-slave-condition.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 560bfe774f058e97596f30ff71cffdac52b72914 Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Date: Wed, 8 Nov 2017 16:47:08 +0100
Subject: ASoC: sun8i-codec: Invert Master / Slave condition
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
commit 560bfe774f058e97596f30ff71cffdac52b72914 upstream.
The current code had the condition backward when checking if the codec
should be running in slave or master mode.
Fix it, and make the comment a bit more readable.
Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Reviewed-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/sunxi/sun8i-codec.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/sound/soc/sunxi/sun8i-codec.c
+++ b/sound/soc/sunxi/sun8i-codec.c
@@ -170,11 +170,11 @@ static int sun8i_set_fmt(struct snd_soc_
/* clock masters */
switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) {
- case SND_SOC_DAIFMT_CBS_CFS: /* DAI Slave */
- value = 0x0; /* Codec Master */
+ case SND_SOC_DAIFMT_CBS_CFS: /* Codec slave, DAI master */
+ value = 0x1;
break;
- case SND_SOC_DAIFMT_CBM_CFM: /* DAI Master */
- value = 0x1; /* Codec Slave */
+ case SND_SOC_DAIFMT_CBM_CFM: /* Codec Master, DAI slave */
+ value = 0x0;
break;
default:
return -EINVAL;
Patches currently in stable-queue which might be from maxime.ripard(a)free-electrons.com are
queue-4.14/asoc-sun8i-codec-set-the-bclk-divider.patch
queue-4.14/asoc-sun8i-codec-fix-left-and-right-channels-inversion.patch
queue-4.14/asoc-sun8i-codec-invert-master-slave-condition.patch
This is a note to let you know that I've just added the patch titled
ASoC: sun8i-codec: Fix left and right channels inversion
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-sun8i-codec-fix-left-and-right-channels-inversion.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 18c1bf35c1c09bca05cf70bc984a4764e0b0372b Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Date: Wed, 8 Nov 2017 16:47:10 +0100
Subject: ASoC: sun8i-codec: Fix left and right channels inversion
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
commit 18c1bf35c1c09bca05cf70bc984a4764e0b0372b upstream.
Since its introduction, the codec had an inversion of the left and right
channels. It turned out to be pretty simple as it appears that the codec
doesn't have the same polarity on the LRCK signal than the I2S block.
Fix this by inverting our bit value for the LRCK inversion.
Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Reviewed-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/sunxi/sun8i-codec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/sunxi/sun8i-codec.c
+++ b/sound/soc/sunxi/sun8i-codec.c
@@ -199,7 +199,7 @@ static int sun8i_set_fmt(struct snd_soc_
value << SUN8I_AIF1CLK_CTRL_AIF1_BCLK_INV);
regmap_update_bits(scodec->regmap, SUN8I_AIF1CLK_CTRL,
BIT(SUN8I_AIF1CLK_CTRL_AIF1_LRCK_INV),
- value << SUN8I_AIF1CLK_CTRL_AIF1_LRCK_INV);
+ !value << SUN8I_AIF1CLK_CTRL_AIF1_LRCK_INV);
/* DAI format */
switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
Patches currently in stable-queue which might be from maxime.ripard(a)free-electrons.com are
queue-4.14/asoc-sun8i-codec-set-the-bclk-divider.patch
queue-4.14/asoc-sun8i-codec-fix-left-and-right-channels-inversion.patch
queue-4.14/asoc-sun8i-codec-invert-master-slave-condition.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Fix potential zero-division at parsing FU
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8428a8ebde2db1e988e41a58497a28beb7ce1705 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 17:07:43 +0100
Subject: ALSA: usb-audio: Fix potential zero-division at parsing FU
From: Takashi Iwai <tiwai(a)suse.de>
commit 8428a8ebde2db1e988e41a58497a28beb7ce1705 upstream.
parse_audio_feature_unit() contains a code dividing potentially with
zero when a malformed FU descriptor is passed. Although there is
already a sanity check, it checks only the value zero, hence it can
still lead to a zero-division when a value 1 is passed there.
Fix it by correcting the sanity check (and the error message
thereof).
Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -1476,9 +1476,9 @@ static int parse_audio_feature_unit(stru
return -EINVAL;
}
csize = hdr->bControlSize;
- if (!csize) {
+ if (csize <= 1) {
usb_audio_dbg(state->chip,
- "unit %u: invalid bControlSize == 0\n",
+ "unit %u: invalid bControlSize <= 1\n",
unitid);
return -EINVAL;
}
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f658f17b5e0e339935dca23e77e0f3cad591926b Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 17:00:32 +0100
Subject: ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
From: Takashi Iwai <tiwai(a)suse.de>
commit f658f17b5e0e339935dca23e77e0f3cad591926b upstream.
The usb-audio driver may trigger an out-of-bound access at parsing a
malformed selector unit, as it checks the header length only after
evaluating bNrInPins field, which can be already above the given
length. Fix it by adding the length check beforehand.
Fixes: 99fc86450c43 ("ALSA: usb-mixer: parse descriptors with structs")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -2098,7 +2098,8 @@ static int parse_audio_selector_unit(str
const struct usbmix_name_map *map;
char **namelist;
- if (!desc->bNrInPins || desc->bLength < 5 + desc->bNrInPins) {
+ if (desc->bLength < 5 || !desc->bNrInPins ||
+ desc->bLength < 5 + desc->bNrInPins) {
usb_audio_err(state->chip,
"invalid SELECTOR UNIT descriptor %d\n", unitid);
return -EINVAL;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Add sanity checks to FE parser
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d937cd6790a2bef2d07b500487646bd794c039bb Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 16:55:51 +0100
Subject: ALSA: usb-audio: Add sanity checks to FE parser
From: Takashi Iwai <tiwai(a)suse.de>
commit d937cd6790a2bef2d07b500487646bd794c039bb upstream.
When the usb-audio descriptor contains the malformed feature unit
description with a too short length, the driver may access
out-of-bounds. Add a sanity check of the header size at the beginning
of parse_audio_feature_unit().
Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Reported-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -1469,6 +1469,12 @@ static int parse_audio_feature_unit(stru
__u8 *bmaControls;
if (state->mixer->protocol == UAC_VERSION_1) {
+ if (hdr->bLength < 7) {
+ usb_audio_err(state->chip,
+ "unit %u: invalid UAC_FEATURE_UNIT descriptor\n",
+ unitid);
+ return -EINVAL;
+ }
csize = hdr->bControlSize;
if (!csize) {
usb_audio_dbg(state->chip,
@@ -1486,6 +1492,12 @@ static int parse_audio_feature_unit(stru
}
} else {
struct uac2_feature_unit_descriptor *ftr = _ftr;
+ if (hdr->bLength < 6) {
+ usb_audio_err(state->chip,
+ "unit %u: invalid UAC_FEATURE_UNIT descriptor\n",
+ unitid);
+ return -EINVAL;
+ }
csize = 4;
channels = (hdr->bLength - 6) / 4 - 1;
bmaControls = ftr->bmaControls;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Add sanity checks in v2 clock parsers
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0a62d6c966956d77397c32836a5bbfe3af786fc1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 17:28:06 +0100
Subject: ALSA: usb-audio: Add sanity checks in v2 clock parsers
From: Takashi Iwai <tiwai(a)suse.de>
commit 0a62d6c966956d77397c32836a5bbfe3af786fc1 upstream.
The helper functions to parse and look for the clock source, selector
and multiplier unit may return the descriptor with a too short length
than required, while there is no sanity check in the caller side.
Add some sanity checks in the parsers, at least, to guarantee the
given descriptor size, for avoiding the potential crashes.
Fixes: 79f920fbff56 ("ALSA: usb-audio: parse clock topology of UAC2 devices")
Reported-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/clock.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/sound/usb/clock.c
+++ b/sound/usb/clock.c
@@ -43,7 +43,7 @@ static struct uac_clock_source_descripto
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_SOURCE))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id)
return cs;
}
@@ -59,8 +59,11 @@ static struct uac_clock_selector_descrip
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_SELECTOR))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id) {
+ if (cs->bLength < 5 + cs->bNrInPins)
+ return NULL;
return cs;
+ }
}
return NULL;
@@ -75,7 +78,7 @@ static struct uac_clock_multiplier_descr
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_MULTIPLIER))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id)
return cs;
}
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: timer: Remove kernel warning at compat ioctl error paths
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3d4e8303f2c747c8540a0a0126d0151514f6468b Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 16:36:11 +0100
Subject: ALSA: timer: Remove kernel warning at compat ioctl error paths
From: Takashi Iwai <tiwai(a)suse.de>
commit 3d4e8303f2c747c8540a0a0126d0151514f6468b upstream.
Some timer compat ioctls have NULL checks of timer instance with
snd_BUG_ON() that bring up WARN_ON() when the debug option is set.
Actually the condition can be met in the normal situation and it's
confusing and bad to spew kernel warnings with stack trace there.
Let's remove snd_BUG_ON() invocation and replace with the simple
checks. Also, correct the error code to EBADFD to follow the native
ioctl error handling.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/core/timer_compat.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/core/timer_compat.c
+++ b/sound/core/timer_compat.c
@@ -66,11 +66,11 @@ static int snd_timer_user_info_compat(st
struct snd_timer *t;
tu = file->private_data;
- if (snd_BUG_ON(!tu->timeri))
- return -ENXIO;
+ if (!tu->timeri)
+ return -EBADFD;
t = tu->timeri->timer;
- if (snd_BUG_ON(!t))
- return -ENXIO;
+ if (!t)
+ return -EBADFD;
memset(&info, 0, sizeof(info));
info.card = t->card ? t->card->number : -1;
if (t->hw.flags & SNDRV_TIMER_HW_SLAVE)
@@ -99,8 +99,8 @@ static int snd_timer_user_status_compat(
struct snd_timer_status32 status;
tu = file->private_data;
- if (snd_BUG_ON(!tu->timeri))
- return -ENXIO;
+ if (!tu->timeri)
+ return -EBADFD;
memset(&status, 0, sizeof(status));
status.tstamp.tv_sec = tu->tstamp.tv_sec;
status.tstamp.tv_nsec = tu->tstamp.tv_nsec;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: pcm: update tstamp only if audio_tstamp changed
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 Mon Sep 17 00:00:00 2001
From: Henrik Eriksson <henrik.eriksson(a)axis.com>
Date: Tue, 21 Nov 2017 09:29:28 +0100
Subject: ALSA: pcm: update tstamp only if audio_tstamp changed
From: Henrik Eriksson <henrik.eriksson(a)axis.com>
commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 upstream.
commit 3179f6200188 ("ALSA: core: add .get_time_info") had a side effect
of changing the behaviour of the PCM runtime tstamp. Prior to this
change tstamp was not updated by snd_pcm_update_hw_ptr0() unless the
hw_ptr had moved, after this change tstamp was always updated.
For an application using alsa-lib, doing snd_pcm_readi() followed by
snd_pcm_status() to estimate the age of the read samples by subtracting
status->avail * [sample rate] from status->tstamp this change degraded
the accuracy of the estimate on devices where the pcm hw does not
provide a granular hw_ptr, e.g., devices using
soc-generic-dmaengine-pcm.c and a dma-engine with residue_granularity
DMA_RESIDUE_GRANULARITY_DESCRIPTOR. The accuracy of the estimate
depended on the latency between the PCM hw completing a period and the
driver called snd_pcm_period_elapsed() to notify ALSA core, typically
determined by interrupt handling latency. After the change the accuracy
of the estimate depended on the latency between the PCM hw completing a
period and the application calling snd_pcm_status(), determined by the
scheduling of the application process. The maximum error of the
estimate is one period length in both cases, but the error average and
variance is smaller when it depends on interrupt latency.
Instead of always updating tstamp, update it only if audio_tstamp
changed.
Fixes: 3179f6200188 ("ALSA: core: add .get_time_info")
Suggested-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Signed-off-by: Henrik Eriksson <henrik.eriksson(a)axis.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/core/pcm_lib.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -248,8 +248,10 @@ static void update_audio_tstamp(struct s
runtime->rate);
*audio_tstamp = ns_to_timespec(audio_nsecs);
}
- runtime->status->audio_tstamp = *audio_tstamp;
- runtime->status->tstamp = *curr_tstamp;
+ if (!timespec_equal(&runtime->status->audio_tstamp, audio_tstamp)) {
+ runtime->status->audio_tstamp = *audio_tstamp;
+ runtime->status->tstamp = *curr_tstamp;
+ }
/*
* re-take a driver timestamp to let apps detect if the reference tstamp
Patches currently in stable-queue which might be from henrik.eriksson(a)axis.com are
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda/realtek - Fix ALC700 family no sound issue
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2d7fe6185722b0817bb345f62ab06b76a7b26542 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Wed, 22 Nov 2017 15:21:32 +0800
Subject: ALSA: hda/realtek - Fix ALC700 family no sound issue
From: Kailang Yang <kailang(a)realtek.com>
commit 2d7fe6185722b0817bb345f62ab06b76a7b26542 upstream.
It maybe the typo for ALC700 support patch.
To fix the bit value on this patch.
Fixes: 6fbae35a3170 ("ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6866,7 +6866,7 @@ static int patch_alc269(struct hda_codec
case 0x10ec0703:
spec->codec_variant = ALC269_TYPE_ALC700;
spec->gen.mixer_nid = 0; /* ALC700 does not have any loopback mixer path */
- alc_update_coef_idx(codec, 0x4a, 0, 1 << 15); /* Combo jack auto trigger control */
+ alc_update_coef_idx(codec, 0x4a, 1 << 15, 0); /* Combo jack auto trigger control */
break;
}
Patches currently in stable-queue which might be from kailang(a)realtek.com are
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda/realtek - Fix ALC275 no sound issue
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-realtek-fix-alc275-no-sound-issue.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3aabf94c2d95fe465d5fa8590113d1c1f7d8333d Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Wed, 8 Nov 2017 15:28:33 +0800
Subject: ALSA: hda/realtek - Fix ALC275 no sound issue
From: Kailang Yang <kailang(a)realtek.com>
commit 3aabf94c2d95fe465d5fa8590113d1c1f7d8333d upstream.
Sound works after a cold boot but not after a reboot from windows.
This patch will solve this issue. This is relation with Class-D power control.
[ The bug was reported in Bugzilla below for Sony VAIO SVS13A1C5E
-- tiwai]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197737
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -341,6 +341,9 @@ static void alc_fill_eapd_coef(struct hd
case 0x10ec0299:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
+ case 0x10ec0275:
+ alc_update_coef_idx(codec, 0xe, 0, 1<<0);
+ break;
case 0x10ec0293:
alc_update_coef_idx(codec, 0xa, 1<<13, 0);
break;
Patches currently in stable-queue which might be from kailang(a)realtek.com are
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Wed, 22 Nov 2017 12:34:56 +0100
Subject: ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
From: Takashi Iwai <tiwai(a)suse.de>
commit d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb upstream.
The previous fix for addressing the breakage in vmaster slave
initialization, commit a91d66129fb9 ("ALSA: hda - Fix incorrect TLV
callback check introduced during set_fs() removal"), introduced a new
helper to process over each slave kctl. However, this helper passes
only the original kctl, not the virtual slave kctl. As a result,
HD-audio driver (which is the only user so far) couldn't initialize
the slave correctly because it's trying to update the value directly
with the original kctl, not with the mapped kctl.
This patch fixes the situation again by passing both the mapped slaved
and original slave kctls to the function. Luckily there is a single
caller as of now, so changing the call signature is no big matter.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197959
Fixes: a91d66129fb9 ("ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removal")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/sound/control.h | 4 +++-
sound/core/vmaster.c | 6 ++++--
sound/pci/hda/hda_codec.c | 10 +++++++---
3 files changed, 14 insertions(+), 6 deletions(-)
--- a/include/sound/control.h
+++ b/include/sound/control.h
@@ -249,7 +249,9 @@ int snd_ctl_add_vmaster_hook(struct snd_
void snd_ctl_sync_vmaster(struct snd_kcontrol *kctl, bool hook_only);
#define snd_ctl_sync_vmaster_hook(kctl) snd_ctl_sync_vmaster(kctl, true)
int snd_ctl_apply_vmaster_slaves(struct snd_kcontrol *kctl,
- int (*func)(struct snd_kcontrol *, void *),
+ int (*func)(struct snd_kcontrol *vslave,
+ struct snd_kcontrol *slave,
+ void *arg),
void *arg);
/*
--- a/sound/core/vmaster.c
+++ b/sound/core/vmaster.c
@@ -495,7 +495,9 @@ EXPORT_SYMBOL_GPL(snd_ctl_sync_vmaster);
* Returns 0 if successful, or a negative error code.
*/
int snd_ctl_apply_vmaster_slaves(struct snd_kcontrol *kctl,
- int (*func)(struct snd_kcontrol *, void *),
+ int (*func)(struct snd_kcontrol *vslave,
+ struct snd_kcontrol *slave,
+ void *arg),
void *arg)
{
struct link_master *master;
@@ -507,7 +509,7 @@ int snd_ctl_apply_vmaster_slaves(struct
if (err < 0)
return err;
list_for_each_entry(slave, &master->slaves, list) {
- err = func(&slave->slave, arg);
+ err = func(slave->kctl, &slave->slave, arg);
if (err < 0)
return err;
}
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -1823,7 +1823,9 @@ struct slave_init_arg {
};
/* initialize the slave volume with 0dB via snd_ctl_apply_vmaster_slaves() */
-static int init_slave_0dB(struct snd_kcontrol *kctl, void *_arg)
+static int init_slave_0dB(struct snd_kcontrol *slave,
+ struct snd_kcontrol *kctl,
+ void *_arg)
{
struct slave_init_arg *arg = _arg;
int _tlv[4];
@@ -1860,7 +1862,7 @@ static int init_slave_0dB(struct snd_kco
arg->step = step;
val = -tlv[2] / step;
if (val > 0) {
- put_kctl_with_value(kctl, val);
+ put_kctl_with_value(slave, val);
return val;
}
@@ -1868,7 +1870,9 @@ static int init_slave_0dB(struct snd_kco
}
/* unmute the slave via snd_ctl_apply_vmaster_slaves() */
-static int init_slave_unmute(struct snd_kcontrol *slave, void *_arg)
+static int init_slave_unmute(struct snd_kcontrol *slave,
+ struct snd_kcontrol *kctl,
+ void *_arg)
{
return put_kctl_with_value(slave, 1);
}
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Fix too short HDMI/DP chmap reporting
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c2432466f583cb719b35a41e757da587d9ab1d00 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Fri, 17 Nov 2017 12:08:40 +0100
Subject: ALSA: hda: Fix too short HDMI/DP chmap reporting
From: Takashi Iwai <tiwai(a)suse.de>
commit c2432466f583cb719b35a41e757da587d9ab1d00 upstream.
We got a regression report about the HD-audio HDMI chmap, where some
surround channels are reported as UNKNOWN. The git bisection pointed
the culprit at the commit 9b3dc8aa3fb1 ("ALSA: hda - Register chmap
obj as priv data instead of codec"). The story behind scene is like
this:
- While moving the code out of the legacy HDA to the HDA common place,
the patch modifies the code to obtain the chmap array indirectly in
a byte array, and it expands it to kctl value array.
- At the latter operation, the size of the array is wrongly passed by
sizeof() to the pointer.
- It can be 4 on 32bit arch, thus too short for 6+ channels.
(And that's the reason why it didn't hit other persons; it's 8 on
64bit arch, thus it's usually enough.)
The code was further changed meanwhile, but the problem persisted.
Let's fix it by correctly evaluating the array size.
Fixes: 9b3dc8aa3fb1 ("ALSA: hda - Register chmap obj as priv data instead of codec")
Reported-by: VDR User <user.vdr(a)gmail.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/hda/hdmi_chmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/hda/hdmi_chmap.c
+++ b/sound/hda/hdmi_chmap.c
@@ -746,7 +746,7 @@ static int hdmi_chmap_ctl_get(struct snd
memset(pcm_chmap, 0, sizeof(pcm_chmap));
chmap->ops.get_chmap(chmap->hdac, pcm_idx, pcm_chmap);
- for (i = 0; i < sizeof(chmap); i++)
+ for (i = 0; i < ARRAY_SIZE(pcm_chmap); i++)
ucontrol->value.integer.value[i] = pcm_chmap[i];
return 0;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-4.14/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-4.14/alsa-hda-add-raven-pci-id.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-4.14/alsa-hda-fix-too-short-hdmi-dp-chmap-reporting.patch
queue-4.14/alsa-hda-realtek-fix-alc700-family-no-sound-issue.patch
queue-4.14/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-4.14/alsa-pcm-update-tstamp-only-if-audio_tstamp-changed.patch
queue-4.14/alsa-hda-fix-yet-remaining-issue-with-vmaster-0db-initialization.patch
queue-4.14/alsa-hda-realtek-fix-alc275-no-sound-issue.patch
queue-4.14/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
9p: Fix missing commas in mount options
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
9p-fix-missing-commas-in-mount-options.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 61b272c3aa170b3e461b8df636407b29f35f98eb Mon Sep 17 00:00:00 2001
From: Tuomas Tynkkynen <tuomas(a)tuxera.com>
Date: Sun, 19 Nov 2017 11:28:43 +0200
Subject: 9p: Fix missing commas in mount options
From: Tuomas Tynkkynen <tuomas(a)tuxera.com>
commit 61b272c3aa170b3e461b8df636407b29f35f98eb upstream.
Since commit c4fac9100456 ("9p: Implement show_options"), the mount
options of 9p filesystems are printed out with some missing commas
between the individual options:
p9-scratch on /mnt/scratch type 9p (rw,dirsync,loose,access=clienttrans=virtio)
Add them back.
Fixes: c4fac9100456 ("9p: Implement show_options")
Signed-off-by: Tuomas Tynkkynen <tuomas(a)tuxera.com>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/9p/client.c | 2 +-
net/9p/trans_fd.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -82,7 +82,7 @@ int p9_show_client_options(struct seq_fi
{
if (clnt->msize != 8192)
seq_printf(m, ",msize=%u", clnt->msize);
- seq_printf(m, "trans=%s", clnt->trans_mod->name);
+ seq_printf(m, ",trans=%s", clnt->trans_mod->name);
switch (clnt->proto_version) {
case p9_proto_legacy:
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -724,12 +724,12 @@ static int p9_fd_show_options(struct seq
{
if (clnt->trans_mod == &p9_tcp_trans) {
if (clnt->trans_opts.tcp.port != P9_PORT)
- seq_printf(m, "port=%u", clnt->trans_opts.tcp.port);
+ seq_printf(m, ",port=%u", clnt->trans_opts.tcp.port);
} else if (clnt->trans_mod == &p9_fd_trans) {
if (clnt->trans_opts.fd.rfd != ~0)
- seq_printf(m, "rfd=%u", clnt->trans_opts.fd.rfd);
+ seq_printf(m, ",rfd=%u", clnt->trans_opts.fd.rfd);
if (clnt->trans_opts.fd.wfd != ~0)
- seq_printf(m, "wfd=%u", clnt->trans_opts.fd.wfd);
+ seq_printf(m, ",wfd=%u", clnt->trans_opts.fd.wfd);
}
return 0;
}
Patches currently in stable-queue which might be from tuomas(a)tuxera.com are
queue-4.14/net-9p-switch-to-wait_event_killable.patch
queue-4.14/9p-fix-missing-commas-in-mount-options.patch
queue-4.14/fs-9p-compare-qid.path-in-v9fs_test_inode.patch
This is a note to let you know that I've just added the patch titled
nilfs2: fix race condition that causes file system corruption
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nilfs2-fix-race-condition-that-causes-file-system-corruption.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 Mon Sep 17 00:00:00 2001
From: Andreas Rohner <andreas.rohner(a)gmx.net>
Date: Fri, 17 Nov 2017 15:29:35 -0800
Subject: nilfs2: fix race condition that causes file system corruption
From: Andreas Rohner <andreas.rohner(a)gmx.net>
commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream.
There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().
When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a
separate transaction. __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.
After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.
Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.
Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two. If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.
In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.
As a result the bmap update can be lost, which leads to file system
corruption. Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.
The error can remain undetected for a long time. A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.
This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.
Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@l…
Signed-off-by: Andreas Rohner <andreas.rohner(a)gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner(a)gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/nilfs2/segment.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1884,8 +1884,6 @@ static int nilfs_segctor_collect_dirty_f
"failed to get inode block.\n");
return err;
}
- mark_buffer_dirty(ibh);
- nilfs_mdt_mark_dirty(ifile);
spin_lock(&nilfs->ns_inode_lock);
if (likely(!ii->i_bh))
ii->i_bh = ibh;
@@ -1894,6 +1892,10 @@ static int nilfs_segctor_collect_dirty_f
goto retry;
}
+ // Always redirty the buffer to avoid race condition
+ mark_buffer_dirty(ii->i_bh);
+ nilfs_mdt_mark_dirty(ifile);
+
clear_bit(NILFS_I_QUEUED, &ii->i_state);
set_bit(NILFS_I_BUSY, &ii->i_state);
list_move_tail(&ii->i_dirty, &sci->sc_dirty_files);
Patches currently in stable-queue which might be from andreas.rohner(a)gmx.net are
queue-3.18/nilfs2-fix-race-condition-that-causes-file-system-corruption.patch
This is a note to let you know that I've just added the patch titled
nfsd: deal with revoked delegations appropriately
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nfsd-deal-with-revoked-delegations-appropriately.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 95da1b3a5aded124dd1bda1e3cdb876184813140 Mon Sep 17 00:00:00 2001
From: Andrew Elble <aweits(a)rit.edu>
Date: Fri, 3 Nov 2017 14:06:31 -0400
Subject: nfsd: deal with revoked delegations appropriately
From: Andrew Elble <aweits(a)rit.edu>
commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream.
If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
case, and NFS4ERR_BAD_STATEID otherwise.
The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply. That can cause the Linux 4.1 client to loop in its
stage manager.
Signed-off-by: Andrew Elble <aweits(a)rit.edu>
Reviewed-by: Trond Myklebust <trond.myklebust(a)primarydata.com>
Signed-off-by: J. Bruce Fields <bfields(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3602,7 +3602,8 @@ static struct nfs4_delegation *find_dele
{
struct nfs4_stid *ret;
- ret = find_stateid_by_type(cl, s, NFS4_DELEG_STID);
+ ret = find_stateid_by_type(cl, s,
+ NFS4_DELEG_STID|NFS4_REVOKED_DELEG_STID);
if (!ret)
return NULL;
return delegstateid(ret);
@@ -3625,6 +3626,12 @@ nfs4_check_deleg(struct nfs4_client *cl,
deleg = find_deleg_stateid(cl, &open->op_delegate_stateid);
if (deleg == NULL)
goto out;
+ if (deleg->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) {
+ nfs4_put_stid(&deleg->dl_stid);
+ if (cl->cl_minorversion)
+ status = nfserr_deleg_revoked;
+ goto out;
+ }
flags = share_access_to_flags(open->op_share_access);
status = nfs4_check_delegmode(deleg, flags);
if (status) {
@@ -4451,6 +4458,16 @@ nfsd4_lookup_stateid(struct nfsd4_compou
struct nfs4_stid **s, struct nfsd_net *nn)
{
__be32 status;
+ bool return_revoked = false;
+
+ /*
+ * only return revoked delegations if explicitly asked.
+ * otherwise we report revoked or bad_stateid status.
+ */
+ if (typemask & NFS4_REVOKED_DELEG_STID)
+ return_revoked = true;
+ else if (typemask & NFS4_DELEG_STID)
+ typemask |= NFS4_REVOKED_DELEG_STID;
if (ZERO_STATEID(stateid) || ONE_STATEID(stateid))
return nfserr_bad_stateid;
@@ -4465,6 +4482,12 @@ nfsd4_lookup_stateid(struct nfsd4_compou
*s = find_stateid_by_type(cstate->clp, stateid, typemask);
if (!*s)
return nfserr_bad_stateid;
+ if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) {
+ nfs4_put_stid(*s);
+ if (cstate->minorversion)
+ return nfserr_deleg_revoked;
+ return nfserr_bad_stateid;
+ }
return nfs_ok;
}
Patches currently in stable-queue which might be from aweits(a)rit.edu are
queue-3.18/nfsd-deal-with-revoked-delegations-appropriately.patch
This is a note to let you know that I've just added the patch titled
nfs: Fix ugly referral attributes
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nfs-fix-ugly-referral-attributes.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c05cefcc72416a37eba5a2b35f0704ed758a9145 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Sun, 5 Nov 2017 15:45:22 -0500
Subject: nfs: Fix ugly referral attributes
From: Chuck Lever <chuck.lever(a)oracle.com>
commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream.
Before traversing a referral and performing a mount, the mounted-on
directory looks strange:
dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31 1969 dir.0
nfs4_get_referral is wiping out any cached attributes with what was
returned via GETATTR(fs_locations), but the bit mask for that
operation does not request any file attributes.
Retrieve owner and timestamp information so that the memcpy in
nfs4_get_referral fills in more attributes.
Changes since v1:
- Don't request attributes that the client unconditionally replaces
- Request only MOUNTED_ON_FILEID or FILEID attribute, not both
- encode_fs_locations() doesn't use the third bitmask word
Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral")
Suggested-by: Pradeep Thomas <pradeepthomas(a)gmail.com>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/nfs/nfs4proc.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -243,15 +243,12 @@ const u32 nfs4_fsinfo_bitmap[3] = { FATT
};
const u32 nfs4_fs_locations_bitmap[3] = {
- FATTR4_WORD0_TYPE
- | FATTR4_WORD0_CHANGE
+ FATTR4_WORD0_CHANGE
| FATTR4_WORD0_SIZE
| FATTR4_WORD0_FSID
| FATTR4_WORD0_FILEID
| FATTR4_WORD0_FS_LOCATIONS,
- FATTR4_WORD1_MODE
- | FATTR4_WORD1_NUMLINKS
- | FATTR4_WORD1_OWNER
+ FATTR4_WORD1_OWNER
| FATTR4_WORD1_OWNER_GROUP
| FATTR4_WORD1_RAWDEV
| FATTR4_WORD1_SPACE_USED
@@ -6143,9 +6140,7 @@ static int _nfs4_proc_fs_locations(struc
struct page *page)
{
struct nfs_server *server = NFS_SERVER(dir);
- u32 bitmask[3] = {
- [0] = FATTR4_WORD0_FSID | FATTR4_WORD0_FS_LOCATIONS,
- };
+ u32 bitmask[3];
struct nfs4_fs_locations_arg args = {
.dir_fh = NFS_FH(dir),
.name = name,
@@ -6164,12 +6159,15 @@ static int _nfs4_proc_fs_locations(struc
dprintk("%s: start\n", __func__);
+ bitmask[0] = nfs4_fattr_bitmap[0] | FATTR4_WORD0_FS_LOCATIONS;
+ bitmask[1] = nfs4_fattr_bitmap[1];
+
/* Ask for the fileid of the absent filesystem if mounted_on_fileid
* is not supported */
if (NFS_SERVER(dir)->attr_bitmask[1] & FATTR4_WORD1_MOUNTED_ON_FILEID)
- bitmask[1] |= FATTR4_WORD1_MOUNTED_ON_FILEID;
+ bitmask[0] &= ~FATTR4_WORD0_FILEID;
else
- bitmask[0] |= FATTR4_WORD0_FILEID;
+ bitmask[1] &= ~FATTR4_WORD1_MOUNTED_ON_FILEID;
nfs_fattr_init(&fs_locations->fattr);
fs_locations->server = server;
Patches currently in stable-queue which might be from chuck.lever(a)oracle.com are
queue-3.18/nfs-fix-ugly-referral-attributes.patch
This is a note to let you know that I've just added the patch titled
NFS: Fix typo in nomigration mount option
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nfs-fix-typo-in-nomigration-mount-option.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f02fee227e5f21981152850744a6084ff3fa94ee Mon Sep 17 00:00:00 2001
From: Joshua Watt <jpewhacker(a)gmail.com>
Date: Tue, 7 Nov 2017 16:25:47 -0600
Subject: NFS: Fix typo in nomigration mount option
From: Joshua Watt <jpewhacker(a)gmail.com>
commit f02fee227e5f21981152850744a6084ff3fa94ee upstream.
The option was incorrectly masking off all other options.
Signed-off-by: Joshua Watt <JPEWhacker(a)gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/nfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1321,7 +1321,7 @@ static int nfs_parse_mount_options(char
mnt->options |= NFS_OPTION_MIGRATION;
break;
case Opt_nomigration:
- mnt->options &= NFS_OPTION_MIGRATION;
+ mnt->options &= ~NFS_OPTION_MIGRATION;
break;
/*
Patches currently in stable-queue which might be from jpewhacker(a)gmail.com are
queue-3.18/nfs-fix-typo-in-nomigration-mount-option.patch
This is a note to let you know that I've just added the patch titled
MIPS: Fix an n32 core file generation regset support regression
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-fix-an-n32-core-file-generation-regset-support-regression.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 547da673173de51f73887377eb275304775064ad Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)mips.com>
Date: Tue, 7 Nov 2017 19:09:20 +0000
Subject: MIPS: Fix an n32 core file generation regset support regression
From: Maciej W. Rozycki <macro(a)mips.com>
commit 547da673173de51f73887377eb275304775064ad upstream.
Fix a commit 7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
regression, then activated by commit 6a9c001b7ec3 ("MIPS: Switch ELF
core dumper to use regsets.)", that caused n32 processes to dump o32
core files by failing to set the EF_MIPS_ABI2 flag in the ELF core file
header's `e_flags' member:
$ file tls-core
tls-core: ELF 32-bit MSB executable, MIPS, N32 MIPS64 rel2 version 1 (SYSV), [...]
$ ./tls-core
Aborted (core dumped)
$ file core
core: ELF 32-bit MSB core file MIPS, MIPS-I version 1 (SYSV), SVR4-style
$
Previously the flag was set as the result of a:
statement placed in arch/mips/kernel/binfmt_elfn32.c, however in the
regset case, i.e. when CORE_DUMP_USE_REGSET is set, ELF_CORE_EFLAGS is
no longer used by `fill_note_info' in fs/binfmt_elf.c, and instead the
`->e_flags' member of the regset view chosen is. We have the views
defined in arch/mips/kernel/ptrace.c, however only an o32 and an n64
one, and the latter is used for n32 as well. Consequently an o32 core
file is incorrectly dumped from n32 processes (the ELF32 vs ELF64 class
is chosen elsewhere, and the 32-bit one is correctly selected for n32).
Correct the issue then by defining an n32 regset view and using it as
appropriate. Issue discovered in GDB testing.
Fixes: 7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
Signed-off-by: Maciej W. Rozycki <macro(a)mips.com>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Djordje Todorovic <djordje.todorovic(a)rt-rk.com>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17617/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/kernel/ptrace.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -522,6 +522,19 @@ static const struct user_regset_view use
.n = ARRAY_SIZE(mips64_regsets),
};
+#ifdef CONFIG_MIPS32_N32
+
+static const struct user_regset_view user_mipsn32_view = {
+ .name = "mipsn32",
+ .e_flags = EF_MIPS_ABI2,
+ .e_machine = ELF_ARCH,
+ .ei_osabi = ELF_OSABI,
+ .regsets = mips64_regsets,
+ .n = ARRAY_SIZE(mips64_regsets),
+};
+
+#endif /* CONFIG_MIPS32_N32 */
+
#endif /* CONFIG_64BIT */
const struct user_regset_view *task_user_regset_view(struct task_struct *task)
@@ -533,6 +546,10 @@ const struct user_regset_view *task_user
if (test_tsk_thread_flag(task, TIF_32BIT_REGS))
return &user_mips_view;
#endif
+#ifdef CONFIG_MIPS32_N32
+ if (test_tsk_thread_flag(task, TIF_32BIT_ADDR))
+ return &user_mipsn32_view;
+#endif
return &user_mips64_view;
#endif
}
Patches currently in stable-queue which might be from macro(a)mips.com are
queue-3.18/mips-fix-an-n32-core-file-generation-regset-support-regression.patch
This is a note to let you know that I've just added the patch titled
MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-bcm47xx-fix-led-inversion-for-wrt54gsv1.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 56a46acf62af5ba44fca2f3f1c7c25a2d5385b19 Mon Sep 17 00:00:00 2001
From: Mirko Parthey <mirko.parthey(a)web.de>
Date: Thu, 18 May 2017 21:30:03 +0200
Subject: MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Mirko Parthey <mirko.parthey(a)web.de>
commit 56a46acf62af5ba44fca2f3f1c7c25a2d5385b19 upstream.
The WLAN LED on the Linksys WRT54GSv1 is active low, but the software
treats it as active high. Fix the inverted logic.
Fixes: 7bb26b169116 ("MIPS: BCM47xx: Fix LEDs on WRT54GS V1.0")
Signed-off-by: Mirko Parthey <mirko.parthey(a)web.de>
Looks-ok-by: Rafał Miłecki <zajec5(a)gmail.com>
Cc: Hauke Mehrtens <hauke(a)hauke-m.de>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16071/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/bcm47xx/leds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/bcm47xx/leds.c
+++ b/arch/mips/bcm47xx/leds.c
@@ -323,7 +323,7 @@ bcm47xx_leds_linksys_wrt54g3gv2[] __init
/* Verified on: WRT54GS V1.0 */
static const struct gpio_led
bcm47xx_leds_linksys_wrt54g_type_0101[] __initconst = {
- BCM47XX_GPIO_LED(0, "green", "wlan", 0, LEDS_GPIO_DEFSTATE_OFF),
+ BCM47XX_GPIO_LED(0, "green", "wlan", 1, LEDS_GPIO_DEFSTATE_OFF),
BCM47XX_GPIO_LED(1, "green", "power", 0, LEDS_GPIO_DEFSTATE_ON),
BCM47XX_GPIO_LED(7, "green", "dmz", 1, LEDS_GPIO_DEFSTATE_OFF),
};
Patches currently in stable-queue which might be from mirko.parthey(a)web.de are
queue-3.18/mips-bcm47xx-fix-led-inversion-for-wrt54gsv1.patch
This is a note to let you know that I've just added the patch titled
isofs: fix timestamps beyond 2027
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
isofs-fix-timestamps-beyond-2027.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 34be4dbf87fc3e474a842305394534216d428f5d Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Thu, 19 Oct 2017 16:47:48 +0200
Subject: isofs: fix timestamps beyond 2027
From: Arnd Bergmann <arnd(a)arndb.de>
commit 34be4dbf87fc3e474a842305394534216d428f5d upstream.
isofs uses a 'char' variable to load the number of years since
1900 for an inode timestamp. On architectures that use a signed
char type by default, this results in an invalid date for
anything beyond 2027.
This changes the function argument to a 'u8' array, which
is defined the same way on all architectures, and unambiguously
lets us use years until 2155.
This should be backported to all kernels that might still be
in use by that date.
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/isofs/isofs.h | 2 +-
fs/isofs/rock.h | 2 +-
fs/isofs/util.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -103,7 +103,7 @@ static inline unsigned int isonum_733(ch
/* Ignore bigendian datum due to broken mastering programs */
return get_unaligned_le32(p);
}
-extern int iso_date(char *, int);
+extern int iso_date(u8 *, int);
struct inode; /* To make gcc happy */
--- a/fs/isofs/rock.h
+++ b/fs/isofs/rock.h
@@ -65,7 +65,7 @@ struct RR_PL_s {
};
struct stamp {
- char time[7];
+ __u8 time[7]; /* actually 6 unsigned, 1 signed */
} __attribute__ ((packed));
struct RR_TF_s {
--- a/fs/isofs/util.c
+++ b/fs/isofs/util.c
@@ -14,7 +14,7 @@
* to GMT. Thus we should always be correct.
*/
-int iso_date(char * p, int flag)
+int iso_date(u8 *p, int flag)
{
int year, month, day, hour, minute, second, tz;
int crtime, days, i;
Patches currently in stable-queue which might be from arnd(a)arndb.de are
queue-3.18/isofs-fix-timestamps-beyond-2027.patch
This is a note to let you know that I've just added the patch titled
iscsi-target: Fix non-immediate TMR reference leak
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iscsi-target-fix-non-immediate-tmr-reference-leak.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3fc9fb13a4b2576aeab86c62fd64eb29ab68659c Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 20:52:56 -0700
Subject: iscsi-target: Fix non-immediate TMR reference leak
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
commit 3fc9fb13a4b2576aeab86c62fd64eb29ab68659c upstream.
This patch fixes a se_cmd->cmd_kref reference leak that can
occur when a non immediate TMR is proceeded our of command
sequence number order, and CMDSN_LOWER_THAN_EXP is returned
by iscsit_sequence_cmd().
To address this bug, call target_put_sess_cmd() during this
special case following what iscsit_process_scsi_cmd() does
upon CMDSN_LOWER_THAN_EXP.
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/target/iscsi/iscsi_target.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1915,12 +1915,14 @@ attach:
if (!(hdr->opcode & ISCSI_OP_IMMEDIATE)) {
int cmdsn_ret = iscsit_sequence_cmd(conn, cmd, buf, hdr->cmdsn);
- if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP)
+ if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP) {
out_of_order_cmdsn = 1;
- else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP)
+ } else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP) {
+ target_put_sess_cmd(&cmd->se_cmd);
return 0;
- else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER)
+ } else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) {
return -1;
+ }
}
iscsit_ack_from_expstatsn(conn, be32_to_cpu(hdr->exp_statsn));
Patches currently in stable-queue which might be from nab(a)linux-iscsi.org are
queue-3.18/iscsi-target-fix-non-immediate-tmr-reference-leak.patch
This is a note to let you know that I've just added the patch titled
fs/9p: Compare qid.path in v9fs_test_inode
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fs-9p-compare-qid.path-in-v9fs_test_inode.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8ee031631546cf2f7859cc69593bd60bbdd70b46 Mon Sep 17 00:00:00 2001
From: Tuomas Tynkkynen <tuomas(a)tuxera.com>
Date: Wed, 6 Sep 2017 17:59:07 +0300
Subject: fs/9p: Compare qid.path in v9fs_test_inode
From: Tuomas Tynkkynen <tuomas(a)tuxera.com>
commit 8ee031631546cf2f7859cc69593bd60bbdd70b46 upstream.
Commit fd2421f54423 ("fs/9p: When doing inode lookup compare qid details
and inode mode bits.") transformed v9fs_qid_iget() to use iget5_locked()
instead of iget_locked(). However, the test() callback is not checking
fid.path at all, which means that a lookup in the inode cache can now
accidentally locate a completely wrong inode from the same inode hash
bucket if the other fields (qid.type and qid.version) match.
Fixes: fd2421f54423 ("fs/9p: When doing inode lookup compare qid details and inode mode bits.")
Reviewed-by: Latchesar Ionkov <lucho(a)ionkov.net>
Signed-off-by: Tuomas Tynkkynen <tuomas(a)tuxera.com>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/9p/vfs_inode.c | 3 +++
fs/9p/vfs_inode_dotl.c | 3 +++
2 files changed, 6 insertions(+)
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -483,6 +483,9 @@ static int v9fs_test_inode(struct inode
if (v9inode->qid.type != st->qid.type)
return 0;
+
+ if (v9inode->qid.path != st->qid.path)
+ return 0;
return 1;
}
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -87,6 +87,9 @@ static int v9fs_test_inode_dotl(struct i
if (v9inode->qid.type != st->qid.type)
return 0;
+
+ if (v9inode->qid.path != st->qid.path)
+ return 0;
return 1;
}
Patches currently in stable-queue which might be from tuomas(a)tuxera.com are
queue-3.18/fs-9p-compare-qid.path-in-v9fs_test_inode.patch
This is a note to let you know that I've just added the patch titled
ext4: fix interaction between i_size, fallocate, and delalloc after a crash
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ext4-fix-interaction-between-i_size-fallocate-and-delalloc-after-a-crash.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 6 Oct 2017 23:09:55 -0400
Subject: ext4: fix interaction between i_size, fallocate, and delalloc after a crash
From: Theodore Ts'o <tytso(a)mit.edu>
commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream.
If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).
If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:
Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)
This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.
Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reported-by: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/ext4/extents.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4807,7 +4807,8 @@ static long ext4_zero_range(struct file
}
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
- offset + len > i_size_read(inode)) {
+ (offset + len > i_size_read(inode) ||
+ offset + len > EXT4_I(inode)->i_disksize)) {
new_size = offset + len;
ret = inode_newsize_ok(inode, new_size);
if (ret)
@@ -4951,7 +4952,8 @@ long ext4_fallocate(struct file *file, i
}
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
- offset + len > i_size_read(inode)) {
+ (offset + len > i_size_read(inode) ||
+ offset + len > EXT4_I(inode)->i_disksize)) {
new_size = offset + len;
ret = inode_newsize_ok(inode, new_size);
if (ret)
Patches currently in stable-queue which might be from tytso(a)mit.edu are
queue-3.18/ext4-fix-interaction-between-i_size-fallocate-and-delalloc-after-a-crash.patch
This is a note to let you know that I've just added the patch titled
eCryptfs: use after free in ecryptfs_release_messaging()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ecryptfs-use-after-free-in-ecryptfs_release_messaging.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Tue, 22 Aug 2017 23:41:28 +0300
Subject: eCryptfs: use after free in ecryptfs_release_messaging()
From: Dan Carpenter <dan.carpenter(a)oracle.com>
commit db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 upstream.
We're freeing the list iterator so we should be using the _safe()
version of hlist_for_each_entry().
Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Tyler Hicks <tyhicks(a)canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/ecryptfs/messaging.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/ecryptfs/messaging.c
+++ b/fs/ecryptfs/messaging.c
@@ -442,15 +442,16 @@ void ecryptfs_release_messaging(void)
}
if (ecryptfs_daemon_hash) {
struct ecryptfs_daemon *daemon;
+ struct hlist_node *n;
int i;
mutex_lock(&ecryptfs_daemon_hash_mux);
for (i = 0; i < (1 << ecryptfs_hash_bits); i++) {
int rc;
- hlist_for_each_entry(daemon,
- &ecryptfs_daemon_hash[i],
- euid_chain) {
+ hlist_for_each_entry_safe(daemon, n,
+ &ecryptfs_daemon_hash[i],
+ euid_chain) {
rc = ecryptfs_exorcise_daemon(daemon);
if (rc)
printk(KERN_ERR "%s: Error whilst "
Patches currently in stable-queue which might be from dan.carpenter(a)oracle.com are
queue-3.18/ecryptfs-use-after-free-in-ecryptfs_release_messaging.patch
This is a note to let you know that I've just added the patch titled
dm: fix race between dm_get_from_kobject() and __dm_destroy()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-fix-race-between-dm_get_from_kobject-and-__dm_destroy.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b9a41d21dceadf8104812626ef85dc56ee8a60ed Mon Sep 17 00:00:00 2001
From: Hou Tao <houtao1(a)huawei.com>
Date: Wed, 1 Nov 2017 15:42:36 +0800
Subject: dm: fix race between dm_get_from_kobject() and __dm_destroy()
From: Hou Tao <houtao1(a)huawei.com>
commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream.
The following BUG_ON was hit when testing repeat creation and removal of
DM devices:
kernel BUG at drivers/md/dm.c:2919!
CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
Call Trace:
[<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
[<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
[<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
[<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
[<ffffffff811de257>] kernfs_seq_show+0x23/0x25
[<ffffffff81199118>] seq_read+0x16f/0x325
[<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
[<ffffffff8117b625>] __vfs_read+0x26/0x9d
[<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
[<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
[<ffffffff8117be9d>] vfs_read+0x8f/0xcf
[<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
[<ffffffff8117c686>] SyS_read+0x4b/0x76
[<ffffffff817b606e>] system_call_fastpath+0x12/0x71
The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().
To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.
The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3033,11 +3033,15 @@ struct mapped_device *dm_get_from_kobjec
md = container_of(kobj, struct mapped_device, kobj_holder.kobj);
- if (test_bit(DMF_FREEING, &md->flags) ||
- dm_deleting_md(md))
- return NULL;
-
+ spin_lock(&_minor_lock);
+ if (test_bit(DMF_FREEING, &md->flags) || dm_deleting_md(md)) {
+ md = NULL;
+ goto out;
+ }
dm_get(md);
+out:
+ spin_unlock(&_minor_lock);
+
return md;
}
Patches currently in stable-queue which might be from houtao1(a)huawei.com are
queue-3.18/dm-fix-race-between-dm_get_from_kobject-and-__dm_destroy.patch
This is a note to let you know that I've just added the patch titled
bcache: only permit to recovery read error when cache device is clean
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d59b23795933678c9638fd20c942d2b4f3cd6185 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Mon, 30 Oct 2017 14:46:31 -0700
Subject: bcache: only permit to recovery read error when cache device is clean
From: Coly Li <colyli(a)suse.de>
commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream.
When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.
For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.
With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.
For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.
Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc->has_data in writethrough mode still makes sense.
Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
bug to fix, and option to permit it is unnecessary. So this version
the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure to
allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.
[small change to patch comment spelling by mlyle]
Signed-off-by: Coly Li <colyli(a)suse.de>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Reported-by: Arne Wolf <awolf(a)lenovo.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Cc: Nix <nix(a)esperi.org.uk>
Cc: Kai Krakow <hurikhan77(a)gmail.com>
Cc: Eric Wheeler <bcache(a)lists.ewheeler.net>
Cc: Junhui Tang <tang.junhui(a)zte.com.cn>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/request.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -705,8 +705,16 @@ static void cached_dev_read_error(struct
{
struct search *s = container_of(cl, struct search, cl);
struct bio *bio = &s->bio.bio;
+ struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
- if (s->recoverable) {
+ /*
+ * If cache device is dirty (dc->has_dirty is non-zero), then
+ * recovery a failed read request from cached device may get a
+ * stale data back. So read failure recovery is only permitted
+ * when cache device is clean.
+ */
+ if (s->recoverable &&
+ (dc && !atomic_read(&dc->has_dirty))) {
/* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio);
Patches currently in stable-queue which might be from colyli(a)suse.de are
queue-3.18/bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
queue-3.18/bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
This is a note to let you know that I've just added the patch titled
bcache: check ca->alloc_thread initialized before wake up it
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 13 Oct 2017 16:35:29 -0700
Subject: bcache: check ca->alloc_thread initialized before wake up it
From: Coly Li <colyli(a)suse.de>
commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.
In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.
There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.
The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca->alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca->alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca->alloc_thread, checking whether it is allocated, and only
wake up ca->alloc_thread when it is not NULL.
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-by: Jorg Bornschein <jb(a)capsec.org>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -406,7 +406,8 @@ long bch_bucket_alloc(struct cache *ca,
finish_wait(&ca->set->bucket_wait, &w);
out:
- wake_up_process(ca->alloc_thread);
+ if (ca->alloc_thread)
+ wake_up_process(ca->alloc_thread);
trace_bcache_alloc(ca, reserve);
Patches currently in stable-queue which might be from colyli(a)suse.de are
queue-3.18/bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
queue-3.18/bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
This is a note to let you know that I've just added the patch titled
autofs: don't fail mount for transient error
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
autofs-don-t-fail-mount-for-transient-error.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ecc0c469f27765ed1e2b967be0aa17cee1a60b76 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Fri, 17 Nov 2017 15:29:13 -0800
Subject: autofs: don't fail mount for transient error
From: NeilBrown <neilb(a)suse.com>
commit ecc0c469f27765ed1e2b967be0aa17cee1a60b76 upstream.
Currently if the autofs kernel module gets an error when writing to the
pipe which links to the daemon, then it marks the whole moutpoint as
catatonic, and it will stop working.
It is possible that the error is transient. This can happen if the
daemon is slow and more than 16 requests queue up. If a subsequent
process tries to queue a request, and is then signalled, the write to
the pipe will return -ERESTARTSYS and autofs will take that as total
failure.
So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole mountpoint.
It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.
Ian said:
: And given the problems with a half dozen (or so) user space applications
: consuming large amounts of CPU under heavy mount and umount activity this
: could happen more easily than we expect.
Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb(a)suse.com>
Acked-by: Ian Kent <raven(a)themaw.net>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/autofs4/waitq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -87,7 +87,8 @@ static int autofs4_write(struct autofs_s
spin_unlock_irqrestore(¤t->sighand->siglock, flags);
}
- return (bytes > 0);
+ /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
+ return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
}
static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
@@ -101,6 +102,7 @@ static void autofs4_notify_daemon(struct
} pkt;
struct file *pipe = NULL;
size_t pktsz;
+ int ret;
DPRINTK("wait id = 0x%08lx, name = %.*s, type=%d",
(unsigned long) wq->wait_queue_token, wq->name.len, wq->name.name, type);
@@ -173,7 +175,18 @@ static void autofs4_notify_daemon(struct
mutex_unlock(&sbi->wq_mutex);
if (autofs4_write(sbi, pipe, &pkt, pktsz))
+ switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
+ case 0:
+ break;
+ case -ENOMEM:
+ case -ERESTARTSYS:
+ /* Just fail this one */
+ autofs4_wait_release(sbi, wq->wait_queue_token, ret);
+ break;
+ default:
autofs4_catatonic_mode(sbi);
+ break;
+ }
fput(pipe);
}
Patches currently in stable-queue which might be from neilb(a)suse.com are
queue-3.18/autofs-don-t-fail-mount-for-transient-error.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f658f17b5e0e339935dca23e77e0f3cad591926b Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 17:00:32 +0100
Subject: ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
From: Takashi Iwai <tiwai(a)suse.de>
commit f658f17b5e0e339935dca23e77e0f3cad591926b upstream.
The usb-audio driver may trigger an out-of-bound access at parsing a
malformed selector unit, as it checks the header length only after
evaluating bNrInPins field, which can be already above the given
length. Fix it by adding the length check beforehand.
Fixes: 99fc86450c43 ("ALSA: usb-mixer: parse descriptors with structs")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -2018,7 +2018,8 @@ static int parse_audio_selector_unit(str
const struct usbmix_name_map *map;
char **namelist;
- if (!desc->bNrInPins || desc->bLength < 5 + desc->bNrInPins) {
+ if (desc->bLength < 5 || !desc->bNrInPins ||
+ desc->bLength < 5 + desc->bNrInPins) {
usb_audio_err(state->chip,
"invalid SELECTOR UNIT descriptor %d\n", unitid);
return -EINVAL;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-3.18/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-3.18/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-3.18/alsa-hda-add-raven-pci-id.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-3.18/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Add sanity checks to FE parser
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d937cd6790a2bef2d07b500487646bd794c039bb Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 16:55:51 +0100
Subject: ALSA: usb-audio: Add sanity checks to FE parser
From: Takashi Iwai <tiwai(a)suse.de>
commit d937cd6790a2bef2d07b500487646bd794c039bb upstream.
When the usb-audio descriptor contains the malformed feature unit
description with a too short length, the driver may access
out-of-bounds. Add a sanity check of the header size at the beginning
of parse_audio_feature_unit().
Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Reported-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/mixer.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -1373,6 +1373,12 @@ static int parse_audio_feature_unit(stru
__u8 *bmaControls;
if (state->mixer->protocol == UAC_VERSION_1) {
+ if (hdr->bLength < 7) {
+ usb_audio_err(state->chip,
+ "unit %u: invalid UAC_FEATURE_UNIT descriptor\n",
+ unitid);
+ return -EINVAL;
+ }
csize = hdr->bControlSize;
if (!csize) {
usb_audio_dbg(state->chip,
@@ -1390,6 +1396,12 @@ static int parse_audio_feature_unit(stru
}
} else {
struct uac2_feature_unit_descriptor *ftr = _ftr;
+ if (hdr->bLength < 6) {
+ usb_audio_err(state->chip,
+ "unit %u: invalid UAC_FEATURE_UNIT descriptor\n",
+ unitid);
+ return -EINVAL;
+ }
csize = 4;
channels = (hdr->bLength - 6) / 4 - 1;
bmaControls = ftr->bmaControls;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-3.18/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-3.18/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-3.18/alsa-hda-add-raven-pci-id.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-3.18/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: usb-audio: Add sanity checks in v2 clock parsers
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0a62d6c966956d77397c32836a5bbfe3af786fc1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 17:28:06 +0100
Subject: ALSA: usb-audio: Add sanity checks in v2 clock parsers
From: Takashi Iwai <tiwai(a)suse.de>
commit 0a62d6c966956d77397c32836a5bbfe3af786fc1 upstream.
The helper functions to parse and look for the clock source, selector
and multiplier unit may return the descriptor with a too short length
than required, while there is no sanity check in the caller side.
Add some sanity checks in the parsers, at least, to guarantee the
given descriptor size, for avoiding the potential crashes.
Fixes: 79f920fbff56 ("ALSA: usb-audio: parse clock topology of UAC2 devices")
Reported-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/usb/clock.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/sound/usb/clock.c
+++ b/sound/usb/clock.c
@@ -43,7 +43,7 @@ static struct uac_clock_source_descripto
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_SOURCE))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id)
return cs;
}
@@ -59,8 +59,11 @@ static struct uac_clock_selector_descrip
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_SELECTOR))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id) {
+ if (cs->bLength < 5 + cs->bNrInPins)
+ return NULL;
return cs;
+ }
}
return NULL;
@@ -75,7 +78,7 @@ static struct uac_clock_multiplier_descr
while ((cs = snd_usb_find_csint_desc(ctrl_iface->extra,
ctrl_iface->extralen,
cs, UAC2_CLOCK_MULTIPLIER))) {
- if (cs->bClockID == clock_id)
+ if (cs->bLength >= sizeof(*cs) && cs->bClockID == clock_id)
return cs;
}
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-3.18/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-3.18/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-3.18/alsa-hda-add-raven-pci-id.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-3.18/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
This is a note to let you know that I've just added the patch titled
ALSA: timer: Remove kernel warning at compat ioctl error paths
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3d4e8303f2c747c8540a0a0126d0151514f6468b Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 21 Nov 2017 16:36:11 +0100
Subject: ALSA: timer: Remove kernel warning at compat ioctl error paths
From: Takashi Iwai <tiwai(a)suse.de>
commit 3d4e8303f2c747c8540a0a0126d0151514f6468b upstream.
Some timer compat ioctls have NULL checks of timer instance with
snd_BUG_ON() that bring up WARN_ON() when the debug option is set.
Actually the condition can be met in the normal situation and it's
confusing and bad to spew kernel warnings with stack trace there.
Let's remove snd_BUG_ON() invocation and replace with the simple
checks. Also, correct the error code to EBADFD to follow the native
ioctl error handling.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/core/timer_compat.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/core/timer_compat.c
+++ b/sound/core/timer_compat.c
@@ -40,11 +40,11 @@ static int snd_timer_user_info_compat(st
struct snd_timer *t;
tu = file->private_data;
- if (snd_BUG_ON(!tu->timeri))
- return -ENXIO;
+ if (!tu->timeri)
+ return -EBADFD;
t = tu->timeri->timer;
- if (snd_BUG_ON(!t))
- return -ENXIO;
+ if (!t)
+ return -EBADFD;
memset(&info, 0, sizeof(info));
info.card = t->card ? t->card->number : -1;
if (t->hw.flags & SNDRV_TIMER_HW_SLAVE)
@@ -73,8 +73,8 @@ static int snd_timer_user_status_compat(
struct snd_timer_status32 status;
tu = file->private_data;
- if (snd_BUG_ON(!tu->timeri))
- return -ENXIO;
+ if (!tu->timeri)
+ return -EBADFD;
memset(&status, 0, sizeof(status));
status.tstamp.tv_sec = tu->tstamp.tv_sec;
status.tstamp.tv_nsec = tu->tstamp.tv_nsec;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-3.18/alsa-usb-audio-fix-potential-zero-division-at-parsing-fu.patch
queue-3.18/alsa-timer-remove-kernel-warning-at-compat-ioctl-error-paths.patch
queue-3.18/alsa-hda-add-raven-pci-id.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-in-v2-clock-parsers.patch
queue-3.18/alsa-usb-audio-fix-potential-out-of-bound-access-at-parsing-su.patch
queue-3.18/alsa-usb-audio-add-sanity-checks-to-fe-parser.patch
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 30863e38ebeb500a31cecee8096fb5002677dd9b Mon Sep 17 00:00:00 2001
From: Brent Taylor <motobud(a)gmail.com>
Date: Mon, 30 Oct 2017 22:32:45 -0500
Subject: [PATCH] mtd: nand: Fix writing mtdoops to nand flash.
When mtdoops calls mtd_panic_write(), it eventually calls
panic_nand_write() in nand_base.c. In order to properly wait for the
nand chip to be ready in panic_nand_wait(), the chip must first be
selected.
When using the atmel nand flash controller, a panic would occur due to
a NULL pointer exception.
Fixes: 2af7c6539931 ("mtd: Add panic_write for NAND flashes")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brent Taylor <motobud(a)gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 4c867115c53b..e48bf8260f54 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2799,15 +2799,18 @@ static int panic_nand_write(struct mtd_info *mtd, loff_t to, size_t len,
size_t *retlen, const uint8_t *buf)
{
struct nand_chip *chip = mtd_to_nand(mtd);
+ int chipnr = (int)(to >> chip->chip_shift);
struct mtd_oob_ops ops;
int ret;
- /* Wait for the device to get ready */
- panic_nand_wait(mtd, chip, 400);
-
/* Grab the device */
panic_nand_get_device(chip, mtd, FL_WRITING);
+ chip->select_chip(mtd, chipnr);
+
+ /* Wait for the device to get ready */
+ panic_nand_wait(mtd, chip, 400);
+
memset(&ops, 0, sizeof(ops));
ops.len = len;
ops.datbuf = (uint8_t *)buf;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 30863e38ebeb500a31cecee8096fb5002677dd9b Mon Sep 17 00:00:00 2001
From: Brent Taylor <motobud(a)gmail.com>
Date: Mon, 30 Oct 2017 22:32:45 -0500
Subject: [PATCH] mtd: nand: Fix writing mtdoops to nand flash.
When mtdoops calls mtd_panic_write(), it eventually calls
panic_nand_write() in nand_base.c. In order to properly wait for the
nand chip to be ready in panic_nand_wait(), the chip must first be
selected.
When using the atmel nand flash controller, a panic would occur due to
a NULL pointer exception.
Fixes: 2af7c6539931 ("mtd: Add panic_write for NAND flashes")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brent Taylor <motobud(a)gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 4c867115c53b..e48bf8260f54 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -2799,15 +2799,18 @@ static int panic_nand_write(struct mtd_info *mtd, loff_t to, size_t len,
size_t *retlen, const uint8_t *buf)
{
struct nand_chip *chip = mtd_to_nand(mtd);
+ int chipnr = (int)(to >> chip->chip_shift);
struct mtd_oob_ops ops;
int ret;
- /* Wait for the device to get ready */
- panic_nand_wait(mtd, chip, 400);
-
/* Grab the device */
panic_nand_get_device(chip, mtd, FL_WRITING);
+ chip->select_chip(mtd, chipnr);
+
+ /* Wait for the device to get ready */
+ panic_nand_wait(mtd, chip, 400);
+
memset(&ops, 0, sizeof(ops));
ops.len = len;
ops.datbuf = (uint8_t *)buf;
The commit 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap
initialization sequence") moved a bunch of logic around, but forgot to
update the gotos after the introduction of the err_free_dotclock label.
It means that if we fail later that the one introduced in that commit,
we'll just to the old label which isn't free the clock we created. This
will result in a breakage as soon as someone tries to do something with
that clock, since its resources will have been long reclaimed.
Cc: <stable(a)vger.kernel.org>
Fixes: 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap initialization sequence")
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
---
drivers/gpu/drm/sun4i/sun4i_tcon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/sun4i/sun4i_tcon.c b/drivers/gpu/drm/sun4i/sun4i_tcon.c
index e122f5b2a395..f4284b51bdca 100644
--- a/drivers/gpu/drm/sun4i/sun4i_tcon.c
+++ b/drivers/gpu/drm/sun4i/sun4i_tcon.c
@@ -724,12 +724,12 @@ static int sun4i_tcon_bind(struct device *dev, struct device *master,
if (IS_ERR(tcon->crtc)) {
dev_err(dev, "Couldn't create our CRTC\n");
ret = PTR_ERR(tcon->crtc);
- goto err_free_clocks;
+ goto err_free_dotclock;
}
ret = sun4i_rgb_init(drm, tcon);
if (ret < 0)
- goto err_free_clocks;
+ goto err_free_dotclock;
if (tcon->quirks->needs_de_be_mux) {
/*
--
git-series 0.9.1
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1c21a48055a67ceb693e9c2587824a8de60a217c Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 22:19:26 -0800
Subject: [PATCH] target: Avoid early CMD_T_PRE_EXECUTE failures during
ABORT_TASK
This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.
Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.
This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed. As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.
To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.
Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 61909b23e959..9c7bc1ca341a 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -133,6 +133,15 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
spin_unlock(&se_cmd->t_state_lock);
return false;
}
+ if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) {
+ if (se_cmd->scsi_status) {
+ pr_debug("Attempted to abort io tag: %llu early failure"
+ " status: 0x%02x\n", se_cmd->tag,
+ se_cmd->scsi_status);
+ spin_unlock(&se_cmd->t_state_lock);
+ return false;
+ }
+ }
if (sess->sess_tearing_down || se_cmd->cmd_wait_set) {
pr_debug("Attempted to abort io tag: %llu already shutdown,"
" skipping\n", se_cmd->tag);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 0e89db84b200..58caacd54a3b 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1975,6 +1975,7 @@ void target_execute_cmd(struct se_cmd *cmd)
}
cmd->t_state = TRANSPORT_PROCESSING;
+ cmd->transport_state &= ~CMD_T_PRE_EXECUTE;
cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
spin_unlock_irq(&cmd->t_state_lock);
@@ -2667,6 +2668,7 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
ret = -ESHUTDOWN;
goto out;
}
+ se_cmd->transport_state |= CMD_T_PRE_EXECUTE;
list_add_tail(&se_cmd->se_cmd_list, &se_sess->sess_cmd_list);
out:
spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index d3139a95ea77..ccf501b8359c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -490,6 +490,7 @@ struct se_cmd {
#define CMD_T_STOP (1 << 5)
#define CMD_T_TAS (1 << 10)
#define CMD_T_FABRIC_STOP (1 << 11)
+#define CMD_T_PRE_EXECUTE (1 << 12)
spinlock_t t_state_lock;
struct kref cmd_kref;
struct completion t_transport_stop_comp;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1c21a48055a67ceb693e9c2587824a8de60a217c Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 22:19:26 -0800
Subject: [PATCH] target: Avoid early CMD_T_PRE_EXECUTE failures during
ABORT_TASK
This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.
Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.
This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed. As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.
To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.
Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 61909b23e959..9c7bc1ca341a 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -133,6 +133,15 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
spin_unlock(&se_cmd->t_state_lock);
return false;
}
+ if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) {
+ if (se_cmd->scsi_status) {
+ pr_debug("Attempted to abort io tag: %llu early failure"
+ " status: 0x%02x\n", se_cmd->tag,
+ se_cmd->scsi_status);
+ spin_unlock(&se_cmd->t_state_lock);
+ return false;
+ }
+ }
if (sess->sess_tearing_down || se_cmd->cmd_wait_set) {
pr_debug("Attempted to abort io tag: %llu already shutdown,"
" skipping\n", se_cmd->tag);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 0e89db84b200..58caacd54a3b 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1975,6 +1975,7 @@ void target_execute_cmd(struct se_cmd *cmd)
}
cmd->t_state = TRANSPORT_PROCESSING;
+ cmd->transport_state &= ~CMD_T_PRE_EXECUTE;
cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
spin_unlock_irq(&cmd->t_state_lock);
@@ -2667,6 +2668,7 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
ret = -ESHUTDOWN;
goto out;
}
+ se_cmd->transport_state |= CMD_T_PRE_EXECUTE;
list_add_tail(&se_cmd->se_cmd_list, &se_sess->sess_cmd_list);
out:
spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index d3139a95ea77..ccf501b8359c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -490,6 +490,7 @@ struct se_cmd {
#define CMD_T_STOP (1 << 5)
#define CMD_T_TAS (1 << 10)
#define CMD_T_FABRIC_STOP (1 << 11)
+#define CMD_T_PRE_EXECUTE (1 << 12)
spinlock_t t_state_lock;
struct kref cmd_kref;
struct completion t_transport_stop_comp;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1c21a48055a67ceb693e9c2587824a8de60a217c Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 22:19:26 -0800
Subject: [PATCH] target: Avoid early CMD_T_PRE_EXECUTE failures during
ABORT_TASK
This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.
Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.
This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed. As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.
To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.
Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 61909b23e959..9c7bc1ca341a 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -133,6 +133,15 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
spin_unlock(&se_cmd->t_state_lock);
return false;
}
+ if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) {
+ if (se_cmd->scsi_status) {
+ pr_debug("Attempted to abort io tag: %llu early failure"
+ " status: 0x%02x\n", se_cmd->tag,
+ se_cmd->scsi_status);
+ spin_unlock(&se_cmd->t_state_lock);
+ return false;
+ }
+ }
if (sess->sess_tearing_down || se_cmd->cmd_wait_set) {
pr_debug("Attempted to abort io tag: %llu already shutdown,"
" skipping\n", se_cmd->tag);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 0e89db84b200..58caacd54a3b 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1975,6 +1975,7 @@ void target_execute_cmd(struct se_cmd *cmd)
}
cmd->t_state = TRANSPORT_PROCESSING;
+ cmd->transport_state &= ~CMD_T_PRE_EXECUTE;
cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
spin_unlock_irq(&cmd->t_state_lock);
@@ -2667,6 +2668,7 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
ret = -ESHUTDOWN;
goto out;
}
+ se_cmd->transport_state |= CMD_T_PRE_EXECUTE;
list_add_tail(&se_cmd->se_cmd_list, &se_sess->sess_cmd_list);
out:
spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index d3139a95ea77..ccf501b8359c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -490,6 +490,7 @@ struct se_cmd {
#define CMD_T_STOP (1 << 5)
#define CMD_T_TAS (1 << 10)
#define CMD_T_FABRIC_STOP (1 << 11)
+#define CMD_T_PRE_EXECUTE (1 << 12)
spinlock_t t_state_lock;
struct kref cmd_kref;
struct completion t_transport_stop_comp;
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae072726f6109bb1c94841d6fb3a82dde298ea85 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 12:32:59 -0700
Subject: [PATCH] iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.
To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.
This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 541f66a875fc..048d4227327c 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1955,7 +1955,6 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct iscsi_tmr_req *tmr_req;
struct iscsi_tm *hdr;
int out_of_order_cmdsn = 0, ret;
- bool sess_ref = false;
u8 function, tcm_function = TMR_UNKNOWN;
hdr = (struct iscsi_tm *) buf;
@@ -1988,22 +1987,23 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
cmd->data_direction = DMA_NONE;
cmd->tmr_req = kzalloc(sizeof(*cmd->tmr_req), GFP_KERNEL);
- if (!cmd->tmr_req)
+ if (!cmd->tmr_req) {
return iscsit_add_reject_cmd(cmd,
ISCSI_REASON_BOOKMARK_NO_RESOURCES,
buf);
+ }
+
+ transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
+ conn->sess->se_sess, 0, DMA_NONE,
+ TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
+
+ target_get_sess_cmd(&cmd->se_cmd, true);
/*
* TASK_REASSIGN for ERL=2 / connection stays inside of
* LIO-Target $FABRIC_MOD
*/
if (function != ISCSI_TM_FUNC_TASK_REASSIGN) {
- transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
- conn->sess->se_sess, 0, DMA_NONE,
- TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
-
- target_get_sess_cmd(&cmd->se_cmd, true);
- sess_ref = true;
tcm_function = iscsit_convert_tmf(function);
if (tcm_function == TMR_UNKNOWN) {
pr_err("Unknown iSCSI TMR Function:"
@@ -2119,12 +2119,8 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
* For connection recovery, this is also the default action for
* TMR TASK_REASSIGN.
*/
- if (sess_ref) {
- pr_debug("Handle TMR, using sess_ref=true check\n");
- target_put_sess_cmd(&cmd->se_cmd);
- }
-
iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
+ target_put_sess_cmd(&cmd->se_cmd);
return 0;
}
EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae072726f6109bb1c94841d6fb3a82dde298ea85 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 12:32:59 -0700
Subject: [PATCH] iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.
To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.
This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 541f66a875fc..048d4227327c 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1955,7 +1955,6 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct iscsi_tmr_req *tmr_req;
struct iscsi_tm *hdr;
int out_of_order_cmdsn = 0, ret;
- bool sess_ref = false;
u8 function, tcm_function = TMR_UNKNOWN;
hdr = (struct iscsi_tm *) buf;
@@ -1988,22 +1987,23 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
cmd->data_direction = DMA_NONE;
cmd->tmr_req = kzalloc(sizeof(*cmd->tmr_req), GFP_KERNEL);
- if (!cmd->tmr_req)
+ if (!cmd->tmr_req) {
return iscsit_add_reject_cmd(cmd,
ISCSI_REASON_BOOKMARK_NO_RESOURCES,
buf);
+ }
+
+ transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
+ conn->sess->se_sess, 0, DMA_NONE,
+ TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
+
+ target_get_sess_cmd(&cmd->se_cmd, true);
/*
* TASK_REASSIGN for ERL=2 / connection stays inside of
* LIO-Target $FABRIC_MOD
*/
if (function != ISCSI_TM_FUNC_TASK_REASSIGN) {
- transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
- conn->sess->se_sess, 0, DMA_NONE,
- TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
-
- target_get_sess_cmd(&cmd->se_cmd, true);
- sess_ref = true;
tcm_function = iscsit_convert_tmf(function);
if (tcm_function == TMR_UNKNOWN) {
pr_err("Unknown iSCSI TMR Function:"
@@ -2119,12 +2119,8 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
* For connection recovery, this is also the default action for
* TMR TASK_REASSIGN.
*/
- if (sess_ref) {
- pr_debug("Handle TMR, using sess_ref=true check\n");
- target_put_sess_cmd(&cmd->se_cmd);
- }
-
iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
+ target_put_sess_cmd(&cmd->se_cmd);
return 0;
}
EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae072726f6109bb1c94841d6fb3a82dde298ea85 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab(a)linux-iscsi.org>
Date: Fri, 27 Oct 2017 12:32:59 -0700
Subject: [PATCH] iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.
To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.
This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.
Reported-by: Donald White <dew(a)datera.io>
Cc: Donald White <dew(a)datera.io>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab(a)linux-iscsi.org>
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 541f66a875fc..048d4227327c 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1955,7 +1955,6 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
struct iscsi_tmr_req *tmr_req;
struct iscsi_tm *hdr;
int out_of_order_cmdsn = 0, ret;
- bool sess_ref = false;
u8 function, tcm_function = TMR_UNKNOWN;
hdr = (struct iscsi_tm *) buf;
@@ -1988,22 +1987,23 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
cmd->data_direction = DMA_NONE;
cmd->tmr_req = kzalloc(sizeof(*cmd->tmr_req), GFP_KERNEL);
- if (!cmd->tmr_req)
+ if (!cmd->tmr_req) {
return iscsit_add_reject_cmd(cmd,
ISCSI_REASON_BOOKMARK_NO_RESOURCES,
buf);
+ }
+
+ transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
+ conn->sess->se_sess, 0, DMA_NONE,
+ TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
+
+ target_get_sess_cmd(&cmd->se_cmd, true);
/*
* TASK_REASSIGN for ERL=2 / connection stays inside of
* LIO-Target $FABRIC_MOD
*/
if (function != ISCSI_TM_FUNC_TASK_REASSIGN) {
- transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops,
- conn->sess->se_sess, 0, DMA_NONE,
- TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
-
- target_get_sess_cmd(&cmd->se_cmd, true);
- sess_ref = true;
tcm_function = iscsit_convert_tmf(function);
if (tcm_function == TMR_UNKNOWN) {
pr_err("Unknown iSCSI TMR Function:"
@@ -2119,12 +2119,8 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
* For connection recovery, this is also the default action for
* TMR TASK_REASSIGN.
*/
- if (sess_ref) {
- pr_debug("Handle TMR, using sess_ref=true check\n");
- target_put_sess_cmd(&cmd->se_cmd);
- }
-
iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
+ target_put_sess_cmd(&cmd->se_cmd);
return 0;
}
EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 559db4c6d784ceedc2a5418ced4d357cb843e221 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Date: Thu, 12 Oct 2017 11:52:34 -0400
Subject: [PATCH] ext4: prevent data corruption with inline data + DAX
If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:
https://www.spinics.net/lists/linux-xfs/msg09859.html
The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:
https://patchwork.kernel.org/patch/9948381/
Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reviewed-by: Jan Kara <jack(a)suse.cz>
CC: stable(a)vger.kernel.org
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index f0bbc8cb6555..1367553c43bb 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -303,11 +303,6 @@ static int ext4_create_inline_data(handle_t *handle,
EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE;
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get cleared
- */
- ext4_set_inode_flags(inode);
get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
@@ -452,11 +447,6 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle,
}
}
ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get set.
- */
- ext4_set_inode_flags(inode);
get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096fce9e..986475c0d552 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3708,6 +3708,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}
if (sbi->s_mount_opt & EXT4_MOUNT_DAX) {
+ if (ext4_has_feature_inline_data(sb)) {
+ ext4_msg(sb, KERN_ERR, "Cannot use DAX on a filesystem"
+ " that may contain inline data");
+ goto failed_mount;
+ }
err = bdev_dax_supported(sb, blocksize);
if (err)
goto failed_mount;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 559db4c6d784ceedc2a5418ced4d357cb843e221 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Date: Thu, 12 Oct 2017 11:52:34 -0400
Subject: [PATCH] ext4: prevent data corruption with inline data + DAX
If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:
https://www.spinics.net/lists/linux-xfs/msg09859.html
The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:
https://patchwork.kernel.org/patch/9948381/
Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reviewed-by: Jan Kara <jack(a)suse.cz>
CC: stable(a)vger.kernel.org
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index f0bbc8cb6555..1367553c43bb 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -303,11 +303,6 @@ static int ext4_create_inline_data(handle_t *handle,
EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE;
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get cleared
- */
- ext4_set_inode_flags(inode);
get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
@@ -452,11 +447,6 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle,
}
}
ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA);
- /*
- * Propagate changes to inode->i_flags as well - e.g. S_DAX may
- * get set.
- */
- ext4_set_inode_flags(inode);
get_bh(is.iloc.bh);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096fce9e..986475c0d552 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3708,6 +3708,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}
if (sbi->s_mount_opt & EXT4_MOUNT_DAX) {
+ if (ext4_has_feature_inline_data(sb)) {
+ ext4_msg(sb, KERN_ERR, "Cannot use DAX on a filesystem"
+ " that may contain inline data");
+ goto failed_mount;
+ }
err = bdev_dax_supported(sb, blocksize);
if (err)
goto failed_mount;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9072d859df3e0f2c3ba450f0d1739595c2d5d13 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Date: Thu, 12 Oct 2017 11:54:08 -0400
Subject: [PATCH] ext4: prevent data corruption with journaling + DAX
The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.
I've captured an instance of this data corruption in the following fstest:
https://patchwork.kernel.org/patch/9948377/
Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: stable(a)vger.kernel.org
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index edfe95f81274..350e0910ed32 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6004,11 +6004,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
}
ext4_set_aops(inode);
- /*
- * Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated.
- * E.g. S_DAX may get cleared / set.
- */
- ext4_set_inode_flags(inode);
jbd2_journal_unlock_updates(journal);
percpu_up_write(&sbi->s_journal_flag_rwsem);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index afb66d4ab5cf..b0b754b37c36 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -290,10 +290,20 @@ static int ext4_ioctl_setflags(struct inode *inode,
if (err)
goto flags_out;
- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+ /*
+ * Changes to the journaling mode can cause unsafe changes to
+ * S_DAX if we are using the DAX mount option.
+ */
+ if (test_opt(inode->i_sb, DAX)) {
+ err = -EBUSY;
+ goto flags_out;
+ }
+
err = ext4_change_inode_journal_flag(inode, jflag);
- if (err)
- goto flags_out;
+ if (err)
+ goto flags_out;
+ }
if (migrate) {
if (flags & EXT4_EXTENTS_FL)
err = ext4_ext_migrate(inode);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9072d859df3e0f2c3ba450f0d1739595c2d5d13 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Date: Thu, 12 Oct 2017 11:54:08 -0400
Subject: [PATCH] ext4: prevent data corruption with journaling + DAX
The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.
I've captured an instance of this data corruption in the following fstest:
https://patchwork.kernel.org/patch/9948377/
Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: stable(a)vger.kernel.org
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index edfe95f81274..350e0910ed32 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6004,11 +6004,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
}
ext4_set_aops(inode);
- /*
- * Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated.
- * E.g. S_DAX may get cleared / set.
- */
- ext4_set_inode_flags(inode);
jbd2_journal_unlock_updates(journal);
percpu_up_write(&sbi->s_journal_flag_rwsem);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index afb66d4ab5cf..b0b754b37c36 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -290,10 +290,20 @@ static int ext4_ioctl_setflags(struct inode *inode,
if (err)
goto flags_out;
- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+ /*
+ * Changes to the journaling mode can cause unsafe changes to
+ * S_DAX if we are using the DAX mount option.
+ */
+ if (test_opt(inode->i_sb, DAX)) {
+ err = -EBUSY;
+ goto flags_out;
+ }
+
err = ext4_change_inode_journal_flag(inode, jflag);
- if (err)
- goto flags_out;
+ if (err)
+ goto flags_out;
+ }
if (migrate) {
if (flags & EXT4_EXTENTS_FL)
err = ext4_ext_migrate(inode);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 230b55fa8d64007339319539f8f8e68114d08529 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Tue, 17 Oct 2017 14:24:09 +1100
Subject: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
Having both a bitmap and a journal is pointless.
Attempting to do so can corrupt the bitmap if the journal
replay happens before the bitmap is initialized.
Rather than try to avoid this corruption, simply
refuse to allow arrays with both a bitmap and a journal.
So:
- if raid5_run sees both are present, fail.
- if adding a bitmap finds a journal is present, fail
- if adding a journal finds a bitmap is present, fail.
Cc: stable(a)vger.kernel.org (4.10+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Tested-by: Joshua Kinard <kumba(a)gentoo.org>
Acked-by: Joshua Kinard <kumba(a)gentoo.org>
Signed-off-by: Shaohua Li <shli(a)fb.com>
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b843b53b0f65..d1b3b60669ea 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
BUG_ON(file && mddev->bitmap_info.offset);
+ if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
+ pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
+ mdname(mddev));
+ return ERR_PTR(-EBUSY);
+ }
+
bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
if (!bitmap)
return ERR_PTR(-ENOMEM);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 97afb28c6f51..6f25e3f1a1cf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6362,7 +6362,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
break;
}
}
- if (has_journal) {
+ if (has_journal || mddev->bitmap) {
export_rdev(rdev);
return -EBUSY;
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a21dbd22a2fb..a8732955f130 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7159,6 +7159,13 @@ static int raid5_run(struct mddev *mddev)
min_offset_diff = diff;
}
+ if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
+ (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
+ pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
+ mdname(mddev));
+ return -EINVAL;
+ }
+
if (mddev->reshape_position != MaxSector) {
/* Check that we can continue the reshape.
* Difficulties arise if the stripe we would write to
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b688741cb06695312f18b730653d6611e1bad28d Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Fri, 25 Aug 2017 17:34:41 +1000
Subject: [PATCH] NFS: revalidate "." etc correctly on "open".
For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.
Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).
Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.
This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate(). This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.
The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.
Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable(a)vger.kernel.org (3.9+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index db482d4c15d5..c583093a066b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1241,8 +1241,7 @@ static int nfs_weak_revalidate(struct dentry *dentry, unsigned int flags)
return 0;
}
- if (nfs_mapping_need_revalidate_inode(inode))
- error = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ error = nfs_lookup_verify_inode(inode, flags);
dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
__func__, inode->i_ino, error ? "invalid" : "valid");
return !error;
@@ -1393,6 +1392,7 @@ static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
const struct dentry_operations nfs4_dentry_operations = {
.d_revalidate = nfs4_lookup_revalidate,
+ .d_weak_revalidate = nfs_weak_revalidate,
.d_delete = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
.d_automount = nfs_d_automount,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b688741cb06695312f18b730653d6611e1bad28d Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Fri, 25 Aug 2017 17:34:41 +1000
Subject: [PATCH] NFS: revalidate "." etc correctly on "open".
For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.
Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).
Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.
This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate(). This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.
The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.
Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable(a)vger.kernel.org (3.9+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index db482d4c15d5..c583093a066b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1241,8 +1241,7 @@ static int nfs_weak_revalidate(struct dentry *dentry, unsigned int flags)
return 0;
}
- if (nfs_mapping_need_revalidate_inode(inode))
- error = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ error = nfs_lookup_verify_inode(inode, flags);
dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
__func__, inode->i_ino, error ? "invalid" : "valid");
return !error;
@@ -1393,6 +1392,7 @@ static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
const struct dentry_operations nfs4_dentry_operations = {
.d_revalidate = nfs4_lookup_revalidate,
+ .d_weak_revalidate = nfs_weak_revalidate,
.d_delete = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
.d_automount = nfs_d_automount,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b688741cb06695312f18b730653d6611e1bad28d Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Fri, 25 Aug 2017 17:34:41 +1000
Subject: [PATCH] NFS: revalidate "." etc correctly on "open".
For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.
Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).
Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.
This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate(). This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.
The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.
Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable(a)vger.kernel.org (3.9+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index db482d4c15d5..c583093a066b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1241,8 +1241,7 @@ static int nfs_weak_revalidate(struct dentry *dentry, unsigned int flags)
return 0;
}
- if (nfs_mapping_need_revalidate_inode(inode))
- error = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ error = nfs_lookup_verify_inode(inode, flags);
dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
__func__, inode->i_ino, error ? "invalid" : "valid");
return !error;
@@ -1393,6 +1392,7 @@ static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
const struct dentry_operations nfs4_dentry_operations = {
.d_revalidate = nfs4_lookup_revalidate,
+ .d_weak_revalidate = nfs_weak_revalidate,
.d_delete = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
.d_automount = nfs_d_automount,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 996478ca9c460886ac147eb0d00e99841b71d31b Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Tue, 22 Aug 2017 16:00:39 -0400
Subject: [PATCH] btrfs: change how we decide to commit transactions during
flushing
Nikolay reported that generic/273 was failing currently with ENOSPC.
Turns out this is because we get to the point where the outstanding
reservations are greater than the pinned space on the fs. This is a
mistake, previously we used the current reservation amount in
may_commit_transaction, not the entire outstanding reservation amount.
Fix this to find the minimum byte size needed to make progress in
flushing, and pass that into may_commit_transaction. From there we can
make a smarter decision on whether to commit the transaction or not.
This fixes the failure in generic/273.
>From Nikolai, IOW: when we go to the final stage of deciding whether to
do trans commit, instead of passing all the reservations from all
tickets we just pass the reservation for the current ticket. Otherwise,
in case all reservations exceed pinned space, then we don't commit
transaction and fail prematurely. Before we passed num_bytes from
flush_space, where num_bytes was the sum of all pending reserverations,
but now all we do is take the first ticket and commit the trans if we
can satisfy that.
Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Cc: stable(a)vger.kernel.org # 4.8
Reported-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Josef Bacik <jbacik(a)fb.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Tested-by: Nikolay Borisov <nborisov(a)suse.com>
[ added Nikolai's comment ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index eccaac17258d..1a6aced00a19 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4907,6 +4907,13 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
}
}
+struct reserve_ticket {
+ u64 bytes;
+ int error;
+ struct list_head list;
+ wait_queue_head_t wait;
+};
+
/**
* maybe_commit_transaction - possibly commit the transaction if its ok to
* @root - the root we're allocating for
@@ -4918,18 +4925,29 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
* will return -ENOSPC.
*/
static int may_commit_transaction(struct btrfs_fs_info *fs_info,
- struct btrfs_space_info *space_info,
- u64 bytes, int force)
+ struct btrfs_space_info *space_info)
{
+ struct reserve_ticket *ticket = NULL;
struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
struct btrfs_trans_handle *trans;
+ u64 bytes;
trans = (struct btrfs_trans_handle *)current->journal_info;
if (trans)
return -EAGAIN;
- if (force)
- goto commit;
+ spin_lock(&space_info->lock);
+ if (!list_empty(&space_info->priority_tickets))
+ ticket = list_first_entry(&space_info->priority_tickets,
+ struct reserve_ticket, list);
+ else if (!list_empty(&space_info->tickets))
+ ticket = list_first_entry(&space_info->tickets,
+ struct reserve_ticket, list);
+ bytes = (ticket) ? ticket->bytes : 0;
+ spin_unlock(&space_info->lock);
+
+ if (!bytes)
+ return 0;
/* See if there is enough pinned space to make this reservation */
if (percpu_counter_compare(&space_info->total_bytes_pinned,
@@ -4944,8 +4962,12 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
return -ENOSPC;
spin_lock(&delayed_rsv->lock);
+ if (delayed_rsv->size > bytes)
+ bytes = 0;
+ else
+ bytes -= delayed_rsv->size;
if (percpu_counter_compare(&space_info->total_bytes_pinned,
- bytes - delayed_rsv->size) < 0) {
+ bytes) < 0) {
spin_unlock(&delayed_rsv->lock);
return -ENOSPC;
}
@@ -4959,13 +4981,6 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
return btrfs_commit_transaction(trans);
}
-struct reserve_ticket {
- u64 bytes;
- int error;
- struct list_head list;
- wait_queue_head_t wait;
-};
-
/*
* Try to flush some data based on policy set by @state. This is only advisory
* and may fail for various reasons. The caller is supposed to examine the
@@ -5015,8 +5030,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
ret = 0;
break;
case COMMIT_TRANS:
- ret = may_commit_transaction(fs_info, space_info,
- num_bytes, 0);
+ ret = may_commit_transaction(fs_info, space_info);
break;
default:
ret = -ENOSPC;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sun, 29 Oct 2017 06:30:19 -0400
Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool
fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex. However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized. This is a
classic bug with "double-checked locking".
While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen. This was changed only incidentally by the large
refactor to use fs/crypto/.
Solve both problems in a trivial way that can easily be backported: just
always take the mutex. It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.
Later I'd like to make this use a helper macro like DO_ONCE(). However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.
Cc: stable(a)vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 608f6bbe0f31..472326737717 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags)
{
int i, res = -ENOMEM;
- /*
- * No need to allocate a bounce page pool if there already is one or
- * this FS won't use it.
- */
- if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool)
+ /* No need to allocate a bounce page pool if this FS won't use it. */
+ if (cop_flags & FS_CFLG_OWN_PAGES)
return 0;
mutex_lock(&fscrypt_init_mutex);
The patch below does not apply to the 4.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sun, 29 Oct 2017 06:30:19 -0400
Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool
fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex. However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized. This is a
classic bug with "double-checked locking".
While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen. This was changed only incidentally by the large
refactor to use fs/crypto/.
Solve both problems in a trivial way that can easily be backported: just
always take the mutex. It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.
Later I'd like to make this use a helper macro like DO_ONCE(). However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.
Cc: stable(a)vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 608f6bbe0f31..472326737717 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags)
{
int i, res = -ENOMEM;
- /*
- * No need to allocate a bounce page pool if there already is one or
- * this FS won't use it.
- */
- if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool)
+ /* No need to allocate a bounce page pool if this FS won't use it. */
+ if (cop_flags & FS_CFLG_OWN_PAGES)
return 0;
mutex_lock(&fscrypt_init_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sun, 29 Oct 2017 06:30:19 -0400
Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool
fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex. However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized. This is a
classic bug with "double-checked locking".
While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen. This was changed only incidentally by the large
refactor to use fs/crypto/.
Solve both problems in a trivial way that can easily be backported: just
always take the mutex. It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.
Later I'd like to make this use a helper macro like DO_ONCE(). However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.
Cc: stable(a)vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 608f6bbe0f31..472326737717 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags)
{
int i, res = -ENOMEM;
- /*
- * No need to allocate a bounce page pool if there already is one or
- * this FS won't use it.
- */
- if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool)
+ /* No need to allocate a bounce page pool if this FS won't use it. */
+ if (cop_flags & FS_CFLG_OWN_PAGES)
return 0;
mutex_lock(&fscrypt_init_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d03a6613957785e94af7a4a6212ad4af66aa5c2 Mon Sep 17 00:00:00 2001
From: Vitaly Wool <vitalywool(a)gmail.com>
Date: Fri, 17 Nov 2017 15:26:16 -0800
Subject: [PATCH] mm/z3fold.c: use kref to prevent page free/compact race
There is a race in the current z3fold implementation between
do_compact() called in a work queue context and the page release
procedure when page's kref goes to 0.
do_compact() may be waiting for page lock, which is released by
release_z3fold_page_locked right before putting the page onto the
"stale" list, and then the page may be freed as do_compact() modifies
its contents.
The mechanism currently implemented to handle that (checking the
PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
counter to guarantee that the page is not released if its compaction is
scheduled. It then becomes compaction function's responsibility to
decrease the counter and quit immediately if the page was actually
freed.
Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
Signed-off-by: Vitaly Wool <vitaly.wool(a)sonymobile.com>
Cc: <Oleksiy.Avramchenko(a)sony.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/z3fold.c b/mm/z3fold.c
index b2ba2ba585f3..39e19125d6a0 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -404,8 +404,7 @@ static void do_compact_page(struct z3fold_header *zhdr, bool locked)
WARN_ON(z3fold_page_trylock(zhdr));
else
z3fold_page_lock(zhdr);
- if (test_bit(PAGE_STALE, &page->private) ||
- !test_and_clear_bit(NEEDS_COMPACTING, &page->private)) {
+ if (WARN_ON(!test_and_clear_bit(NEEDS_COMPACTING, &page->private))) {
z3fold_page_unlock(zhdr);
return;
}
@@ -413,6 +412,11 @@ static void do_compact_page(struct z3fold_header *zhdr, bool locked)
list_del_init(&zhdr->buddy);
spin_unlock(&pool->lock);
+ if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) {
+ atomic64_dec(&pool->pages_nr);
+ return;
+ }
+
z3fold_compact_page(zhdr);
unbuddied = get_cpu_ptr(pool->unbuddied);
fchunks = num_free_chunks(zhdr);
@@ -753,9 +757,11 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
list_del_init(&zhdr->buddy);
spin_unlock(&pool->lock);
zhdr->cpu = -1;
+ kref_get(&zhdr->refcount);
do_compact_page(zhdr, true);
return;
}
+ kref_get(&zhdr->refcount);
queue_work_on(zhdr->cpu, pool->compact_wq, &zhdr->work);
z3fold_page_unlock(zhdr);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c7fd89a6407ea3a44a2a2fa12d290162c42499c4 Mon Sep 17 00:00:00 2001
From: James Hogan <jhogan(a)kernel.org>
Date: Fri, 10 Nov 2017 11:46:54 +0000
Subject: [PATCH] MIPS: Fix odd fp register warnings with MIPS64r2
Building 32-bit MIPS64r2 kernels produces warnings like the following
on certain toolchains (such as GNU assembler 2.24.90, but not GNU
assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP
save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd
from fpu_save_double and fpu_restore_16odd from fpu_restore_double:
arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1
...
arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1
...
This appears to be because .set mips64r2 does not change the FPU ABI to
64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the
command line on that toolchain, from the default FPU ABI of 32-bit due
to the -mabi=32. This makes access to the odd FPU registers invalid.
Fix by explicitly changing the FPU ABI with .set fp=64 directives in
fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up
in asmmacro.h so fp doesn't turn into $30.
Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.0+: 22b8ba765a72: MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
Cc: <stable(a)vger.kernel.org> # 4.0+
Patchwork: https://patchwork.linux-mips.org/patch/17656/
diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index b815d7b3bd27..feb069cbf44e 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -19,6 +19,9 @@
#include <asm/asmmacro-64.h>
#endif
+/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */
+#undef fp
+
/*
* Helper macros for generating raw instruction encodings.
*/
@@ -105,6 +108,7 @@
.macro fpu_save_16odd thread
.set push
.set mips64r2
+ .set fp=64
SET_HARDFLOAT
sdc1 $f1, THREAD_FPR1(\thread)
sdc1 $f3, THREAD_FPR3(\thread)
@@ -163,6 +167,7 @@
.macro fpu_restore_16odd thread
.set push
.set mips64r2
+ .set fp=64
SET_HARDFLOAT
ldc1 $f1, THREAD_FPR1(\thread)
ldc1 $f3, THREAD_FPR3(\thread)
@@ -234,9 +239,6 @@
.endm
#ifdef TOOLCHAIN_SUPPORTS_MSA
-/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */
-#undef fp
-
.macro _cfcmsa rd, cs
.set push
.set mips32r2
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 22b8ba765a726d90e9830ff6134c32b04f12c10f Mon Sep 17 00:00:00 2001
From: James Hogan <jhogan(a)kernel.org>
Date: Mon, 3 Jul 2017 23:41:47 +0100
Subject: [PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
32-bit kernels can be configured to support MIPS64, in which case
neither CONFIG_64BIT or CONFIG_CPU_MIPS32_R* will be set. This causes
the CP0_Status.FR checks at the point of floating point register save
and restore to be compiled out, which results in odd FP registers not
being saved or restored to the task or signal context even when
CP0_Status.FR is set.
Fix the ifdefs to use CONFIG_CPU_MIPSR2 and CONFIG_CPU_MIPSR6, which are
enabled for the relevant revisions of either MIPS32 or MIPS64, along
with some other CPUs such as Octeon (r2), Loongson1 (r2), XLP (r2),
Loongson 3A R2.
The suspect code originates from commit 597ce1723e0f ("MIPS: Support for
64-bit FP with O32 binaries") in v3.14, however the code in
__enable_fpu() was consistent and refused to set FR=1, falling back to
software FPU emulation. This was suboptimal but should be functionally
correct.
Commit fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6
CPU") in v4.2 (and stable tagged back to 4.0) later introduced the bug
by updating __enable_fpu() to set FR=1 but failing to update the other
similar ifdefs to enable FR=1 state handling.
Fixes: fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.0+
Patchwork: https://patchwork.linux-mips.org/patch/16739/
diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index 83054f79f72a..b815d7b3bd27 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -126,8 +126,8 @@
.endm
.macro fpu_save_double thread status tmp
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
sll \tmp, \status, 5
bgez \tmp, 10f
fpu_save_16odd \thread
@@ -184,8 +184,8 @@
.endm
.macro fpu_restore_double thread status tmp
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
sll \tmp, \status, 5
bgez \tmp, 10f # 16 register mode?
diff --git a/arch/mips/kernel/r4k_fpu.S b/arch/mips/kernel/r4k_fpu.S
index 0a83b1708b3c..8e3a6020c613 100644
--- a/arch/mips/kernel/r4k_fpu.S
+++ b/arch/mips/kernel/r4k_fpu.S
@@ -40,8 +40,8 @@
*/
LEAF(_save_fp)
EXPORT_SYMBOL(_save_fp)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
mfc0 t0, CP0_STATUS
#endif
fpu_save_double a0 t0 t1 # clobbers t1
@@ -52,8 +52,8 @@ EXPORT_SYMBOL(_save_fp)
* Restore a thread's fp context.
*/
LEAF(_restore_fp)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
mfc0 t0, CP0_STATUS
#endif
fpu_restore_double a0 t0 t1 # clobbers t1
@@ -246,11 +246,11 @@ LEAF(_save_fp_context)
cfc1 t1, fcr31
.set pop
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
.set push
SET_HARDFLOAT
-#ifdef CONFIG_CPU_MIPS32_R2
+#ifdef CONFIG_CPU_MIPSR2
.set mips32r2
.set fp=64
mfc0 t0, CP0_STATUS
@@ -314,11 +314,11 @@ LEAF(_save_fp_context)
LEAF(_restore_fp_context)
EX lw t1, 0(a1)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
.set push
SET_HARDFLOAT
-#ifdef CONFIG_CPU_MIPS32_R2
+#ifdef CONFIG_CPU_MIPSR2
.set mips32r2
.set fp=64
mfc0 t0, CP0_STATUS
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 22b8ba765a726d90e9830ff6134c32b04f12c10f Mon Sep 17 00:00:00 2001
From: James Hogan <jhogan(a)kernel.org>
Date: Mon, 3 Jul 2017 23:41:47 +0100
Subject: [PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
32-bit kernels can be configured to support MIPS64, in which case
neither CONFIG_64BIT or CONFIG_CPU_MIPS32_R* will be set. This causes
the CP0_Status.FR checks at the point of floating point register save
and restore to be compiled out, which results in odd FP registers not
being saved or restored to the task or signal context even when
CP0_Status.FR is set.
Fix the ifdefs to use CONFIG_CPU_MIPSR2 and CONFIG_CPU_MIPSR6, which are
enabled for the relevant revisions of either MIPS32 or MIPS64, along
with some other CPUs such as Octeon (r2), Loongson1 (r2), XLP (r2),
Loongson 3A R2.
The suspect code originates from commit 597ce1723e0f ("MIPS: Support for
64-bit FP with O32 binaries") in v3.14, however the code in
__enable_fpu() was consistent and refused to set FR=1, falling back to
software FPU emulation. This was suboptimal but should be functionally
correct.
Commit fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6
CPU") in v4.2 (and stable tagged back to 4.0) later introduced the bug
by updating __enable_fpu() to set FR=1 but failing to update the other
similar ifdefs to enable FR=1 state handling.
Fixes: fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.0+
Patchwork: https://patchwork.linux-mips.org/patch/16739/
diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index 83054f79f72a..b815d7b3bd27 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -126,8 +126,8 @@
.endm
.macro fpu_save_double thread status tmp
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
sll \tmp, \status, 5
bgez \tmp, 10f
fpu_save_16odd \thread
@@ -184,8 +184,8 @@
.endm
.macro fpu_restore_double thread status tmp
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
sll \tmp, \status, 5
bgez \tmp, 10f # 16 register mode?
diff --git a/arch/mips/kernel/r4k_fpu.S b/arch/mips/kernel/r4k_fpu.S
index 0a83b1708b3c..8e3a6020c613 100644
--- a/arch/mips/kernel/r4k_fpu.S
+++ b/arch/mips/kernel/r4k_fpu.S
@@ -40,8 +40,8 @@
*/
LEAF(_save_fp)
EXPORT_SYMBOL(_save_fp)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
mfc0 t0, CP0_STATUS
#endif
fpu_save_double a0 t0 t1 # clobbers t1
@@ -52,8 +52,8 @@ EXPORT_SYMBOL(_save_fp)
* Restore a thread's fp context.
*/
LEAF(_restore_fp)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
mfc0 t0, CP0_STATUS
#endif
fpu_restore_double a0 t0 t1 # clobbers t1
@@ -246,11 +246,11 @@ LEAF(_save_fp_context)
cfc1 t1, fcr31
.set pop
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
.set push
SET_HARDFLOAT
-#ifdef CONFIG_CPU_MIPS32_R2
+#ifdef CONFIG_CPU_MIPSR2
.set mips32r2
.set fp=64
mfc0 t0, CP0_STATUS
@@ -314,11 +314,11 @@ LEAF(_save_fp_context)
LEAF(_restore_fp_context)
EX lw t1, 0(a1)
-#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \
- defined(CONFIG_CPU_MIPS32_R6)
+#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
+ defined(CONFIG_CPU_MIPSR6)
.set push
SET_HARDFLOAT
-#ifdef CONFIG_CPU_MIPS32_R2
+#ifdef CONFIG_CPU_MIPSR2
.set mips32r2
.set fp=64
mfc0 t0, CP0_STATUS
On Mon, Nov 27, 2017 at 02:13:00PM +0000, James Hogan wrote:
> On Mon, Nov 27, 2017 at 01:56:49PM +0100, Greg KH wrote:
> > On Mon, Nov 27, 2017 at 12:40:36PM +0000, James Hogan wrote:
> > > Hi Greg,
> > >
> > > On Mon, Nov 27, 2017 at 01:35:46PM +0100, gregkh(a)linuxfoundation.org wrote:
> > > > The patch below was submitted to be applied to the 4.9-stable tree.
> > > >
> > > > I fail to see how this patch meets the stable kernel rules as found at
> > > > Documentation/process/stable-kernel-rules.rst.
> > > >
> > > > I could be totally wrong, and if so, please respond to
> > > > <stable(a)vger.kernel.org> and let me know why this patch should be
> > > > applied. Otherwise, it is now dropped from my patch queues, never to be
> > > > seen again.
> > >
> > > I should have adjusted the commit message. KERN_WARN doesn't exist so it
> > > actually fixes a build error as well as switching to pr_warn().
> >
> > What build error? I have not heard of this breaking the build on 4.9
> > for the past year, is it in some config that no one uses? :)
>
> The LEDE project has been carrying the patch [1] since February when
> they added 4.9 support (their 4.4 support had a slightly earlier version
> of the driver added with just a plain printk, no KERN_WARN).
>
> They have both CONFIG_SOC_MT7620 and CONFIG_PCI=y in their ralink mt7620
> config [2], and they are keeping up to date with stable releases [3], so
> I have no doubt they would appreciate having the patch applied to
> upstream stable to reduce their delta.
>
> The only defconfigs in mainline which enable this platform
> (CONFIG_SOC_MT7620) are omega2p_defconfig and vocore2_defconfig, which
> were added in August by Harvey to help widen our internal continuous
> build & boot test coverage. Neither defconfig enables CONFIG_PCI yet
> which is required to see the build failure below, but regardless it is a
> valid configuration which LEDE is actively using.
>
> arch/mips/pci/pci-mt7620.c: In function ‘wait_pciephy_busy’:
> arch/mips/pci/pci-mt7620.c:123:11: error: ‘KERN_WARN’ undeclared (first use in this function)
> printk(KERN_WARN "PCIE-PHY retry failed.\n");
> ^~~~~~~~~
>
> John: I'm not familiar with the hardware, but would it be appropriate to
> add CONFIG_PCI=y to either of those 2 defconfigs (omega2p_defconfig and
> vocore2_defconfig) so this driver gets some upstream build[/boot]
> testing?
>
> Anyway, hopefully that helps allay stable backport concerns.
Yes, thanks, that explains it a lot better, now queued up.
greg k-h
hi Greg,
Please include the commit below in -stable, once it gets upstream.
Maybe also add:
Fixes: b6366f048e0c: ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
because it fixes a pretty old bug.
Thanks,
Ingo
----- Forwarded message from "tip-bot for Steven Rostedt (Red Hat)" <tipbot(a)zytor.com> -----
Date: Tue, 10 Oct 2017 04:02:17 -0700
From: "tip-bot for Steven Rostedt (Red Hat)" <tipbot(a)zytor.com>
To: linux-tip-commits(a)vger.kernel.org
Cc: rostedt(a)goodmis.org, williams(a)redhat.com, mingo(a)kernel.org, tglx(a)linutronix.de, peterz(a)infradead.org, linux-kernel(a)vger.kernel.org, jkacur(a)redhat.com, bristot(a)redhat.com,
torvalds(a)linux-foundation.org, efault(a)gmx.de, hpa(a)zytor.com, swood(a)redhat.com
Subject: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic
Commit-ID: 4bdced5c9a2922521e325896a7bbbf0132c94e56
Gitweb: https://git.kernel.org/tip/4bdced5c9a2922521e325896a7bbbf0132c94e56
Author: Steven Rostedt (Red Hat) <rostedt(a)goodmis.org>
AuthorDate: Fri, 6 Oct 2017 14:05:04 -0400
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:40 +0200
sched/rt: Simplify the IPI based RT balancing logic
When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.
When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:
b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.
Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.
This greatly simplifies the code!
The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:
rto_push_work - the irq work to process each CPU set in rto_mask
rto_lock - the lock to protect some of the other rto fields
rto_loop_start - an atomic that keeps contention down on rto_lock
the first CPU scheduling in a lower priority task
is the one to kick off the process.
rto_loop_next - an atomic that gets incremented for each CPU that
schedules in a lower priority task.
rto_loop - a variable protected by rto_lock that is used to
compare against rto_loop_next
rto_cpu - The cpu to send the next IPI to, also protected by
the rto_lock.
When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.
If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.
When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.
Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.
If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.
This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?
Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mike Galbraith <efault(a)gmx.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Scott Wood <swood(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
kernel/sched/rt.c | 316 ++++++++++++++++++------------------------------
kernel/sched/sched.h | 24 ++--
kernel/sched/topology.c | 6 +
3 files changed, 138 insertions(+), 208 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0af5ca9..fda2799 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -73,10 +73,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
-#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI)
-static void push_irq_work_func(struct irq_work *work);
-#endif
-
void init_rt_rq(struct rt_rq *rt_rq)
{
struct rt_prio_array *array;
@@ -96,13 +92,6 @@ void init_rt_rq(struct rt_rq *rt_rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
-
-#ifdef HAVE_RT_PUSH_IPI
- rt_rq->push_flags = 0;
- rt_rq->push_cpu = nr_cpu_ids;
- raw_spin_lock_init(&rt_rq->push_lock);
- init_irq_work(&rt_rq->push_work, push_irq_work_func);
-#endif
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1875,241 +1864,166 @@ static void push_rt_tasks(struct rq *rq)
}
#ifdef HAVE_RT_PUSH_IPI
+
/*
- * The search for the next cpu always starts at rq->cpu and ends
- * when we reach rq->cpu again. It will never return rq->cpu.
- * This returns the next cpu to check, or nr_cpu_ids if the loop
- * is complete.
+ * When a high priority task schedules out from a CPU and a lower priority
+ * task is scheduled in, a check is made to see if there's any RT tasks
+ * on other CPUs that are waiting to run because a higher priority RT task
+ * is currently running on its CPU. In this case, the CPU with multiple RT
+ * tasks queued on it (overloaded) needs to be notified that a CPU has opened
+ * up that may be able to run one of its non-running queued RT tasks.
+ *
+ * All CPUs with overloaded RT tasks need to be notified as there is currently
+ * no way to know which of these CPUs have the highest priority task waiting
+ * to run. Instead of trying to take a spinlock on each of these CPUs,
+ * which has shown to cause large latency when done on machines with many
+ * CPUs, sending an IPI to the CPUs to have them push off the overloaded
+ * RT tasks waiting to run.
+ *
+ * Just sending an IPI to each of the CPUs is also an issue, as on large
+ * count CPU machines, this can cause an IPI storm on a CPU, especially
+ * if its the only CPU with multiple RT tasks queued, and a large number
+ * of CPUs scheduling a lower priority task at the same time.
+ *
+ * Each root domain has its own irq work function that can iterate over
+ * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT
+ * tassk must be checked if there's one or many CPUs that are lowering
+ * their priority, there's a single irq work iterator that will try to
+ * push off RT tasks that are waiting to run.
+ *
+ * When a CPU schedules a lower priority task, it will kick off the
+ * irq work iterator that will jump to each CPU with overloaded RT tasks.
+ * As it only takes the first CPU that schedules a lower priority task
+ * to start the process, the rto_start variable is incremented and if
+ * the atomic result is one, then that CPU will try to take the rto_lock.
+ * This prevents high contention on the lock as the process handles all
+ * CPUs scheduling lower priority tasks.
+ *
+ * All CPUs that are scheduling a lower priority task will increment the
+ * rt_loop_next variable. This will make sure that the irq work iterator
+ * checks all RT overloaded CPUs whenever a CPU schedules a new lower
+ * priority task, even if the iterator is in the middle of a scan. Incrementing
+ * the rt_loop_next will cause the iterator to perform another scan.
*
- * rq->rt.push_cpu holds the last cpu returned by this function,
- * or if this is the first instance, it must hold rq->cpu.
*/
static int rto_next_cpu(struct rq *rq)
{
- int prev_cpu = rq->rt.push_cpu;
+ struct root_domain *rd = rq->rd;
+ int next;
int cpu;
- cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
-
/*
- * If the previous cpu is less than the rq's CPU, then it already
- * passed the end of the mask, and has started from the beginning.
- * We end if the next CPU is greater or equal to rq's CPU.
+ * When starting the IPI RT pushing, the rto_cpu is set to -1,
+ * rt_next_cpu() will simply return the first CPU found in
+ * the rto_mask.
+ *
+ * If rto_next_cpu() is called with rto_cpu is a valid cpu, it
+ * will return the next CPU found in the rto_mask.
+ *
+ * If there are no more CPUs left in the rto_mask, then a check is made
+ * against rto_loop and rto_loop_next. rto_loop is only updated with
+ * the rto_lock held, but any CPU may increment the rto_loop_next
+ * without any locking.
*/
- if (prev_cpu < rq->cpu) {
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
+ for (;;) {
- } else if (cpu >= nr_cpu_ids) {
- /*
- * We passed the end of the mask, start at the beginning.
- * If the result is greater or equal to the rq's CPU, then
- * the loop is finished.
- */
- cpu = cpumask_first(rq->rd->rto_mask);
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
- }
- rq->rt.push_cpu = cpu;
+ /* When rto_cpu is -1 this acts like cpumask_first() */
+ cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
- /* Return cpu to let the caller know if the loop is finished or not */
- return cpu;
-}
+ rd->rto_cpu = cpu;
-static int find_next_push_cpu(struct rq *rq)
-{
- struct rq *next_rq;
- int cpu;
+ if (cpu < nr_cpu_ids)
+ return cpu;
- while (1) {
- cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- break;
- next_rq = cpu_rq(cpu);
+ rd->rto_cpu = -1;
+
+ /*
+ * ACQUIRE ensures we see the @rto_mask changes
+ * made prior to the @next value observed.
+ *
+ * Matches WMB in rt_set_overload().
+ */
+ next = atomic_read_acquire(&rd->rto_loop_next);
- /* Make sure the next rq can push to this rq */
- if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ if (rd->rto_loop == next)
break;
+
+ rd->rto_loop = next;
}
- return cpu;
+ return -1;
}
-#define RT_PUSH_IPI_EXECUTING 1
-#define RT_PUSH_IPI_RESTART 2
+static inline bool rto_start_trylock(atomic_t *v)
+{
+ return !atomic_cmpxchg_acquire(v, 0, 1);
+}
-/*
- * When a high priority task schedules out from a CPU and a lower priority
- * task is scheduled in, a check is made to see if there's any RT tasks
- * on other CPUs that are waiting to run because a higher priority RT task
- * is currently running on its CPU. In this case, the CPU with multiple RT
- * tasks queued on it (overloaded) needs to be notified that a CPU has opened
- * up that may be able to run one of its non-running queued RT tasks.
- *
- * On large CPU boxes, there's the case that several CPUs could schedule
- * a lower priority task at the same time, in which case it will look for
- * any overloaded CPUs that it could pull a task from. To do this, the runqueue
- * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting
- * for a single overloaded CPU's runqueue lock can produce a large latency.
- * (This has actually been observed on large boxes running cyclictest).
- * Instead of taking the runqueue lock of the overloaded CPU, each of the
- * CPUs that scheduled a lower priority task simply sends an IPI to the
- * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with
- * lots of contention. The overloaded CPU will look to push its non-running
- * RT task off, and if it does, it can then ignore the other IPIs coming
- * in, and just pass those IPIs off to any other overloaded CPU.
- *
- * When a CPU schedules a lower priority task, it only sends an IPI to
- * the "next" CPU that has overloaded RT tasks. This prevents IPI storms,
- * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with
- * RT overloaded tasks, would cause 100 IPIs to go out at once.
- *
- * The overloaded RT CPU, when receiving an IPI, will try to push off its
- * overloaded RT tasks and then send an IPI to the next CPU that has
- * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks
- * have completed. Just because a CPU may have pushed off its own overloaded
- * RT task does not mean it should stop sending the IPI around to other
- * overloaded CPUs. There may be another RT task waiting to run on one of
- * those CPUs that are of higher priority than the one that was just
- * pushed.
- *
- * An optimization that could possibly be made is to make a CPU array similar
- * to the cpupri array mask of all running RT tasks, but for the overloaded
- * case, then the IPI could be sent to only the CPU with the highest priority
- * RT task waiting, and that CPU could send off further IPIs to the CPU with
- * the next highest waiting task. Since the overloaded case is much less likely
- * to happen, the complexity of this implementation may not be worth it.
- * Instead, just send an IPI around to all overloaded CPUs.
- *
- * The rq->rt.push_flags holds the status of the IPI that is going around.
- * A run queue can only send out a single IPI at a time. The possible flags
- * for rq->rt.push_flags are:
- *
- * (None or zero): No IPI is going around for the current rq
- * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around
- * RT_PUSH_IPI_RESTART: The priority of the running task for the rq
- * has changed, and the IPI should restart
- * circulating the overloaded CPUs again.
- *
- * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated
- * before sending to the next CPU.
- *
- * Instead of having all CPUs that schedule a lower priority task send
- * an IPI to the same "first" CPU in the RT overload mask, they send it
- * to the next overloaded CPU after their own CPU. This helps distribute
- * the work when there's more than one overloaded CPU and multiple CPUs
- * scheduling in lower priority tasks.
- *
- * When a rq schedules a lower priority task than what was currently
- * running, the next CPU with overloaded RT tasks is examined first.
- * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower
- * priority task, it will send an IPI first to CPU 5, then CPU 5 will
- * send to CPU 1 if it is still overloaded. CPU 1 will clear the
- * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set.
- *
- * The first CPU to notice IPI_RESTART is set, will clear that flag and then
- * send an IPI to the next overloaded CPU after the rq->cpu and not the next
- * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3
- * schedules a lower priority task, and the IPI_RESTART gets set while the
- * handling is being done on CPU 5, it will clear the flag and send it back to
- * CPU 4 instead of CPU 1.
- *
- * Note, the above logic can be disabled by turning off the sched_feature
- * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be
- * taken by the CPU requesting a pull and the waiting RT task will be pulled
- * by that CPU. This may be fine for machines with few CPUs.
- */
-static void tell_cpu_to_push(struct rq *rq)
+static inline void rto_start_unlock(atomic_t *v)
{
- int cpu;
+ atomic_set_release(v, 0);
+}
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- raw_spin_lock(&rq->rt.push_lock);
- /* Make sure it's still executing */
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- /*
- * Tell the IPI to restart the loop as things have
- * changed since it started.
- */
- rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
- raw_spin_unlock(&rq->rt.push_lock);
- return;
- }
- raw_spin_unlock(&rq->rt.push_lock);
- }
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu = -1;
- /* When here, there's no IPI going around */
+ /* Keep the loop going if the IPI is currently active */
+ atomic_inc(&rq->rd->rto_loop_next);
- rq->rt.push_cpu = rq->cpu;
- cpu = find_next_push_cpu(rq);
- if (cpu >= nr_cpu_ids)
+ /* Only one CPU can initiate a loop at a time */
+ if (!rto_start_trylock(&rq->rd->rto_loop_start))
return;
- rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+ raw_spin_lock(&rq->rd->rto_lock);
+
+ /*
+ * The rto_cpu is updated under the lock, if it has a valid cpu
+ * then the IPI is still running and will continue due to the
+ * update to loop_next, and nothing needs to be done here.
+ * Otherwise it is finishing up and an ipi needs to be sent.
+ */
+ if (rq->rd->rto_cpu < 0)
+ cpu = rto_next_cpu(rq);
- irq_work_queue_on(&rq->rt.push_work, cpu);
+ raw_spin_unlock(&rq->rd->rto_lock);
+
+ rto_start_unlock(&rq->rd->rto_loop_start);
+
+ if (cpu >= 0)
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
/* Called from hardirq context */
-static void try_to_push_tasks(void *arg)
+void rto_push_irq_work_func(struct irq_work *work)
{
- struct rt_rq *rt_rq = arg;
- struct rq *rq, *src_rq;
- int this_cpu;
+ struct rq *rq;
int cpu;
- this_cpu = rt_rq->push_cpu;
+ rq = this_rq();
- /* Paranoid check */
- BUG_ON(this_cpu != smp_processor_id());
-
- rq = cpu_rq(this_cpu);
- src_rq = rq_of_rt_rq(rt_rq);
-
-again:
+ /*
+ * We do not need to grab the lock to check for has_pushable_tasks.
+ * When it gets updated, a check is made if a push is possible.
+ */
if (has_pushable_tasks(rq)) {
raw_spin_lock(&rq->lock);
- push_rt_task(rq);
+ push_rt_tasks(rq);
raw_spin_unlock(&rq->lock);
}
- /* Pass the IPI to the next rt overloaded queue */
- raw_spin_lock(&rt_rq->push_lock);
- /*
- * If the source queue changed since the IPI went out,
- * we need to restart the search from that CPU again.
- */
- if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
- rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
- rt_rq->push_cpu = src_rq->cpu;
- }
+ raw_spin_lock(&rq->rd->rto_lock);
- cpu = find_next_push_cpu(src_rq);
+ /* Pass the IPI to the next rt overloaded queue */
+ cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
- raw_spin_unlock(&rt_rq->push_lock);
+ raw_spin_unlock(&rq->rd->rto_lock);
- if (cpu >= nr_cpu_ids)
+ if (cpu < 0)
return;
- /*
- * It is possible that a restart caused this CPU to be
- * chosen again. Don't bother with an IPI, just see if we
- * have more to push.
- */
- if (unlikely(cpu == rq->cpu))
- goto again;
-
/* Try the next RT overloaded CPU */
- irq_work_queue_on(&rt_rq->push_work, cpu);
-}
-
-static void push_irq_work_func(struct irq_work *work)
-{
- struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
-
- try_to_push_tasks(rt_rq);
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
#endif /* HAVE_RT_PUSH_IPI */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a81c978..8aa24b4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -505,7 +505,7 @@ static inline int rt_bandwidth_enabled(void)
}
/* RT IPI pull logic requires IRQ_WORK */
-#ifdef CONFIG_IRQ_WORK
+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP)
# define HAVE_RT_PUSH_IPI
#endif
@@ -527,12 +527,6 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
-#ifdef HAVE_RT_PUSH_IPI
- int push_flags;
- int push_cpu;
- struct irq_work push_work;
- raw_spinlock_t push_lock;
-#endif
#endif /* CONFIG_SMP */
int rt_queued;
@@ -641,6 +635,19 @@ struct root_domain {
struct dl_bw dl_bw;
struct cpudl cpudl;
+#ifdef HAVE_RT_PUSH_IPI
+ /*
+ * For IPI pull requests, loop across the rto_mask.
+ */
+ struct irq_work rto_push_work;
+ raw_spinlock_t rto_lock;
+ /* These are only updated and read within rto_lock */
+ int rto_loop;
+ int rto_cpu;
+ /* These atomics are updated outside of a lock */
+ atomic_t rto_loop_next;
+ atomic_t rto_loop_start;
+#endif
/*
* The "RT overload" flag: it gets set if a CPU has more than
* one runnable RT task.
@@ -658,6 +665,9 @@ extern void init_defrootdomain(void);
extern int sched_init_domains(const struct cpumask *cpu_map);
extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
+#ifdef HAVE_RT_PUSH_IPI
+extern void rto_push_irq_work_func(struct irq_work *work);
+#endif
#endif /* CONFIG_SMP */
/*
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f51d123..e50450c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -268,6 +268,12 @@ static int init_rootdomain(struct root_domain *rd)
if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;
+#ifdef HAVE_RT_PUSH_IPI
+ rd->rto_cpu = -1;
+ raw_spin_lock_init(&rd->rto_lock);
+ init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
+#endif
+
init_dl_bw(&rd->dl_bw);
if (cpudl_init(&rd->cpudl) != 0)
goto free_rto_mask;
----- End forwarded message -----
This is a note to let you know that I've just added the patch titled
sched/rt: Simplify the IPI based RT balancing logic
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sched-rt-simplify-the-ipi-based-rt-balancing-logic.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4bdced5c9a2922521e325896a7bbbf0132c94e56 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt(a)goodmis.org>
Date: Fri, 6 Oct 2017 14:05:04 -0400
Subject: sched/rt: Simplify the IPI based RT balancing logic
From: Steven Rostedt (Red Hat) <rostedt(a)goodmis.org>
commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream.
When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.
When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:
b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.
Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.
This greatly simplifies the code!
The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:
rto_push_work - the irq work to process each CPU set in rto_mask
rto_lock - the lock to protect some of the other rto fields
rto_loop_start - an atomic that keeps contention down on rto_lock
the first CPU scheduling in a lower priority task
is the one to kick off the process.
rto_loop_next - an atomic that gets incremented for each CPU that
schedules in a lower priority task.
rto_loop - a variable protected by rto_lock that is used to
compare against rto_loop_next
rto_cpu - The cpu to send the next IPI to, also protected by
the rto_lock.
When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.
If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.
When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.
Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.
If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.
This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?
Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mike Galbraith <efault(a)gmx.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Scott Wood <swood(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/rt.c | 316 +++++++++++++++++-------------------------------
kernel/sched/sched.h | 24 ++-
kernel/sched/topology.c | 6
3 files changed, 138 insertions(+), 208 deletions(-)
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -74,10 +74,6 @@ static void start_rt_bandwidth(struct rt
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
-#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI)
-static void push_irq_work_func(struct irq_work *work);
-#endif
-
void init_rt_rq(struct rt_rq *rt_rq)
{
struct rt_prio_array *array;
@@ -97,13 +93,6 @@ void init_rt_rq(struct rt_rq *rt_rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
-
-#ifdef HAVE_RT_PUSH_IPI
- rt_rq->push_flags = 0;
- rt_rq->push_cpu = nr_cpu_ids;
- raw_spin_lock_init(&rt_rq->push_lock);
- init_irq_work(&rt_rq->push_work, push_irq_work_func);
-#endif
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1876,241 +1865,166 @@ static void push_rt_tasks(struct rq *rq)
}
#ifdef HAVE_RT_PUSH_IPI
+
/*
- * The search for the next cpu always starts at rq->cpu and ends
- * when we reach rq->cpu again. It will never return rq->cpu.
- * This returns the next cpu to check, or nr_cpu_ids if the loop
- * is complete.
+ * When a high priority task schedules out from a CPU and a lower priority
+ * task is scheduled in, a check is made to see if there's any RT tasks
+ * on other CPUs that are waiting to run because a higher priority RT task
+ * is currently running on its CPU. In this case, the CPU with multiple RT
+ * tasks queued on it (overloaded) needs to be notified that a CPU has opened
+ * up that may be able to run one of its non-running queued RT tasks.
+ *
+ * All CPUs with overloaded RT tasks need to be notified as there is currently
+ * no way to know which of these CPUs have the highest priority task waiting
+ * to run. Instead of trying to take a spinlock on each of these CPUs,
+ * which has shown to cause large latency when done on machines with many
+ * CPUs, sending an IPI to the CPUs to have them push off the overloaded
+ * RT tasks waiting to run.
+ *
+ * Just sending an IPI to each of the CPUs is also an issue, as on large
+ * count CPU machines, this can cause an IPI storm on a CPU, especially
+ * if its the only CPU with multiple RT tasks queued, and a large number
+ * of CPUs scheduling a lower priority task at the same time.
+ *
+ * Each root domain has its own irq work function that can iterate over
+ * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT
+ * tassk must be checked if there's one or many CPUs that are lowering
+ * their priority, there's a single irq work iterator that will try to
+ * push off RT tasks that are waiting to run.
+ *
+ * When a CPU schedules a lower priority task, it will kick off the
+ * irq work iterator that will jump to each CPU with overloaded RT tasks.
+ * As it only takes the first CPU that schedules a lower priority task
+ * to start the process, the rto_start variable is incremented and if
+ * the atomic result is one, then that CPU will try to take the rto_lock.
+ * This prevents high contention on the lock as the process handles all
+ * CPUs scheduling lower priority tasks.
+ *
+ * All CPUs that are scheduling a lower priority task will increment the
+ * rt_loop_next variable. This will make sure that the irq work iterator
+ * checks all RT overloaded CPUs whenever a CPU schedules a new lower
+ * priority task, even if the iterator is in the middle of a scan. Incrementing
+ * the rt_loop_next will cause the iterator to perform another scan.
*
- * rq->rt.push_cpu holds the last cpu returned by this function,
- * or if this is the first instance, it must hold rq->cpu.
*/
static int rto_next_cpu(struct rq *rq)
{
- int prev_cpu = rq->rt.push_cpu;
+ struct root_domain *rd = rq->rd;
+ int next;
int cpu;
- cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
-
/*
- * If the previous cpu is less than the rq's CPU, then it already
- * passed the end of the mask, and has started from the beginning.
- * We end if the next CPU is greater or equal to rq's CPU.
+ * When starting the IPI RT pushing, the rto_cpu is set to -1,
+ * rt_next_cpu() will simply return the first CPU found in
+ * the rto_mask.
+ *
+ * If rto_next_cpu() is called with rto_cpu is a valid cpu, it
+ * will return the next CPU found in the rto_mask.
+ *
+ * If there are no more CPUs left in the rto_mask, then a check is made
+ * against rto_loop and rto_loop_next. rto_loop is only updated with
+ * the rto_lock held, but any CPU may increment the rto_loop_next
+ * without any locking.
*/
- if (prev_cpu < rq->cpu) {
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
+ for (;;) {
- } else if (cpu >= nr_cpu_ids) {
- /*
- * We passed the end of the mask, start at the beginning.
- * If the result is greater or equal to the rq's CPU, then
- * the loop is finished.
- */
- cpu = cpumask_first(rq->rd->rto_mask);
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
- }
- rq->rt.push_cpu = cpu;
+ /* When rto_cpu is -1 this acts like cpumask_first() */
+ cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
- /* Return cpu to let the caller know if the loop is finished or not */
- return cpu;
-}
+ rd->rto_cpu = cpu;
-static int find_next_push_cpu(struct rq *rq)
-{
- struct rq *next_rq;
- int cpu;
+ if (cpu < nr_cpu_ids)
+ return cpu;
- while (1) {
- cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- break;
- next_rq = cpu_rq(cpu);
+ rd->rto_cpu = -1;
+
+ /*
+ * ACQUIRE ensures we see the @rto_mask changes
+ * made prior to the @next value observed.
+ *
+ * Matches WMB in rt_set_overload().
+ */
+ next = atomic_read_acquire(&rd->rto_loop_next);
- /* Make sure the next rq can push to this rq */
- if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ if (rd->rto_loop == next)
break;
+
+ rd->rto_loop = next;
}
- return cpu;
+ return -1;
}
-#define RT_PUSH_IPI_EXECUTING 1
-#define RT_PUSH_IPI_RESTART 2
+static inline bool rto_start_trylock(atomic_t *v)
+{
+ return !atomic_cmpxchg_acquire(v, 0, 1);
+}
-/*
- * When a high priority task schedules out from a CPU and a lower priority
- * task is scheduled in, a check is made to see if there's any RT tasks
- * on other CPUs that are waiting to run because a higher priority RT task
- * is currently running on its CPU. In this case, the CPU with multiple RT
- * tasks queued on it (overloaded) needs to be notified that a CPU has opened
- * up that may be able to run one of its non-running queued RT tasks.
- *
- * On large CPU boxes, there's the case that several CPUs could schedule
- * a lower priority task at the same time, in which case it will look for
- * any overloaded CPUs that it could pull a task from. To do this, the runqueue
- * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting
- * for a single overloaded CPU's runqueue lock can produce a large latency.
- * (This has actually been observed on large boxes running cyclictest).
- * Instead of taking the runqueue lock of the overloaded CPU, each of the
- * CPUs that scheduled a lower priority task simply sends an IPI to the
- * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with
- * lots of contention. The overloaded CPU will look to push its non-running
- * RT task off, and if it does, it can then ignore the other IPIs coming
- * in, and just pass those IPIs off to any other overloaded CPU.
- *
- * When a CPU schedules a lower priority task, it only sends an IPI to
- * the "next" CPU that has overloaded RT tasks. This prevents IPI storms,
- * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with
- * RT overloaded tasks, would cause 100 IPIs to go out at once.
- *
- * The overloaded RT CPU, when receiving an IPI, will try to push off its
- * overloaded RT tasks and then send an IPI to the next CPU that has
- * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks
- * have completed. Just because a CPU may have pushed off its own overloaded
- * RT task does not mean it should stop sending the IPI around to other
- * overloaded CPUs. There may be another RT task waiting to run on one of
- * those CPUs that are of higher priority than the one that was just
- * pushed.
- *
- * An optimization that could possibly be made is to make a CPU array similar
- * to the cpupri array mask of all running RT tasks, but for the overloaded
- * case, then the IPI could be sent to only the CPU with the highest priority
- * RT task waiting, and that CPU could send off further IPIs to the CPU with
- * the next highest waiting task. Since the overloaded case is much less likely
- * to happen, the complexity of this implementation may not be worth it.
- * Instead, just send an IPI around to all overloaded CPUs.
- *
- * The rq->rt.push_flags holds the status of the IPI that is going around.
- * A run queue can only send out a single IPI at a time. The possible flags
- * for rq->rt.push_flags are:
- *
- * (None or zero): No IPI is going around for the current rq
- * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around
- * RT_PUSH_IPI_RESTART: The priority of the running task for the rq
- * has changed, and the IPI should restart
- * circulating the overloaded CPUs again.
- *
- * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated
- * before sending to the next CPU.
- *
- * Instead of having all CPUs that schedule a lower priority task send
- * an IPI to the same "first" CPU in the RT overload mask, they send it
- * to the next overloaded CPU after their own CPU. This helps distribute
- * the work when there's more than one overloaded CPU and multiple CPUs
- * scheduling in lower priority tasks.
- *
- * When a rq schedules a lower priority task than what was currently
- * running, the next CPU with overloaded RT tasks is examined first.
- * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower
- * priority task, it will send an IPI first to CPU 5, then CPU 5 will
- * send to CPU 1 if it is still overloaded. CPU 1 will clear the
- * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set.
- *
- * The first CPU to notice IPI_RESTART is set, will clear that flag and then
- * send an IPI to the next overloaded CPU after the rq->cpu and not the next
- * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3
- * schedules a lower priority task, and the IPI_RESTART gets set while the
- * handling is being done on CPU 5, it will clear the flag and send it back to
- * CPU 4 instead of CPU 1.
- *
- * Note, the above logic can be disabled by turning off the sched_feature
- * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be
- * taken by the CPU requesting a pull and the waiting RT task will be pulled
- * by that CPU. This may be fine for machines with few CPUs.
- */
-static void tell_cpu_to_push(struct rq *rq)
+static inline void rto_start_unlock(atomic_t *v)
{
- int cpu;
+ atomic_set_release(v, 0);
+}
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- raw_spin_lock(&rq->rt.push_lock);
- /* Make sure it's still executing */
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- /*
- * Tell the IPI to restart the loop as things have
- * changed since it started.
- */
- rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
- raw_spin_unlock(&rq->rt.push_lock);
- return;
- }
- raw_spin_unlock(&rq->rt.push_lock);
- }
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu = -1;
- /* When here, there's no IPI going around */
+ /* Keep the loop going if the IPI is currently active */
+ atomic_inc(&rq->rd->rto_loop_next);
- rq->rt.push_cpu = rq->cpu;
- cpu = find_next_push_cpu(rq);
- if (cpu >= nr_cpu_ids)
+ /* Only one CPU can initiate a loop at a time */
+ if (!rto_start_trylock(&rq->rd->rto_loop_start))
return;
- rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+ raw_spin_lock(&rq->rd->rto_lock);
- irq_work_queue_on(&rq->rt.push_work, cpu);
+ /*
+ * The rto_cpu is updated under the lock, if it has a valid cpu
+ * then the IPI is still running and will continue due to the
+ * update to loop_next, and nothing needs to be done here.
+ * Otherwise it is finishing up and an ipi needs to be sent.
+ */
+ if (rq->rd->rto_cpu < 0)
+ cpu = rto_next_cpu(rq);
+
+ raw_spin_unlock(&rq->rd->rto_lock);
+
+ rto_start_unlock(&rq->rd->rto_loop_start);
+
+ if (cpu >= 0)
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
/* Called from hardirq context */
-static void try_to_push_tasks(void *arg)
+void rto_push_irq_work_func(struct irq_work *work)
{
- struct rt_rq *rt_rq = arg;
- struct rq *rq, *src_rq;
- int this_cpu;
+ struct rq *rq;
int cpu;
- this_cpu = rt_rq->push_cpu;
+ rq = this_rq();
- /* Paranoid check */
- BUG_ON(this_cpu != smp_processor_id());
-
- rq = cpu_rq(this_cpu);
- src_rq = rq_of_rt_rq(rt_rq);
-
-again:
+ /*
+ * We do not need to grab the lock to check for has_pushable_tasks.
+ * When it gets updated, a check is made if a push is possible.
+ */
if (has_pushable_tasks(rq)) {
raw_spin_lock(&rq->lock);
- push_rt_task(rq);
+ push_rt_tasks(rq);
raw_spin_unlock(&rq->lock);
}
- /* Pass the IPI to the next rt overloaded queue */
- raw_spin_lock(&rt_rq->push_lock);
- /*
- * If the source queue changed since the IPI went out,
- * we need to restart the search from that CPU again.
- */
- if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
- rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
- rt_rq->push_cpu = src_rq->cpu;
- }
+ raw_spin_lock(&rq->rd->rto_lock);
- cpu = find_next_push_cpu(src_rq);
+ /* Pass the IPI to the next rt overloaded queue */
+ cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
- raw_spin_unlock(&rt_rq->push_lock);
+ raw_spin_unlock(&rq->rd->rto_lock);
- if (cpu >= nr_cpu_ids)
+ if (cpu < 0)
return;
- /*
- * It is possible that a restart caused this CPU to be
- * chosen again. Don't bother with an IPI, just see if we
- * have more to push.
- */
- if (unlikely(cpu == rq->cpu))
- goto again;
-
/* Try the next RT overloaded CPU */
- irq_work_queue_on(&rt_rq->push_work, cpu);
-}
-
-static void push_irq_work_func(struct irq_work *work)
-{
- struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
-
- try_to_push_tasks(rt_rq);
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
#endif /* HAVE_RT_PUSH_IPI */
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -502,7 +502,7 @@ static inline int rt_bandwidth_enabled(v
}
/* RT IPI pull logic requires IRQ_WORK */
-#ifdef CONFIG_IRQ_WORK
+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP)
# define HAVE_RT_PUSH_IPI
#endif
@@ -524,12 +524,6 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
-#ifdef HAVE_RT_PUSH_IPI
- int push_flags;
- int push_cpu;
- struct irq_work push_work;
- raw_spinlock_t push_lock;
-#endif
#endif /* CONFIG_SMP */
int rt_queued;
@@ -638,6 +632,19 @@ struct root_domain {
struct dl_bw dl_bw;
struct cpudl cpudl;
+#ifdef HAVE_RT_PUSH_IPI
+ /*
+ * For IPI pull requests, loop across the rto_mask.
+ */
+ struct irq_work rto_push_work;
+ raw_spinlock_t rto_lock;
+ /* These are only updated and read within rto_lock */
+ int rto_loop;
+ int rto_cpu;
+ /* These atomics are updated outside of a lock */
+ atomic_t rto_loop_next;
+ atomic_t rto_loop_start;
+#endif
/*
* The "RT overload" flag: it gets set if a CPU has more than
* one runnable RT task.
@@ -655,6 +662,9 @@ extern void init_defrootdomain(void);
extern int sched_init_domains(const struct cpumask *cpu_map);
extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
+#ifdef HAVE_RT_PUSH_IPI
+extern void rto_push_irq_work_func(struct irq_work *work);
+#endif
#endif /* CONFIG_SMP */
/*
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -269,6 +269,12 @@ static int init_rootdomain(struct root_d
if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;
+#ifdef HAVE_RT_PUSH_IPI
+ rd->rto_cpu = -1;
+ raw_spin_lock_init(&rd->rto_lock);
+ init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
+#endif
+
init_dl_bw(&rd->dl_bw);
if (cpudl_init(&rd->cpudl) != 0)
goto free_rto_mask;
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.14/sched-make-resched_cpu-unconditional.patch
queue-4.14/sched-rt-simplify-the-ipi-based-rt-balancing-logic.patch
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4bdced5c9a2922521e325896a7bbbf0132c94e56 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt(a)goodmis.org>
Date: Fri, 6 Oct 2017 14:05:04 -0400
Subject: [PATCH] sched/rt: Simplify the IPI based RT balancing logic
When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.
When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:
b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.
Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.
This greatly simplifies the code!
The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:
rto_push_work - the irq work to process each CPU set in rto_mask
rto_lock - the lock to protect some of the other rto fields
rto_loop_start - an atomic that keeps contention down on rto_lock
the first CPU scheduling in a lower priority task
is the one to kick off the process.
rto_loop_next - an atomic that gets incremented for each CPU that
schedules in a lower priority task.
rto_loop - a variable protected by rto_lock that is used to
compare against rto_loop_next
rto_cpu - The cpu to send the next IPI to, also protected by
the rto_lock.
When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.
If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.
When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.
Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.
If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.
This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?
Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mike Galbraith <efault(a)gmx.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Scott Wood <swood(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0af5ca9e3e3f..fda27991699a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -73,10 +73,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
-#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI)
-static void push_irq_work_func(struct irq_work *work);
-#endif
-
void init_rt_rq(struct rt_rq *rt_rq)
{
struct rt_prio_array *array;
@@ -96,13 +92,6 @@ void init_rt_rq(struct rt_rq *rt_rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
-
-#ifdef HAVE_RT_PUSH_IPI
- rt_rq->push_flags = 0;
- rt_rq->push_cpu = nr_cpu_ids;
- raw_spin_lock_init(&rt_rq->push_lock);
- init_irq_work(&rt_rq->push_work, push_irq_work_func);
-#endif
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1875,241 +1864,166 @@ static void push_rt_tasks(struct rq *rq)
}
#ifdef HAVE_RT_PUSH_IPI
+
/*
- * The search for the next cpu always starts at rq->cpu and ends
- * when we reach rq->cpu again. It will never return rq->cpu.
- * This returns the next cpu to check, or nr_cpu_ids if the loop
- * is complete.
+ * When a high priority task schedules out from a CPU and a lower priority
+ * task is scheduled in, a check is made to see if there's any RT tasks
+ * on other CPUs that are waiting to run because a higher priority RT task
+ * is currently running on its CPU. In this case, the CPU with multiple RT
+ * tasks queued on it (overloaded) needs to be notified that a CPU has opened
+ * up that may be able to run one of its non-running queued RT tasks.
+ *
+ * All CPUs with overloaded RT tasks need to be notified as there is currently
+ * no way to know which of these CPUs have the highest priority task waiting
+ * to run. Instead of trying to take a spinlock on each of these CPUs,
+ * which has shown to cause large latency when done on machines with many
+ * CPUs, sending an IPI to the CPUs to have them push off the overloaded
+ * RT tasks waiting to run.
+ *
+ * Just sending an IPI to each of the CPUs is also an issue, as on large
+ * count CPU machines, this can cause an IPI storm on a CPU, especially
+ * if its the only CPU with multiple RT tasks queued, and a large number
+ * of CPUs scheduling a lower priority task at the same time.
+ *
+ * Each root domain has its own irq work function that can iterate over
+ * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT
+ * tassk must be checked if there's one or many CPUs that are lowering
+ * their priority, there's a single irq work iterator that will try to
+ * push off RT tasks that are waiting to run.
+ *
+ * When a CPU schedules a lower priority task, it will kick off the
+ * irq work iterator that will jump to each CPU with overloaded RT tasks.
+ * As it only takes the first CPU that schedules a lower priority task
+ * to start the process, the rto_start variable is incremented and if
+ * the atomic result is one, then that CPU will try to take the rto_lock.
+ * This prevents high contention on the lock as the process handles all
+ * CPUs scheduling lower priority tasks.
+ *
+ * All CPUs that are scheduling a lower priority task will increment the
+ * rt_loop_next variable. This will make sure that the irq work iterator
+ * checks all RT overloaded CPUs whenever a CPU schedules a new lower
+ * priority task, even if the iterator is in the middle of a scan. Incrementing
+ * the rt_loop_next will cause the iterator to perform another scan.
*
- * rq->rt.push_cpu holds the last cpu returned by this function,
- * or if this is the first instance, it must hold rq->cpu.
*/
static int rto_next_cpu(struct rq *rq)
{
- int prev_cpu = rq->rt.push_cpu;
+ struct root_domain *rd = rq->rd;
+ int next;
int cpu;
- cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
-
/*
- * If the previous cpu is less than the rq's CPU, then it already
- * passed the end of the mask, and has started from the beginning.
- * We end if the next CPU is greater or equal to rq's CPU.
+ * When starting the IPI RT pushing, the rto_cpu is set to -1,
+ * rt_next_cpu() will simply return the first CPU found in
+ * the rto_mask.
+ *
+ * If rto_next_cpu() is called with rto_cpu is a valid cpu, it
+ * will return the next CPU found in the rto_mask.
+ *
+ * If there are no more CPUs left in the rto_mask, then a check is made
+ * against rto_loop and rto_loop_next. rto_loop is only updated with
+ * the rto_lock held, but any CPU may increment the rto_loop_next
+ * without any locking.
*/
- if (prev_cpu < rq->cpu) {
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
+ for (;;) {
- } else if (cpu >= nr_cpu_ids) {
- /*
- * We passed the end of the mask, start at the beginning.
- * If the result is greater or equal to the rq's CPU, then
- * the loop is finished.
- */
- cpu = cpumask_first(rq->rd->rto_mask);
- if (cpu >= rq->cpu)
- return nr_cpu_ids;
- }
- rq->rt.push_cpu = cpu;
+ /* When rto_cpu is -1 this acts like cpumask_first() */
+ cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
- /* Return cpu to let the caller know if the loop is finished or not */
- return cpu;
-}
+ rd->rto_cpu = cpu;
-static int find_next_push_cpu(struct rq *rq)
-{
- struct rq *next_rq;
- int cpu;
+ if (cpu < nr_cpu_ids)
+ return cpu;
- while (1) {
- cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- break;
- next_rq = cpu_rq(cpu);
+ rd->rto_cpu = -1;
+
+ /*
+ * ACQUIRE ensures we see the @rto_mask changes
+ * made prior to the @next value observed.
+ *
+ * Matches WMB in rt_set_overload().
+ */
+ next = atomic_read_acquire(&rd->rto_loop_next);
- /* Make sure the next rq can push to this rq */
- if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ if (rd->rto_loop == next)
break;
+
+ rd->rto_loop = next;
}
- return cpu;
+ return -1;
}
-#define RT_PUSH_IPI_EXECUTING 1
-#define RT_PUSH_IPI_RESTART 2
+static inline bool rto_start_trylock(atomic_t *v)
+{
+ return !atomic_cmpxchg_acquire(v, 0, 1);
+}
-/*
- * When a high priority task schedules out from a CPU and a lower priority
- * task is scheduled in, a check is made to see if there's any RT tasks
- * on other CPUs that are waiting to run because a higher priority RT task
- * is currently running on its CPU. In this case, the CPU with multiple RT
- * tasks queued on it (overloaded) needs to be notified that a CPU has opened
- * up that may be able to run one of its non-running queued RT tasks.
- *
- * On large CPU boxes, there's the case that several CPUs could schedule
- * a lower priority task at the same time, in which case it will look for
- * any overloaded CPUs that it could pull a task from. To do this, the runqueue
- * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting
- * for a single overloaded CPU's runqueue lock can produce a large latency.
- * (This has actually been observed on large boxes running cyclictest).
- * Instead of taking the runqueue lock of the overloaded CPU, each of the
- * CPUs that scheduled a lower priority task simply sends an IPI to the
- * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with
- * lots of contention. The overloaded CPU will look to push its non-running
- * RT task off, and if it does, it can then ignore the other IPIs coming
- * in, and just pass those IPIs off to any other overloaded CPU.
- *
- * When a CPU schedules a lower priority task, it only sends an IPI to
- * the "next" CPU that has overloaded RT tasks. This prevents IPI storms,
- * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with
- * RT overloaded tasks, would cause 100 IPIs to go out at once.
- *
- * The overloaded RT CPU, when receiving an IPI, will try to push off its
- * overloaded RT tasks and then send an IPI to the next CPU that has
- * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks
- * have completed. Just because a CPU may have pushed off its own overloaded
- * RT task does not mean it should stop sending the IPI around to other
- * overloaded CPUs. There may be another RT task waiting to run on one of
- * those CPUs that are of higher priority than the one that was just
- * pushed.
- *
- * An optimization that could possibly be made is to make a CPU array similar
- * to the cpupri array mask of all running RT tasks, but for the overloaded
- * case, then the IPI could be sent to only the CPU with the highest priority
- * RT task waiting, and that CPU could send off further IPIs to the CPU with
- * the next highest waiting task. Since the overloaded case is much less likely
- * to happen, the complexity of this implementation may not be worth it.
- * Instead, just send an IPI around to all overloaded CPUs.
- *
- * The rq->rt.push_flags holds the status of the IPI that is going around.
- * A run queue can only send out a single IPI at a time. The possible flags
- * for rq->rt.push_flags are:
- *
- * (None or zero): No IPI is going around for the current rq
- * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around
- * RT_PUSH_IPI_RESTART: The priority of the running task for the rq
- * has changed, and the IPI should restart
- * circulating the overloaded CPUs again.
- *
- * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated
- * before sending to the next CPU.
- *
- * Instead of having all CPUs that schedule a lower priority task send
- * an IPI to the same "first" CPU in the RT overload mask, they send it
- * to the next overloaded CPU after their own CPU. This helps distribute
- * the work when there's more than one overloaded CPU and multiple CPUs
- * scheduling in lower priority tasks.
- *
- * When a rq schedules a lower priority task than what was currently
- * running, the next CPU with overloaded RT tasks is examined first.
- * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower
- * priority task, it will send an IPI first to CPU 5, then CPU 5 will
- * send to CPU 1 if it is still overloaded. CPU 1 will clear the
- * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set.
- *
- * The first CPU to notice IPI_RESTART is set, will clear that flag and then
- * send an IPI to the next overloaded CPU after the rq->cpu and not the next
- * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3
- * schedules a lower priority task, and the IPI_RESTART gets set while the
- * handling is being done on CPU 5, it will clear the flag and send it back to
- * CPU 4 instead of CPU 1.
- *
- * Note, the above logic can be disabled by turning off the sched_feature
- * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be
- * taken by the CPU requesting a pull and the waiting RT task will be pulled
- * by that CPU. This may be fine for machines with few CPUs.
- */
-static void tell_cpu_to_push(struct rq *rq)
+static inline void rto_start_unlock(atomic_t *v)
{
- int cpu;
+ atomic_set_release(v, 0);
+}
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- raw_spin_lock(&rq->rt.push_lock);
- /* Make sure it's still executing */
- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
- /*
- * Tell the IPI to restart the loop as things have
- * changed since it started.
- */
- rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
- raw_spin_unlock(&rq->rt.push_lock);
- return;
- }
- raw_spin_unlock(&rq->rt.push_lock);
- }
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu = -1;
- /* When here, there's no IPI going around */
+ /* Keep the loop going if the IPI is currently active */
+ atomic_inc(&rq->rd->rto_loop_next);
- rq->rt.push_cpu = rq->cpu;
- cpu = find_next_push_cpu(rq);
- if (cpu >= nr_cpu_ids)
+ /* Only one CPU can initiate a loop at a time */
+ if (!rto_start_trylock(&rq->rd->rto_loop_start))
return;
- rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+ raw_spin_lock(&rq->rd->rto_lock);
+
+ /*
+ * The rto_cpu is updated under the lock, if it has a valid cpu
+ * then the IPI is still running and will continue due to the
+ * update to loop_next, and nothing needs to be done here.
+ * Otherwise it is finishing up and an ipi needs to be sent.
+ */
+ if (rq->rd->rto_cpu < 0)
+ cpu = rto_next_cpu(rq);
- irq_work_queue_on(&rq->rt.push_work, cpu);
+ raw_spin_unlock(&rq->rd->rto_lock);
+
+ rto_start_unlock(&rq->rd->rto_loop_start);
+
+ if (cpu >= 0)
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
/* Called from hardirq context */
-static void try_to_push_tasks(void *arg)
+void rto_push_irq_work_func(struct irq_work *work)
{
- struct rt_rq *rt_rq = arg;
- struct rq *rq, *src_rq;
- int this_cpu;
+ struct rq *rq;
int cpu;
- this_cpu = rt_rq->push_cpu;
+ rq = this_rq();
- /* Paranoid check */
- BUG_ON(this_cpu != smp_processor_id());
-
- rq = cpu_rq(this_cpu);
- src_rq = rq_of_rt_rq(rt_rq);
-
-again:
+ /*
+ * We do not need to grab the lock to check for has_pushable_tasks.
+ * When it gets updated, a check is made if a push is possible.
+ */
if (has_pushable_tasks(rq)) {
raw_spin_lock(&rq->lock);
- push_rt_task(rq);
+ push_rt_tasks(rq);
raw_spin_unlock(&rq->lock);
}
- /* Pass the IPI to the next rt overloaded queue */
- raw_spin_lock(&rt_rq->push_lock);
- /*
- * If the source queue changed since the IPI went out,
- * we need to restart the search from that CPU again.
- */
- if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
- rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
- rt_rq->push_cpu = src_rq->cpu;
- }
+ raw_spin_lock(&rq->rd->rto_lock);
- cpu = find_next_push_cpu(src_rq);
+ /* Pass the IPI to the next rt overloaded queue */
+ cpu = rto_next_cpu(rq);
- if (cpu >= nr_cpu_ids)
- rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
- raw_spin_unlock(&rt_rq->push_lock);
+ raw_spin_unlock(&rq->rd->rto_lock);
- if (cpu >= nr_cpu_ids)
+ if (cpu < 0)
return;
- /*
- * It is possible that a restart caused this CPU to be
- * chosen again. Don't bother with an IPI, just see if we
- * have more to push.
- */
- if (unlikely(cpu == rq->cpu))
- goto again;
-
/* Try the next RT overloaded CPU */
- irq_work_queue_on(&rt_rq->push_work, cpu);
-}
-
-static void push_irq_work_func(struct irq_work *work)
-{
- struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
-
- try_to_push_tasks(rt_rq);
+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
}
#endif /* HAVE_RT_PUSH_IPI */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a81c9782e98c..8aa24b41f652 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -505,7 +505,7 @@ static inline int rt_bandwidth_enabled(void)
}
/* RT IPI pull logic requires IRQ_WORK */
-#ifdef CONFIG_IRQ_WORK
+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP)
# define HAVE_RT_PUSH_IPI
#endif
@@ -527,12 +527,6 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
-#ifdef HAVE_RT_PUSH_IPI
- int push_flags;
- int push_cpu;
- struct irq_work push_work;
- raw_spinlock_t push_lock;
-#endif
#endif /* CONFIG_SMP */
int rt_queued;
@@ -641,6 +635,19 @@ struct root_domain {
struct dl_bw dl_bw;
struct cpudl cpudl;
+#ifdef HAVE_RT_PUSH_IPI
+ /*
+ * For IPI pull requests, loop across the rto_mask.
+ */
+ struct irq_work rto_push_work;
+ raw_spinlock_t rto_lock;
+ /* These are only updated and read within rto_lock */
+ int rto_loop;
+ int rto_cpu;
+ /* These atomics are updated outside of a lock */
+ atomic_t rto_loop_next;
+ atomic_t rto_loop_start;
+#endif
/*
* The "RT overload" flag: it gets set if a CPU has more than
* one runnable RT task.
@@ -658,6 +665,9 @@ extern void init_defrootdomain(void);
extern int sched_init_domains(const struct cpumask *cpu_map);
extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
+#ifdef HAVE_RT_PUSH_IPI
+extern void rto_push_irq_work_func(struct irq_work *work);
+#endif
#endif /* CONFIG_SMP */
/*
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f51d123f9fe1..e50450c2fed8 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -268,6 +268,12 @@ static int init_rootdomain(struct root_domain *rd)
if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;
+#ifdef HAVE_RT_PUSH_IPI
+ rd->rto_cpu = -1;
+ raw_spin_lock_init(&rd->rto_lock);
+ init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
+#endif
+
init_dl_bw(&rd->dl_bw);
if (cpudl_init(&rd->cpudl) != 0)
goto free_rto_mask;
The patch below was submitted to be applied to the 4.14-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a7a8e1eab929eb3a5b735a788a23b9731139046 Mon Sep 17 00:00:00 2001
From: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
Date: Mon, 13 Nov 2017 13:49:04 +0800
Subject: [PATCH] timekeeping: Eliminate the stale declaration of
ktime_get_raw_and_real_ts64()
Commit ba26621e63ce got rid of ktime_get_raw_and_real_ts64(), but left its
declaration behind.
Remove it.
Fixes: ba26621e63ce ("time: Remove duplicated code in ktime_get_raw_and_real()")
Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Christopher S. Hall <christopher.s.hall(a)intel.com>
Cc: joelaf(a)google.com
Cc: arnd(a)arndb.de
Cc: gregkh(a)linuxfoundation.org
Cc: john.stultz(a)linaro.org
Cc: deepa.kernel(a)gmail.com
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1510552144-20831-1-git-send-email-douly.fnst@cn.f…
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 0021575fe871..51293e1aa4da 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -272,12 +272,6 @@ extern bool timekeeping_rtc_skipresume(void);
extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
-/*
- * PPS accessor
- */
-extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
- struct timespec64 *ts_real);
-
/*
* struct system_time_snapshot - simultaneous raw/real time capture with
* counter value
This is a note to let you know that I've just added the patch titled
dm bufio: fix integer overflow when limiting maximum cache size
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 15 Nov 2017 16:38:09 -0800
Subject: dm bufio: fix integer overflow when limiting maximum cache size
From: Eric Biggers <ebiggers(a)google.com>
commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream.
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression
(VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
overflows, causing the wrong result to be computed. For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem). This causes
severe performance problems for dm-verity users on affected systems.
Fix this by using mult_frac() to correctly multiply by a percentage. Do
this for all places in dm-bufio that multiply by a percentage. Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.
Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057f2 ("dm: add bufio")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-bufio.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -937,7 +937,8 @@ static void __get_memory_limit(struct dm
buffers = c->minimum_buffers;
*limit_buffers = buffers;
- *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
+ *threshold_buffers = mult_frac(buffers,
+ DM_BUFIO_WRITEBACK_PERCENT, 100);
}
/*
@@ -1856,19 +1857,15 @@ static int __init dm_bufio_init(void)
memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches);
memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names);
- mem = (__u64)((totalram_pages - totalhigh_pages) *
- DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT;
+ mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;
if (mem > ULONG_MAX)
mem = ULONG_MAX;
#ifdef CONFIG_MMU
- /*
- * Get the size of vmalloc space the same way as VMALLOC_TOTAL
- * in fs/proc/internal.h
- */
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100))
+ mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100);
#endif
dm_bufio_default_cache_size = mem;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.9/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
queue-4.9/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
This is a note to let you know that I've just added the patch titled
dm: allocate struct mapped_device with kvzalloc
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-allocate-struct-mapped_device-with-kvzalloc.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 856eb0916d181da6d043cc33e03f54d5c5bbe54a Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 31 Oct 2017 19:33:02 -0400
Subject: dm: allocate struct mapped_device with kvzalloc
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream.
The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.
In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.
Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-core.h | 3 ++-
drivers/md/dm.c | 7 ++++---
2 files changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -29,7 +29,6 @@ struct dm_kobject_holder {
* DM targets must _not_ deference a mapped_device to directly access its members!
*/
struct mapped_device {
- struct srcu_struct io_barrier;
struct mutex suspend_lock;
/*
@@ -127,6 +126,8 @@ struct mapped_device {
struct blk_mq_tag_set *tag_set;
bool use_blk_mq:1;
bool init_tio_pdu:1;
+
+ struct srcu_struct io_barrier;
};
void dm_init_md_queue(struct mapped_device *md);
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -21,6 +21,7 @@
#include <linux/delay.h>
#include <linux/wait.h>
#include <linux/pr.h>
+#include <linux/vmalloc.h>
#define DM_MSG_PREFIX "core"
@@ -1511,7 +1512,7 @@ static struct mapped_device *alloc_dev(i
struct mapped_device *md;
void *old_md;
- md = kzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id);
+ md = vzalloc_node(sizeof(*md), numa_node_id);
if (!md) {
DMWARN("unable to allocate device, out of memory.");
return NULL;
@@ -1605,7 +1606,7 @@ bad_io_barrier:
bad_minor:
module_put(THIS_MODULE);
bad_module_get:
- kfree(md);
+ kvfree(md);
return NULL;
}
@@ -1624,7 +1625,7 @@ static void free_dev(struct mapped_devic
free_minor(minor);
module_put(THIS_MODULE);
- kfree(md);
+ kvfree(md);
}
static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
Patches currently in stable-queue which might be from mpatocka(a)redhat.com are
queue-4.9/dm-allocate-struct-mapped_device-with-kvzalloc.patch
This is a note to let you know that I've just added the patch titled
dm bufio: fix integer overflow when limiting maximum cache size
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 15 Nov 2017 16:38:09 -0800
Subject: dm bufio: fix integer overflow when limiting maximum cache size
From: Eric Biggers <ebiggers(a)google.com>
commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream.
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression
(VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
overflows, causing the wrong result to be computed. For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem). This causes
severe performance problems for dm-verity users on affected systems.
Fix this by using mult_frac() to correctly multiply by a percentage. Do
this for all places in dm-bufio that multiply by a percentage. Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.
Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057f2 ("dm: add bufio")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-bufio.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -928,7 +928,8 @@ static void __get_memory_limit(struct dm
buffers = c->minimum_buffers;
*limit_buffers = buffers;
- *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
+ *threshold_buffers = mult_frac(buffers,
+ DM_BUFIO_WRITEBACK_PERCENT, 100);
}
/*
@@ -1829,19 +1830,15 @@ static int __init dm_bufio_init(void)
memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches);
memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names);
- mem = (__u64)((totalram_pages - totalhigh_pages) *
- DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT;
+ mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;
if (mem > ULONG_MAX)
mem = ULONG_MAX;
#ifdef CONFIG_MMU
- /*
- * Get the size of vmalloc space the same way as VMALLOC_TOTAL
- * in fs/proc/internal.h
- */
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100))
+ mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100);
#endif
dm_bufio_default_cache_size = mem;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.4/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
queue-4.4/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
This is a note to let you know that I've just added the patch titled
ovl: Put upperdentry if ovl_check_origin() fails
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ovl-put-upperdentry-if-ovl_check_origin-fails.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5455f92b54e516995a9ca45bbf790d3629c27a93 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal(a)redhat.com>
Date: Wed, 1 Nov 2017 15:37:22 -0400
Subject: ovl: Put upperdentry if ovl_check_origin() fails
From: Vivek Goyal <vgoyal(a)redhat.com>
commit 5455f92b54e516995a9ca45bbf790d3629c27a93 upstream.
If ovl_check_origin() fails, we should put upperdentry. We have a reference
on it by now. So goto out_put_upper instead of out.
Fixes: a9d019573e88 ("ovl: lookup non-dir copy-up-origin by file handle")
Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/overlayfs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -630,7 +630,7 @@ struct dentry *ovl_lookup(struct inode *
err = ovl_check_origin(upperdentry, roe->lowerstack,
roe->numlower, &stack, &ctr);
if (err)
- goto out;
+ goto out_put_upper;
}
if (d.redirect) {
Patches currently in stable-queue which might be from vgoyal(a)redhat.com are
queue-4.14/ovl-put-upperdentry-if-ovl_check_origin-fails.patch
This is a note to let you know that I've just added the patch titled
dm zoned: ignore last smaller runt zone
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-zoned-ignore-last-smaller-runt-zone.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 114e025968b5990ad0b57bf60697ea64ee206aac Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Sat, 28 Oct 2017 16:39:34 +0900
Subject: dm zoned: ignore last smaller runt zone
From: Damien Le Moal <damien.lemoal(a)wdc.com>
commit 114e025968b5990ad0b57bf60697ea64ee206aac upstream.
The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.
Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.
While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().
Reported-by: Peter Desnoyers <pjd(a)ccs.neu.edu>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-zoned-target.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -660,6 +660,7 @@ static int dmz_get_zoned_device(struct d
struct dmz_target *dmz = ti->private;
struct request_queue *q;
struct dmz_dev *dev;
+ sector_t aligned_capacity;
int ret;
/* Get the target device */
@@ -685,15 +686,17 @@ static int dmz_get_zoned_device(struct d
goto err;
}
+ q = bdev_get_queue(dev->bdev);
dev->capacity = i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT;
- if (ti->begin || (ti->len != dev->capacity)) {
+ aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1);
+ if (ti->begin ||
+ ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) {
ti->error = "Partial mapping not supported";
ret = -EINVAL;
goto err;
}
- q = bdev_get_queue(dev->bdev);
- dev->zone_nr_sectors = q->limits.chunk_sectors;
+ dev->zone_nr_sectors = blk_queue_zone_sectors(q);
dev->zone_nr_sectors_shift = ilog2(dev->zone_nr_sectors);
dev->zone_nr_blocks = dmz_sect2blk(dev->zone_nr_sectors);
@@ -929,8 +932,10 @@ static int dmz_iterate_devices(struct dm
iterate_devices_callout_fn fn, void *data)
{
struct dmz_target *dmz = ti->private;
+ struct dmz_dev *dev = dmz->dev;
+ sector_t capacity = dev->capacity & ~(dev->zone_nr_sectors - 1);
- return fn(ti, dmz->ddev, 0, dmz->dev->capacity, data);
+ return fn(ti, dmz->ddev, 0, capacity, data);
}
static struct target_type dmz_type = {
Patches currently in stable-queue which might be from damien.lemoal(a)wdc.com are
queue-4.14/dm-zoned-ignore-last-smaller-runt-zone.patch
This is a note to let you know that I've just added the patch titled
dm mpath: remove annoying message of 'blk_get_request() returned -11'
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-mpath-remove-annoying-message-of-blk_get_request-returned-11.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9dc112e2daf87b40607fd8d357e2d7de32290d45 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Sat, 30 Sep 2017 19:46:48 +0800
Subject: dm mpath: remove annoying message of 'blk_get_request() returned -11'
From: Ming Lei <ming.lei(a)redhat.com>
commit 9dc112e2daf87b40607fd8d357e2d7de32290d45 upstream.
It is very normal to see allocation failure, especially with blk-mq
request_queues, so it's unnecessary to report this error and annoy
people.
In practice this 'blk_get_request() returned -11' error gets logged
quite frequently when a blk-mq DM multipath device sees heavy IO.
This change is marked for stable@ because the annoying message in
question was included in stable@ commit 7083abbbf.
Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-mpath.c | 2 --
1 file changed, 2 deletions(-)
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -499,8 +499,6 @@ static int multipath_clone_and_map(struc
if (IS_ERR(clone)) {
/* EBUSY, ENODEV or EWOULDBLOCK: requeue */
bool queue_dying = blk_queue_dying(q);
- DMERR_LIMIT("blk_get_request() returned %ld%s - requeuing",
- PTR_ERR(clone), queue_dying ? " (path offline)" : "");
if (queue_dying) {
atomic_inc(&m->pg_init_in_progress);
activate_or_offline_path(pgpath);
Patches currently in stable-queue which might be from ming.lei(a)redhat.com are
queue-4.14/dm-mpath-remove-annoying-message-of-blk_get_request-returned-11.patch
This is a note to let you know that I've just added the patch titled
dm integrity: allow unaligned bv_offset
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-integrity-allow-unaligned-bv_offset.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 95b1369a9638cfa322ad1c0cde8efbe524059884 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 7 Nov 2017 10:40:40 -0500
Subject: dm integrity: allow unaligned bv_offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 95b1369a9638cfa322ad1c0cde8efbe524059884 upstream.
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).
dm-integrity checks if bv_offset is aligned on page size and this check
fail with slub_debug and XFS.
Fix this bug by removing the bv_offset check, leaving only the check for
bv_len.
Fixes: 7eada909bfd7 ("dm: add integrity target")
Reported-by: Bruno Prémont <bonbons(a)sysophe.eu>
Reviewed-by: Milan Broz <gmazyland(a)gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-integrity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1376,7 +1376,7 @@ static int dm_integrity_map(struct dm_ta
struct bvec_iter iter;
struct bio_vec bv;
bio_for_each_segment(bv, bio, iter) {
- if (unlikely((bv.bv_offset | bv.bv_len) & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) {
+ if (unlikely(bv.bv_len & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) {
DMERR("Bio vector (%u,%u) is not aligned on %u-sector boundary",
bv.bv_offset, bv.bv_len, ic->sectors_per_block);
return DM_MAPIO_KILL;
Patches currently in stable-queue which might be from mpatocka(a)redhat.com are
queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch
queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch
queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch
This is a note to let you know that I've just added the patch titled
dm crypt: allow unaligned bv_offset
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-crypt-allow-unaligned-bv_offset.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0440d5c0ca9744b92a07aeb6df0a9a75db6f4280 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 7 Nov 2017 10:35:57 -0500
Subject: dm crypt: allow unaligned bv_offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 0440d5c0ca9744b92a07aeb6df0a9a75db6f4280 upstream.
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).
dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.
Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).
Fixes: 8f0009a22517 ("dm crypt: optionally support larger encryption sector size")
Reported-by: Bruno Prémont <bonbons(a)sysophe.eu>
Tested-by: Bruno Prémont <bonbons(a)sysophe.eu>
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Reviewed-by: Milan Broz <gmazyland(a)gmail.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-crypt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1075,7 +1075,7 @@ static int crypt_convert_block_aead(stru
BUG_ON(cc->integrity_iv_size && cc->integrity_iv_size != cc->iv_size);
/* Reject unexpected unaligned bio. */
- if (unlikely(bv_in.bv_offset & (cc->sector_size - 1)))
+ if (unlikely(bv_in.bv_len & (cc->sector_size - 1)))
return -EIO;
dmreq = dmreq_of_req(cc, req);
@@ -1168,7 +1168,7 @@ static int crypt_convert_block_skcipher(
int r = 0;
/* Reject unexpected unaligned bio. */
- if (unlikely(bv_in.bv_offset & (cc->sector_size - 1)))
+ if (unlikely(bv_in.bv_len & (cc->sector_size - 1)))
return -EIO;
dmreq = dmreq_of_req(cc, req);
Patches currently in stable-queue which might be from mpatocka(a)redhat.com are
queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch
queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch
queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch
This is a note to let you know that I've just added the patch titled
dm cache: fix race condition in the writeback mode overwrite_bio optimisation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-cache-fix-race-condition-in-the-writeback-mode-overwrite_bio-optimisation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d1260e2a3f85f4c1010510a15f89597001318b1b Mon Sep 17 00:00:00 2001
From: Joe Thornber <ejt(a)redhat.com>
Date: Fri, 10 Nov 2017 07:53:31 -0500
Subject: dm cache: fix race condition in the writeback mode overwrite_bio optimisation
From: Joe Thornber <ejt(a)redhat.com>
commit d1260e2a3f85f4c1010510a15f89597001318b1b upstream.
When a DM cache in writeback mode moves data between the slow and fast
device it can often avoid a copy if the triggering bio either:
i) covers the whole block (no point copying if we're about to overwrite it)
ii) the migration is a promotion and the origin block is currently discarded
Prior to this fix there was a race with case (ii). The discard status
was checked with a shared lock held (rather than exclusive). This meant
another bio could run in parallel and write data to the origin, removing
the discard state. After the promotion the parallel write would have
been lost.
With this fix the discard status is re-checked once the exclusive lock
has been aquired. If the block is no longer discarded it falls back to
the slower full copy path.
Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2")
Signed-off-by: Joe Thornber <ejt(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-cache-target.c | 86 ++++++++++++++++++++++++++-----------------
1 file changed, 53 insertions(+), 33 deletions(-)
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1201,6 +1201,18 @@ static void background_work_end(struct c
/*----------------------------------------------------------------*/
+static bool bio_writes_complete_block(struct cache *cache, struct bio *bio)
+{
+ return (bio_data_dir(bio) == WRITE) &&
+ (bio->bi_iter.bi_size == (cache->sectors_per_block << SECTOR_SHIFT));
+}
+
+static bool optimisable_bio(struct cache *cache, struct bio *bio, dm_oblock_t block)
+{
+ return writeback_mode(&cache->features) &&
+ (is_discarded_oblock(cache, block) || bio_writes_complete_block(cache, bio));
+}
+
static void quiesce(struct dm_cache_migration *mg,
void (*continuation)(struct work_struct *))
{
@@ -1474,13 +1486,51 @@ static void mg_upgrade_lock(struct work_
}
}
+static void mg_full_copy(struct work_struct *ws)
+{
+ struct dm_cache_migration *mg = ws_to_mg(ws);
+ struct cache *cache = mg->cache;
+ struct policy_work *op = mg->op;
+ bool is_policy_promote = (op->op == POLICY_PROMOTE);
+
+ if ((!is_policy_promote && !is_dirty(cache, op->cblock)) ||
+ is_discarded_oblock(cache, op->oblock)) {
+ mg_upgrade_lock(ws);
+ return;
+ }
+
+ init_continuation(&mg->k, mg_upgrade_lock);
+
+ if (copy(mg, is_policy_promote)) {
+ DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache));
+ mg->k.input = BLK_STS_IOERR;
+ mg_complete(mg, false);
+ }
+}
+
static void mg_copy(struct work_struct *ws)
{
- int r;
struct dm_cache_migration *mg = ws_to_mg(ws);
if (mg->overwrite_bio) {
/*
+ * No exclusive lock was held when we last checked if the bio
+ * was optimisable. So we have to check again in case things
+ * have changed (eg, the block may no longer be discarded).
+ */
+ if (!optimisable_bio(mg->cache, mg->overwrite_bio, mg->op->oblock)) {
+ /*
+ * Fallback to a real full copy after doing some tidying up.
+ */
+ bool rb = bio_detain_shared(mg->cache, mg->op->oblock, mg->overwrite_bio);
+ BUG_ON(rb); /* An exclussive lock must _not_ be held for this block */
+ mg->overwrite_bio = NULL;
+ inc_io_migrations(mg->cache);
+ mg_full_copy(ws);
+ return;
+ }
+
+ /*
* It's safe to do this here, even though it's new data
* because all IO has been locked out of the block.
*
@@ -1489,26 +1539,8 @@ static void mg_copy(struct work_struct *
*/
overwrite(mg, mg_update_metadata_after_copy);
- } else {
- struct cache *cache = mg->cache;
- struct policy_work *op = mg->op;
- bool is_policy_promote = (op->op == POLICY_PROMOTE);
-
- if ((!is_policy_promote && !is_dirty(cache, op->cblock)) ||
- is_discarded_oblock(cache, op->oblock)) {
- mg_upgrade_lock(ws);
- return;
- }
-
- init_continuation(&mg->k, mg_upgrade_lock);
-
- r = copy(mg, is_policy_promote);
- if (r) {
- DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache));
- mg->k.input = BLK_STS_IOERR;
- mg_complete(mg, false);
- }
- }
+ } else
+ mg_full_copy(ws);
}
static int mg_lock_writes(struct dm_cache_migration *mg)
@@ -1748,18 +1780,6 @@ static void inc_miss_counter(struct cach
/*----------------------------------------------------------------*/
-static bool bio_writes_complete_block(struct cache *cache, struct bio *bio)
-{
- return (bio_data_dir(bio) == WRITE) &&
- (bio->bi_iter.bi_size == (cache->sectors_per_block << SECTOR_SHIFT));
-}
-
-static bool optimisable_bio(struct cache *cache, struct bio *bio, dm_oblock_t block)
-{
- return writeback_mode(&cache->features) &&
- (is_discarded_oblock(cache, block) || bio_writes_complete_block(cache, bio));
-}
-
static int map_bio(struct cache *cache, struct bio *bio, dm_oblock_t block,
bool *commit_needed)
{
Patches currently in stable-queue which might be from ejt(a)redhat.com are
queue-4.14/dm-cache-fix-race-condition-in-the-writeback-mode-overwrite_bio-optimisation.patch
This is a note to let you know that I've just added the patch titled
dm bufio: fix integer overflow when limiting maximum cache size
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 15 Nov 2017 16:38:09 -0800
Subject: dm bufio: fix integer overflow when limiting maximum cache size
From: Eric Biggers <ebiggers(a)google.com>
commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream.
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression
(VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
overflows, causing the wrong result to be computed. For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem). This causes
severe performance problems for dm-verity users on affected systems.
Fix this by using mult_frac() to correctly multiply by a percentage. Do
this for all places in dm-bufio that multiply by a percentage. Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.
Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057f2 ("dm: add bufio")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-bufio.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -974,7 +974,8 @@ static void __get_memory_limit(struct dm
buffers = c->minimum_buffers;
*limit_buffers = buffers;
- *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
+ *threshold_buffers = mult_frac(buffers,
+ DM_BUFIO_WRITEBACK_PERCENT, 100);
}
/*
@@ -1910,19 +1911,15 @@ static int __init dm_bufio_init(void)
memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches);
memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names);
- mem = (__u64)((totalram_pages - totalhigh_pages) *
- DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT;
+ mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;
if (mem > ULONG_MAX)
mem = ULONG_MAX;
#ifdef CONFIG_MMU
- /*
- * Get the size of vmalloc space the same way as VMALLOC_TOTAL
- * in fs/proc/internal.h
- */
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100))
+ mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100);
#endif
dm_bufio_default_cache_size = mem;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.14/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
queue-4.14/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
This is a note to let you know that I've just added the patch titled
dm: allocate struct mapped_device with kvzalloc
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-allocate-struct-mapped_device-with-kvzalloc.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 856eb0916d181da6d043cc33e03f54d5c5bbe54a Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 31 Oct 2017 19:33:02 -0400
Subject: dm: allocate struct mapped_device with kvzalloc
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream.
The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.
In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.
Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-core.h | 3 ++-
drivers/md/dm.c | 6 +++---
2 files changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -29,7 +29,6 @@ struct dm_kobject_holder {
* DM targets must _not_ deference a mapped_device to directly access its members!
*/
struct mapped_device {
- struct srcu_struct io_barrier;
struct mutex suspend_lock;
/*
@@ -127,6 +126,8 @@ struct mapped_device {
struct blk_mq_tag_set *tag_set;
bool use_blk_mq:1;
bool init_tio_pdu:1;
+
+ struct srcu_struct io_barrier;
};
void dm_init_md_queue(struct mapped_device *md);
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1695,7 +1695,7 @@ static struct mapped_device *alloc_dev(i
struct mapped_device *md;
void *old_md;
- md = kzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id);
+ md = kvzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id);
if (!md) {
DMWARN("unable to allocate device, out of memory.");
return NULL;
@@ -1795,7 +1795,7 @@ bad_io_barrier:
bad_minor:
module_put(THIS_MODULE);
bad_module_get:
- kfree(md);
+ kvfree(md);
return NULL;
}
@@ -1814,7 +1814,7 @@ static void free_dev(struct mapped_devic
free_minor(minor);
module_put(THIS_MODULE);
- kfree(md);
+ kvfree(md);
}
static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
Patches currently in stable-queue which might be from mpatocka(a)redhat.com are
queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch
queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch
queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch
This is a note to let you know that I've just added the patch titled
dm bufio: fix integer overflow when limiting maximum cache size
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 15 Nov 2017 16:38:09 -0800
Subject: dm bufio: fix integer overflow when limiting maximum cache size
From: Eric Biggers <ebiggers(a)google.com>
commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream.
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression
(VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
overflows, causing the wrong result to be computed. For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem). This causes
severe performance problems for dm-verity users on affected systems.
Fix this by using mult_frac() to correctly multiply by a percentage. Do
this for all places in dm-bufio that multiply by a percentage. Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.
Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057f2 ("dm: add bufio")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-bufio.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -876,7 +876,8 @@ static void __get_memory_limit(struct dm
buffers = c->minimum_buffers;
*limit_buffers = buffers;
- *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
+ *threshold_buffers = mult_frac(buffers,
+ DM_BUFIO_WRITEBACK_PERCENT, 100);
}
/*
@@ -1764,19 +1765,15 @@ static int __init dm_bufio_init(void)
memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches);
memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names);
- mem = (__u64)((totalram_pages - totalhigh_pages) *
- DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT;
+ mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;
if (mem > ULONG_MAX)
mem = ULONG_MAX;
#ifdef CONFIG_MMU
- /*
- * Get the size of vmalloc space the same way as VMALLOC_TOTAL
- * in fs/proc/internal.h
- */
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100))
+ mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100);
#endif
dm_bufio_default_cache_size = mem;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-3.18/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
queue-3.18/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch
This is a note to let you know that I've just added the patch titled
PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7f342678634f16795892677204366e835e450dda Mon Sep 17 00:00:00 2001
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
Date: Tue, 17 Oct 2017 05:47:38 -0700
Subject: PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
commit 7f342678634f16795892677204366e835e450dda upstream.
The Cavium ThunderX (CN8XXX) family of PCIe Root Ports does not advertise
an ACS capability. However, the RTL internally implements similar
protection as if ACS had Request Redirection, Completion Redirection,
Source Validation, and Upstream Forwarding features enabled.
Change Cavium ACS capabilities quirk flags accordingly.
Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices")
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
[bhelgaas: tidy changelog, comment, stable tag]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/quirks.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4088,12 +4088,14 @@ static int pci_quirk_amd_sb_acs(struct p
static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
{
/*
- * Cavium devices matching this quirk do not perform peer-to-peer
- * with other functions, allowing masking out these bits as if they
- * were unimplemented in the ACS capability.
+ * Cavium root ports don't advertise an ACS capability. However,
+ * the RTL internally implements similar protection as if ACS had
+ * Request Redirection, Completion Redirection, Source Validation,
+ * and Upstream Forwarding features enabled. Assert that the
+ * hardware implements and enables equivalent ACS functionality for
+ * these flags.
*/
- acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR |
- PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
+ acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF);
return acs_flags ? 0 : 1;
}
Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are
queue-4.9/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Add Raven PCI ID
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-add-raven-pci-id.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Date: Thu, 23 Nov 2017 20:07:00 +0530
Subject: ALSA: hda: Add Raven PCI ID
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream.
This commit adds PCI ID for Raven platform
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/hda_intel.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2305,6 +2305,9 @@ static const struct pci_device_id azx_id
/* AMD Hudson */
{ PCI_DEVICE(0x1022, 0x780d),
.driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
+ /* AMD Raven */
+ { PCI_DEVICE(0x1022, 0x15e3),
+ .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
/* ATI HDMI */
{ PCI_DEVICE(0x1002, 0x0002),
.driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS },
Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are
queue-4.9/alsa-hda-add-raven-pci-id.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Add Raven PCI ID
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-add-raven-pci-id.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Date: Thu, 23 Nov 2017 20:07:00 +0530
Subject: ALSA: hda: Add Raven PCI ID
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream.
This commit adds PCI ID for Raven platform
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/hda_intel.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2316,6 +2316,9 @@ static const struct pci_device_id azx_id
/* AMD Hudson */
{ PCI_DEVICE(0x1022, 0x780d),
.driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
+ /* AMD Raven */
+ { PCI_DEVICE(0x1022, 0x15e3),
+ .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
/* ATI HDMI */
{ PCI_DEVICE(0x1002, 0x0002),
.driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS },
Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are
queue-4.4/alsa-hda-add-raven-pci-id.patch
This is a note to let you know that I've just added the patch titled
PM / OPP: Add missing of_node_put(np)
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pm-opp-add-missing-of_node_put-np.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001
From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
Date: Wed, 4 Oct 2017 11:35:03 +0530
Subject: PM / OPP: Add missing of_node_put(np)
From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
commit 7978db344719dab1e56d05e6fc04aaaddcde0a5e upstream.
The for_each_available_child_of_node() loop in _of_add_opp_table_v2()
doesn't drop the reference to "np" on errors. Fix that.
Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings)
Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
[ VK: Improved commit log. ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/power/opp/of.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/base/power/opp/of.c
+++ b/drivers/base/power/opp/of.c
@@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct d
dev_err(dev, "%s: Failed to add OPP, %d\n", __func__,
ret);
_dev_pm_opp_remove_table(opp_table, dev, false);
+ of_node_put(np);
goto put_opp_table;
}
}
Patches currently in stable-queue which might be from Tobias.Jordan(a)elektrobit.com are
queue-4.14/pm-opp-add-missing-of_node_put-np.patch
This is a note to let you know that I've just added the patch titled
PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7f342678634f16795892677204366e835e450dda Mon Sep 17 00:00:00 2001
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
Date: Tue, 17 Oct 2017 05:47:38 -0700
Subject: PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
commit 7f342678634f16795892677204366e835e450dda upstream.
The Cavium ThunderX (CN8XXX) family of PCIe Root Ports does not advertise
an ACS capability. However, the RTL internally implements similar
protection as if ACS had Request Redirection, Completion Redirection,
Source Validation, and Upstream Forwarding features enabled.
Change Cavium ACS capabilities quirk flags accordingly.
Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices")
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
[bhelgaas: tidy changelog, comment, stable tag]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/quirks.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4215,12 +4215,14 @@ static int pci_quirk_amd_sb_acs(struct p
static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
{
/*
- * Cavium devices matching this quirk do not perform peer-to-peer
- * with other functions, allowing masking out these bits as if they
- * were unimplemented in the ACS capability.
+ * Cavium root ports don't advertise an ACS capability. However,
+ * the RTL internally implements similar protection as if ACS had
+ * Request Redirection, Completion Redirection, Source Validation,
+ * and Upstream Forwarding features enabled. Assert that the
+ * hardware implements and enables equivalent ACS functionality for
+ * these flags.
*/
- acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR |
- PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
+ acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF);
if (!((dev->device >= 0xa000) && (dev->device <= 0xa0ff)))
return -ENOTTY;
Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are
queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch
This is a note to let you know that I've just added the patch titled
PCI: hv: Use effective affinity mask
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-hv-use-effective-affinity-mask.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 79aa801e899417a56863d6713f76c4e108856000 Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 1 Nov 2017 20:30:53 +0000
Subject: PCI: hv: Use effective affinity mask
From: Dexuan Cui <decui(a)microsoft.com>
commit 79aa801e899417a56863d6713f76c4e108856000 upstream.
The effective_affinity_mask is always set when an interrupt is assigned in
__assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic
apic_physflat: -> default_cpu_mask_to_apicid() ->
irq_data_update_effective_affinity(), but it looks d->common->affinity
remains all-1's before the user space or the kernel changes it later.
In the early allocation/initialization phase of an IRQ, we should use the
effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to
the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3
VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts.
Tested-by: Adrian Suhov <v-adsuho(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Jake Oshins <jakeo(a)microsoft.com>
Cc: Jork Loeser <jloeser(a)microsoft.com>
Cc: Stephen Hemminger <sthemmin(a)microsoft.com>
Cc: K. Y. Srinivasan <kys(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/host/pci-hyperv.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -879,7 +879,7 @@ static void hv_irq_unmask(struct irq_dat
int cpu;
u64 res;
- dest = irq_data_get_affinity_mask(data);
+ dest = irq_data_get_effective_affinity_mask(data);
pdev = msi_desc_to_pci_dev(msi_desc);
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
@@ -1042,6 +1042,7 @@ static void hv_compose_msi_msg(struct ir
struct hv_pci_dev *hpdev;
struct pci_bus *pbus;
struct pci_dev *pdev;
+ struct cpumask *dest;
struct compose_comp_ctxt comp;
struct tran_int_desc *int_desc;
struct {
@@ -1056,6 +1057,7 @@ static void hv_compose_msi_msg(struct ir
int ret;
pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data));
+ dest = irq_data_get_effective_affinity_mask(data);
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
@@ -1081,14 +1083,14 @@ static void hv_compose_msi_msg(struct ir
switch (pci_protocol_version) {
case PCI_PROTOCOL_VERSION_1_1:
size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1,
- irq_data_get_affinity_mask(data),
+ dest,
hpdev->desc.win_slot.slot,
cfg->vector);
break;
case PCI_PROTOCOL_VERSION_1_2:
size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2,
- irq_data_get_affinity_mask(data),
+ dest,
hpdev->desc.win_slot.slot,
cfg->vector);
break;
Patches currently in stable-queue which might be from decui(a)microsoft.com are
queue-4.14/pci-hv-use-effective-affinity-mask.patch
This is a note to let you know that I've just added the patch titled
PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c00054f540bf81e592e1fee709b0bdbf20f478b5 Mon Sep 17 00:00:00 2001
From: Bjorn Helgaas <bhelgaas(a)google.com>
Date: Mon, 13 Nov 2017 15:05:50 -0600
Subject: PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
From: Bjorn Helgaas <bhelgaas(a)google.com>
commit c00054f540bf81e592e1fee709b0bdbf20f478b5 upstream.
Previously we programmed the LTR_L1.2_THRESHOLD in the parent (upstream)
device using the capability pointer of the *child* (downstream) device,
which corrupted some random word of the parent's config space.
Use the parent's L1 SS capability pointer to program its
LTR_L1.2_THRESHOLD.
Fixes: aeda9adebab8 ("PCI/ASPM: Configure L1 substate settings")
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Vidya Sagar <vidyas(a)nvidia.com>
CC: Rajat Jain <rajatja(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/pcie/aspm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -658,7 +658,7 @@ static void pcie_config_aspm_l1ss(struct
0xFF00, link->l1ss.ctl1);
/* Program LTR L1.2 threshold in both ports */
- pci_clear_and_set_dword(parent, dw_cap_ptr + PCI_L1SS_CTL1,
+ pci_clear_and_set_dword(parent, up_cap_ptr + PCI_L1SS_CTL1,
0xE3FF0000, link->l1ss.ctl1);
pci_clear_and_set_dword(child, dw_cap_ptr + PCI_L1SS_CTL1,
0xE3FF0000, link->l1ss.ctl1);
Patches currently in stable-queue which might be from bhelgaas(a)google.com are
queue-4.14/pci-hv-use-effective-affinity-mask.patch
queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
queue-4.14/pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch
queue-4.14/pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch
queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch
This is a note to let you know that I've just added the patch titled
PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 94ac327e043ee40d7fc57b54541da50507ef4e99 Mon Sep 17 00:00:00 2001
From: Bjorn Helgaas <bhelgaas(a)google.com>
Date: Mon, 13 Nov 2017 08:50:30 -0600
Subject: PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
From: Bjorn Helgaas <bhelgaas(a)google.com>
commit 94ac327e043ee40d7fc57b54541da50507ef4e99 upstream.
Every Port that supports the L1.2 substate advertises its Port
Common_Mode_Restore_Time, i.e., the time the Port requires to re-establish
common mode when exiting L1.2 (see PCIe r3.1, sec 7.33.2).
Per sec 5.5.3.3.1, when exiting L1.2, the Downstream Port (the device at
the upstream end of the link) must send TS1 training sequences for at least
T(COMMONMODE) after it detects electrical idle exit on the Link. We want
this to be long enough for both ends of the Link, so we should set it to
the maximum of the Port Common_Mode_Restore_Time for the upstream and
downstream components on the Link.
Previously we only looked at the Port Common_Mode_Restore_Time of the
upstream device, so if the downstream device required more time, we didn't
program the upstream device's T(COMMONMODE) correctly.
Fixes: f1f0366dd6be ("PCI/ASPM: Calculate and save the L1.2 timing parameters")
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Vidya Sagar <vidyas(a)nvidia.com>
Acked-by: Rajat Jain <rajatja(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/pcie/aspm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -453,7 +453,7 @@ static void aspm_calc_l1ss_info(struct p
/* Choose the greater of the two T_cmn_mode_rstr_time */
val1 = (upreg->l1ss_cap >> 8) & 0xFF;
- val2 = (upreg->l1ss_cap >> 8) & 0xFF;
+ val2 = (dwreg->l1ss_cap >> 8) & 0xFF;
if (val1 > val2)
link->l1ss.ctl1 |= val1 << 8;
else
Patches currently in stable-queue which might be from bhelgaas(a)google.com are
queue-4.14/pci-hv-use-effective-affinity-mask.patch
queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
queue-4.14/pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch
queue-4.14/pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch
queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch
This is a note to let you know that I've just added the patch titled
PCI: Apply Cavium ThunderX ACS quirk to more Root Ports
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f2ddaf8dfd4a5071ad09074d2f95ab85d35c8a1e Mon Sep 17 00:00:00 2001
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
Date: Tue, 17 Oct 2017 05:47:39 -0700
Subject: PCI: Apply Cavium ThunderX ACS quirk to more Root Ports
From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
commit f2ddaf8dfd4a5071ad09074d2f95ab85d35c8a1e upstream.
Extend the Cavium ThunderX ACS quirk to cover more device IDs and restrict
it to only Root Ports.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com>
[bhelgaas: changelog, stable tag]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/quirks.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4212,6 +4212,19 @@ static int pci_quirk_amd_sb_acs(struct p
#endif
}
+static bool pci_quirk_cavium_acs_match(struct pci_dev *dev)
+{
+ /*
+ * Effectively selects all downstream ports for whole ThunderX 1
+ * family by 0xf800 mask (which represents 8 SoCs), while the lower
+ * bits of device ID are used to indicate which subdevice is used
+ * within the SoC.
+ */
+ return (pci_is_pcie(dev) &&
+ (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) &&
+ ((dev->device & 0xf800) == 0xa000));
+}
+
static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
{
/*
@@ -4224,7 +4237,7 @@ static int pci_quirk_cavium_acs(struct p
*/
acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF);
- if (!((dev->device >= 0xa000) && (dev->device <= 0xa0ff)))
+ if (!pci_quirk_cavium_acs_match(dev))
return -ENOTTY;
return acs_flags ? 0 : 1;
Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are
queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch
queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch
This is a note to let you know that I've just added the patch titled
net: mvneta: fix handling of the Tx descriptor counter
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mvneta-fix-handling-of-the-tx-descriptor-counter.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0d63785c6b94b5d2f095f90755825f90eea791f5 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot(a)sequanux.org>
Date: Mon, 13 Nov 2017 16:27:02 +0100
Subject: net: mvneta: fix handling of the Tx descriptor counter
From: Simon Guinot <simon.guinot(a)sequanux.org>
commit 0d63785c6b94b5d2f095f90755825f90eea791f5 upstream.
The mvneta controller provides a 8-bit register to update the pending
Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
added at once. In the current code the mvneta_txq_pend_desc_add function
assumes the caller takes care of this limit. But it is not the case. In
some situations (xmit_more flag), more than 255 descriptors are added.
When this happens, the Tx descriptor counter register is updated with a
wrong value, which breaks the whole Tx queue management.
This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
function to process more than 255 Tx descriptors.
Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
Signed-off-by: Simon Guinot <simon.guinot(a)sequanux.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(str
{
u32 val;
- /* Only 255 descriptors can be added at once ; Assume caller
- * process TX desriptors in quanta less than 256
- */
- val = pend_desc + txq->pending;
- mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+ pend_desc += txq->pending;
+
+ /* Only 255 Tx descriptors can be added at once */
+ do {
+ val = min(pend_desc, 255);
+ mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+ pend_desc -= val;
+ } while (pend_desc > 0);
txq->pending = 0;
}
Patches currently in stable-queue which might be from simon.guinot(a)sequanux.org are
queue-4.14/net-mvneta-fix-handling-of-the-tx-descriptor-counter.patch
This is a note to let you know that I've just added the patch titled
nbd: wait uninterruptible for the dead timeout
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nbd-wait-uninterruptible-for-the-dead-timeout.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ff57dc94faec023abc267cdc45766fccff497557 Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Mon, 6 Nov 2017 16:11:57 -0500
Subject: nbd: wait uninterruptible for the dead timeout
From: Josef Bacik <jbacik(a)fb.com>
commit ff57dc94faec023abc267cdc45766fccff497557 upstream.
If we have a pending signal or the user kills their application then
it'll bring down the whole device, which is less than awesome. Instead
wait uninterruptible for the dead timeout so we're sure we gave it our
best shot.
Fixes: 560bc4b39952 ("nbd: handle dead connections")
Signed-off-by: Josef Bacik <jbacik(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/block/nbd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -723,9 +723,9 @@ static int wait_for_reconnect(struct nbd
return 0;
if (test_bit(NBD_DISCONNECTED, &config->runtime_flags))
return 0;
- wait_event_interruptible_timeout(config->conn_wait,
- atomic_read(&config->live_connections),
- config->dead_conn_timeout);
+ wait_event_timeout(config->conn_wait,
+ atomic_read(&config->live_connections),
+ config->dead_conn_timeout);
return atomic_read(&config->live_connections);
}
Patches currently in stable-queue which might be from jbacik(a)fb.com are
queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch
queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch
This is a note to let you know that I've just added the patch titled
nbd: don't start req until after the dead connection logic
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nbd-don-t-start-req-until-after-the-dead-connection-logic.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Mon, 6 Nov 2017 16:11:58 -0500
Subject: nbd: don't start req until after the dead connection logic
From: Josef Bacik <jbacik(a)fb.com>
commit 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 upstream.
We can end up sleeping for a while waiting for the dead timeout, which
means we could get the per request timer to fire. We did handle this
case, but if the dead timeout happened right after we submitted we'd
either tear down the connection or possibly requeue as we're handling an
error and race with the endio which can lead to panics and other
hilarity.
Fixes: 560bc4b39952 ("nbd: handle dead connections")
Signed-off-by: Josef Bacik <jbacik(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/block/nbd.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -288,15 +288,6 @@ static enum blk_eh_timer_return nbd_xmit
cmd->status = BLK_STS_TIMEOUT;
return BLK_EH_HANDLED;
}
-
- /* If we are waiting on our dead timer then we could get timeout
- * callbacks for our request. For this we just want to reset the timer
- * and let the queue side take care of everything.
- */
- if (!completion_done(&cmd->send_complete)) {
- nbd_config_put(nbd);
- return BLK_EH_RESET_TIMER;
- }
config = nbd->config;
if (config->num_connections > 1) {
@@ -740,6 +731,7 @@ static int nbd_handle_cmd(struct nbd_cmd
if (!refcount_inc_not_zero(&nbd->config_refs)) {
dev_err_ratelimited(disk_to_dev(nbd->disk),
"Socks array is empty\n");
+ blk_mq_start_request(req);
return -EINVAL;
}
config = nbd->config;
@@ -748,6 +740,7 @@ static int nbd_handle_cmd(struct nbd_cmd
dev_err_ratelimited(disk_to_dev(nbd->disk),
"Attempted send on invalid socket\n");
nbd_config_put(nbd);
+ blk_mq_start_request(req);
return -EINVAL;
}
cmd->status = BLK_STS_OK;
@@ -771,6 +764,7 @@ again:
*/
sock_shutdown(nbd);
nbd_config_put(nbd);
+ blk_mq_start_request(req);
return -EIO;
}
goto again;
@@ -781,6 +775,7 @@ again:
* here so that it gets put _after_ the request that is already on the
* dispatch list.
*/
+ blk_mq_start_request(req);
if (unlikely(nsock->pending && nsock->pending != req)) {
blk_mq_requeue_request(req, true);
ret = 0;
@@ -793,10 +788,10 @@ again:
ret = nbd_send_cmd(nbd, cmd, index);
if (ret == -EAGAIN) {
dev_err_ratelimited(disk_to_dev(nbd->disk),
- "Request send failed trying another connection\n");
+ "Request send failed, requeueing\n");
nbd_mark_nsock_dead(nbd, nsock, 1);
- mutex_unlock(&nsock->tx_lock);
- goto again;
+ blk_mq_requeue_request(req, true);
+ ret = 0;
}
out:
mutex_unlock(&nsock->tx_lock);
@@ -820,7 +815,6 @@ static blk_status_t nbd_queue_rq(struct
* done sending everything over the wire.
*/
init_completion(&cmd->send_complete);
- blk_mq_start_request(bd->rq);
/* We can be called directly from the user space process, which means we
* could possibly have signals pending so our sendmsg will fail. In
Patches currently in stable-queue which might be from jbacik(a)fb.com are
queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch
queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Add Raven PCI ID
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-add-raven-pci-id.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Date: Thu, 23 Nov 2017 20:07:00 +0530
Subject: ALSA: hda: Add Raven PCI ID
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream.
This commit adds PCI ID for Raven platform
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/hda_intel.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2463,6 +2463,9 @@ static const struct pci_device_id azx_id
/* AMD Hudson */
{ PCI_DEVICE(0x1022, 0x780d),
.driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
+ /* AMD Raven */
+ { PCI_DEVICE(0x1022, 0x15e3),
+ .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
/* ATI HDMI */
{ PCI_DEVICE(0x1002, 0x0002),
.driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS },
Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are
queue-4.14/alsa-hda-add-raven-pci-id.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Add Raven PCI ID
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-add-raven-pci-id.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Date: Thu, 23 Nov 2017 20:07:00 +0530
Subject: ALSA: hda: Add Raven PCI ID
From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream.
This commit adds PCI ID for Raven platform
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/hda_intel.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2092,6 +2092,9 @@ static const struct pci_device_id azx_id
/* AMD Hudson */
{ PCI_DEVICE(0x1022, 0x780d),
.driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
+ /* AMD Raven */
+ { PCI_DEVICE(0x1022, 0x15e3),
+ .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
/* ATI HDMI */
{ PCI_DEVICE(0x1002, 0x0002),
.driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS },
Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are
queue-3.18/alsa-hda-add-raven-pci-id.patch
On Mon, Nov 27, 2017 at 12:40:36PM +0000, James Hogan wrote:
> Hi Greg,
>
> On Mon, Nov 27, 2017 at 01:35:46PM +0100, gregkh(a)linuxfoundation.org wrote:
> > The patch below was submitted to be applied to the 4.9-stable tree.
> >
> > I fail to see how this patch meets the stable kernel rules as found at
> > Documentation/process/stable-kernel-rules.rst.
> >
> > I could be totally wrong, and if so, please respond to
> > <stable(a)vger.kernel.org> and let me know why this patch should be
> > applied. Otherwise, it is now dropped from my patch queues, never to be
> > seen again.
>
> I should have adjusted the commit message. KERN_WARN doesn't exist so it
> actually fixes a build error as well as switching to pr_warn().
What build error? I have not heard of this breaking the build on 4.9
for the past year, is it in some config that no one uses? :)
thanks,
greg k-h
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001
From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
Date: Wed, 4 Oct 2017 11:35:03 +0530
Subject: [PATCH] PM / OPP: Add missing of_node_put(np)
The for_each_available_child_of_node() loop in _of_add_opp_table_v2()
doesn't drop the reference to "np" on errors. Fix that.
Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings)
Cc: 4.3+ <stable(a)vger.kernel.org> # 4.3+
Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
[ VK: Improved commit log. ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 0b718886479b..87509cb69f79 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct device *dev, struct device_node *opp_np)
dev_err(dev, "%s: Failed to add OPP, %d\n", __func__,
ret);
_dev_pm_opp_remove_table(opp_table, dev, false);
+ of_node_put(np);
goto put_opp_table;
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001
From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
Date: Wed, 4 Oct 2017 11:35:03 +0530
Subject: [PATCH] PM / OPP: Add missing of_node_put(np)
The for_each_available_child_of_node() loop in _of_add_opp_table_v2()
doesn't drop the reference to "np" on errors. Fix that.
Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings)
Cc: 4.3+ <stable(a)vger.kernel.org> # 4.3+
Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
[ VK: Improved commit log. ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 0b718886479b..87509cb69f79 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct device *dev, struct device_node *opp_np)
dev_err(dev, "%s: Failed to add OPP, %d\n", __func__,
ret);
_dev_pm_opp_remove_table(opp_table, dev, false);
+ of_node_put(np);
goto put_opp_table;
}
}
This is a note to let you know that I've just added the patch titled
x86/decoder: Add new TEST instruction pattern
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-decoder-add-new-test-instruction-pattern.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Fri, 24 Nov 2017 13:56:30 +0900
Subject: x86/decoder: Add new TEST instruction pattern
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream.
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -896,7 +896,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Patches currently in stable-queue which might be from mhiramat(a)kernel.org are
queue-4.9/x86-decoder-add-new-test-instruction-pattern.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix typo in mt7628 pinmux function
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:15 +0200
Subject: MIPS: ralink: Fix typo in mt7628 pinmux function
From: Mathias Kresin <dev(a)kresin.me>
commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream.
There is a typo inside the pinmux setup code. The function is called
refclk and not reclk.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16047/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -141,7 +141,7 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) };
static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.9/mips-ralink-fix-mt7628-pinmux.patch
queue-4.9/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
arm64: Implement arch-specific pte_access_permitted()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm64-implement-arch-specific-pte_access_permitted.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6218f96c58dbf44a06aeaf767aab1f54fc397838 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas(a)arm.com>
Date: Thu, 26 Oct 2017 18:36:47 +0100
Subject: arm64: Implement arch-specific pte_access_permitted()
From: Catalin Marinas <catalin.marinas(a)arm.com>
commit 6218f96c58dbf44a06aeaf767aab1f54fc397838 upstream.
The generic pte_access_permitted() implementation only checks for
pte_present() (together with the write permission where applicable).
However, for both kernel ptes and PROT_NONE mappings pte_present() also
returns true on arm64 even though such mappings are not user accessible.
Additionally, arm64 now supports execute-only user permission
(PROT_EXEC) which is implemented by clearing the PTE_USER bit.
With this patch the arm64 implementation of pte_access_permitted()
checks for the PTE_VALID and PTE_USER bits together with writable access
if applicable.
Reported-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -91,6 +91,8 @@ extern unsigned long empty_zero_page[PAG
((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
#define pte_valid_young(pte) \
((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF))
+#define pte_valid_user(pte) \
+ ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
/*
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
@@ -100,6 +102,18 @@ extern unsigned long empty_zero_page[PAG
#define pte_accessible(mm, pte) \
(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
+/*
+ * p??_access_permitted() is true for valid user mappings (subject to the
+ * write permission check) other than user execute-only which do not have the
+ * PTE_USER bit set. PROT_NONE mappings do not have the PTE_VALID bit set.
+ */
+#define pte_access_permitted(pte, write) \
+ (pte_valid_user(pte) && (!(write) || pte_write(pte)))
+#define pmd_access_permitted(pmd, write) \
+ (pte_access_permitted(pmd_pte(pmd), (write)))
+#define pud_access_permitted(pud, write) \
+ (pte_access_permitted(pud_pte(pud), (write)))
+
static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
{
pte_val(pte) &= ~pgprot_val(prot);
Patches currently in stable-queue which might be from catalin.marinas(a)arm.com are
queue-4.9/arm64-implement-arch-specific-pte_access_permitted.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix MT7628 pinmux
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-mt7628-pinmux.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:14 +0200
Subject: MIPS: ralink: Fix MT7628 pinmux
From: Mathias Kresin <dev(a)kresin.me>
commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream.
According to the datasheet the REFCLK pin is shared with GPIO#37 and
the PERST pin is shared with GPIO#36.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16046/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -141,8 +141,8 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) };
-static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.9/mips-ralink-fix-mt7628-pinmux.patch
queue-4.9/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:26 +0100
Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream.
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software
read-only bit to determine whether a page is writable. This
concealed a bug which left the kernel text section writable
(AP2=0) while marked read-only in the software bit.
In a kernel with the AP2 bug, the dump looks like this:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M ro x SHD
0xc0600000-0xc0800000 2M ro NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
The fix is to check that the software and hardware bits are both
set before displaying "ro". The dump then shows the true perms:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M RW x SHD
0xc0600000-0xc0800000 2M RW NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -126,8 +126,8 @@ static const struct prot_bits section_bi
.val = PMD_SECT_USER,
.set = "USR",
}, {
- .mask = L_PMD_SECT_RDONLY,
- .val = L_PMD_SECT_RDONLY,
+ .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
+ .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
.set = "ro",
.clear = "RW",
#elif __LINUX_ARM_ARCH__ >= 6
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.9/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.9/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:25 +0100
Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream.
Currently, for ARM kernels with CONFIG_ARM_LPAE and
CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the
kernel code and rodata are writable. They are marked read-only in
a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit
is not set (PMD_SECT_AP2).
For user mappings, the logic that propagates the software bit
to the hardware bit is in set_pmd_at(); but for the kernel,
section_update() writes the PMDs directly, skipping this logic.
The fix is to set PMD_SECT_AP2 for read-only sections in
section_update(), at the same time as L_PMD_SECT_RDONLY.
Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Reported-by: Neil Dick <neil(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Tested-by: Laura Abbott <labbott(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -619,8 +619,8 @@ static struct section_perm ro_perms[] =
.start = (unsigned long)_stext,
.end = (unsigned long)__init_begin,
#ifdef CONFIG_ARM_LPAE
- .mask = ~L_PMD_SECT_RDONLY,
- .prot = L_PMD_SECT_RDONLY,
+ .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2),
+ .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
#else
.mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE),
.prot = PMD_SECT_APX | PMD_SECT_AP_WRITE,
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.9/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.9/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
x86/decoder: Add new TEST instruction pattern
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-decoder-add-new-test-instruction-pattern.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Fri, 24 Nov 2017 13:56:30 +0900
Subject: x86/decoder: Add new TEST instruction pattern
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream.
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -833,7 +833,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Patches currently in stable-queue which might be from mhiramat(a)kernel.org are
queue-4.4/x86-decoder-add-new-test-instruction-pattern.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix typo in mt7628 pinmux function
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:15 +0200
Subject: MIPS: ralink: Fix typo in mt7628 pinmux function
From: Mathias Kresin <dev(a)kresin.me>
commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream.
There is a typo inside the pinmux setup code. The function is called
refclk and not reclk.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16047/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -141,7 +141,7 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) };
static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.4/mips-ralink-fix-mt7628-pinmux.patch
queue-4.4/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix MT7628 pinmux
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-mt7628-pinmux.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:14 +0200
Subject: MIPS: ralink: Fix MT7628 pinmux
From: Mathias Kresin <dev(a)kresin.me>
commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream.
According to the datasheet the REFCLK pin is shared with GPIO#37 and
the PERST pin is shared with GPIO#36.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16046/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -141,8 +141,8 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) };
-static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.4/mips-ralink-fix-mt7628-pinmux.patch
queue-4.4/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:25 +0100
Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream.
Currently, for ARM kernels with CONFIG_ARM_LPAE and
CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the
kernel code and rodata are writable. They are marked read-only in
a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit
is not set (PMD_SECT_AP2).
For user mappings, the logic that propagates the software bit
to the hardware bit is in set_pmd_at(); but for the kernel,
section_update() writes the PMDs directly, skipping this logic.
The fix is to set PMD_SECT_AP2 for read-only sections in
section_update(), at the same time as L_PMD_SECT_RDONLY.
Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Reported-by: Neil Dick <neil(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Tested-by: Laura Abbott <labbott(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -611,8 +611,8 @@ static struct section_perm ro_perms[] =
.start = (unsigned long)_stext,
.end = (unsigned long)__init_begin,
#ifdef CONFIG_ARM_LPAE
- .mask = ~L_PMD_SECT_RDONLY,
- .prot = L_PMD_SECT_RDONLY,
+ .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2),
+ .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
#else
.mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE),
.prot = PMD_SECT_APX | PMD_SECT_AP_WRITE,
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.4/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.4/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:26 +0100
Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream.
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software
read-only bit to determine whether a page is writable. This
concealed a bug which left the kernel text section writable
(AP2=0) while marked read-only in the software bit.
In a kernel with the AP2 bug, the dump looks like this:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M ro x SHD
0xc0600000-0xc0800000 2M ro NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
The fix is to check that the software and hardware bits are both
set before displaying "ro". The dump then shows the true perms:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M RW x SHD
0xc0600000-0xc0800000 2M RW NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -126,8 +126,8 @@ static const struct prot_bits section_bi
.val = PMD_SECT_USER,
.set = "USR",
}, {
- .mask = L_PMD_SECT_RDONLY,
- .val = L_PMD_SECT_RDONLY,
+ .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
+ .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
.set = "ro",
.clear = "RW",
#elif __LINUX_ARM_ARCH__ >= 6
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.4/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.4/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Wed, 22 Nov 2017 20:39:16 -0800
Subject: x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
From: Andy Lutomirski <luto(a)kernel.org>
commit ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 upstream.
Running this code with IRQs enabled (where dummy_lock is a spinlock):
static void check_load_gs_index(void)
{
/* This will fail. */
load_gs_index(0xffff);
spin_lock(&dummy_lock);
spin_unlock(&dummy_lock);
}
Will generate a lockdep warning. The issue is that the actual write
to %gs would cause an exception with IRQs disabled, and the exception
handler would, as an inadvertent side effect, update irqflag tracing
to reflect the IRQs-off status. native_load_gs_index() would then
turn IRQs back on and return with irqflag tracing still thinking that
IRQs were off. The dummy lock-and-unlock causes lockdep to notice the
error and warn.
Fix it by adding the missing tracing.
Apparently nothing did this in a context where it mattered. I haven't
tried to find a code path that would actually exhibit the warning if
appropriately nasty user code were running.
I suspect that the security impact of this bug is very, very low --
production systems don't run with lockdep enabled, and the warning is
mostly harmless anyway.
Found during a quick audit of the entry code to try to track down an
unrelated bug that Ingo found in some still-in-development code.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.151141191…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -51,15 +51,19 @@ ENTRY(native_usergs_sysret64)
END(native_usergs_sysret64)
#endif /* CONFIG_PARAVIRT */
-.macro TRACE_IRQS_IRETQ
+.macro TRACE_IRQS_FLAGS flags:req
#ifdef CONFIG_TRACE_IRQFLAGS
- bt $9, EFLAGS(%rsp) /* interrupts off? */
+ bt $9, \flags /* interrupts off? */
jnc 1f
TRACE_IRQS_ON
1:
#endif
.endm
+.macro TRACE_IRQS_IRETQ
+ TRACE_IRQS_FLAGS EFLAGS(%rsp)
+.endm
+
/*
* When dynamic function tracer is enabled it will add a breakpoint
* to all locations that it is about to modify, sync CPUs, update
@@ -923,11 +927,13 @@ ENTRY(native_load_gs_index)
FRAME_BEGIN
pushfq
DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
+ TRACE_IRQS_OFF
SWAPGS
.Lgs_change:
movl %edi, %gs
2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
SWAPGS
+ TRACE_IRQS_FLAGS (%rsp)
popfq
FRAME_END
ret
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch
queue-4.14/x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch
This is a note to let you know that I've just added the patch titled
x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 548c3050ea8d16997ae27f9e080a8338a606fc93 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Tue, 21 Nov 2017 20:43:56 -0800
Subject: x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
From: Andy Lutomirski <luto(a)kernel.org>
commit 548c3050ea8d16997ae27f9e080a8338a606fc93 upstream.
When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
before it. This means that users of entry_SYSCALL_64_after_hwframe()
were responsible for invoking TRACE_IRQS_OFF, and the one and only
user (Xen, added in the same commit) got it wrong.
I think this would manifest as a warning if a Xen PV guest with
CONFIG_DEBUG_LOCKDEP=y were used with context tracking. (The
context tracking bit is to cause lockdep to get invoked before we
turn IRQs back on.) I haven't tested that for real yet because I
can't get a kernel configured like that to boot at all on Xen PV.
Move TRACE_IRQS_OFF below the label.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries")
Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.151132544…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -148,8 +148,6 @@ ENTRY(entry_SYSCALL_64)
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
- TRACE_IRQS_OFF
-
/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
@@ -170,6 +168,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
UNWIND_HINT_REGS extra=0
+ TRACE_IRQS_OFF
+
/*
* If we need to do entry work or if we guess we'll need to do
* exit work, go straight to the slow path.
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch
queue-4.14/x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch
This is a note to let you know that I've just added the patch titled
x86/decoder: Add new TEST instruction pattern
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-decoder-add-new-test-instruction-pattern.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Fri, 24 Nov 2017 13:56:30 +0900
Subject: x86/decoder: Add new TEST instruction pattern
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream.
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -896,7 +896,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Patches currently in stable-queue which might be from mhiramat(a)kernel.org are
queue-4.14/x86-decoder-add-new-test-instruction-pattern.patch
This is a note to let you know that I've just added the patch titled
x86/boot: Fix boot failure when SMP MP-table is based at 0
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-boot-fix-boot-failure-when-smp-mp-table-is-based-at-0.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ac5292e9a294618cecb31109d1ba265e3d027ba2 Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 6 Nov 2017 14:17:53 -0600
Subject: x86/boot: Fix boot failure when SMP MP-table is based at 0
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit ac5292e9a294618cecb31109d1ba265e3d027ba2 upstream.
When crosvm is used to boot a kernel as a VM, the SMP MP-table is found
at physical address 0x0. This causes mpf_base to be set to 0 and a
subsequent "if (!mpf_base)" check in default_get_smp_config() results in
the MP-table not being parsed. Further into the boot this results in an
oops when attempting a read_apic_id().
Add a boolean variable that is set to true when the MP-table is found.
Use this variable for testing if the MP-table was found so that even a
value of 0 for mpf_base will result in continued parsing of the MP-table.
Fixes: 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data")
Reported-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: regression(a)leemhuis.info
Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/mpparse.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -431,6 +431,7 @@ static inline void __init construct_defa
}
static unsigned long mpf_base;
+static bool mpf_found;
static unsigned long __init get_mpc_size(unsigned long physptr)
{
@@ -504,7 +505,7 @@ void __init default_get_smp_config(unsig
if (!smp_found_config)
return;
- if (!mpf_base)
+ if (!mpf_found)
return;
if (acpi_lapic && early)
@@ -593,6 +594,7 @@ static int __init smp_scan_config(unsign
smp_found_config = 1;
#endif
mpf_base = base;
+ mpf_found = true;
pr_info("found SMP MP-table at [mem %#010lx-%#010lx] mapped at [%p]\n",
base, base + sizeof(*mpf) - 1, mpf);
@@ -858,7 +860,7 @@ static int __init update_mp_table(void)
if (!enable_update_mptable)
return 0;
- if (!mpf_base)
+ if (!mpf_found)
return 0;
mpf = early_memremap(mpf_base, sizeof(*mpf));
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.14/x86-boot-fix-boot-failure-when-smp-mp-table-is-based-at-0.patch
This is a note to let you know that I've just added the patch titled
uapi: fix linux/tls.h userspace compilation error
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
uapi-fix-linux-tls.h-userspace-compilation-error.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b9f3eb499d84f8d4adcb2f9212ec655700b28228 Mon Sep 17 00:00:00 2001
From: "Dmitry V. Levin" <ldv(a)altlinux.org>
Date: Tue, 14 Nov 2017 06:30:11 +0300
Subject: uapi: fix linux/tls.h userspace compilation error
From: Dmitry V. Levin <ldv(a)altlinux.org>
commit b9f3eb499d84f8d4adcb2f9212ec655700b28228 upstream.
Move inclusion of a private kernel header <net/tcp.h>
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:
/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.
Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/net/tls.h | 4 ++++
include/uapi/linux/tls.h | 4 ----
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -35,6 +35,10 @@
#define _TLS_OFFLOAD_H
#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <linux/socket.h>
+#include <linux/tcp.h>
+#include <net/tcp.h>
#include <uapi/linux/tls.h>
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -35,10 +35,6 @@
#define _UAPI_LINUX_TLS_H
#include <linux/types.h>
-#include <asm/byteorder.h>
-#include <linux/socket.h>
-#include <linux/tcp.h>
-#include <net/tcp.h>
/* TLS socket options */
#define TLS_TX 1 /* Set transmit parameters */
Patches currently in stable-queue which might be from ldv(a)altlinux.org are
queue-4.14/uapi-fix-linux-tls.h-userspace-compilation-error.patch
queue-4.14/uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch
This is a note to let you know that I've just added the patch titled
uapi: fix linux/rxrpc.h userspace compilation errors
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0eef304bc9f7d079a1165e8cd2f24b078e9e1f2a Mon Sep 17 00:00:00 2001
From: "Dmitry V. Levin" <ldv(a)altlinux.org>
Date: Mon, 13 Nov 2017 03:37:06 +0300
Subject: uapi: fix linux/rxrpc.h userspace compilation errors
From: Dmitry V. Levin <ldv(a)altlinux.org>
commit 0eef304bc9f7d079a1165e8cd2f24b078e9e1f2a upstream.
Consistently use types provided by <linux/types.h> to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16'
u16 srx_service; /* service desired */
/usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16'
u16 transport_type; /* type of transport socket (SOCK_DGRAM) */
/usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16'
u16 transport_len; /* length of transport address */
Use __kernel_sa_family_t instead of sa_family_t the same way
as uapi/linux/in.h does, to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t'
sa_family_t srx_family; /* address family */
/usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t'
sa_family_t family; /* transport address family */
Fixes: 727f8914477e ("rxrpc: Expose UAPI definitions to userspace")
Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/uapi/linux/rxrpc.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/include/uapi/linux/rxrpc.h
+++ b/include/uapi/linux/rxrpc.h
@@ -20,12 +20,12 @@
* RxRPC socket address
*/
struct sockaddr_rxrpc {
- sa_family_t srx_family; /* address family */
- u16 srx_service; /* service desired */
- u16 transport_type; /* type of transport socket (SOCK_DGRAM) */
- u16 transport_len; /* length of transport address */
+ __kernel_sa_family_t srx_family; /* address family */
+ __u16 srx_service; /* service desired */
+ __u16 transport_type; /* type of transport socket (SOCK_DGRAM) */
+ __u16 transport_len; /* length of transport address */
union {
- sa_family_t family; /* transport address family */
+ __kernel_sa_family_t family; /* transport address family */
struct sockaddr_in sin; /* IPv4 transport address */
struct sockaddr_in6 sin6; /* IPv6 transport address */
} transport;
Patches currently in stable-queue which might be from ldv(a)altlinux.org are
queue-4.14/uapi-fix-linux-tls.h-userspace-compilation-error.patch
queue-4.14/uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch
This is a note to let you know that I've just added the patch titled
perf/x86/intel: Hide TSX events when RTM is not supported
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
perf-x86-intel-hide-tsx-events-when-rtm-is-not-supported.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 58ba4d5a25579e5c7e312bd359c95f3a9a0a242c Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak(a)linux.intel.com>
Date: Wed, 8 Nov 2017 16:07:18 -0800
Subject: perf/x86/intel: Hide TSX events when RTM is not supported
From: Andi Kleen <ak(a)linux.intel.com>
commit 58ba4d5a25579e5c7e312bd359c95f3a9a0a242c upstream.
0day testing reported a perf test regression on Haswell systems without
RTM. Commit a5df70c35 hides the in_tx/in_tx_cp attributes when RTM is not
available, but the TSX events are still available in sysfs. Due to the
missing attributes the event parser fails on those files.
Don't show the TSX events in sysfs when RTM is not available on
Haswell/Broadwell/Skylake.
Fixes: a5df70c354c2 (perf/x86: Only show format attributes when supported)
Reported-by: kernel test robot <xiaolong.ye(a)intel.com>
Tested-by: Jin Yao <yao.jin(a)linux.intel.com>
Signed-off-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Peter Zijlstra <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/20171109000718.14137-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/events/intel/core.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3730,6 +3730,19 @@ EVENT_ATTR_STR(cycles-t, cycles_t, "even
EVENT_ATTR_STR(cycles-ct, cycles_ct, "event=0x3c,in_tx=1,in_tx_cp=1");
static struct attribute *hsw_events_attrs[] = {
+ EVENT_PTR(mem_ld_hsw),
+ EVENT_PTR(mem_st_hsw),
+ EVENT_PTR(td_slots_issued),
+ EVENT_PTR(td_slots_retired),
+ EVENT_PTR(td_fetch_bubbles),
+ EVENT_PTR(td_total_slots),
+ EVENT_PTR(td_total_slots_scale),
+ EVENT_PTR(td_recovery_bubbles),
+ EVENT_PTR(td_recovery_bubbles_scale),
+ NULL
+};
+
+static struct attribute *hsw_tsx_events_attrs[] = {
EVENT_PTR(tx_start),
EVENT_PTR(tx_commit),
EVENT_PTR(tx_abort),
@@ -3742,18 +3755,16 @@ static struct attribute *hsw_events_attr
EVENT_PTR(el_conflict),
EVENT_PTR(cycles_t),
EVENT_PTR(cycles_ct),
- EVENT_PTR(mem_ld_hsw),
- EVENT_PTR(mem_st_hsw),
- EVENT_PTR(td_slots_issued),
- EVENT_PTR(td_slots_retired),
- EVENT_PTR(td_fetch_bubbles),
- EVENT_PTR(td_total_slots),
- EVENT_PTR(td_total_slots_scale),
- EVENT_PTR(td_recovery_bubbles),
- EVENT_PTR(td_recovery_bubbles_scale),
NULL
};
+static __init struct attribute **get_hsw_events_attrs(void)
+{
+ return boot_cpu_has(X86_FEATURE_RTM) ?
+ merge_attr(hsw_events_attrs, hsw_tsx_events_attrs) :
+ hsw_events_attrs;
+}
+
static ssize_t freeze_on_smi_show(struct device *cdev,
struct device_attribute *attr,
char *buf)
@@ -4182,7 +4193,7 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
- x86_pmu.cpu_events = hsw_events_attrs;
+ x86_pmu.cpu_events = get_hsw_events_attrs();
x86_pmu.lbr_double_abort = true;
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
@@ -4221,7 +4232,7 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
- x86_pmu.cpu_events = hsw_events_attrs;
+ x86_pmu.cpu_events = get_hsw_events_attrs();
x86_pmu.limit_period = bdw_limit_period;
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
@@ -4279,7 +4290,7 @@ __init int intel_pmu_init(void)
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
extra_attr = merge_attr(extra_attr, skl_format_attr);
- x86_pmu.cpu_events = hsw_events_attrs;
+ x86_pmu.cpu_events = get_hsw_events_attrs();
intel_pmu_pebs_data_source_skl(
boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
pr_cont("Skylake events, ");
Patches currently in stable-queue which might be from ak(a)linux.intel.com are
queue-4.14/perf-x86-intel-hide-tsx-events-when-rtm-is-not-supported.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix typo in mt7628 pinmux function
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:15 +0200
Subject: MIPS: ralink: Fix typo in mt7628 pinmux function
From: Mathias Kresin <dev(a)kresin.me>
commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream.
There is a typo inside the pinmux setup code. The function is called
refclk and not reclk.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16047/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -145,7 +145,7 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) };
static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.14/mips-ralink-fix-mt7628-pinmux.patch
queue-4.14/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
MIPS: ralink: Fix MT7628 pinmux
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-ralink-fix-mt7628-pinmux.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001
From: Mathias Kresin <dev(a)kresin.me>
Date: Thu, 11 May 2017 08:11:14 +0200
Subject: MIPS: ralink: Fix MT7628 pinmux
From: Mathias Kresin <dev(a)kresin.me>
commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream.
According to the datasheet the REFCLK pin is shared with GPIO#37 and
the PERST pin is shared with GPIO#36.
Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support")
Signed-off-by: Mathias Kresin <dev(a)kresin.me>
Acked-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16046/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/ralink/mt7620.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -145,8 +145,8 @@ static struct rt2880_pmx_func i2c_grp_mt
FUNC("i2c", 0, 4, 2),
};
-static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) };
-static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) };
+static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) };
+static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) };
static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) };
static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };
Patches currently in stable-queue which might be from dev(a)kresin.me are
queue-4.14/mips-ralink-fix-mt7628-pinmux.patch
queue-4.14/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch
This is a note to let you know that I've just added the patch titled
MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mips-cmpxchg64-and-have_virt_cpu_accounting_gen-don-t-work-for-32-bit-smp.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben(a)decadent.org.uk>
Date: Wed, 4 Oct 2017 03:46:14 +0100
Subject: MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP
From: Ben Hutchings <ben(a)decadent.org.uk>
commit a3f143106596d739e7fbc4b84c96b1475247d876 upstream.
__cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts
on the same CPU (that's what the 'local' means). We can't use it to
implement cmpxchg64() in SMP configurations.
So, for 32-bit SMP configurations:
- Don't define cmpxchg64()
- Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it
Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...")
Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting")
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com>
Cc: linux-mips(a)linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17413/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/mips/Kconfig | 2 +-
arch/mips/include/asm/cmpxchg.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -65,7 +65,7 @@ config MIPS
select HAVE_PERF_EVENTS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_SYSCALL_TRACEPOINTS
- select HAVE_VIRT_CPU_ACCOUNTING_GEN
+ select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select MODULES_USE_ELF_REL if MODULES
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -204,8 +204,10 @@ static inline unsigned long __cmpxchg(vo
#else
#include <asm-generic/cmpxchg-local.h>
#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
+#ifndef CONFIG_SMP
#define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
#endif
+#endif
#undef __scbeqz
Patches currently in stable-queue which might be from ben(a)decadent.org.uk are
queue-4.14/mips-cmpxchg64-and-have_virt_cpu_accounting_gen-don-t-work-for-32-bit-smp.patch
This is a note to let you know that I've just added the patch titled
arm64: Implement arch-specific pte_access_permitted()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm64-implement-arch-specific-pte_access_permitted.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6218f96c58dbf44a06aeaf767aab1f54fc397838 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas(a)arm.com>
Date: Thu, 26 Oct 2017 18:36:47 +0100
Subject: arm64: Implement arch-specific pte_access_permitted()
From: Catalin Marinas <catalin.marinas(a)arm.com>
commit 6218f96c58dbf44a06aeaf767aab1f54fc397838 upstream.
The generic pte_access_permitted() implementation only checks for
pte_present() (together with the write permission where applicable).
However, for both kernel ptes and PROT_NONE mappings pte_present() also
returns true on arm64 even though such mappings are not user accessible.
Additionally, arm64 now supports execute-only user permission
(PROT_EXEC) which is implemented by clearing the PTE_USER bit.
With this patch the arm64 implementation of pte_access_permitted()
checks for the PTE_VALID and PTE_USER bits together with writable access
if applicable.
Reported-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -98,6 +98,8 @@ extern unsigned long empty_zero_page[PAG
((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
#define pte_valid_young(pte) \
((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF))
+#define pte_valid_user(pte) \
+ ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
/*
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
@@ -107,6 +109,18 @@ extern unsigned long empty_zero_page[PAG
#define pte_accessible(mm, pte) \
(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
+/*
+ * p??_access_permitted() is true for valid user mappings (subject to the
+ * write permission check) other than user execute-only which do not have the
+ * PTE_USER bit set. PROT_NONE mappings do not have the PTE_VALID bit set.
+ */
+#define pte_access_permitted(pte, write) \
+ (pte_valid_user(pte) && (!(write) || pte_write(pte)))
+#define pmd_access_permitted(pmd, write) \
+ (pte_access_permitted(pmd_pte(pmd), (write)))
+#define pud_access_permitted(pud, write) \
+ (pte_access_permitted(pud_pte(pud), (write)))
+
static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
{
pte_val(pte) &= ~pgprot_val(prot);
Patches currently in stable-queue which might be from catalin.marinas(a)arm.com are
queue-4.14/arm64-implement-arch-specific-pte_access_permitted.patch
This is a note to let you know that I've just added the patch titled
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:26 +0100
Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream.
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software
read-only bit to determine whether a page is writable. This
concealed a bug which left the kernel text section writable
(AP2=0) while marked read-only in the software bit.
In a kernel with the AP2 bug, the dump looks like this:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M ro x SHD
0xc0600000-0xc0800000 2M ro NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
The fix is to check that the software and hardware bits are both
set before displaying "ro". The dump then shows the true perms:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M RW x SHD
0xc0600000-0xc0800000 2M RW NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -129,8 +129,8 @@ static const struct prot_bits section_bi
.val = PMD_SECT_USER,
.set = "USR",
}, {
- .mask = L_PMD_SECT_RDONLY,
- .val = L_PMD_SECT_RDONLY,
+ .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
+ .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
.set = "ro",
.clear = "RW",
#elif __LINUX_ARM_ARCH__ >= 6
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.14/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.14/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:25 +0100
Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream.
Currently, for ARM kernels with CONFIG_ARM_LPAE and
CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the
kernel code and rodata are writable. They are marked read-only in
a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit
is not set (PMD_SECT_AP2).
For user mappings, the logic that propagates the software bit
to the hardware bit is in set_pmd_at(); but for the kernel,
section_update() writes the PMDs directly, skipping this logic.
The fix is to set PMD_SECT_AP2 for read-only sections in
section_update(), at the same time as L_PMD_SECT_RDONLY.
Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Reported-by: Neil Dick <neil(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Tested-by: Laura Abbott <labbott(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -639,8 +639,8 @@ static struct section_perm ro_perms[] =
.start = (unsigned long)_stext,
.end = (unsigned long)__init_begin,
#ifdef CONFIG_ARM_LPAE
- .mask = ~L_PMD_SECT_RDONLY,
- .prot = L_PMD_SECT_RDONLY,
+ .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2),
+ .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
#else
.mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE),
.prot = PMD_SECT_APX | PMD_SECT_AP_WRITE,
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-4.14/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch
queue-4.14/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
This is a note to let you know that I've just added the patch titled
x86/decoder: Add new TEST instruction pattern
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-decoder-add-new-test-instruction-pattern.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Fri, 24 Nov 2017 13:56:30 +0900
Subject: x86/decoder: Add new TEST instruction pattern
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream.
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -814,7 +814,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Patches currently in stable-queue which might be from mhiramat(a)kernel.org are
queue-3.18/x86-decoder-add-new-test-instruction-pattern.patch
This is a note to let you know that I've just added the patch titled
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001
From: Philip Derrin <philip(a)cog.systems>
Date: Tue, 14 Nov 2017 00:55:26 +0100
Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
From: Philip Derrin <philip(a)cog.systems>
commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream.
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software
read-only bit to determine whether a page is writable. This
concealed a bug which left the kernel text section writable
(AP2=0) while marked read-only in the software bit.
In a kernel with the AP2 bug, the dump looks like this:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M ro x SHD
0xc0600000-0xc0800000 2M ro NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
The fix is to check that the software and hardware bits are both
set before displaying "ro". The dump then shows the true perms:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M RW x SHD
0xc0600000-0xc0800000 2M RW NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE")
Signed-off-by: Philip Derrin <philip(a)cog.systems>
Tested-by: Neil Dick <neil(a)cog.systems>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/mm/dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -126,8 +126,8 @@ static const struct prot_bits section_bi
.val = PMD_SECT_USER,
.set = "USR",
}, {
- .mask = L_PMD_SECT_RDONLY,
- .val = L_PMD_SECT_RDONLY,
+ .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
+ .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2,
.set = "ro",
.clear = "RW",
#elif __LINUX_ARM_ARCH__ >= 6
Patches currently in stable-queue which might be from philip(a)cog.systems are
queue-3.18/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch
From: James Hogan <jhogan(a)kernel.org>
Fix uninitialized variable warning in the Octeon EDAC driver, as seen in
MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU Tools
2016.05-03:
drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’:
drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (int_reg.s.sec_err || int_reg.s.ded_err) {
^
This was introduced in commit 1bc021e81565 ("EDAC: Octeon: Add error
injection support"), and is fixed by initialising the whole int_reg
variable to zero before the conditional assignments in the error
injection case.
Fixes: 1bc021e81565 ("EDAC: Octeon: Add error injection support")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: David Daney <david.daney(a)cavium.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Daniel Walker <dwalker(a)fifo99.com>
Cc: Steven J. Hill <steven.hill(a)cavium.com>
Cc: linux-edac(a)vger.kernel.org
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 3.15+
---
Comments appreciated. Is this correct?
I've added the stable tag on the assumption that this might matter. If
not it can be changed. It'd be nice to have it in 4.14 though to silence
the warning since the driver was added to cavium_octeon_defconfig in
commit f922bc0ad08b ("MIPS: Octeon: cavium_octeon_defconfig: Enable more
drivers").
---
drivers/edac/octeon_edac-lmc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/edac/octeon_edac-lmc.c b/drivers/edac/octeon_edac-lmc.c
index 9c1ffe3e912b..aeb222ca3ed1 100644
--- a/drivers/edac/octeon_edac-lmc.c
+++ b/drivers/edac/octeon_edac-lmc.c
@@ -78,6 +78,7 @@ static void octeon_lmc_edac_poll_o2(struct mem_ctl_info *mci)
if (!pvt->inject)
int_reg.u64 = cvmx_read_csr(CVMX_LMCX_INT(mci->mc_idx));
else {
+ int_reg.u64 = 0;
if (pvt->error_type == 1)
int_reg.s.sec_err = 1;
if (pvt->error_type == 2)
--
2.14.1
The patch below was submitted to be applied to the 4.9-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8593b18ad348733b5d5ddfa0c79dcabf51dff308 Mon Sep 17 00:00:00 2001
From: John Crispin <john(a)phrozen.org>
Date: Mon, 20 Feb 2017 10:29:43 +0100
Subject: [PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
Switch the printk() call to the prefered pr_warn() api.
Fixes: 7e5873d3755c ("MIPS: pci: Add MT7620a PCIE driver")
Signed-off-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.5+
Patchwork: https://patchwork.linux-mips.org/patch/15321/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
diff --git a/arch/mips/pci/pci-mt7620.c b/arch/mips/pci/pci-mt7620.c
index 22e4c22e750b..1a0b80a1cc4a 100644
--- a/arch/mips/pci/pci-mt7620.c
+++ b/arch/mips/pci/pci-mt7620.c
@@ -120,7 +120,7 @@ static int wait_pciephy_busy(void)
else
break;
if (retry++ > WAITRETRY_MAX) {
- printk(KERN_WARN "PCIE-PHY retry failed.\n");
+ pr_warn("PCIE-PHY retry failed.\n");
return -1;
}
}
The patch below was submitted to be applied to the 4.14-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8593b18ad348733b5d5ddfa0c79dcabf51dff308 Mon Sep 17 00:00:00 2001
From: John Crispin <john(a)phrozen.org>
Date: Mon, 20 Feb 2017 10:29:43 +0100
Subject: [PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
Switch the printk() call to the prefered pr_warn() api.
Fixes: 7e5873d3755c ("MIPS: pci: Add MT7620a PCIE driver")
Signed-off-by: John Crispin <john(a)phrozen.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.5+
Patchwork: https://patchwork.linux-mips.org/patch/15321/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
diff --git a/arch/mips/pci/pci-mt7620.c b/arch/mips/pci/pci-mt7620.c
index 22e4c22e750b..1a0b80a1cc4a 100644
--- a/arch/mips/pci/pci-mt7620.c
+++ b/arch/mips/pci/pci-mt7620.c
@@ -120,7 +120,7 @@ static int wait_pciephy_busy(void)
else
break;
if (retry++ > WAITRETRY_MAX) {
- printk(KERN_WARN "PCIE-PHY retry failed.\n");
+ pr_warn("PCIE-PHY retry failed.\n");
return -1;
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben(a)decadent.org.uk>
Date: Wed, 4 Oct 2017 03:46:14 +0100
Subject: [PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work
for 32-bit SMP
__cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts
on the same CPU (that's what the 'local' means). We can't use it to
implement cmpxchg64() in SMP configurations.
So, for 32-bit SMP configurations:
- Don't define cmpxchg64()
- Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it
Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...")
Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting")
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.1+
Patchwork: https://patchwork.linux-mips.org/patch/17413/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 43411118dd4f..b09b644cd57c 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -64,7 +64,7 @@ config MIPS
select HAVE_PERF_EVENTS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_SYSCALL_TRACEPOINTS
- select HAVE_VIRT_CPU_ACCOUNTING_GEN
+ select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select MODULES_USE_ELF_REL if MODULES
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 903f3bf48419..ae2b4583b486 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -202,8 +202,10 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
#else
#include <asm-generic/cmpxchg-local.h>
#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
+#ifndef CONFIG_SMP
#define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
#endif
+#endif
#undef __scbeqz
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben(a)decadent.org.uk>
Date: Wed, 4 Oct 2017 03:46:14 +0100
Subject: [PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work
for 32-bit SMP
__cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts
on the same CPU (that's what the 'local' means). We can't use it to
implement cmpxchg64() in SMP configurations.
So, for 32-bit SMP configurations:
- Don't define cmpxchg64()
- Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it
Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...")
Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting")
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.1+
Patchwork: https://patchwork.linux-mips.org/patch/17413/
Signed-off-by: James Hogan <jhogan(a)kernel.org>
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 43411118dd4f..b09b644cd57c 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -64,7 +64,7 @@ config MIPS
select HAVE_PERF_EVENTS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_SYSCALL_TRACEPOINTS
- select HAVE_VIRT_CPU_ACCOUNTING_GEN
+ select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select MODULES_USE_ELF_REL if MODULES
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 903f3bf48419..ae2b4583b486 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -202,8 +202,10 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
#else
#include <asm-generic/cmpxchg-local.h>
#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
+#ifndef CONFIG_SMP
#define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
#endif
+#endif
#undef __scbeqz
The patch below was submitted to be applied to the 4.14-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 18b4b276b490a8b9f86c512de8a6054c27bb87c2 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:26 +0000
Subject: [PATCH] arm64: mm: Remove arch_apei_flush_tlb_one()
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_fixmap() to do the invalidation. Remove it.
Move the IPI-considered-harmful comment to __set_fixmap().
Signed-off-by: James Morse <james.morse(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 59cca1d6ec54..32f465a80e4e 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -126,18 +126,6 @@ static inline const char *acpi_get_enable_method(int cpu)
*/
#define acpi_disable_cmcff 1
pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
-
-/*
- * Despite its name, this function must still broadcast the TLB
- * invalidation in order to ensure other CPUs don't end up with junk
- * entries as a result of speculation. Unusually, its also called in
- * IRQ context (ghes_iounmap_irq) so if we ever need to use IPIs for
- * TLB broadcasting, then we're in trouble here.
- */
-static inline void arch_apei_flush_tlb_one(unsigned long addr)
-{
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
-}
#endif /* CONFIG_ACPI_APEI */
#ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f1eb15e0e864..267d2b79d52d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -778,6 +778,10 @@ void __init early_fixmap_init(void)
}
}
+/*
+ * Unusually, this is also called in IRQ context (ghes_iounmap_irq) so if we
+ * ever need to use IPIs for TLB broadcasting, then we're in trouble here.
+ */
void __set_fixmap(enum fixed_addresses idx,
phys_addr_t phys, pgprot_t flags)
{
This is a note to let you know that I've just added the patch titled
sched: Make resched_cpu() unconditional
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sched-make-resched_cpu-unconditional.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Date: Mon, 18 Sep 2017 08:54:40 -0700
Subject: sched: Make resched_cpu() unconditional
From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream.
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -507,8 +507,7 @@ void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
- if (!raw_spin_trylock_irqsave(&rq->lock, flags))
- return;
+ raw_spin_lock_irqsave(&rq->lock, flags);
resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are
queue-4.9/sched-make-resched_cpu-unconditional.patch
This is a note to let you know that I've just added the patch titled
lib/mpi: call cond_resched() from mpi_powm() loop
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 7 Nov 2017 14:15:27 -0800
Subject: lib/mpi: call cond_resched() from mpi_powm() loop
From: Eric Biggers <ebiggers(a)google.com>
commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream.
On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling. If all
threads do it, it locks up the system. Moreover, it can cause
rcu_sched-stall warnings.
Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed. It's still noninterruptible, but at
least it's preemptible now.
Do the cond_resched() once per bit rather than once per MPI limb because
each limb might still easily take 100+ milliseconds on slow CPUs.
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/mpi/mpi-pow.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
* however I decided to publish this code under the plain GPL.
*/
+#include <linux/sched.h>
#include <linux/string.h>
#include "mpi-internal.h"
#include "longlong.h"
@@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp,
}
e <<= 1;
c--;
+ cond_resched();
}
i--;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.9/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
This is a note to let you know that I've just added the patch titled
sched: Make resched_cpu() unconditional
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sched-make-resched_cpu-unconditional.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Date: Mon, 18 Sep 2017 08:54:40 -0700
Subject: sched: Make resched_cpu() unconditional
From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream.
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -600,8 +600,7 @@ void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
- if (!raw_spin_trylock_irqsave(&rq->lock, flags))
- return;
+ raw_spin_lock_irqsave(&rq->lock, flags);
resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are
queue-4.4/sched-make-resched_cpu-unconditional.patch
This is a note to let you know that I've just added the patch titled
lib/mpi: call cond_resched() from mpi_powm() loop
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 7 Nov 2017 14:15:27 -0800
Subject: lib/mpi: call cond_resched() from mpi_powm() loop
From: Eric Biggers <ebiggers(a)google.com>
commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream.
On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling. If all
threads do it, it locks up the system. Moreover, it can cause
rcu_sched-stall warnings.
Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed. It's still noninterruptible, but at
least it's preemptible now.
Do the cond_resched() once per bit rather than once per MPI limb because
each limb might still easily take 100+ milliseconds on slow CPUs.
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/mpi/mpi-pow.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
* however I decided to publish this code under the plain GPL.
*/
+#include <linux/sched.h>
#include <linux/string.h>
#include "mpi-internal.h"
#include "longlong.h"
@@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp,
}
e <<= 1;
c--;
+ cond_resched();
}
i--;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.4/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
This is a note to let you know that I've just added the patch titled
serdev: fix registration of second slave
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
serdev-fix-registration-of-second-slave.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 08fcee289f341786eb3b44e5f2d1dc850943238e Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 10 Oct 2017 18:09:49 +0200
Subject: serdev: fix registration of second slave
From: Johan Hovold <johan(a)kernel.org>
commit 08fcee289f341786eb3b44e5f2d1dc850943238e upstream.
Serdev currently only supports a single slave device, but the required
sanity checks to prevent further registration attempts were missing.
If a serial-port node has two child nodes with compatible properties,
the OF code would try to register two slave devices using the same id
and name. Driver core will not allow this (and there will be loud
complaints), but the controller's slave pointer would already have been
set to address of the soon to be deallocated second struct
serdev_device. As the first slave device remains registered, this can
lead to later use-after-free issues when the slave callbacks are
accessed.
Note that while the serdev registration helpers are exported, they are
typically only called by serdev core. Any other (out-of-tree) callers
must serialise registration and deregistration themselves.
Fixes: cd6484e1830b ("serdev: Introduce new bus for serial attached devices")
Cc: Rob Herring <robh(a)kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serdev/core.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
--- a/drivers/tty/serdev/core.c
+++ b/drivers/tty/serdev/core.c
@@ -65,21 +65,32 @@ static int serdev_uevent(struct device *
*/
int serdev_device_add(struct serdev_device *serdev)
{
+ struct serdev_controller *ctrl = serdev->ctrl;
struct device *parent = serdev->dev.parent;
int err;
dev_set_name(&serdev->dev, "%s-%d", dev_name(parent), serdev->nr);
+ /* Only a single slave device is currently supported. */
+ if (ctrl->serdev) {
+ dev_err(&serdev->dev, "controller busy\n");
+ return -EBUSY;
+ }
+ ctrl->serdev = serdev;
+
err = device_add(&serdev->dev);
if (err < 0) {
dev_err(&serdev->dev, "Can't add %s, status %d\n",
dev_name(&serdev->dev), err);
- goto err_device_add;
+ goto err_clear_serdev;
}
dev_dbg(&serdev->dev, "device %s registered\n", dev_name(&serdev->dev));
-err_device_add:
+ return 0;
+
+err_clear_serdev:
+ ctrl->serdev = NULL;
return err;
}
EXPORT_SYMBOL_GPL(serdev_device_add);
@@ -90,7 +101,10 @@ EXPORT_SYMBOL_GPL(serdev_device_add);
*/
void serdev_device_remove(struct serdev_device *serdev)
{
+ struct serdev_controller *ctrl = serdev->ctrl;
+
device_unregister(&serdev->dev);
+ ctrl->serdev = NULL;
}
EXPORT_SYMBOL_GPL(serdev_device_remove);
@@ -295,7 +309,6 @@ struct serdev_device *serdev_device_allo
return NULL;
serdev->ctrl = ctrl;
- ctrl->serdev = serdev;
device_initialize(&serdev->dev);
serdev->dev.parent = &ctrl->dev;
serdev->dev.bus = &serdev_bus_type;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/serdev-fix-registration-of-second-slave.patch
This is a note to let you know that I've just added the patch titled
sched: Make resched_cpu() unconditional
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sched-make-resched_cpu-unconditional.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Date: Mon, 18 Sep 2017 08:54:40 -0700
Subject: sched: Make resched_cpu() unconditional
From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream.
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
- if (!raw_spin_trylock_irqsave(&rq->lock, flags))
- return;
+ raw_spin_lock_irqsave(&rq->lock, flags);
resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are
queue-4.14/sched-make-resched_cpu-unconditional.patch
This is a note to let you know that I've just added the patch titled
lib/mpi: call cond_resched() from mpi_powm() loop
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 7 Nov 2017 14:15:27 -0800
Subject: lib/mpi: call cond_resched() from mpi_powm() loop
From: Eric Biggers <ebiggers(a)google.com>
commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream.
On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling. If all
threads do it, it locks up the system. Moreover, it can cause
rcu_sched-stall warnings.
Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed. It's still noninterruptible, but at
least it's preemptible now.
Do the cond_resched() once per bit rather than once per MPI limb because
each limb might still easily take 100+ milliseconds on slow CPUs.
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/mpi/mpi-pow.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
* however I decided to publish this code under the plain GPL.
*/
+#include <linux/sched.h>
#include <linux/string.h>
#include "mpi-internal.h"
#include "longlong.h"
@@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp,
}
e <<= 1;
c--;
+ cond_resched();
}
i--;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.14/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
This is a note to let you know that I've just added the patch titled
cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
cpufreq-schedutil-reset-cached_raw_freq-when-not-in-sync-with-next_freq.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar(a)linaro.org>
Date: Wed, 8 Nov 2017 20:23:55 +0530
Subject: cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 upstream.
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There is a case where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.
Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then see the CPU wasn't idle recently and choose to keep the next
freq as 1.2 GHz.
- Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is
1.2 GHz.
- Now if the utilization doesn't change in then next request, then the
next target frequency will still be 780 MHz and it will match with
cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here.
Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely)
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/cpufreq_schedutil.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -282,8 +282,12 @@ static void sugov_update_single(struct u
* Do not reduce the frequency if the CPU has not been idle
* recently, as the reduction is likely to be premature then.
*/
- if (busy && next_f < sg_policy->next_freq)
+ if (busy && next_f < sg_policy->next_freq) {
next_f = sg_policy->next_freq;
+
+ /* Reset cached freq as next_freq has changed */
+ sg_policy->cached_raw_freq = 0;
+ }
}
sugov_update_commit(sg_policy, time, next_f);
}
Patches currently in stable-queue which might be from viresh.kumar(a)linaro.org are
queue-4.14/cpufreq-schedutil-reset-cached_raw_freq-when-not-in-sync-with-next_freq.patch
This is a note to let you know that I've just added the patch titled
sched: Make resched_cpu() unconditional
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sched-make-resched_cpu-unconditional.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Date: Mon, 18 Sep 2017 08:54:40 -0700
Subject: sched: Make resched_cpu() unconditional
From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream.
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not. This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):
o CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
synchronize_sched_expedited_wait
ret = swait_event_timeout(rsp->expedited_wq,
sync_rcu_preempt_exp_done(rnp_root),
jiffies_stall);
expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
o CPU5 handles IPI and fails to acquire rq lock.
Handles IPI
sync_sched_exp_handler
resched_cpu
returns while failing to try lock acquire rq->lock
need_resched is not set
o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
o CPU 1 reports RCU stall.
Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.
Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/sched/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -632,8 +632,7 @@ void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
- if (!raw_spin_trylock_irqsave(&rq->lock, flags))
- return;
+ raw_spin_lock_irqsave(&rq->lock, flags);
resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are
queue-3.18/sched-make-resched_cpu-unconditional.patch
This is a note to let you know that I've just added the patch titled
lib/mpi: call cond_resched() from mpi_powm() loop
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 7 Nov 2017 14:15:27 -0800
Subject: lib/mpi: call cond_resched() from mpi_powm() loop
From: Eric Biggers <ebiggers(a)google.com>
commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream.
On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling. If all
threads do it, it locks up the system. Moreover, it can cause
rcu_sched-stall warnings.
Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed. It's still noninterruptible, but at
least it's preemptible now.
Do the cond_resched() once per bit rather than once per MPI limb because
each limb might still easily take 100+ milliseconds on slow CPUs.
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/mpi/mpi-pow.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
* however I decided to publish this code under the plain GPL.
*/
+#include <linux/sched.h>
#include <linux/string.h>
#include "mpi-internal.h"
#include "longlong.h"
@@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp,
}
e <<= 1;
c--;
+ cond_resched();
}
i--;
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-3.18/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch
Include the OF-based modalias in the uevent sent when registering devices
on the sunxi RSB bus, so that user space has a chance to autoload the
kernel module for the device.
Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64:
pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for
the AXP803 PMIC is built as a module, it is not loaded and the system
ends up with an disfunctional MMC controller.
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
---
drivers/bus/sunxi-rsb.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/bus/sunxi-rsb.c b/drivers/bus/sunxi-rsb.c
index 328ca93781cf..37cb57244cbe 100644
--- a/drivers/bus/sunxi-rsb.c
+++ b/drivers/bus/sunxi-rsb.c
@@ -173,11 +173,24 @@ static int sunxi_rsb_device_remove(struct device *dev)
return drv->remove(to_sunxi_rsb_device(dev));
}
+static int sunxi_rsb_device_uevent(struct device *dev,
+ struct kobj_uevent_env *env)
+{
+ int ret;
+
+ ret = of_device_uevent_modalias(dev, env);
+ if (ret != -ENODEV)
+ return ret;
+
+ return 0;
+}
+
static struct bus_type sunxi_rsb_bus = {
.name = RSB_CTRL_NAME,
.match = sunxi_rsb_device_match,
.probe = sunxi_rsb_device_probe,
.remove = sunxi_rsb_device_remove,
+ .uevent = sunxi_rsb_device_uevent,
};
static void sunxi_rsb_dev_release(struct device *dev)
--
2.15.0
This is a note to let you know that I've just added the patch titled
ACPI / APEI: Remove arch_apei_flush_tlb_one()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-apei-remove-arch_apei_flush_tlb_one.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:27 +0000
Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one()
From: James Morse <james.morse(a)arm.com>
commit 4a75aeacda3c2455954596593d89187df5420d0a upstream.
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_pte_vaddr() to do the invalidation when called from clear_fixmap()
Remove arch_apei_flush_tlb_one().
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/acpi/apei.c | 5 -----
include/acpi/apei.h | 1 -
2 files changed, 6 deletions(-)
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev,
apei_mce_report_mem_error(sev, mem_err);
#endif
}
-
-void arch_apei_flush_tlb_one(unsigned long addr)
-{
- __flush_tlb_one(addr);
-}
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -44,7 +44,6 @@ int erst_clear(u64 record_id);
int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-void arch_apei_flush_tlb_one(unsigned long addr);
#endif
#endif
Patches currently in stable-queue which might be from james.morse(a)arm.com are
queue-4.9/acpi-apei-remove-arch_apei_flush_tlb_one.patch
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f5fe9fef5b2da06b6319fab8123056da5217c3f Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Wed, 22 Nov 2017 13:05:48 +0100
Subject: [PATCH] sched/debug: Fix task state recording/printout
The recent conversion of the task state recording to use task_state_index()
broke the sched_switch tracepoint task state output.
task_state_index() returns surprisingly an index (0-7) which is then
printed with __print_flags() applying bitmasks. Not really working and
resulting in weird states like 'prev_state=t' instead of 'prev_state=I'.
Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a
bitmask from the return value of task_state_index() and store it in
entry->prev_state, which makes __print_flags() work as expected.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 306b31de5194..bc01e06bc716 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -116,9 +116,9 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct *
* RUNNING (we will not have dequeued if state != RUNNING).
*/
if (preempt)
- return TASK_STATE_MAX;
+ return TASK_REPORT_MAX;
- return task_state_index(p);
+ return 1 << task_state_index(p);
}
#endif /* CREATE_TRACE_POINTS */
@@ -164,7 +164,7 @@ TRACE_EVENT(sched_switch,
{ 0x40, "P" }, { 0x80, "I" }) :
"R",
- __entry->prev_state & TASK_STATE_MAX ? "+" : "",
+ __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
__entry->next_comm, __entry->next_pid, __entry->next_prio)
);
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Wed, 18 Mar 2015 14:49:46 -0400
Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead
of pulling
When debugging the latencies on a 40 core box, where we hit 300 to
500 microsecond latencies, I found there was a huge contention on the
runqueue locks.
Investigating it further, running ftrace, I found that it was due to
the pulling of RT tasks.
The test that was run was the following:
cyclictest --numa -p95 -m -d0 -i100
This created a thread on each CPU, that would set its wakeup in iterations
of 100 microseconds. The -d0 means that all the threads had the same
interval (100us). Each thread sleeps for 100us and wakes up and measures
its latencies.
cyclictest is maintained at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
What happened was another RT task would be scheduled on one of the CPUs
that was running our test, when the other CPU tests went to sleep and
scheduled idle. This caused the "pull" operation to execute on all
these CPUs. Each one of these saw the RT task that was overloaded on
the CPU of the test that was still running, and each one tried
to grab that task in a thundering herd way.
To grab the task, each thread would do a double rq lock grab, grabbing
its own lock as well as the rq of the overloaded CPU. As the sched
domains on this box was rather flat for its size, I saw up to 12 CPUs
block on this lock at once. This caused a ripple affect with the
rq locks especially since the taking was done via a double rq lock, which
means that several of the CPUs had their own rq locks held while trying
to take this rq lock. As these locks were blocked, any wakeups or load
balanceing on these CPUs would also block on these locks, and the wait
time escalated.
I've tried various methods to lessen the load, but things like an
atomic counter to only let one CPU grab the task wont work, because
the task may have a limited affinity, and we may pick the wrong
CPU to take that lock and do the pull, to only find out that the
CPU we picked isn't in the task's affinity.
Instead of doing the PULL, I now have the CPUs that want the pull to
send over an IPI to the overloaded CPU, and let that CPU pick what
CPU to push the task to. No more need to grab the rq lock, and the
push/pull algorithm still works fine.
With this patch, the latency dropped to just 150us over a 20 hour run.
Without the patch, the huge latencies would trigger in seconds.
I've created a new sched feature called RT_PUSH_IPI, which is enabled
by default.
When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
is enabled, the IPI is sent to the overloaded CPU to do a push.
To enabled or disable this at run time:
# mount -t debugfs nodev /sys/kernel/debug
# echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
or
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
Update: This original patch would send an IPI to all CPUs in the RT overload
list. But that could theoretically cause the reverse issue. That is, there
could be lots of overloaded RT queues and one CPU lowers its priority. It would
then send an IPI to all the overloaded RT queues and they could then all try
to grab the rq lock of the CPU lowering its priority, and then we have the
same problem.
The latest design sends out only one IPI to the first overloaded CPU. It tries to
push any tasks that it can, and then looks for the next overloaded CPU that can
push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
tasks that have priorities greater than the source CPU are covered. In case the
source CPU lowers its priority again, a flag is set to tell the IPI traversal to
restart with the first RT overloaded CPU after the source CPU.
Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Joern Engel <joern(a)purestorage.com>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com>
Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..91e33cd485f6 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
*/
SCHED_FEAT(TTWU_QUEUE, true)
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * In order to avoid a thundering herd attack of CPUs that are
+ * lowering their priorities at the same time, and there being
+ * a single CPU that has an RT task that can migrate and is waiting
+ * to run, where the other CPUs will try to take that CPUs
+ * rq lock and possibly create a large contention, sending an
+ * IPI to that CPU and let that CPU push the RT task to where
+ * it should go may be a better scenario.
+ */
+SCHED_FEAT(RT_PUSH_IPI, true)
+#endif
+
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b077eba0..ad0241561c3e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
#include "sched.h"
#include <linux/slab.h>
+#include <linux/irq_work.h>
int sched_rr_timeslice = RR_TIMESLICE;
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
+#ifdef CONFIG_SMP
+static void push_irq_work_func(struct irq_work *work);
+#endif
+
void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
+
+#ifdef HAVE_RT_PUSH_IPI
+ rt_rq->push_flags = 0;
+ rt_rq->push_cpu = nr_cpu_ids;
+ raw_spin_lock_init(&rt_rq->push_lock);
+ init_irq_work(&rt_rq->push_work, push_irq_work_func);
#endif
+#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
;
}
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * The search for the next cpu always starts at rq->cpu and ends
+ * when we reach rq->cpu again. It will never return rq->cpu.
+ * This returns the next cpu to check, or nr_cpu_ids if the loop
+ * is complete.
+ *
+ * rq->rt.push_cpu holds the last cpu returned by this function,
+ * or if this is the first instance, it must hold rq->cpu.
+ */
+static int rto_next_cpu(struct rq *rq)
+{
+ int prev_cpu = rq->rt.push_cpu;
+ int cpu;
+
+ cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
+
+ /*
+ * If the previous cpu is less than the rq's CPU, then it already
+ * passed the end of the mask, and has started from the beginning.
+ * We end if the next CPU is greater or equal to rq's CPU.
+ */
+ if (prev_cpu < rq->cpu) {
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+
+ } else if (cpu >= nr_cpu_ids) {
+ /*
+ * We passed the end of the mask, start at the beginning.
+ * If the result is greater or equal to the rq's CPU, then
+ * the loop is finished.
+ */
+ cpu = cpumask_first(rq->rd->rto_mask);
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+ }
+ rq->rt.push_cpu = cpu;
+
+ /* Return cpu to let the caller know if the loop is finished or not */
+ return cpu;
+}
+
+static int find_next_push_cpu(struct rq *rq)
+{
+ struct rq *next_rq;
+ int cpu;
+
+ while (1) {
+ cpu = rto_next_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ break;
+ next_rq = cpu_rq(cpu);
+
+ /* Make sure the next rq can push to this rq */
+ if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ break;
+ }
+
+ return cpu;
+}
+
+#define RT_PUSH_IPI_EXECUTING 1
+#define RT_PUSH_IPI_RESTART 2
+
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu;
+
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ raw_spin_lock(&rq->rt.push_lock);
+ /* Make sure it's still executing */
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ /*
+ * Tell the IPI to restart the loop as things have
+ * changed since it started.
+ */
+ rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
+ raw_spin_unlock(&rq->rt.push_lock);
+ return;
+ }
+ raw_spin_unlock(&rq->rt.push_lock);
+ }
+
+ /* When here, there's no IPI going around */
+
+ rq->rt.push_cpu = rq->cpu;
+ cpu = find_next_push_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+
+ irq_work_queue_on(&rq->rt.push_work, cpu);
+}
+
+/* Called from hardirq context */
+static void try_to_push_tasks(void *arg)
+{
+ struct rt_rq *rt_rq = arg;
+ struct rq *rq, *src_rq;
+ int this_cpu;
+ int cpu;
+
+ this_cpu = rt_rq->push_cpu;
+
+ /* Paranoid check */
+ BUG_ON(this_cpu != smp_processor_id());
+
+ rq = cpu_rq(this_cpu);
+ src_rq = rq_of_rt_rq(rt_rq);
+
+again:
+ if (has_pushable_tasks(rq)) {
+ raw_spin_lock(&rq->lock);
+ push_rt_task(rq);
+ raw_spin_unlock(&rq->lock);
+ }
+
+ /* Pass the IPI to the next rt overloaded queue */
+ raw_spin_lock(&rt_rq->push_lock);
+ /*
+ * If the source queue changed since the IPI went out,
+ * we need to restart the search from that CPU again.
+ */
+ if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
+ rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
+ rt_rq->push_cpu = src_rq->cpu;
+ }
+
+ cpu = find_next_push_cpu(src_rq);
+
+ if (cpu >= nr_cpu_ids)
+ rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
+ raw_spin_unlock(&rt_rq->push_lock);
+
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ /*
+ * It is possible that a restart caused this CPU to be
+ * chosen again. Don't bother with an IPI, just see if we
+ * have more to push.
+ */
+ if (unlikely(cpu == rq->cpu))
+ goto again;
+
+ /* Try the next RT overloaded CPU */
+ irq_work_queue_on(&rt_rq->push_work, cpu);
+}
+
+static void push_irq_work_func(struct irq_work *work)
+{
+ struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
+
+ try_to_push_tasks(rt_rq);
+}
+#endif /* HAVE_RT_PUSH_IPI */
+
static int pull_rt_task(struct rq *this_rq)
{
int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
*/
smp_rmb();
+#ifdef HAVE_RT_PUSH_IPI
+ if (sched_feat(RT_PUSH_IPI)) {
+ tell_cpu_to_push(this_rq);
+ return 0;
+ }
+#endif
+
for_each_cpu(cpu, this_rq->rd->rto_mask) {
if (this_cpu == cpu)
continue;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435a2779..c2c0d7bd5027 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/stop_machine.h>
+#include <linux/irq_work.h>
#include <linux/tick.h>
#include <linux/slab.h>
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
return sysctl_sched_rt_runtime >= 0;
}
+/* RT IPI pull logic requires IRQ_WORK */
+#ifdef CONFIG_IRQ_WORK
+# define HAVE_RT_PUSH_IPI
+#endif
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+#ifdef HAVE_RT_PUSH_IPI
+ int push_flags;
+ int push_cpu;
+ struct irq_work push_work;
+ raw_spinlock_t push_lock;
#endif
+#endif /* CONFIG_SMP */
int rt_queued;
int rt_throttled;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Wed, 18 Mar 2015 14:49:46 -0400
Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead
of pulling
When debugging the latencies on a 40 core box, where we hit 300 to
500 microsecond latencies, I found there was a huge contention on the
runqueue locks.
Investigating it further, running ftrace, I found that it was due to
the pulling of RT tasks.
The test that was run was the following:
cyclictest --numa -p95 -m -d0 -i100
This created a thread on each CPU, that would set its wakeup in iterations
of 100 microseconds. The -d0 means that all the threads had the same
interval (100us). Each thread sleeps for 100us and wakes up and measures
its latencies.
cyclictest is maintained at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
What happened was another RT task would be scheduled on one of the CPUs
that was running our test, when the other CPU tests went to sleep and
scheduled idle. This caused the "pull" operation to execute on all
these CPUs. Each one of these saw the RT task that was overloaded on
the CPU of the test that was still running, and each one tried
to grab that task in a thundering herd way.
To grab the task, each thread would do a double rq lock grab, grabbing
its own lock as well as the rq of the overloaded CPU. As the sched
domains on this box was rather flat for its size, I saw up to 12 CPUs
block on this lock at once. This caused a ripple affect with the
rq locks especially since the taking was done via a double rq lock, which
means that several of the CPUs had their own rq locks held while trying
to take this rq lock. As these locks were blocked, any wakeups or load
balanceing on these CPUs would also block on these locks, and the wait
time escalated.
I've tried various methods to lessen the load, but things like an
atomic counter to only let one CPU grab the task wont work, because
the task may have a limited affinity, and we may pick the wrong
CPU to take that lock and do the pull, to only find out that the
CPU we picked isn't in the task's affinity.
Instead of doing the PULL, I now have the CPUs that want the pull to
send over an IPI to the overloaded CPU, and let that CPU pick what
CPU to push the task to. No more need to grab the rq lock, and the
push/pull algorithm still works fine.
With this patch, the latency dropped to just 150us over a 20 hour run.
Without the patch, the huge latencies would trigger in seconds.
I've created a new sched feature called RT_PUSH_IPI, which is enabled
by default.
When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
is enabled, the IPI is sent to the overloaded CPU to do a push.
To enabled or disable this at run time:
# mount -t debugfs nodev /sys/kernel/debug
# echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
or
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
Update: This original patch would send an IPI to all CPUs in the RT overload
list. But that could theoretically cause the reverse issue. That is, there
could be lots of overloaded RT queues and one CPU lowers its priority. It would
then send an IPI to all the overloaded RT queues and they could then all try
to grab the rq lock of the CPU lowering its priority, and then we have the
same problem.
The latest design sends out only one IPI to the first overloaded CPU. It tries to
push any tasks that it can, and then looks for the next overloaded CPU that can
push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
tasks that have priorities greater than the source CPU are covered. In case the
source CPU lowers its priority again, a flag is set to tell the IPI traversal to
restart with the first RT overloaded CPU after the source CPU.
Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Joern Engel <joern(a)purestorage.com>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com>
Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..91e33cd485f6 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
*/
SCHED_FEAT(TTWU_QUEUE, true)
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * In order to avoid a thundering herd attack of CPUs that are
+ * lowering their priorities at the same time, and there being
+ * a single CPU that has an RT task that can migrate and is waiting
+ * to run, where the other CPUs will try to take that CPUs
+ * rq lock and possibly create a large contention, sending an
+ * IPI to that CPU and let that CPU push the RT task to where
+ * it should go may be a better scenario.
+ */
+SCHED_FEAT(RT_PUSH_IPI, true)
+#endif
+
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b077eba0..ad0241561c3e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
#include "sched.h"
#include <linux/slab.h>
+#include <linux/irq_work.h>
int sched_rr_timeslice = RR_TIMESLICE;
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
+#ifdef CONFIG_SMP
+static void push_irq_work_func(struct irq_work *work);
+#endif
+
void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
+
+#ifdef HAVE_RT_PUSH_IPI
+ rt_rq->push_flags = 0;
+ rt_rq->push_cpu = nr_cpu_ids;
+ raw_spin_lock_init(&rt_rq->push_lock);
+ init_irq_work(&rt_rq->push_work, push_irq_work_func);
#endif
+#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
;
}
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * The search for the next cpu always starts at rq->cpu and ends
+ * when we reach rq->cpu again. It will never return rq->cpu.
+ * This returns the next cpu to check, or nr_cpu_ids if the loop
+ * is complete.
+ *
+ * rq->rt.push_cpu holds the last cpu returned by this function,
+ * or if this is the first instance, it must hold rq->cpu.
+ */
+static int rto_next_cpu(struct rq *rq)
+{
+ int prev_cpu = rq->rt.push_cpu;
+ int cpu;
+
+ cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
+
+ /*
+ * If the previous cpu is less than the rq's CPU, then it already
+ * passed the end of the mask, and has started from the beginning.
+ * We end if the next CPU is greater or equal to rq's CPU.
+ */
+ if (prev_cpu < rq->cpu) {
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+
+ } else if (cpu >= nr_cpu_ids) {
+ /*
+ * We passed the end of the mask, start at the beginning.
+ * If the result is greater or equal to the rq's CPU, then
+ * the loop is finished.
+ */
+ cpu = cpumask_first(rq->rd->rto_mask);
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+ }
+ rq->rt.push_cpu = cpu;
+
+ /* Return cpu to let the caller know if the loop is finished or not */
+ return cpu;
+}
+
+static int find_next_push_cpu(struct rq *rq)
+{
+ struct rq *next_rq;
+ int cpu;
+
+ while (1) {
+ cpu = rto_next_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ break;
+ next_rq = cpu_rq(cpu);
+
+ /* Make sure the next rq can push to this rq */
+ if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ break;
+ }
+
+ return cpu;
+}
+
+#define RT_PUSH_IPI_EXECUTING 1
+#define RT_PUSH_IPI_RESTART 2
+
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu;
+
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ raw_spin_lock(&rq->rt.push_lock);
+ /* Make sure it's still executing */
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ /*
+ * Tell the IPI to restart the loop as things have
+ * changed since it started.
+ */
+ rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
+ raw_spin_unlock(&rq->rt.push_lock);
+ return;
+ }
+ raw_spin_unlock(&rq->rt.push_lock);
+ }
+
+ /* When here, there's no IPI going around */
+
+ rq->rt.push_cpu = rq->cpu;
+ cpu = find_next_push_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+
+ irq_work_queue_on(&rq->rt.push_work, cpu);
+}
+
+/* Called from hardirq context */
+static void try_to_push_tasks(void *arg)
+{
+ struct rt_rq *rt_rq = arg;
+ struct rq *rq, *src_rq;
+ int this_cpu;
+ int cpu;
+
+ this_cpu = rt_rq->push_cpu;
+
+ /* Paranoid check */
+ BUG_ON(this_cpu != smp_processor_id());
+
+ rq = cpu_rq(this_cpu);
+ src_rq = rq_of_rt_rq(rt_rq);
+
+again:
+ if (has_pushable_tasks(rq)) {
+ raw_spin_lock(&rq->lock);
+ push_rt_task(rq);
+ raw_spin_unlock(&rq->lock);
+ }
+
+ /* Pass the IPI to the next rt overloaded queue */
+ raw_spin_lock(&rt_rq->push_lock);
+ /*
+ * If the source queue changed since the IPI went out,
+ * we need to restart the search from that CPU again.
+ */
+ if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
+ rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
+ rt_rq->push_cpu = src_rq->cpu;
+ }
+
+ cpu = find_next_push_cpu(src_rq);
+
+ if (cpu >= nr_cpu_ids)
+ rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
+ raw_spin_unlock(&rt_rq->push_lock);
+
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ /*
+ * It is possible that a restart caused this CPU to be
+ * chosen again. Don't bother with an IPI, just see if we
+ * have more to push.
+ */
+ if (unlikely(cpu == rq->cpu))
+ goto again;
+
+ /* Try the next RT overloaded CPU */
+ irq_work_queue_on(&rt_rq->push_work, cpu);
+}
+
+static void push_irq_work_func(struct irq_work *work)
+{
+ struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
+
+ try_to_push_tasks(rt_rq);
+}
+#endif /* HAVE_RT_PUSH_IPI */
+
static int pull_rt_task(struct rq *this_rq)
{
int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
*/
smp_rmb();
+#ifdef HAVE_RT_PUSH_IPI
+ if (sched_feat(RT_PUSH_IPI)) {
+ tell_cpu_to_push(this_rq);
+ return 0;
+ }
+#endif
+
for_each_cpu(cpu, this_rq->rd->rto_mask) {
if (this_cpu == cpu)
continue;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435a2779..c2c0d7bd5027 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/stop_machine.h>
+#include <linux/irq_work.h>
#include <linux/tick.h>
#include <linux/slab.h>
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
return sysctl_sched_rt_runtime >= 0;
}
+/* RT IPI pull logic requires IRQ_WORK */
+#ifdef CONFIG_IRQ_WORK
+# define HAVE_RT_PUSH_IPI
+#endif
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+#ifdef HAVE_RT_PUSH_IPI
+ int push_flags;
+ int push_cpu;
+ struct irq_work push_work;
+ raw_spinlock_t push_lock;
#endif
+#endif /* CONFIG_SMP */
int rt_queued;
int rt_throttled;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Wed, 18 Mar 2015 14:49:46 -0400
Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead
of pulling
When debugging the latencies on a 40 core box, where we hit 300 to
500 microsecond latencies, I found there was a huge contention on the
runqueue locks.
Investigating it further, running ftrace, I found that it was due to
the pulling of RT tasks.
The test that was run was the following:
cyclictest --numa -p95 -m -d0 -i100
This created a thread on each CPU, that would set its wakeup in iterations
of 100 microseconds. The -d0 means that all the threads had the same
interval (100us). Each thread sleeps for 100us and wakes up and measures
its latencies.
cyclictest is maintained at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
What happened was another RT task would be scheduled on one of the CPUs
that was running our test, when the other CPU tests went to sleep and
scheduled idle. This caused the "pull" operation to execute on all
these CPUs. Each one of these saw the RT task that was overloaded on
the CPU of the test that was still running, and each one tried
to grab that task in a thundering herd way.
To grab the task, each thread would do a double rq lock grab, grabbing
its own lock as well as the rq of the overloaded CPU. As the sched
domains on this box was rather flat for its size, I saw up to 12 CPUs
block on this lock at once. This caused a ripple affect with the
rq locks especially since the taking was done via a double rq lock, which
means that several of the CPUs had their own rq locks held while trying
to take this rq lock. As these locks were blocked, any wakeups or load
balanceing on these CPUs would also block on these locks, and the wait
time escalated.
I've tried various methods to lessen the load, but things like an
atomic counter to only let one CPU grab the task wont work, because
the task may have a limited affinity, and we may pick the wrong
CPU to take that lock and do the pull, to only find out that the
CPU we picked isn't in the task's affinity.
Instead of doing the PULL, I now have the CPUs that want the pull to
send over an IPI to the overloaded CPU, and let that CPU pick what
CPU to push the task to. No more need to grab the rq lock, and the
push/pull algorithm still works fine.
With this patch, the latency dropped to just 150us over a 20 hour run.
Without the patch, the huge latencies would trigger in seconds.
I've created a new sched feature called RT_PUSH_IPI, which is enabled
by default.
When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
is enabled, the IPI is sent to the overloaded CPU to do a push.
To enabled or disable this at run time:
# mount -t debugfs nodev /sys/kernel/debug
# echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
or
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
Update: This original patch would send an IPI to all CPUs in the RT overload
list. But that could theoretically cause the reverse issue. That is, there
could be lots of overloaded RT queues and one CPU lowers its priority. It would
then send an IPI to all the overloaded RT queues and they could then all try
to grab the rq lock of the CPU lowering its priority, and then we have the
same problem.
The latest design sends out only one IPI to the first overloaded CPU. It tries to
push any tasks that it can, and then looks for the next overloaded CPU that can
push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
tasks that have priorities greater than the source CPU are covered. In case the
source CPU lowers its priority again, a flag is set to tell the IPI traversal to
restart with the first RT overloaded CPU after the source CPU.
Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Joern Engel <joern(a)purestorage.com>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com>
Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..91e33cd485f6 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
*/
SCHED_FEAT(TTWU_QUEUE, true)
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * In order to avoid a thundering herd attack of CPUs that are
+ * lowering their priorities at the same time, and there being
+ * a single CPU that has an RT task that can migrate and is waiting
+ * to run, where the other CPUs will try to take that CPUs
+ * rq lock and possibly create a large contention, sending an
+ * IPI to that CPU and let that CPU push the RT task to where
+ * it should go may be a better scenario.
+ */
+SCHED_FEAT(RT_PUSH_IPI, true)
+#endif
+
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b077eba0..ad0241561c3e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
#include "sched.h"
#include <linux/slab.h>
+#include <linux/irq_work.h>
int sched_rr_timeslice = RR_TIMESLICE;
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
+#ifdef CONFIG_SMP
+static void push_irq_work_func(struct irq_work *work);
+#endif
+
void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
+
+#ifdef HAVE_RT_PUSH_IPI
+ rt_rq->push_flags = 0;
+ rt_rq->push_cpu = nr_cpu_ids;
+ raw_spin_lock_init(&rt_rq->push_lock);
+ init_irq_work(&rt_rq->push_work, push_irq_work_func);
#endif
+#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
;
}
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * The search for the next cpu always starts at rq->cpu and ends
+ * when we reach rq->cpu again. It will never return rq->cpu.
+ * This returns the next cpu to check, or nr_cpu_ids if the loop
+ * is complete.
+ *
+ * rq->rt.push_cpu holds the last cpu returned by this function,
+ * or if this is the first instance, it must hold rq->cpu.
+ */
+static int rto_next_cpu(struct rq *rq)
+{
+ int prev_cpu = rq->rt.push_cpu;
+ int cpu;
+
+ cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
+
+ /*
+ * If the previous cpu is less than the rq's CPU, then it already
+ * passed the end of the mask, and has started from the beginning.
+ * We end if the next CPU is greater or equal to rq's CPU.
+ */
+ if (prev_cpu < rq->cpu) {
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+
+ } else if (cpu >= nr_cpu_ids) {
+ /*
+ * We passed the end of the mask, start at the beginning.
+ * If the result is greater or equal to the rq's CPU, then
+ * the loop is finished.
+ */
+ cpu = cpumask_first(rq->rd->rto_mask);
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+ }
+ rq->rt.push_cpu = cpu;
+
+ /* Return cpu to let the caller know if the loop is finished or not */
+ return cpu;
+}
+
+static int find_next_push_cpu(struct rq *rq)
+{
+ struct rq *next_rq;
+ int cpu;
+
+ while (1) {
+ cpu = rto_next_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ break;
+ next_rq = cpu_rq(cpu);
+
+ /* Make sure the next rq can push to this rq */
+ if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ break;
+ }
+
+ return cpu;
+}
+
+#define RT_PUSH_IPI_EXECUTING 1
+#define RT_PUSH_IPI_RESTART 2
+
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu;
+
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ raw_spin_lock(&rq->rt.push_lock);
+ /* Make sure it's still executing */
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ /*
+ * Tell the IPI to restart the loop as things have
+ * changed since it started.
+ */
+ rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
+ raw_spin_unlock(&rq->rt.push_lock);
+ return;
+ }
+ raw_spin_unlock(&rq->rt.push_lock);
+ }
+
+ /* When here, there's no IPI going around */
+
+ rq->rt.push_cpu = rq->cpu;
+ cpu = find_next_push_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+
+ irq_work_queue_on(&rq->rt.push_work, cpu);
+}
+
+/* Called from hardirq context */
+static void try_to_push_tasks(void *arg)
+{
+ struct rt_rq *rt_rq = arg;
+ struct rq *rq, *src_rq;
+ int this_cpu;
+ int cpu;
+
+ this_cpu = rt_rq->push_cpu;
+
+ /* Paranoid check */
+ BUG_ON(this_cpu != smp_processor_id());
+
+ rq = cpu_rq(this_cpu);
+ src_rq = rq_of_rt_rq(rt_rq);
+
+again:
+ if (has_pushable_tasks(rq)) {
+ raw_spin_lock(&rq->lock);
+ push_rt_task(rq);
+ raw_spin_unlock(&rq->lock);
+ }
+
+ /* Pass the IPI to the next rt overloaded queue */
+ raw_spin_lock(&rt_rq->push_lock);
+ /*
+ * If the source queue changed since the IPI went out,
+ * we need to restart the search from that CPU again.
+ */
+ if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
+ rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
+ rt_rq->push_cpu = src_rq->cpu;
+ }
+
+ cpu = find_next_push_cpu(src_rq);
+
+ if (cpu >= nr_cpu_ids)
+ rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
+ raw_spin_unlock(&rt_rq->push_lock);
+
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ /*
+ * It is possible that a restart caused this CPU to be
+ * chosen again. Don't bother with an IPI, just see if we
+ * have more to push.
+ */
+ if (unlikely(cpu == rq->cpu))
+ goto again;
+
+ /* Try the next RT overloaded CPU */
+ irq_work_queue_on(&rt_rq->push_work, cpu);
+}
+
+static void push_irq_work_func(struct irq_work *work)
+{
+ struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
+
+ try_to_push_tasks(rt_rq);
+}
+#endif /* HAVE_RT_PUSH_IPI */
+
static int pull_rt_task(struct rq *this_rq)
{
int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
*/
smp_rmb();
+#ifdef HAVE_RT_PUSH_IPI
+ if (sched_feat(RT_PUSH_IPI)) {
+ tell_cpu_to_push(this_rq);
+ return 0;
+ }
+#endif
+
for_each_cpu(cpu, this_rq->rd->rto_mask) {
if (this_cpu == cpu)
continue;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435a2779..c2c0d7bd5027 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/stop_machine.h>
+#include <linux/irq_work.h>
#include <linux/tick.h>
#include <linux/slab.h>
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
return sysctl_sched_rt_runtime >= 0;
}
+/* RT IPI pull logic requires IRQ_WORK */
+#ifdef CONFIG_IRQ_WORK
+# define HAVE_RT_PUSH_IPI
+#endif
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+#ifdef HAVE_RT_PUSH_IPI
+ int push_flags;
+ int push_cpu;
+ struct irq_work push_work;
+ raw_spinlock_t push_lock;
#endif
+#endif /* CONFIG_SMP */
int rt_queued;
int rt_throttled;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Wed, 18 Mar 2015 14:49:46 -0400
Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead
of pulling
When debugging the latencies on a 40 core box, where we hit 300 to
500 microsecond latencies, I found there was a huge contention on the
runqueue locks.
Investigating it further, running ftrace, I found that it was due to
the pulling of RT tasks.
The test that was run was the following:
cyclictest --numa -p95 -m -d0 -i100
This created a thread on each CPU, that would set its wakeup in iterations
of 100 microseconds. The -d0 means that all the threads had the same
interval (100us). Each thread sleeps for 100us and wakes up and measures
its latencies.
cyclictest is maintained at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
What happened was another RT task would be scheduled on one of the CPUs
that was running our test, when the other CPU tests went to sleep and
scheduled idle. This caused the "pull" operation to execute on all
these CPUs. Each one of these saw the RT task that was overloaded on
the CPU of the test that was still running, and each one tried
to grab that task in a thundering herd way.
To grab the task, each thread would do a double rq lock grab, grabbing
its own lock as well as the rq of the overloaded CPU. As the sched
domains on this box was rather flat for its size, I saw up to 12 CPUs
block on this lock at once. This caused a ripple affect with the
rq locks especially since the taking was done via a double rq lock, which
means that several of the CPUs had their own rq locks held while trying
to take this rq lock. As these locks were blocked, any wakeups or load
balanceing on these CPUs would also block on these locks, and the wait
time escalated.
I've tried various methods to lessen the load, but things like an
atomic counter to only let one CPU grab the task wont work, because
the task may have a limited affinity, and we may pick the wrong
CPU to take that lock and do the pull, to only find out that the
CPU we picked isn't in the task's affinity.
Instead of doing the PULL, I now have the CPUs that want the pull to
send over an IPI to the overloaded CPU, and let that CPU pick what
CPU to push the task to. No more need to grab the rq lock, and the
push/pull algorithm still works fine.
With this patch, the latency dropped to just 150us over a 20 hour run.
Without the patch, the huge latencies would trigger in seconds.
I've created a new sched feature called RT_PUSH_IPI, which is enabled
by default.
When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
is enabled, the IPI is sent to the overloaded CPU to do a push.
To enabled or disable this at run time:
# mount -t debugfs nodev /sys/kernel/debug
# echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
or
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
Update: This original patch would send an IPI to all CPUs in the RT overload
list. But that could theoretically cause the reverse issue. That is, there
could be lots of overloaded RT queues and one CPU lowers its priority. It would
then send an IPI to all the overloaded RT queues and they could then all try
to grab the rq lock of the CPU lowering its priority, and then we have the
same problem.
The latest design sends out only one IPI to the first overloaded CPU. It tries to
push any tasks that it can, and then looks for the next overloaded CPU that can
push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
tasks that have priorities greater than the source CPU are covered. In case the
source CPU lowers its priority again, a flag is set to tell the IPI traversal to
restart with the first RT overloaded CPU after the source CPU.
Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Joern Engel <joern(a)purestorage.com>
Cc: Clark Williams <williams(a)redhat.com>
Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com>
Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..91e33cd485f6 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
*/
SCHED_FEAT(TTWU_QUEUE, true)
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * In order to avoid a thundering herd attack of CPUs that are
+ * lowering their priorities at the same time, and there being
+ * a single CPU that has an RT task that can migrate and is waiting
+ * to run, where the other CPUs will try to take that CPUs
+ * rq lock and possibly create a large contention, sending an
+ * IPI to that CPU and let that CPU push the RT task to where
+ * it should go may be a better scenario.
+ */
+SCHED_FEAT(RT_PUSH_IPI, true)
+#endif
+
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b077eba0..ad0241561c3e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
#include "sched.h"
#include <linux/slab.h>
+#include <linux/irq_work.h>
int sched_rr_timeslice = RR_TIMESLICE;
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
+#ifdef CONFIG_SMP
+static void push_irq_work_func(struct irq_work *work);
+#endif
+
void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
rt_rq->rt_nr_migratory = 0;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
+
+#ifdef HAVE_RT_PUSH_IPI
+ rt_rq->push_flags = 0;
+ rt_rq->push_cpu = nr_cpu_ids;
+ raw_spin_lock_init(&rt_rq->push_lock);
+ init_irq_work(&rt_rq->push_work, push_irq_work_func);
#endif
+#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
;
}
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * The search for the next cpu always starts at rq->cpu and ends
+ * when we reach rq->cpu again. It will never return rq->cpu.
+ * This returns the next cpu to check, or nr_cpu_ids if the loop
+ * is complete.
+ *
+ * rq->rt.push_cpu holds the last cpu returned by this function,
+ * or if this is the first instance, it must hold rq->cpu.
+ */
+static int rto_next_cpu(struct rq *rq)
+{
+ int prev_cpu = rq->rt.push_cpu;
+ int cpu;
+
+ cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
+
+ /*
+ * If the previous cpu is less than the rq's CPU, then it already
+ * passed the end of the mask, and has started from the beginning.
+ * We end if the next CPU is greater or equal to rq's CPU.
+ */
+ if (prev_cpu < rq->cpu) {
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+
+ } else if (cpu >= nr_cpu_ids) {
+ /*
+ * We passed the end of the mask, start at the beginning.
+ * If the result is greater or equal to the rq's CPU, then
+ * the loop is finished.
+ */
+ cpu = cpumask_first(rq->rd->rto_mask);
+ if (cpu >= rq->cpu)
+ return nr_cpu_ids;
+ }
+ rq->rt.push_cpu = cpu;
+
+ /* Return cpu to let the caller know if the loop is finished or not */
+ return cpu;
+}
+
+static int find_next_push_cpu(struct rq *rq)
+{
+ struct rq *next_rq;
+ int cpu;
+
+ while (1) {
+ cpu = rto_next_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ break;
+ next_rq = cpu_rq(cpu);
+
+ /* Make sure the next rq can push to this rq */
+ if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+ break;
+ }
+
+ return cpu;
+}
+
+#define RT_PUSH_IPI_EXECUTING 1
+#define RT_PUSH_IPI_RESTART 2
+
+static void tell_cpu_to_push(struct rq *rq)
+{
+ int cpu;
+
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ raw_spin_lock(&rq->rt.push_lock);
+ /* Make sure it's still executing */
+ if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+ /*
+ * Tell the IPI to restart the loop as things have
+ * changed since it started.
+ */
+ rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
+ raw_spin_unlock(&rq->rt.push_lock);
+ return;
+ }
+ raw_spin_unlock(&rq->rt.push_lock);
+ }
+
+ /* When here, there's no IPI going around */
+
+ rq->rt.push_cpu = rq->cpu;
+ cpu = find_next_push_cpu(rq);
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+
+ irq_work_queue_on(&rq->rt.push_work, cpu);
+}
+
+/* Called from hardirq context */
+static void try_to_push_tasks(void *arg)
+{
+ struct rt_rq *rt_rq = arg;
+ struct rq *rq, *src_rq;
+ int this_cpu;
+ int cpu;
+
+ this_cpu = rt_rq->push_cpu;
+
+ /* Paranoid check */
+ BUG_ON(this_cpu != smp_processor_id());
+
+ rq = cpu_rq(this_cpu);
+ src_rq = rq_of_rt_rq(rt_rq);
+
+again:
+ if (has_pushable_tasks(rq)) {
+ raw_spin_lock(&rq->lock);
+ push_rt_task(rq);
+ raw_spin_unlock(&rq->lock);
+ }
+
+ /* Pass the IPI to the next rt overloaded queue */
+ raw_spin_lock(&rt_rq->push_lock);
+ /*
+ * If the source queue changed since the IPI went out,
+ * we need to restart the search from that CPU again.
+ */
+ if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
+ rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
+ rt_rq->push_cpu = src_rq->cpu;
+ }
+
+ cpu = find_next_push_cpu(src_rq);
+
+ if (cpu >= nr_cpu_ids)
+ rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
+ raw_spin_unlock(&rt_rq->push_lock);
+
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ /*
+ * It is possible that a restart caused this CPU to be
+ * chosen again. Don't bother with an IPI, just see if we
+ * have more to push.
+ */
+ if (unlikely(cpu == rq->cpu))
+ goto again;
+
+ /* Try the next RT overloaded CPU */
+ irq_work_queue_on(&rt_rq->push_work, cpu);
+}
+
+static void push_irq_work_func(struct irq_work *work)
+{
+ struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
+
+ try_to_push_tasks(rt_rq);
+}
+#endif /* HAVE_RT_PUSH_IPI */
+
static int pull_rt_task(struct rq *this_rq)
{
int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
*/
smp_rmb();
+#ifdef HAVE_RT_PUSH_IPI
+ if (sched_feat(RT_PUSH_IPI)) {
+ tell_cpu_to_push(this_rq);
+ return 0;
+ }
+#endif
+
for_each_cpu(cpu, this_rq->rd->rto_mask) {
if (this_cpu == cpu)
continue;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435a2779..c2c0d7bd5027 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/stop_machine.h>
+#include <linux/irq_work.h>
#include <linux/tick.h>
#include <linux/slab.h>
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
return sysctl_sched_rt_runtime >= 0;
}
+/* RT IPI pull logic requires IRQ_WORK */
+#ifdef CONFIG_IRQ_WORK
+# define HAVE_RT_PUSH_IPI
+#endif
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+#ifdef HAVE_RT_PUSH_IPI
+ int push_flags;
+ int push_cpu;
+ struct irq_work push_work;
+ raw_spinlock_t push_lock;
#endif
+#endif /* CONFIG_SMP */
int rt_queued;
int rt_throttled;
This is a note to let you know that I've just added the patch titled
vsock: use new wait API for vsock_stream_sendmsg()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 499fde662f1957e3cb8d192a94a099ebe19c714b Mon Sep 17 00:00:00 2001
From: WANG Cong <xiyou.wangcong(a)gmail.com>
Date: Fri, 19 May 2017 11:21:59 -0700
Subject: vsock: use new wait API for vsock_stream_sendmsg()
From: WANG Cong <xiyou.wangcong(a)gmail.com>
commit 499fde662f1957e3cb8d192a94a099ebe19c714b upstream.
As reported by Michal, vsock_stream_sendmsg() could still
sleep at vsock_stream_has_space() after prepare_to_wait():
vsock_stream_has_space
vmci_transport_stream_has_space
vmci_qpair_produce_free_space
qp_lock
qp_acquire_queue_mutex
mutex_lock
Just switch to the new wait API like we did for commit
d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()").
Reported-by: Michal Kubecek <mkubecek(a)suse.cz>
Cc: Stefan Hajnoczi <stefanha(a)redhat.com>
Cc: Jorgen Hansen <jhansen(a)vmware.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: "Jorgen S. Hansen" <jhansen(a)vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/vmw_vsock/af_vsock.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1524,8 +1524,7 @@ static int vsock_stream_sendmsg(struct s
long timeout;
int err;
struct vsock_transport_send_notify_data send_data;
-
- DEFINE_WAIT(wait);
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
sk = sock->sk;
vsk = vsock_sk(sk);
@@ -1568,11 +1567,10 @@ static int vsock_stream_sendmsg(struct s
if (err < 0)
goto out;
-
while (total_written < len) {
ssize_t written;
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+ add_wait_queue(sk_sleep(sk), &wait);
while (vsock_stream_has_space(vsk) == 0 &&
sk->sk_err == 0 &&
!(sk->sk_shutdown & SEND_SHUTDOWN) &&
@@ -1581,33 +1579,30 @@ static int vsock_stream_sendmsg(struct s
/* Don't wait for non-blocking sockets. */
if (timeout == 0) {
err = -EAGAIN;
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
err = transport->notify_send_pre_block(vsk, &send_data);
if (err < 0) {
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
release_sock(sk);
- timeout = schedule_timeout(timeout);
+ timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout);
lock_sock(sk);
if (signal_pending(current)) {
err = sock_intr_errno(timeout);
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
} else if (timeout == 0) {
err = -EAGAIN;
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
-
- prepare_to_wait(sk_sleep(sk), &wait,
- TASK_INTERRUPTIBLE);
}
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
/* These checks occur both as part of and after the loop
* conditional since we need to check before and after
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-4.9/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
queue-4.9/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
This is a note to let you know that I've just added the patch titled
ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001
From: WANG Cong <xiyou.wangcong(a)gmail.com>
Date: Tue, 20 Jun 2017 11:42:27 -0700
Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
From: WANG Cong <xiyou.wangcong(a)gmail.com>
commit 76da0704507bbc51875013f6557877ab308cfd0a upstream.
In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.
We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.
Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: jeffy <jeffy.chen(a)rock-chips.com>
Cc: David Ahern <dsahern(a)gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Acked-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/route.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3495,7 +3495,11 @@ static int ip6_route_dev_notify(struct n
net->ipv6.ip6_blk_hole_entry->dst.dev = dev;
net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev);
#endif
- } else if (event == NETDEV_UNREGISTER) {
+ } else if (event == NETDEV_UNREGISTER &&
+ dev->reg_state != NETREG_UNREGISTERED) {
+ /* NETDEV_UNREGISTER could be fired for multiple times by
+ * netdev_wait_allrefs(). Make sure we only call this once.
+ */
in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-4.9/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
queue-4.9/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
This is a note to let you know that I've just added the patch titled
vsock: use new wait API for vsock_stream_sendmsg()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 499fde662f1957e3cb8d192a94a099ebe19c714b Mon Sep 17 00:00:00 2001
From: WANG Cong <xiyou.wangcong(a)gmail.com>
Date: Fri, 19 May 2017 11:21:59 -0700
Subject: vsock: use new wait API for vsock_stream_sendmsg()
From: WANG Cong <xiyou.wangcong(a)gmail.com>
commit 499fde662f1957e3cb8d192a94a099ebe19c714b upstream.
As reported by Michal, vsock_stream_sendmsg() could still
sleep at vsock_stream_has_space() after prepare_to_wait():
vsock_stream_has_space
vmci_transport_stream_has_space
vmci_qpair_produce_free_space
qp_lock
qp_acquire_queue_mutex
mutex_lock
Just switch to the new wait API like we did for commit
d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()").
Reported-by: Michal Kubecek <mkubecek(a)suse.cz>
Cc: Stefan Hajnoczi <stefanha(a)redhat.com>
Cc: Jorgen Hansen <jhansen(a)vmware.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: "Jorgen S. Hansen" <jhansen(a)vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/vmw_vsock/af_vsock.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1512,8 +1512,7 @@ static int vsock_stream_sendmsg(struct s
long timeout;
int err;
struct vsock_transport_send_notify_data send_data;
-
- DEFINE_WAIT(wait);
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
sk = sock->sk;
vsk = vsock_sk(sk);
@@ -1556,11 +1555,10 @@ static int vsock_stream_sendmsg(struct s
if (err < 0)
goto out;
-
while (total_written < len) {
ssize_t written;
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+ add_wait_queue(sk_sleep(sk), &wait);
while (vsock_stream_has_space(vsk) == 0 &&
sk->sk_err == 0 &&
!(sk->sk_shutdown & SEND_SHUTDOWN) &&
@@ -1569,33 +1567,30 @@ static int vsock_stream_sendmsg(struct s
/* Don't wait for non-blocking sockets. */
if (timeout == 0) {
err = -EAGAIN;
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
err = transport->notify_send_pre_block(vsk, &send_data);
if (err < 0) {
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
release_sock(sk);
- timeout = schedule_timeout(timeout);
+ timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout);
lock_sock(sk);
if (signal_pending(current)) {
err = sock_intr_errno(timeout);
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
} else if (timeout == 0) {
err = -EAGAIN;
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
goto out_err;
}
-
- prepare_to_wait(sk_sleep(sk), &wait,
- TASK_INTERRUPTIBLE);
}
- finish_wait(sk_sleep(sk), &wait);
+ remove_wait_queue(sk_sleep(sk), &wait);
/* These checks occur both as part of and after the loop
* conditional since we need to check before and after
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-4.4/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
queue-4.4/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
This is a note to let you know that I've just added the patch titled
ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001
From: WANG Cong <xiyou.wangcong(a)gmail.com>
Date: Tue, 20 Jun 2017 11:42:27 -0700
Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
From: WANG Cong <xiyou.wangcong(a)gmail.com>
commit 76da0704507bbc51875013f6557877ab308cfd0a upstream.
In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.
We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.
Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: jeffy <jeffy.chen(a)rock-chips.com>
Cc: David Ahern <dsahern(a)gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Acked-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/route.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3378,7 +3378,11 @@ static int ip6_route_dev_notify(struct n
net->ipv6.ip6_blk_hole_entry->dst.dev = dev;
net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev);
#endif
- } else if (event == NETDEV_UNREGISTER) {
+ } else if (event == NETDEV_UNREGISTER &&
+ dev->reg_state != NETREG_UNREGISTERED) {
+ /* NETDEV_UNREGISTER could be fired for multiple times by
+ * netdev_wait_allrefs(). Make sure we only call this once.
+ */
in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-4.4/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch
queue-4.4/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
This is a note to let you know that I've just added the patch titled
ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001
From: WANG Cong <xiyou.wangcong(a)gmail.com>
Date: Tue, 20 Jun 2017 11:42:27 -0700
Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
From: WANG Cong <xiyou.wangcong(a)gmail.com>
commit 76da0704507bbc51875013f6557877ab308cfd0a upstream.
In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.
We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.
Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: jeffy <jeffy.chen(a)rock-chips.com>
Cc: David Ahern <dsahern(a)gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Acked-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/route.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2827,7 +2827,11 @@ static int ip6_route_dev_notify(struct n
net->ipv6.ip6_blk_hole_entry->dst.dev = dev;
net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev);
#endif
- } else if (event == NETDEV_UNREGISTER) {
+ } else if (event == NETDEV_UNREGISTER &&
+ dev->reg_state != NETREG_UNREGISTERED) {
+ /* NETDEV_UNREGISTER could be fired for multiple times by
+ * netdev_wait_allrefs(). Make sure we only call this once.
+ */
in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-3.18/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch
Hi,
Customers running VMware Workstation on 4.4 kernels have been hitting a couple of bugs where net/vmw_vsock/af_vsock.c is calling functions that may sleep when not allowed to. These issues have already been fixed in later kernels, and we would like to request these fixes backported to 4.4 and 4.9 LTS branches.
To resolve the above issue, we would like the following commit backported to 4.4 (it applies when using 3-way merge):
commit 265563fc8f123641d006d65f26d18d0b24d3022d "AF_VSOCK: Shrink the area influenced by prepare_to_wait"
and the following commit (that needs to be applied on top of the above) backported to both 4.4 and 4.9:
commit 359669f3b8eab6cbfb83eb1a46ec6ba089d47d18 "vsock: use new wait API for vsock_stream_sendmsg()”
The backports have been tested with kernel 4.4.100 and 4.9.64.
Thanks,
Jorgen
At least 3.18, 4.4 and 4.9 already have
242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
which has bug fixed in:
76da0704507b ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")
This is a note to let you know that I've just added the patch titled
ACPI / APEI: Remove arch_apei_flush_tlb_one()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-apei-remove-arch_apei_flush_tlb_one.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:27 +0000
Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one()
From: James Morse <james.morse(a)arm.com>
commit 4a75aeacda3c2455954596593d89187df5420d0a upstream.
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_pte_vaddr() to do the invalidation when called from clear_fixmap()
Remove arch_apei_flush_tlb_one().
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/acpi/apei.c | 5 -----
include/acpi/apei.h | 1 -
2 files changed, 6 deletions(-)
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -52,8 +52,3 @@ void arch_apei_report_mem_error(int sev,
apei_mce_report_mem_error(sev, mem_err);
#endif
}
-
-void arch_apei_flush_tlb_one(unsigned long addr)
-{
- __flush_tlb_one(addr);
-}
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -51,7 +51,6 @@ int erst_clear(u64 record_id);
int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-void arch_apei_flush_tlb_one(unsigned long addr);
#endif
#endif
Patches currently in stable-queue which might be from james.morse(a)arm.com are
queue-4.14/acpi-apei-remove-arch_apei_flush_tlb_one.patch
This is a note to let you know that I've just added the patch titled
ACPI / APEI: Remove arch_apei_flush_tlb_one()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-apei-remove-arch_apei_flush_tlb_one.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:27 +0000
Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one()
From: James Morse <james.morse(a)arm.com>
commit 4a75aeacda3c2455954596593d89187df5420d0a upstream.
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_pte_vaddr() to do the invalidation when called from clear_fixmap()
Remove arch_apei_flush_tlb_one().
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/acpi/apei.c | 5 -----
include/acpi/apei.h | 1 -
2 files changed, 6 deletions(-)
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev,
apei_mce_report_mem_error(sev, mem_err);
#endif
}
-
-void arch_apei_flush_tlb_one(unsigned long addr)
-{
- __flush_tlb_one(addr);
-}
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -44,7 +44,6 @@ int erst_clear(u64 record_id);
int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-void arch_apei_flush_tlb_one(unsigned long addr);
#endif
#endif
Patches currently in stable-queue which might be from james.morse(a)arm.com are
queue-4.4/acpi-apei-remove-arch_apei_flush_tlb_one.patch
On Mon, Nov 27, 2017 at 10:39:25AM +0100, Tomas Charvat wrote:
> Ok under heavy load patch
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?i…
>
> cause following error once in few minutes, it doesn't happen instantly
This patch simply reverts to the old behaviour, this bug does not seem
to be related to the revert.
Is the commit that you referring the head commit of your branch,
or did you picked this one to some other branch?
Can you give some details on your setup and send your .config?
The snd_usb_copy_string_desc() retrieves the usb string corresponding to
the index number through the usb_string(). The problem is that the
usb_string() returns the length of the string (>= 0) when successful, but
it can also return a negative value about the error case or status of
usb_control_msg().
If iClockSource is '0' as shown below, usb_string() will returns -EINVAL.
This will result in '0' being inserted into buf[-22], and the following
KASAN out-of-bound error message will be output.
AudioControl Interface Descriptor:
bLength 8
bDescriptorType 36
bDescriptorSubtype 10 (CLOCK_SOURCE)
bClockID 1
bmAttributes 0x07 Internal programmable Clock (synced to SOF)
bmControls 0x07
Clock Frequency Control (read/write)
Clock Validity Control (read-only)
bAssocTerminal 0
iClockSource 0
To fix it, check usb_string() return value and bail out.
==================================================================
BUG: KASAN: stack-out-of-bounds in parse_audio_unit+0x1327/0x1960 [snd_usb_audio]
Write of size 1 at addr ffff88007e66735a by task systemd-udevd/18376
CPU: 0 PID: 18376 Comm: systemd-udevd Not tainted 4.13.0+ #3
Hardware name: LG Electronics 15N540-RFLGL/White Tip Mountain, BIOS 15N5
Call Trace:
dump_stack+0x63/0x8d
print_address_description+0x70/0x290
? parse_audio_unit+0x1327/0x1960 [snd_usb_audio]
kasan_report+0x265/0x350
__asan_store1+0x4a/0x50
parse_audio_unit+0x1327/0x1960 [snd_usb_audio]
? save_stack+0xb5/0xd0
? save_stack_trace+0x1b/0x20
? save_stack+0x46/0xd0
? kasan_kmalloc+0xad/0xe0
? kmem_cache_alloc_trace+0xff/0x230
? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio]
? usb_audio_probe+0x4de/0xf40 [snd_usb_audio]
? usb_probe_interface+0x1f5/0x440
? driver_probe_device+0x3ed/0x660
? build_feature_ctl+0xb10/0xb10 [snd_usb_audio]
? save_stack_trace+0x1b/0x20
? init_object+0x69/0xa0
? snd_usb_find_csint_desc+0xa8/0xf0 [snd_usb_audio]
snd_usb_mixer_controls+0x1dc/0x370 [snd_usb_audio]
? build_audio_procunit+0x890/0x890 [snd_usb_audio]
? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio]
? kmem_cache_alloc_trace+0xff/0x230
? usb_ifnum_to_if+0xbd/0xf0
snd_usb_create_mixer+0x25b/0x4b0 [snd_usb_audio]
? snd_usb_create_stream+0x255/0x2c0 [snd_usb_audio]
usb_audio_probe+0x4de/0xf40 [snd_usb_audio]
? snd_usb_autosuspend.part.7+0x30/0x30 [snd_usb_audio]
? __pm_runtime_idle+0x90/0x90
? kernfs_activate+0xa6/0xc0
? usb_match_one_id_intf+0xdc/0x130
? __pm_runtime_set_status+0x2d4/0x450
usb_probe_interface+0x1f5/0x440
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jaejoong Kim <climbbb.kim(a)gmail.com>
---
sound/usb/mixer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index e630813..da7cbe7 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -204,6 +204,10 @@ static int snd_usb_copy_string_desc(struct mixer_build *state,
int index, char *buf, int maxlen)
{
int len = usb_string(state->chip->dev, index, buf, maxlen - 1);
+
+ if (len < 0)
+ return len;
+
buf[len] = 0;
return len;
}
--
2.7.4
Commit cb0631fd3cf9 ("x86/mm: fix use-after-free of vma during userfaultfd
fault") went into mainline without Cc: stable. It appears to be a
use-after-free reachable by unprivileged users -- at least with
CONFIG_USERFAULTFD=y. Can it please be applied to 4.9-stable?
Eric
This is a note to let you know that I've just added the patch titled
x86/mm: fix use-after-free of vma during userfaultfd fault
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-fix-use-after-free-of-vma-during-userfaultfd-fault.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From cb0631fd3cf9e989cd48293fe631cbc402aec9a9 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka(a)suse.cz>
Date: Wed, 1 Nov 2017 08:21:25 +0100
Subject: x86/mm: fix use-after-free of vma during userfaultfd fault
From: Vlastimil Babka <vbabka(a)suse.cz>
commit cb0631fd3cf9e989cd48293fe631cbc402aec9a9 upstream.
Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
__do_page_fault() with the following reproducer:
mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
r0 = userfaultfd(0x0)
ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
r1 = gettid()
syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
tkill(r1, 0x7)
The vma should be pinned by mmap_sem, but handle_userfault() might (in a
return to userspace scenario) release it and then acquire again, so when
we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
the vma might be gone.
Specifically, per Andrea the scenario is
"A return to userland to repeat the page fault later with a
VM_FAULT_NOPAGE retval (potentially after handling any pending signal
during the return to userland). The return to userland is identified
whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
vmf->flags"
However, since commit a3c4fb7c9c2e ("x86/mm: Fix fault error path using
unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
that point, which can thus become use-after-free. Fix this by moving
the read before calling handle_mm_fault().
Reported-by: syzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6(a)syzkaller.appspotmail.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Suggested-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Eric Biggers <ebiggers3(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/fault.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1393,7 +1393,17 @@ good_area:
* make sure we exit gracefully rather than endlessly redo
* the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if
* we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
+ *
+ * Note that handle_userfault() may also release and reacquire mmap_sem
+ * (and not return with VM_FAULT_RETRY), when returning to userland to
+ * repeat the page fault later with a VM_FAULT_NOPAGE retval
+ * (potentially after handling any pending signal during the return to
+ * userland). The return to userland is identified whenever
+ * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
+ * Thus we have to be careful about not touching vma after handling the
+ * fault, so we read the pkey beforehand.
*/
+ pkey = vma_pkey(vma);
fault = handle_mm_fault(vma, address, flags);
major |= fault & VM_FAULT_MAJOR;
@@ -1420,7 +1430,6 @@ good_area:
return;
}
- pkey = vma_pkey(vma);
up_read(&mm->mmap_sem);
if (unlikely(fault & VM_FAULT_ERROR)) {
mm_fault_error(regs, error_code, address, &pkey, fault);
Patches currently in stable-queue which might be from vbabka(a)suse.cz are
queue-4.9/x86-mm-fix-use-after-free-of-vma-during-userfaultfd-fault.patch
From: Eric Biggers <ebiggers(a)google.com>
Adding a specially crafted X.509 certificate whose subjectPublicKey
ASN.1 value is zero-length caused x509_extract_key_data() to set the
public key size to SIZE_MAX, as it subtracted the nonexistent BIT STRING
metadata byte. Then, x509_cert_parse() called kmemdup() with that bogus
size, triggering the WARN_ON_ONCE() in kmalloc_slab().
This appears to be harmless, but it still must be fixed since WARNs are
never supposed to be user-triggerable.
Fix it by updating x509_cert_parse() to validate that the value has a
BIT STRING metadata byte, and that the byte is 0 which indicates that
the number of bits in the bitstring is a multiple of 8.
It would be nice to handle the metadata byte in asn1_ber_decoder()
instead. But that would be tricky because in the general case a BIT
STRING could be implicitly tagged, and/or could legitimately have a
length that is not a whole number of bytes.
Here was the WARN (cleaned up slightly):
WARNING: CPU: 1 PID: 202 at mm/slab_common.c:971 kmalloc_slab+0x5d/0x70 mm/slab_common.c:971
Modules linked in:
CPU: 1 PID: 202 Comm: keyctl Tainted: G B 4.14.0-09238-g1d3b78bbc6e9 #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
task: ffff880033014180 task.stack: ffff8800305c8000
Call Trace:
__do_kmalloc mm/slab.c:3706 [inline]
__kmalloc_track_caller+0x22/0x2e0 mm/slab.c:3726
kmemdup+0x17/0x40 mm/util.c:118
kmemdup include/linux/string.h:414 [inline]
x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Cc: <stable(a)vger.kernel.org> # v3.7+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
crypto/asymmetric_keys/x509_cert_parser.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c
index dd03fead1ca3..ce2df8c9c583 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -409,6 +409,8 @@ int x509_extract_key_data(void *context, size_t hdrlen,
ctx->cert->pub->pkey_algo = "rsa";
/* Discard the BIT STRING metadata */
+ if (vlen < 1 || *(const u8 *)value != 0)
+ return -EBADMSG;
ctx->key = value + 1;
ctx->key_size = vlen - 1;
return 0;
--
2.15.0
From: Eric Biggers <ebiggers(a)google.com>
asn1_ber_decoder() was ignoring errors from actions associated with the
opcodes ASN1_OP_END_SEQ_ACT, ASN1_OP_END_SET_ACT,
ASN1_OP_END_SEQ_OF_ACT, and ASN1_OP_END_SET_OF_ACT. In practice, this
meant the pkcs7_note_signed_info() action (since that was the only user
of those opcodes). Fix it by checking for the error, just like the
decoder does for actions associated with the other opcodes.
This bug allowed users to leak slab memory by repeatedly trying to add a
specially crafted "pkcs7_test" key (requires CONFIG_PKCS7_TEST_KEY).
In theory, this bug could also be used to bypass module signature
verification, by providing a PKCS#7 message that is misparsed such that
a signature's ->authattrs do not contain its ->msgdigest. But it
doesn't seem practical in normal cases, due to restrictions on the
format of the ->authattrs.
Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Cc: <stable(a)vger.kernel.org> # v3.7+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
lib/asn1_decoder.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c
index d77cdfc4b554..dc14beae2c9a 100644
--- a/lib/asn1_decoder.c
+++ b/lib/asn1_decoder.c
@@ -439,6 +439,8 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder,
else
act = machine[pc + 1];
ret = actions[act](context, hdr, 0, data + tdp, len);
+ if (ret < 0)
+ return ret;
}
pc += asn1_op_lengths[op];
goto next_op;
--
2.15.0
From: Eric Biggers <ebiggers(a)google.com>
In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed
to the action functions before their lengths had been computed, using
the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in
reading data past the end of the input buffer, when given a specially
crafted message.
Fix it by rearranging the code so that the indefinite length is resolved
before the action is called.
This bug was originally found by fuzzing the X.509 parser in userspace
using libFuzzer from the LLVM project.
KASAN report (cleaned up slightly):
BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline]
BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
Read of size 128 at addr ffff880035dd9eaf by task keyctl/195
CPU: 1 PID: 195 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xd1/0x175 lib/dump_stack.c:53
print_address_description+0x78/0x260 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x23f/0x350 mm/kasan/report.c:409
memcpy+0x1f/0x50 mm/kasan/kasan.c:302
memcpy ./include/linux/string.h:341 [inline]
x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
asn1_ber_decoder+0xb4a/0x1fd0 lib/asn1_decoder.c:447
x509_cert_parse+0x1c7/0x620 crypto/asymmetric_keys/x509_cert_parser.c:89
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Allocated by task 195:
__do_kmalloc_node mm/slab.c:3675 [inline]
__kmalloc_node+0x47/0x60 mm/slab.c:3682
kvmalloc ./include/linux/mm.h:540 [inline]
SYSC_add_key security/keys/keyctl.c:104 [inline]
SyS_add_key+0x19e/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Reported-by: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org> # v3.7+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
lib/asn1_decoder.c | 47 ++++++++++++++++++++++++++---------------------
1 file changed, 26 insertions(+), 21 deletions(-)
diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c
index 1ef0cec38d78..d77cdfc4b554 100644
--- a/lib/asn1_decoder.c
+++ b/lib/asn1_decoder.c
@@ -313,42 +313,47 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder,
/* Decide how to handle the operation */
switch (op) {
- case ASN1_OP_MATCH_ANY_ACT:
- case ASN1_OP_MATCH_ANY_ACT_OR_SKIP:
- case ASN1_OP_COND_MATCH_ANY_ACT:
- case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP:
- ret = actions[machine[pc + 1]](context, hdr, tag, data + dp, len);
- if (ret < 0)
- return ret;
- goto skip_data;
-
- case ASN1_OP_MATCH_ACT:
- case ASN1_OP_MATCH_ACT_OR_SKIP:
- case ASN1_OP_COND_MATCH_ACT_OR_SKIP:
- ret = actions[machine[pc + 2]](context, hdr, tag, data + dp, len);
- if (ret < 0)
- return ret;
- goto skip_data;
-
case ASN1_OP_MATCH:
case ASN1_OP_MATCH_OR_SKIP:
+ case ASN1_OP_MATCH_ACT:
+ case ASN1_OP_MATCH_ACT_OR_SKIP:
case ASN1_OP_MATCH_ANY:
case ASN1_OP_MATCH_ANY_OR_SKIP:
+ case ASN1_OP_MATCH_ANY_ACT:
+ case ASN1_OP_MATCH_ANY_ACT_OR_SKIP:
case ASN1_OP_COND_MATCH_OR_SKIP:
+ case ASN1_OP_COND_MATCH_ACT_OR_SKIP:
case ASN1_OP_COND_MATCH_ANY:
case ASN1_OP_COND_MATCH_ANY_OR_SKIP:
- skip_data:
+ case ASN1_OP_COND_MATCH_ANY_ACT:
+ case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP:
+
if (!(flags & FLAG_CONS)) {
if (flags & FLAG_INDEFINITE_LENGTH) {
+ size_t tmp = dp;
+
ret = asn1_find_indefinite_length(
- data, datalen, &dp, &len, &errmsg);
+ data, datalen, &tmp, &len, &errmsg);
if (ret < 0)
goto error;
- } else {
- dp += len;
}
pr_debug("- LEAF: %zu\n", len);
}
+
+ if (op & ASN1_OP_MATCH__ACT) {
+ unsigned char act;
+
+ if (op & ASN1_OP_MATCH__ANY)
+ act = machine[pc + 1];
+ else
+ act = machine[pc + 2];
+ ret = actions[act](context, hdr, tag, data + dp, len);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (!(flags & FLAG_CONS))
+ dp += len;
pc += asn1_op_lengths[op];
goto next_op;
--
2.15.0
From: Daniel Jurgens <danielj(a)mellanox.com>
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.
This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: stable(a)vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
---
drivers/infiniband/core/security.c | 43 ++++++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 23278ed5be45..4b7fd68e1174 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec)
int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
{
+ u8 i = rdma_start_port(dev);
+ bool is_ib = false;
int ret;
+ while (i <= rdma_end_port(dev) && !is_ib)
+ is_ib = rdma_protocol_ib(dev, i++);
+
+ /* If this isn't an IB device don't create the security context */
+ if (!is_ib)
+ return 0;
+
qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL);
if (!qp->qp_sec)
return -ENOMEM;
@@ -441,6 +450,10 @@ EXPORT_SYMBOL(ib_create_qp_security);
void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
{
+ /* Return if not IB */
+ if (!sec)
+ return;
+
mutex_lock(&sec->mutex);
/* Remove the QP from the lists so it won't get added to
@@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
int ret;
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is in progress this
* QP security could be marked for an error state
* transition. Wait for this to complete.
@@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec)
{
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is occurring we must
* wait until this QP security structure is processed
* in the QP to error flow before destroying it because
@@ -565,13 +586,19 @@ int ib_security_modify_qp(struct ib_qp *qp,
bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
(qp_attr_mask & IB_QP_ALT_PATH));
+ WARN_ONCE((qp_attr_mask & IB_QP_PORT &&
+ rdma_protocol_ib(real_qp->device, qp_attr->port_num) &&
+ !real_qp->qp_sec),
+ "%s: QP security is not initialized for IB QP: %d\n",
+ __func__, real_qp->qp_num);
+
/* The port/pkey settings are maintained only for the real QP. Open
* handles on the real QP will be in the shared_qp_list. When
* enforcing security on the real QP all the shared QPs will be
* checked as well.
*/
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qp` && real_qp->qp_sec) {
mutex_lock(&real_qp->qp_sec->mutex);
new_pps = get_new_pps(real_qp,
qp_attr,
@@ -600,7 +627,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
qp_attr_mask,
udata);
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qpp && real_qp->qp_sec) {
/* Clean up the lists and free the appropriate
* ports_pkeys structure.
*/
@@ -631,6 +658,9 @@ int ib_security_pkey_access(struct ib_device *dev,
u16 pkey;
int ret;
+ if (!rdma_protocol_ib(dev, port_num))
+ return 0;
+
ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey);
if (ret)
return ret;
@@ -665,6 +695,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
{
int ret;
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return 0;
+
ret = security_ib_alloc_security(&agent->security);
if (ret)
return ret;
@@ -690,6 +723,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
{
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return;
+
security_ib_free_security(agent->security);
if (agent->lsm_nb_reg)
unregister_lsm_notifier(&agent->lsm_nb);
@@ -697,6 +733,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index)
{
+ if (!rdma_protocol_ib(map->agent.device, map->agent.port_num))
+ return 0;
+
if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
return -EACCES;
--
2.15.0
Tree/Branch: v3.2.96
Git describe: v3.2.96
Commit: 07a40fa222 Linux 3.2.96
Build Time: 0 min 3 sec
Passed: 0 / 4 ( 0.00 %)
Failed: 4 / 4 (100.00 %)
Errors: 6
Warnings: 20
Section Mismatches: 0
Failed defconfigs:
x86_64-allnoconfig
arm-allmodconfig
arm-allnoconfig
x86_64-defconfig
Errors:
x86_64-allnoconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
arm-allnoconfig
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
x86_64-defconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
2 warnings 0 mismatches : arm-allmodconfig
19 warnings 0 mismatches : arm-allnoconfig
-------------------------------------------------------------------------------
Errors summary: 6
2 /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
2 /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
1 /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
Warnings Summary: 20
2 .config:27:warning: symbol value '' invalid for PHYS_OFFSET
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
x86_64-allnoconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
arm-allmodconfig : FAIL, 0 errors, 2 warnings, 0 section mismatches
Warnings:
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
-------------------------------------------------------------------------------
arm-allnoconfig : FAIL, 4 errors, 19 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
Warnings:
/home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
-------------------------------------------------------------------------------
x86_64-defconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
Tree/Branch: v3.16.51
Git describe: v3.16.51
Commit: c45c05f42d Linux 3.16.51
Build Time: 37 min 49 sec
Passed: 9 / 9 (100.00 %)
Failed: 0 / 9 ( 0.00 %)
Errors: 0
Warnings: 7
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
1 warnings 0 mismatches : arm64-allnoconfig
2 warnings 0 mismatches : arm64-allmodconfig
1 warnings 0 mismatches : arm-multi_v5_defconfig
1 warnings 0 mismatches : arm-multi_v7_defconfig
1 warnings 0 mismatches : x86_64-defconfig
4 warnings 0 mismatches : arm-allmodconfig
1 warnings 0 mismatches : arm-allnoconfig
1 warnings 0 mismatches : x86_64-allnoconfig
3 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 7
7 ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
2 ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
2 ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
1 ../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
1 ../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
1 ../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
-------------------------------------------------------------------------------
arm-multi_v5_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm-multi_v7_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ee70bc1e7b63ac8023c9ff9475d8741e397316e7 Mon Sep 17 00:00:00 2001
From: Alexander Steffen <Alexander.Steffen(a)infineon.com>
Date: Fri, 8 Sep 2017 17:21:32 +0200
Subject: [PATCH] tpm-dev-common: Reject too short writes
tpm_transmit() does not offer an explicit interface to indicate the number
of valid bytes in the communication buffer. Instead, it relies on the
commandSize field in the TPM header that is encoded within the buffer.
Therefore, ensure that a) enough data has been written to the buffer, so
that the commandSize field is present and b) the commandSize field does not
announce more data than has been written to the buffer.
This should have been fixed with CVE-2011-1161 long ago, but apparently
a correct version of that patch never made it into the kernel.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Steffen <Alexander.Steffen(a)infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
index 610638a80383..461bf0b8a094 100644
--- a/drivers/char/tpm/tpm-dev-common.c
+++ b/drivers/char/tpm/tpm-dev-common.c
@@ -110,6 +110,12 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf,
return -EFAULT;
}
+ if (in_size < 6 ||
+ in_size < be32_to_cpu(*((__be32 *) (priv->data_buffer + 2)))) {
+ mutex_unlock(&priv->buffer_mutex);
+ return -EINVAL;
+ }
+
/* atomic tpm command send and result receive. We only hold the ops
* lock during this period so that the tpm can be unregistered even if
* the char dev is held open.
The patch below was submitted to be applied to the 4.14-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0bbc931a074a741cf8e6279e8045cf7118586780 Mon Sep 17 00:00:00 2001
From: Colin Ian King <colin.king(a)canonical.com>
Date: Fri, 25 Aug 2017 17:45:05 +0100
Subject: [PATCH] tpm_tis: make array cmd_getticks static const to shrink
object code size
Don't populate array cmd_getticks on the stack, instead make it static
const. Makes the object code smaller by over 160 bytes:
Before:
text data bss dec hex filename
18813 3152 128 22093 564d drivers/char/tpm/tpm_tis_core.o
After:
text data bss dec hex filename
18554 3248 128 21930 55aa drivers/char/tpm/tpm_tis_core.o
Cc: stable(a)vger.kernel.org
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 63bc6c3b949e..1e957e923d21 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -445,7 +445,7 @@ static int probe_itpm(struct tpm_chip *chip)
{
struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
int rc = 0;
- u8 cmd_getticks[] = {
+ static const u8 cmd_getticks[] = {
0x00, 0xc1, 0x00, 0x00, 0x00, 0x0a,
0x00, 0x00, 0x00, 0xf1
};
On Sun, Nov 26, 2017 at 11:58:43AM +0000, Chris Wilson wrote:
> Quoting Lukas Wunner (2017-11-26 11:49:19)
> > Hm, the race at hand would be solved by the intel_fbdev_sync() below,
> > or am I missing something? Still wondering why it's necessary to
> > leave the fbdev around...
>
> The race is solved, but if we do free ifbdev, we can't dereference
> ifbdev prior to the sync; and we store the async cookie inside ifbdev.
> Bleugh. Catch 22.
Right. Oh dear god! We could move the cookie into dev_priv, the
fbdev_suspend_work is also there, outside of struct intel_fbdev.
> What we might do then is just pull the struct into dev_priv under
> ifdef FBDEV.
I vaguely remember something that dev_priv deliberately only contains
a pointer to struct intel_fbdev, that it *was* embedded in dev_priv
in the past but moved out for some reason.
> > However the "if (ifbdev->vma)" looks a bit fishy, ifbdev could be NULL
> > (e.g. if BIOS fb was too small but intelfb_alloc() failed) so I think
> > this might lead to a null pointer deref. Does it make a difference
> > if we check for ifbdev versus ifbdev->vma? I also notice that you
> > added a check for ifbdev->vma with 15727ed0d944 but Daniel later
> > removed it with 88be58be886f.
>
> We know that ifbdev is non-NULL and can't become NULL until fini. So
> after the sync point, we want to ask the question of whether the config
> was successful, for that I used to use ->fb which now replaced by ->vma.
Yes if the fbdev is kept around then obviously it's fine to deref it.
Thanks,
Lukas
This is a note to let you know that I've just added the patch titled
ACPI / APEI: Remove arch_apei_flush_tlb_one()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-apei-remove-arch_apei_flush_tlb_one.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:27 +0000
Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one()
From: James Morse <james.morse(a)arm.com>
commit 4a75aeacda3c2455954596593d89187df5420d0a upstream.
Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on
__set_pte_vaddr() to do the invalidation when called from clear_fixmap()
Remove arch_apei_flush_tlb_one().
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/acpi/apei.c | 5 -----
include/acpi/apei.h | 1 -
2 files changed, 6 deletions(-)
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev,
apei_mce_report_mem_error(sev, mem_err);
#endif
}
-
-void arch_apei_flush_tlb_one(unsigned long addr)
-{
- __flush_tlb_one(addr);
-}
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -44,7 +44,6 @@ int erst_clear(u64 record_id);
int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
-void arch_apei_flush_tlb_one(unsigned long addr);
#endif
#endif
Patches currently in stable-queue which might be from james.morse(a)arm.com are
queue-3.18/acpi-apei-remove-arch_apei_flush_tlb_one.patch
From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
When we act as an AP, new firmware versions handle
internally the power saving clients and the driver doesn't
know that the peers went to sleep. It is, hence, possible
that a peer goes to sleep for a long time and stop pulling
frames. This will cause its transmit queue to hang which is
a condition that triggers the recovery flow in the driver.
While this client is certainly buggy (it should have pulled
the frame based on the TIM IE in the beacon), we can't blow
up because of a buggy client.
Change the current implementation to not enable the
transmit queue hang detection on queues that serve peers
when we act as an AP / GO.
We can still enable this mechanism using the debug
configuration which can come in handy when we want to
debug why the client doesn't wake up.
Cc: stable(a)vger.kernel.org # v4.13
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/mvm/utils.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
index d46115e2d69e..19c1d1f76e15 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
@@ -1134,9 +1134,18 @@ unsigned int iwl_mvm_get_wd_timeout(struct iwl_mvm *mvm,
unsigned int default_timeout =
cmd_q ? IWL_DEF_WD_TIMEOUT : mvm->cfg->base_params->wd_timeout;
- if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS))
+ if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS)) {
+ /*
+ * We can't know when the station is asleep or awake, so we
+ * must disable the queue hang detection.
+ */
+ if (fw_has_capa(&mvm->fw->ucode_capa,
+ IWL_UCODE_TLV_CAPA_STA_PM_NOTIF) &&
+ vif && vif->type == NL80211_IFTYPE_AP)
+ return IWL_WATCHDOG_DISABLED;
return iwlmvm_mod_params.tfd_q_hang_detect ?
default_timeout : IWL_WATCHDOG_DISABLED;
+ }
trigger = iwl_fw_dbg_get_trigger(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS);
txq_timer = (void *)trigger->data;
--
2.15.0
This is a note to let you know that I've just added the patch titled
s390/runtime instrumention: fix possible memory corruption
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-runtime-instrumention-fix-possible-memory-corruption.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Mon, 11 Sep 2017 11:24:22 +0200
Subject: s390/runtime instrumention: fix possible memory corruption
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit d6e646ad7cfa7034d280459b2b2546288f247144 upstream.
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation")
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/runtime_instr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/runtime_instr.c
+++ b/arch/s390/kernel/runtime_instr.c
@@ -47,11 +47,13 @@ void exit_thread_runtime_instr(void)
{
struct task_struct *task = current;
+ preempt_disable();
if (!task->thread.ri_cb)
return;
disable_runtime_instr();
kfree(task->thread.ri_cb);
task->thread.ri_cb = NULL;
+ preempt_enable();
}
SYSCALL_DEFINE1(s390_runtime_instr, int, command)
@@ -62,9 +64,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int,
return -EOPNOTSUPP;
if (command == S390_RUNTIME_INSTR_STOP) {
- preempt_disable();
exit_thread_runtime_instr();
- preempt_enable();
return 0;
}
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.9/s390-fix-transactional-execution-control-register-handling.patch
queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch
This is a note to let you know that I've just added the patch titled
s390: fix transactional execution control register handling
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-fix-transactional-execution-control-register-handling.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Thu, 9 Nov 2017 12:29:34 +0100
Subject: s390: fix transactional execution control register handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream.
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan(a)danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/include/asm/switch_to.h | 2 +-
arch/s390/kernel/early.c | 4 +++-
arch/s390/kernel/process.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -34,8 +34,8 @@ static inline void restore_access_regs(u
save_access_regs(&prev->thread.acrs[0]); \
save_ri_cb(prev->thread.ri_cb); \
} \
+ update_cr_regs(next); \
if (next->mm) { \
- update_cr_regs(next); \
set_cpu_flag(CIF_FPU); \
restore_access_regs(&next->thread.acrs[0]); \
restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -345,8 +345,10 @@ static __init void detect_machine_facili
S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE;
if (test_facility(40))
S390_lowcore.machine_flags |= MACHINE_FLAG_LPP;
- if (test_facility(50) && test_facility(73))
+ if (test_facility(50) && test_facility(73)) {
S390_lowcore.machine_flags |= MACHINE_FLAG_TE;
+ __ctl_set_bit(0, 55);
+ }
if (test_facility(51))
S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC;
if (test_facility(129)) {
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -120,6 +120,7 @@ int copy_thread(unsigned long clone_flag
memset(&p->thread.per_user, 0, sizeof(p->thread.per_user));
memset(&p->thread.per_event, 0, sizeof(p->thread.per_event));
clear_tsk_thread_flag(p, TIF_SINGLE_STEP);
+ p->thread.per_flags = 0;
/* Initialize per thread user and system timer values */
ti = task_thread_info(p);
ti->user_timer = 0;
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.9/s390-fix-transactional-execution-control-register-handling.patch
queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch
This is a note to let you know that I've just added the patch titled
s390/disassembler: add missing end marker for e7 table
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-disassembler-add-missing-end-marker-for-e7-table.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Tue, 26 Sep 2017 09:16:48 +0200
Subject: s390/disassembler: add missing end marker for e7 table
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit 5c50538752af7968f53924b22dede8ed4ce4cb3b upstream.
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/dis.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/dis.c
+++ b/arch/s390/kernel/dis.c
@@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = {
{ "vfsq", 0xce, INSTR_VRR_VV000MM },
{ "vfs", 0xe2, INSTR_VRR_VVV00MM },
{ "vftci", 0x4a, INSTR_VRI_VVIMM },
+ { "", 0, INSTR_INVALID }
};
static struct s390_insn opcode_eb[] = {
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.9/s390-fix-transactional-execution-control-register-handling.patch
queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch
This is a note to let you know that I've just added the patch titled
ACPI / EC: Fix regression related to triggering source of EC event handling
ACPI / EC: Fix an issue that SCI_EVT cannot be detected
ACPI / EC: Remove old CLEAR_ON_RESUME quirk
Revert "ACPI / EC: Enable event freeze mode..." to fix
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb Mon Sep 17 00:00:00 2001
From: Lv Zheng <lv.zheng(a)intel.com>
Date: Tue, 26 Sep 2017 16:54:03 +0800
Subject: ACPI / EC: Fix regression related to triggering source of EC event handling
From: Lv Zheng <lv.zheng(a)intel.com>
commit 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb upstream.
Originally the Samsung quirks removed by commit 4c237371 can be covered
by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956
changed ec_freeze_events=Y back to N, making this problem re-surface.
Actually, if commit e923e8e7 is robust enough, we can freely change
ec_freeze_events mode, so this patch fixes the issue by improving
commit e923e8e7.
Related commits listed in the merged order:
Commit: e923e8e79e18fd6be9162f1be6b99a002e9df2cb
Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected
after event is enabled
Commit: 4c237371f290d1ed3b2071dd43554362137b1cce
Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk
Commit: 9c40f956ce9b331493347d1b3cb7e384f7dc0581
Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix
a regression
This patch not only fixes the reported post-resume EC event triggering
source issue, but also fixes an unreported similar issue related to the
driver bind by adding EC event triggering source in ec_install_handlers().
Fixes: e923e8e79e18 (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled)
Fixes: 4c237371f290 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk)
Fixes: 9c40f956ce9b (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833
Signed-off-by: Lv Zheng <lv.zheng(a)intel.com>
Reported-by: Alistair Hamilton <ahpatent(a)gmail.com>
Tested-by: Alistair Hamilton <ahpatent(a)gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/acpi/ec.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -482,8 +482,11 @@ static inline void __acpi_ec_enable_even
{
if (!test_and_set_bit(EC_FLAGS_QUERY_ENABLED, &ec->flags))
ec_log_drv("event unblocked");
- if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
- advance_transaction(ec);
+ /*
+ * Unconditionally invoke this once after enabling the event
+ * handling mechanism to detect the pending events.
+ */
+ advance_transaction(ec);
}
static inline void __acpi_ec_disable_event(struct acpi_ec *ec)
@@ -1458,11 +1461,10 @@ static int ec_install_handlers(struct ac
if (test_bit(EC_FLAGS_STARTED, &ec->flags) &&
ec->reference_count >= 1)
acpi_ec_enable_gpe(ec, true);
-
- /* EC is fully operational, allow queries */
- acpi_ec_enable_event(ec);
}
}
+ /* EC is fully operational, allow queries */
+ acpi_ec_enable_event(ec);
return 0;
}
Patches currently in stable-queue which might be from lv.zheng(a)intel.com are
queue-4.9/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch
This is a note to let you know that I've just added the patch titled
s390: fix transactional execution control register handling
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-fix-transactional-execution-control-register-handling.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Thu, 9 Nov 2017 12:29:34 +0100
Subject: s390: fix transactional execution control register handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream.
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan(a)danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/include/asm/switch_to.h | 2 +-
arch/s390/kernel/early.c | 4 +++-
arch/s390/kernel/process.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -34,8 +34,8 @@ static inline void restore_access_regs(u
save_access_regs(&prev->thread.acrs[0]); \
save_ri_cb(prev->thread.ri_cb); \
} \
+ update_cr_regs(next); \
if (next->mm) { \
- update_cr_regs(next); \
set_cpu_flag(CIF_FPU); \
restore_access_regs(&next->thread.acrs[0]); \
restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -325,8 +325,10 @@ static __init void detect_machine_facili
S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE;
if (test_facility(40))
S390_lowcore.machine_flags |= MACHINE_FLAG_LPP;
- if (test_facility(50) && test_facility(73))
+ if (test_facility(50) && test_facility(73)) {
S390_lowcore.machine_flags |= MACHINE_FLAG_TE;
+ __ctl_set_bit(0, 55);
+ }
if (test_facility(51))
S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC;
if (test_facility(129)) {
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -137,6 +137,7 @@ int copy_thread(unsigned long clone_flag
memset(&p->thread.per_user, 0, sizeof(p->thread.per_user));
memset(&p->thread.per_event, 0, sizeof(p->thread.per_event));
clear_tsk_thread_flag(p, TIF_SINGLE_STEP);
+ p->thread.per_flags = 0;
/* Initialize per thread user and system timer values */
ti = task_thread_info(p);
ti->user_timer = 0;
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.4/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.4/s390-fix-transactional-execution-control-register-handling.patch
queue-4.4/s390-runtime-instrumention-fix-possible-memory-corruption.patch
This is a note to let you know that I've just added the patch titled
s390/runtime instrumention: fix possible memory corruption
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-runtime-instrumention-fix-possible-memory-corruption.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Mon, 11 Sep 2017 11:24:22 +0200
Subject: s390/runtime instrumention: fix possible memory corruption
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit d6e646ad7cfa7034d280459b2b2546288f247144 upstream.
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation")
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/runtime_instr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/runtime_instr.c
+++ b/arch/s390/kernel/runtime_instr.c
@@ -47,11 +47,13 @@ void exit_thread_runtime_instr(void)
{
struct task_struct *task = current;
+ preempt_disable();
if (!task->thread.ri_cb)
return;
disable_runtime_instr();
kfree(task->thread.ri_cb);
task->thread.ri_cb = NULL;
+ preempt_enable();
}
SYSCALL_DEFINE1(s390_runtime_instr, int, command)
@@ -62,9 +64,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int,
return -EOPNOTSUPP;
if (command == S390_RUNTIME_INSTR_STOP) {
- preempt_disable();
exit_thread_runtime_instr();
- preempt_enable();
return 0;
}
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.4/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.4/s390-fix-transactional-execution-control-register-handling.patch
queue-4.4/s390-runtime-instrumention-fix-possible-memory-corruption.patch