The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 89705e92700170888236555fe91b45e4c1bb0985 Mon Sep 17 00:00:00 2001
From: Danit Goldberg <danitg(a)mellanox.com>
Date: Fri, 5 Jul 2019 19:21:57 +0300
Subject: [PATCH] IB/mlx5: Report correctly tag matching rendezvous capability
Userspace expects the IB_TM_CAP_RC bit to indicate that the device
supports RC transport tag matching with rendezvous offload. However the
firmware splits this into two capabilities for eager and rendezvous tag
matching.
Only if the FW supports both modes should userspace be told the tag
matching capability is available.
Cc: <stable(a)vger.kernel.org> # 4.13
Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: Danit Goldberg <danitg(a)mellanox.com>
Reviewed-by: Yishai Hadas <yishaih(a)mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 7581571bd9cd..56d4b1e9dd23 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1046,15 +1046,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
if (MLX5_CAP_GEN(mdev, tag_matching)) {
- props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
props->tm_caps.max_num_tags =
(1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz)) - 1;
- props->tm_caps.flags = IB_TM_CAP_RC;
props->tm_caps.max_ops =
1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
props->tm_caps.max_sge = MLX5_TM_MAX_SGE;
}
+ if (MLX5_CAP_GEN(mdev, tag_matching) &&
+ MLX5_CAP_GEN(mdev, rndv_offload_rc)) {
+ props->tm_caps.flags = IB_TM_CAP_RNDV_RC;
+ props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
+ }
+
if (MLX5_CAP_GEN(dev->mdev, cq_moderation)) {
props->cq_caps.max_cq_moderation_count =
MLX5_MAX_CQ_COUNT;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 50806bef9f20..4053be51b7fa 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -307,8 +307,8 @@ struct ib_rss_caps {
};
enum ib_tm_cap_flags {
- /* Support tag matching on RC transport */
- IB_TM_CAP_RC = 1 << 0,
+ /* Support tag matching with rendezvous offload for RC transport */
+ IB_TM_CAP_RNDV_RC = 1 << 0,
};
struct ib_tm_caps {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 803f0f64d17769071d7287d9e3e3b79a3e1ae937 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 19 Jun 2019 13:05:39 +0100
Subject: [PATCH] Btrfs: fix fsync not persisting dentry deletions due to inode
evictions
In order to avoid searches on a log tree when unlinking an inode, we check
if the inode being unlinked was logged in the current transaction, as well
as the inode of its parent directory. When any of the inodes are logged,
we proceed to delete directory items and inode reference items from the
log, to ensure that if a subsequent fsync of only the inode being unlinked
or only of the parent directory when the other is not fsync'ed as well,
does not result in the entry still existing after a power failure.
That check however is not reliable when one of the inodes involved (the
one being unlinked or its parent directory's inode) is evicted, since the
logged_trans field is transient, that is, it is not stored on disk, so it
is lost when the inode is evicted and loaded into memory again (which is
set to zero on load). As a consequence the checks currently being done by
btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
return true if the inode was evicted before, regardless of the inode
having been logged or not before (and in the current transaction), this
results in the dentry being unlinked still existing after a log replay
if after the unlink operation only one of the inodes involved is fsync'ed.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ xfs_io -c fsync /mnt/dir/foo
# Keep an open file descriptor on our directory while we evict inodes.
# We just want to evict the file's inode, the directory's inode must not
# be evicted.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time to background process to chdir to our test
# directory.
$ sleep 0.5
# Trigger eviction of the file's inode.
$ echo 2 > /proc/sys/vm/drop_caches
# Unlink our file and fsync the parent directory. After a power failure
# we don't expect to see the file anymore, since we fsync'ed the parent
# directory.
$ rm -f $SCRATCH_MNT/dir/foo
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ ls /mnt/dir
foo
$
--> file still there, unlink not persisted despite explicit fsync on dir
Fix this by checking if the inode has the full_sync bit set in its runtime
flags as well, since that bit is set everytime an inode is loaded from
disk, or for other less common cases such as after a shrinking truncate
or failure to allocate extent maps for holes, and gets cleared after the
first fsync. Also consider the inode as possibly logged only if it was
last modified in the current transaction (besides having the full_fsync
flag set).
Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4a04659fded7..6c8297bcfeb7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3322,6 +3322,30 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
return 0;
}
+/*
+ * Check if an inode was logged in the current transaction. We can't always rely
+ * on an inode's logged_trans value, because it's an in-memory only field and
+ * therefore not persisted. This means that its value is lost if the inode gets
+ * evicted and loaded again from disk (in which case it has a value of 0, and
+ * certainly it is smaller then any possible transaction ID), when that happens
+ * the full_sync flag is set in the inode's runtime flags, so on that case we
+ * assume eviction happened and ignore the logged_trans value, assuming the
+ * worst case, that the inode was logged before in the current transaction.
+ */
+static bool inode_logged(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *inode)
+{
+ if (inode->logged_trans == trans->transid)
+ return true;
+
+ if (inode->last_trans == trans->transid &&
+ test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
+ !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
+ return true;
+
+ return false;
+}
+
/*
* If both a file and directory are logged, and unlinks or renames are
* mixed in, we have a few interesting corners:
@@ -3356,7 +3380,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
int bytes_del = 0;
u64 dir_ino = btrfs_ino(dir);
- if (dir->logged_trans < trans->transid)
+ if (!inode_logged(trans, dir))
return 0;
ret = join_running_log_trans(root);
@@ -3460,7 +3484,7 @@ int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
u64 index;
int ret;
- if (inode->logged_trans < trans->transid)
+ if (!inode_logged(trans, inode))
return 0;
ret = join_running_log_trans(root);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 803f0f64d17769071d7287d9e3e3b79a3e1ae937 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 19 Jun 2019 13:05:39 +0100
Subject: [PATCH] Btrfs: fix fsync not persisting dentry deletions due to inode
evictions
In order to avoid searches on a log tree when unlinking an inode, we check
if the inode being unlinked was logged in the current transaction, as well
as the inode of its parent directory. When any of the inodes are logged,
we proceed to delete directory items and inode reference items from the
log, to ensure that if a subsequent fsync of only the inode being unlinked
or only of the parent directory when the other is not fsync'ed as well,
does not result in the entry still existing after a power failure.
That check however is not reliable when one of the inodes involved (the
one being unlinked or its parent directory's inode) is evicted, since the
logged_trans field is transient, that is, it is not stored on disk, so it
is lost when the inode is evicted and loaded into memory again (which is
set to zero on load). As a consequence the checks currently being done by
btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
return true if the inode was evicted before, regardless of the inode
having been logged or not before (and in the current transaction), this
results in the dentry being unlinked still existing after a log replay
if after the unlink operation only one of the inodes involved is fsync'ed.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ xfs_io -c fsync /mnt/dir/foo
# Keep an open file descriptor on our directory while we evict inodes.
# We just want to evict the file's inode, the directory's inode must not
# be evicted.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time to background process to chdir to our test
# directory.
$ sleep 0.5
# Trigger eviction of the file's inode.
$ echo 2 > /proc/sys/vm/drop_caches
# Unlink our file and fsync the parent directory. After a power failure
# we don't expect to see the file anymore, since we fsync'ed the parent
# directory.
$ rm -f $SCRATCH_MNT/dir/foo
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ ls /mnt/dir
foo
$
--> file still there, unlink not persisted despite explicit fsync on dir
Fix this by checking if the inode has the full_sync bit set in its runtime
flags as well, since that bit is set everytime an inode is loaded from
disk, or for other less common cases such as after a shrinking truncate
or failure to allocate extent maps for holes, and gets cleared after the
first fsync. Also consider the inode as possibly logged only if it was
last modified in the current transaction (besides having the full_fsync
flag set).
Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4a04659fded7..6c8297bcfeb7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3322,6 +3322,30 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
return 0;
}
+/*
+ * Check if an inode was logged in the current transaction. We can't always rely
+ * on an inode's logged_trans value, because it's an in-memory only field and
+ * therefore not persisted. This means that its value is lost if the inode gets
+ * evicted and loaded again from disk (in which case it has a value of 0, and
+ * certainly it is smaller then any possible transaction ID), when that happens
+ * the full_sync flag is set in the inode's runtime flags, so on that case we
+ * assume eviction happened and ignore the logged_trans value, assuming the
+ * worst case, that the inode was logged before in the current transaction.
+ */
+static bool inode_logged(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *inode)
+{
+ if (inode->logged_trans == trans->transid)
+ return true;
+
+ if (inode->last_trans == trans->transid &&
+ test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
+ !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
+ return true;
+
+ return false;
+}
+
/*
* If both a file and directory are logged, and unlinks or renames are
* mixed in, we have a few interesting corners:
@@ -3356,7 +3380,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
int bytes_del = 0;
u64 dir_ino = btrfs_ino(dir);
- if (dir->logged_trans < trans->transid)
+ if (!inode_logged(trans, dir))
return 0;
ret = join_running_log_trans(root);
@@ -3460,7 +3484,7 @@ int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
u64 index;
int ret;
- if (inode->logged_trans < trans->transid)
+ if (!inode_logged(trans, inode))
return 0;
ret = join_running_log_trans(root);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d1d832a0b51dd9570429bb4b81b2a6c1759e681a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Jun 2019 11:25:24 +0100
Subject: [PATCH] Btrfs: fix data loss after inode eviction, renaming it, and
fsync it
When we log an inode, regardless of logging it completely or only that it
exists, we always update it as logged (logged_trans and last_log_commit
fields of the inode are updated). This is generally fine and avoids future
attempts to log it from having to do repeated work that brings no value.
However, if we write data to a file, then evict its inode after all the
dealloc was flushed (and ordered extents completed), rename the file and
fsync it, we end up not logging the new extents, since the rename may
result in logging that the inode exists in case the parent directory was
logged before. The following reproducer shows and explains how this can
happen:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ touch /mnt/dir/bar
# Do a direct IO write instead of a buffered write because with a
# buffered write we would need to make sure dealloc gets flushed and
# complete before we do the inode eviction later, and we can not do that
# from user space with call to things such as sync(2) since that results
# in a transaction commit as well.
$ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
# Keep the directory dir in use while we evict inodes. We want our file
# bar's inode to be evicted but we don't want our directory's inode to
# be evicted (if it were evicted too, we would not be able to reproduce
# the issue since the first fsync below, of file foo, would result in a
# transaction commit.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time for the background process to chdir.
$ sleep 0.1
# Evict all inodes, except the inode for the directory dir because it is
# currently in use by our background process.
$ echo 2 > /proc/sys/vm/drop_caches
# fsync file foo, which ends up persisting information about the parent
# directory because it is a new inode.
$ xfs_io -c fsync /mnt/dir/foo
# Rename bar, this results in logging that this inode exists (inode item,
# names, xattrs) because the parent directory is in the log.
$ mv /mnt/dir/bar /mnt/dir/baz
# Now fsync baz, which ends up doing absolutely nothing because of the
# rename operation which logged that the inode exists only.
$ xfs_io -c fsync /mnt/dir/baz
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/dir/baz
0000000
--> Empty file, data we wrote is missing.
Fix this by not updating last_sub_trans of an inode when we are logging
only that it exists and the inode was not yet logged since it was loaded
from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
return false for this scenario and make us log the inode. The logged_trans
of the inode is still always setsince that alone is used to track if names
need to be deleted as part of unlink operations.
Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3fc8d854d7fb..4a04659fded7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5420,9 +5420,19 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
}
}
+ /*
+ * Don't update last_log_commit if we logged that an inode exists after
+ * it was loaded to memory (full_sync bit set).
+ * This is to prevent data loss when we do a write to the inode, then
+ * the inode gets evicted after all delalloc was flushed, then we log
+ * it exists (due to a rename for example) and then fsync it. This last
+ * fsync would do nothing (not logging the extents previously written).
+ */
spin_lock(&inode->lock);
inode->logged_trans = trans->transid;
- inode->last_log_commit = inode->last_sub_trans;
+ if (inode_only != LOG_INODE_EXISTS ||
+ !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
+ inode->last_log_commit = inode->last_sub_trans;
spin_unlock(&inode->lock);
out_unlock:
mutex_unlock(&inode->log_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d1d832a0b51dd9570429bb4b81b2a6c1759e681a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Jun 2019 11:25:24 +0100
Subject: [PATCH] Btrfs: fix data loss after inode eviction, renaming it, and
fsync it
When we log an inode, regardless of logging it completely or only that it
exists, we always update it as logged (logged_trans and last_log_commit
fields of the inode are updated). This is generally fine and avoids future
attempts to log it from having to do repeated work that brings no value.
However, if we write data to a file, then evict its inode after all the
dealloc was flushed (and ordered extents completed), rename the file and
fsync it, we end up not logging the new extents, since the rename may
result in logging that the inode exists in case the parent directory was
logged before. The following reproducer shows and explains how this can
happen:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ touch /mnt/dir/bar
# Do a direct IO write instead of a buffered write because with a
# buffered write we would need to make sure dealloc gets flushed and
# complete before we do the inode eviction later, and we can not do that
# from user space with call to things such as sync(2) since that results
# in a transaction commit as well.
$ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
# Keep the directory dir in use while we evict inodes. We want our file
# bar's inode to be evicted but we don't want our directory's inode to
# be evicted (if it were evicted too, we would not be able to reproduce
# the issue since the first fsync below, of file foo, would result in a
# transaction commit.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time for the background process to chdir.
$ sleep 0.1
# Evict all inodes, except the inode for the directory dir because it is
# currently in use by our background process.
$ echo 2 > /proc/sys/vm/drop_caches
# fsync file foo, which ends up persisting information about the parent
# directory because it is a new inode.
$ xfs_io -c fsync /mnt/dir/foo
# Rename bar, this results in logging that this inode exists (inode item,
# names, xattrs) because the parent directory is in the log.
$ mv /mnt/dir/bar /mnt/dir/baz
# Now fsync baz, which ends up doing absolutely nothing because of the
# rename operation which logged that the inode exists only.
$ xfs_io -c fsync /mnt/dir/baz
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/dir/baz
0000000
--> Empty file, data we wrote is missing.
Fix this by not updating last_sub_trans of an inode when we are logging
only that it exists and the inode was not yet logged since it was loaded
from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
return false for this scenario and make us log the inode. The logged_trans
of the inode is still always setsince that alone is used to track if names
need to be deleted as part of unlink operations.
Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3fc8d854d7fb..4a04659fded7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5420,9 +5420,19 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
}
}
+ /*
+ * Don't update last_log_commit if we logged that an inode exists after
+ * it was loaded to memory (full_sync bit set).
+ * This is to prevent data loss when we do a write to the inode, then
+ * the inode gets evicted after all delalloc was flushed, then we log
+ * it exists (due to a rename for example) and then fsync it. This last
+ * fsync would do nothing (not logging the extents previously written).
+ */
spin_lock(&inode->lock);
inode->logged_trans = trans->transid;
- inode->last_log_commit = inode->last_sub_trans;
+ if (inode_only != LOG_INODE_EXISTS ||
+ !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
+ inode->last_log_commit = inode->last_sub_trans;
spin_unlock(&inode->lock);
out_unlock:
mutex_unlock(&inode->log_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64adde31c8e996a6db6f7a1a4131180e363aa9f2 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)linaro.org>
Date: Wed, 29 May 2019 11:43:52 +0200
Subject: [PATCH] PCI: qcom: Ensure that PERST is asserted for at least 100 ms
Currently, there is only a 1 ms sleep after asserting PERST.
Reading the datasheets for different endpoints, some require PERST to be
asserted for 10 ms in order for the endpoint to perform a reset, others
require it to be asserted for 50 ms.
Several SoCs using this driver uses PCIe Mini Card, where we don't know
what endpoint will be plugged in.
The PCI Express Card Electromechanical Specification r2.0, section
2.2, "PERST# Signal" specifies:
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
the power rails achieving specified operating limits."
Add a sleep of 100 ms before deasserting PERST, in order to ensure that
we are compliant with the spec.
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <niklas.cassel(a)linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5+
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index da5dd3639a49..7e581748ee9f 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
{
+ /* Ensure that PERST has been asserted for at least 100 ms */
+ msleep(100);
gpiod_set_value_cansleep(pcie->reset, 0);
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64adde31c8e996a6db6f7a1a4131180e363aa9f2 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)linaro.org>
Date: Wed, 29 May 2019 11:43:52 +0200
Subject: [PATCH] PCI: qcom: Ensure that PERST is asserted for at least 100 ms
Currently, there is only a 1 ms sleep after asserting PERST.
Reading the datasheets for different endpoints, some require PERST to be
asserted for 10 ms in order for the endpoint to perform a reset, others
require it to be asserted for 50 ms.
Several SoCs using this driver uses PCIe Mini Card, where we don't know
what endpoint will be plugged in.
The PCI Express Card Electromechanical Specification r2.0, section
2.2, "PERST# Signal" specifies:
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
the power rails achieving specified operating limits."
Add a sleep of 100 ms before deasserting PERST, in order to ensure that
we are compliant with the spec.
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <niklas.cassel(a)linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5+
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index da5dd3639a49..7e581748ee9f 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
{
+ /* Ensure that PERST has been asserted for at least 100 ms */
+ msleep(100);
gpiod_set_value_cansleep(pcie->reset, 0);
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
}
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7608bf40cf2480057ec0da31456cc428791c32ef Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)mellanox.com>
Date: Tue, 11 Jun 2019 13:09:51 -0300
Subject: [PATCH] RDMA/odp: Fix missed unlock in non-blocking invalidate_start
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be
unlocked as no invalidate_end will be called.
Cc: <stable(a)vger.kernel.org>
Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count")
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 9001cc10770a..eb9939d52818 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -149,6 +149,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
{
struct ib_ucontext_per_mm *per_mm =
container_of(mn, struct ib_ucontext_per_mm, mn);
+ int rc;
if (mmu_notifier_range_blockable(range))
down_read(&per_mm->umem_rwsem);
@@ -165,11 +166,14 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
return 0;
}
- return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
- range->end,
- invalidate_range_start_trampoline,
- mmu_notifier_range_blockable(range),
- NULL);
+ rc = rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
+ range->end,
+ invalidate_range_start_trampoline,
+ mmu_notifier_range_blockable(range),
+ NULL);
+ if (rc)
+ up_read(&per_mm->umem_rwsem);
+ return rc;
}
static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7608bf40cf2480057ec0da31456cc428791c32ef Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)mellanox.com>
Date: Tue, 11 Jun 2019 13:09:51 -0300
Subject: [PATCH] RDMA/odp: Fix missed unlock in non-blocking invalidate_start
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be
unlocked as no invalidate_end will be called.
Cc: <stable(a)vger.kernel.org>
Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count")
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 9001cc10770a..eb9939d52818 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -149,6 +149,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
{
struct ib_ucontext_per_mm *per_mm =
container_of(mn, struct ib_ucontext_per_mm, mn);
+ int rc;
if (mmu_notifier_range_blockable(range))
down_read(&per_mm->umem_rwsem);
@@ -165,11 +166,14 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
return 0;
}
- return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
- range->end,
- invalidate_range_start_trampoline,
- mmu_notifier_range_blockable(range),
- NULL);
+ rc = rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
+ range->end,
+ invalidate_range_start_trampoline,
+ mmu_notifier_range_blockable(range),
+ NULL);
+ if (rc)
+ up_read(&per_mm->umem_rwsem);
+ return rc;
}
static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bcef5b7215681250c4bf8961dfe15e9e4fef97d0 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bvanassche(a)acm.org>
Date: Wed, 29 May 2019 09:38:31 -0700
Subject: [PATCH] RDMA/srp: Accept again source addresses that do not have a
port number
The function srp_parse_in() is used both for parsing source address
specifications and for target address specifications. Target addresses
must have a port number. Having to specify a port number for source
addresses is inconvenient. Make sure that srp_parse_in() supports again
parsing addresses with no port number.
Cc: <stable(a)vger.kernel.org>
Fixes: c62adb7def71 ("IB/srp: Fix IPv6 address parsing")
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index be9ddcad8f28..87848faa7502 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3481,13 +3481,14 @@ static const match_table_t srp_opt_tokens = {
* @net: [in] Network namespace.
* @sa: [out] Address family, IP address and port number.
* @addr_port_str: [in] IP address and port number.
+ * @has_port: [out] Whether or not @addr_port_str includes a port number.
*
* Parse the following address formats:
* - IPv4: <ip_address>:<port>, e.g. 1.2.3.4:5.
* - IPv6: \[<ipv6_address>\]:<port>, e.g. [1::2:3%4]:5.
*/
static int srp_parse_in(struct net *net, struct sockaddr_storage *sa,
- const char *addr_port_str)
+ const char *addr_port_str, bool *has_port)
{
char *addr_end, *addr = kstrdup(addr_port_str, GFP_KERNEL);
char *port_str;
@@ -3496,9 +3497,12 @@ static int srp_parse_in(struct net *net, struct sockaddr_storage *sa,
if (!addr)
return -ENOMEM;
port_str = strrchr(addr, ':');
- if (!port_str)
- return -EINVAL;
- *port_str++ = '\0';
+ if (port_str && strchr(port_str, ']'))
+ port_str = NULL;
+ if (port_str)
+ *port_str++ = '\0';
+ if (has_port)
+ *has_port = port_str != NULL;
ret = inet_pton_with_scope(net, AF_INET, addr, port_str, sa);
if (ret && addr[0]) {
addr_end = addr + strlen(addr) - 1;
@@ -3520,6 +3524,7 @@ static int srp_parse_options(struct net *net, const char *buf,
char *p;
substring_t args[MAX_OPT_ARGS];
unsigned long long ull;
+ bool has_port;
int opt_mask = 0;
int token;
int ret = -EINVAL;
@@ -3618,7 +3623,8 @@ static int srp_parse_options(struct net *net, const char *buf,
ret = -ENOMEM;
goto out;
}
- ret = srp_parse_in(net, &target->rdma_cm.src.ss, p);
+ ret = srp_parse_in(net, &target->rdma_cm.src.ss, p,
+ NULL);
if (ret < 0) {
pr_warn("bad source parameter '%s'\n", p);
kfree(p);
@@ -3634,7 +3640,10 @@ static int srp_parse_options(struct net *net, const char *buf,
ret = -ENOMEM;
goto out;
}
- ret = srp_parse_in(net, &target->rdma_cm.dst.ss, p);
+ ret = srp_parse_in(net, &target->rdma_cm.dst.ss, p,
+ &has_port);
+ if (!has_port)
+ ret = -EINVAL;
if (ret < 0) {
pr_warn("bad dest parameter '%s'\n", p);
kfree(p);
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 26202928fafad8bda8b478edb7e62c885be623d7 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Mon, 1 Jul 2019 14:09:18 +0900
Subject: [PATCH] block: Limit zone array allocation size
Limit the size of the struct blk_zone array used in
blk_revalidate_disk_zones() to avoid memory allocation failures leading
to disk revalidation failure. Also further reduce the likelyhood of
such failures by using kvcalloc() (that is vmalloc()) instead of
allocating contiguous pages with alloc_pages().
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 58ced170b424..6c503824ba3f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -14,6 +14,8 @@
#include <linux/rbtree.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include "blk.h"
@@ -371,22 +373,25 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node,
* Allocate an array of struct blk_zone to get nr_zones zone information.
* The allocated array may be smaller than nr_zones.
*/
-static struct blk_zone *blk_alloc_zones(int node, unsigned int *nr_zones)
+static struct blk_zone *blk_alloc_zones(unsigned int *nr_zones)
{
- size_t size = *nr_zones * sizeof(struct blk_zone);
- struct page *page;
- int order;
-
- for (order = get_order(size); order >= 0; order--) {
- page = alloc_pages_node(node, GFP_NOIO | __GFP_ZERO, order);
- if (page) {
- *nr_zones = min_t(unsigned int, *nr_zones,
- (PAGE_SIZE << order) / sizeof(struct blk_zone));
- return page_address(page);
- }
+ struct blk_zone *zones;
+ size_t nrz = min(*nr_zones, BLK_ZONED_REPORT_MAX_ZONES);
+
+ /*
+ * GFP_KERNEL here is meaningless as the caller task context has
+ * the PF_MEMALLOC_NOIO flag set in blk_revalidate_disk_zones()
+ * with memalloc_noio_save().
+ */
+ zones = kvcalloc(nrz, sizeof(struct blk_zone), GFP_KERNEL);
+ if (!zones) {
+ *nr_zones = 0;
+ return NULL;
}
- return NULL;
+ *nr_zones = nrz;
+
+ return zones;
}
void blk_queue_free_zone_bitmaps(struct request_queue *q)
@@ -448,7 +453,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
/* Get zone information and initialize seq_zones_bitmap */
rep_nr_zones = nr_zones;
- zones = blk_alloc_zones(q->node, &rep_nr_zones);
+ zones = blk_alloc_zones(&rep_nr_zones);
if (!zones)
goto out;
@@ -487,8 +492,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
out:
memalloc_noio_restore(noio_flag);
- free_pages((unsigned long)zones,
- get_order(rep_nr_zones * sizeof(struct blk_zone)));
+ kvfree(zones);
kfree(seq_zones_wlock);
kfree(seq_zones_bitmap);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 05036e3e3458..1ef375dafb1c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -344,6 +344,11 @@ struct queue_limits {
#ifdef CONFIG_BLK_DEV_ZONED
+/*
+ * Maximum number of zones to report with a single report zones command.
+ */
+#define BLK_ZONED_REPORT_MAX_ZONES 8192U
+
extern unsigned int blkdev_nr_zones(struct block_device *bdev);
extern int blkdev_report_zones(struct block_device *bdev,
sector_t sector, struct blk_zone *zones,
commit 8e2442a5f86e1f77b86401fce274a7f622740bc4 upstream.
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
---
scripts/kconfig/confdata.c | 7 +++----
scripts/kconfig/expr.h | 1 +
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 91d0a5c014ac..fd99ae90a618 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -834,11 +834,12 @@ int conf_write(const char *name)
"#\n"
"# %s\n"
"#\n", str);
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1024,8 +1025,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_split_config())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 7c329e179007..43a87f8ea738 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
--
2.17.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd82d4bd21880b7c4d5f5756be435095d6ae07b5 Mon Sep 17 00:00:00 2001
From: Julien Thierry <julien.thierry(a)arm.com>
Date: Tue, 11 Jun 2019 10:38:10 +0100
Subject: [PATCH] arm64: Fix incorrect irqflag restore for priority masking
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
[1] https://www.spinics.net/lists/arm-kernel/msg716956.html
Fixes: 4a503217ce37 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable(a)vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui(a)huawei.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Wei Li <liwei391(a)huawei.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Christoffer Dall <christoffer.dall(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose(a)arm.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 14b41ddc68ba..9e991b628706 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -163,7 +163,9 @@ static inline bool gic_prio_masking_enabled(void)
static inline void gic_pmr_mask_irqs(void)
{
- BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF);
+ BUILD_BUG_ON(GICD_INT_DEF_PRI < (GIC_PRIO_IRQOFF |
+ GIC_PRIO_PSR_I_SET));
+ BUILD_BUG_ON(GICD_INT_DEF_PRI >= GIC_PRIO_IRQON);
gic_write_pmr(GIC_PRIO_IRQOFF);
}
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index db452aa9e651..f93204f319da 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,6 +18,7 @@
#include <linux/irqflags.h>
+#include <asm/arch_gicv3.h>
#include <asm/cpufeature.h>
#define DAIF_PROCCTX 0
@@ -32,6 +33,11 @@ static inline void local_daif_mask(void)
:
:
: "memory");
+
+ /* Don't really care for a dsb here, we don't intend to enable IRQs */
+ if (system_uses_irq_prio_masking())
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+
trace_hardirqs_off();
}
@@ -43,7 +49,7 @@ static inline unsigned long local_daif_save(void)
if (system_uses_irq_prio_masking()) {
/* If IRQs are masked with PMR, reflect it in the flags */
- if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
+ if (read_sysreg_s(SYS_ICC_PMR_EL1) != GIC_PRIO_IRQON)
flags |= PSR_I_BIT;
}
@@ -59,36 +65,44 @@ static inline void local_daif_restore(unsigned long flags)
if (!irq_disabled) {
trace_hardirqs_on();
- if (system_uses_irq_prio_masking())
- arch_local_irq_enable();
- } else if (!(flags & PSR_A_BIT)) {
- /*
- * If interrupts are disabled but we can take
- * asynchronous errors, we can take NMIs
- */
if (system_uses_irq_prio_masking()) {
- flags &= ~PSR_I_BIT;
+ gic_write_pmr(GIC_PRIO_IRQON);
+ dsb(sy);
+ }
+ } else if (system_uses_irq_prio_masking()) {
+ u64 pmr;
+
+ if (!(flags & PSR_A_BIT)) {
/*
- * There has been concern that the write to daif
- * might be reordered before this write to PMR.
- * From the ARM ARM DDI 0487D.a, section D1.7.1
- * "Accessing PSTATE fields":
- * Writes to the PSTATE fields have side-effects on
- * various aspects of the PE operation. All of these
- * side-effects are guaranteed:
- * - Not to be visible to earlier instructions in
- * the execution stream.
- * - To be visible to later instructions in the
- * execution stream
- *
- * Also, writes to PMR are self-synchronizing, so no
- * interrupts with a lower priority than PMR is signaled
- * to the PE after the write.
- *
- * So we don't need additional synchronization here.
+ * If interrupts are disabled but we can take
+ * asynchronous errors, we can take NMIs
*/
- arch_local_irq_disable();
+ flags &= ~PSR_I_BIT;
+ pmr = GIC_PRIO_IRQOFF;
+ } else {
+ pmr = GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET;
}
+
+ /*
+ * There has been concern that the write to daif
+ * might be reordered before this write to PMR.
+ * From the ARM ARM DDI 0487D.a, section D1.7.1
+ * "Accessing PSTATE fields":
+ * Writes to the PSTATE fields have side-effects on
+ * various aspects of the PE operation. All of these
+ * side-effects are guaranteed:
+ * - Not to be visible to earlier instructions in
+ * the execution stream.
+ * - To be visible to later instructions in the
+ * execution stream
+ *
+ * Also, writes to PMR are self-synchronizing, so no
+ * interrupts with a lower priority than PMR is signaled
+ * to the PE after the write.
+ *
+ * So we don't need additional synchronization here.
+ */
+ gic_write_pmr(pmr);
}
write_sysreg(flags, daif);
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index fbe1aba6ffb3..a1372722f12e 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -67,43 +67,46 @@ static inline void arch_local_irq_disable(void)
*/
static inline unsigned long arch_local_save_flags(void)
{
- unsigned long daif_bits;
unsigned long flags;
- daif_bits = read_sysreg(daif);
-
- /*
- * The asm is logically equivalent to:
- *
- * if (system_uses_irq_prio_masking())
- * flags = (daif_bits & PSR_I_BIT) ?
- * GIC_PRIO_IRQOFF :
- * read_sysreg_s(SYS_ICC_PMR_EL1);
- * else
- * flags = daif_bits;
- */
asm volatile(ALTERNATIVE(
- "mov %0, %1\n"
- "nop\n"
- "nop",
- __mrs_s("%0", SYS_ICC_PMR_EL1)
- "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
- "csel %0, %0, %2, eq",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (flags), "+r" (daif_bits)
- : "r" ((unsigned long) GIC_PRIO_IRQOFF)
- : "cc", "memory");
+ "mrs %0, daif",
+ __mrs_s("%0", SYS_ICC_PMR_EL1),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (flags)
+ :
+ : "memory");
return flags;
}
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ int res;
+
+ asm volatile(ALTERNATIVE(
+ "and %w0, %w1, #" __stringify(PSR_I_BIT),
+ "eor %w0, %w1, #" __stringify(GIC_PRIO_IRQON),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (res)
+ : "r" ((int) flags)
+ : "memory");
+
+ return res;
+}
+
static inline unsigned long arch_local_irq_save(void)
{
unsigned long flags;
flags = arch_local_save_flags();
- arch_local_irq_disable();
+ /*
+ * There are too many states with IRQs disabled, just keep the current
+ * state if interrupts are already disabled/masked.
+ */
+ if (!arch_irqs_disabled_flags(flags))
+ arch_local_irq_disable();
return flags;
}
@@ -124,21 +127,5 @@ static inline void arch_local_irq_restore(unsigned long flags)
: "memory");
}
-static inline int arch_irqs_disabled_flags(unsigned long flags)
-{
- int res;
-
- asm volatile(ALTERNATIVE(
- "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
- "nop",
- "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
- "cset %w0, ls",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (res)
- : "r" ((int) flags)
- : "cc", "memory");
-
- return res;
-}
#endif
#endif
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4bcd9c1291d5..33410635b015 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -608,11 +608,12 @@ static inline void kvm_arm_vhe_guest_enter(void)
* will not signal the CPU of interrupts of lower priority, and the
* only way to get out will be via guest exceptions.
* Naturally, we want to avoid this.
+ *
+ * local_daif_mask() already sets GIC_PRIO_PSR_I_SET, we just need a
+ * dsb to ensure the redistributor is forwards EL2 IRQs to the CPU.
*/
- if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ if (system_uses_irq_prio_masking())
dsb(sy);
- }
}
static inline void kvm_arm_vhe_guest_exit(void)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index b2de32939ada..da2242248466 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -35,9 +35,15 @@
* means masking more IRQs (or at least that the same IRQs remain masked).
*
* To mask interrupts, we clear the most significant bit of PMR.
+ *
+ * Some code sections either automatically switch back to PSR.I or explicitly
+ * require to not use priority masking. If bit GIC_PRIO_PSR_I_SET is included
+ * in the the priority mask, it indicates that PSR.I should be set and
+ * interrupt disabling temporarily does not rely on IRQ priorities.
*/
-#define GIC_PRIO_IRQON 0xf0
-#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_IRQON 0xc0
+#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_PSR_I_SET (1 << 4)
/* Additional SPSR bits not exposed in the UABI */
#define PSR_IL_BIT (1 << 20)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d5966346710..165da78815c5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -258,6 +258,7 @@ alternative_else_nop_endif
/*
* Registers that may be useful after this macro is invoked:
*
+ * x20 - ICC_PMR_EL1
* x21 - aborted SP
* x22 - aborted PC
* x23 - aborted PSTATE
@@ -449,6 +450,24 @@ alternative_endif
.endm
#endif
+ .macro gic_prio_kentry_setup, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ mov \tmp, #(GIC_PRIO_PSR_I_SET | GIC_PRIO_IRQON)
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
+ .macro gic_prio_irq_setup, pmr:req, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ orr \tmp, \pmr, #GIC_PRIO_PSR_I_SET
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
.text
/*
@@ -627,6 +646,7 @@ el1_dbg:
cmp x24, #ESR_ELx_EC_BRK64 // if BRK64
cinc x24, x24, eq // set bit '0'
tbz x24, #0, el1_inv // EL1 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x2, sp // struct pt_regs
bl do_debug_exception
@@ -644,12 +664,10 @@ ENDPROC(el1_sync)
.align 6
el1_irq:
kernel_entry 1
+ gic_prio_irq_setup pmr=x20, tmp=x1
enable_da_f
#ifdef CONFIG_ARM64_PSEUDO_NMI
-alternative_if ARM64_HAS_IRQ_PRIO_MASKING
- ldr x20, [sp, #S_PMR_SAVE]
-alternative_else_nop_endif
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
bl asm_nmi_enter
@@ -679,8 +697,9 @@ alternative_else_nop_endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
/*
- * if IRQs were disabled when we received the interrupt, we have an NMI
- * and we are not re-enabling interrupt upon eret. Skip tracing.
+ * When using IRQ priority masking, we can get spurious interrupts while
+ * PMR is set to GIC_PRIO_IRQOFF. An NMI might also have occurred in a
+ * section with interrupts disabled. Skip tracing in those cases.
*/
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
@@ -809,6 +828,7 @@ el0_ia:
* Instruction abort handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -854,6 +874,7 @@ el0_sp_pc:
* Stack or PC alignment exception handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -888,6 +909,7 @@ el0_dbg:
* Debug exception handling
*/
tbnz x24, #0, el0_inv // EL0 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x1, x25
mov x2, sp
@@ -909,7 +931,9 @@ ENDPROC(el0_sync)
el0_irq:
kernel_entry 0
el0_irq_naked:
+ gic_prio_irq_setup pmr=x20, tmp=x0
enable_da_f
+
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
@@ -931,6 +955,7 @@ ENDPROC(el0_irq)
el1_error:
kernel_entry 1
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -941,6 +966,7 @@ el0_error:
kernel_entry 0
el0_error_naked:
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -965,6 +991,7 @@ work_pending:
*/
ret_to_user:
disable_daif
+ gic_prio_kentry_setup tmp=x3
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
@@ -981,6 +1008,7 @@ ENDPROC(ret_to_user)
*/
.align 6
el0_svc:
+ gic_prio_kentry_setup tmp=x1
mov x0, sp
bl el0_svc_handler
b ret_to_user
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 3767fb21a5b8..58efc3727778 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -94,7 +94,7 @@ static void __cpu_do_idle_irqprio(void)
* be raised.
*/
pmr = gic_read_pmr();
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
__cpu_do_idle();
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index bb4b3f07761a..4deaee3c2a33 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -192,11 +192,13 @@ static void init_gic_priority_masking(void)
WARN_ON(!(cpuflags & PSR_I_BIT));
- gic_write_pmr(GIC_PRIO_IRQOFF);
-
/* We can only unmask PSR.I if we can take aborts */
- if (!(cpuflags & PSR_A_BIT))
+ if (!(cpuflags & PSR_A_BIT)) {
+ gic_write_pmr(GIC_PRIO_IRQOFF);
write_sysreg(cpuflags & ~PSR_I_BIT, daif);
+ } else {
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+ }
}
/*
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8799e0c267d4..b89fcf0173b7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -615,7 +615,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
* Naturally, we want to avoid this.
*/
if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
dsb(sy);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 19 Jul 2019 18:41:10 +0200
Subject: [PATCH] KVM: nVMX: do not use dangling shadow VMCS after guest reset
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.
This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.
Reported-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Liran Alon <liran.alon(a)oracle.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4f23e34f628b..0f1378789bd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
@@ -1341,6 +1342,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
unsigned long val;
int i;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
preempt_disable();
vmcs_load(shadow_vmcs);
@@ -1373,6 +1377,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
unsigned long val;
int i, q;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -4436,7 +4443,6 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
- vmx->nested.need_vmcs12_to_shadow_sync = false;
vmx_disable_shadow_vmcs(vmx);
}
vmx->nested.posted_intr_nv = -1;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3b013a2972d5bc344d6eaa8f24fdfe268211e45f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:28 -0700
Subject: [PATCH] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from
vmcs01
If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
when MPX is supported. Because the value effectively comes from vmcs01,
vmcs02 must be updated even if vmcs12 is clean.
Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable(a)vger.kernel.org
Cc: Liran Alon <liran.alon(a)oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e82bbca2fe1..c4c0a45245b2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2228,13 +2228,9 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
set_cr4_guest_host_mask(vmx);
- if (kvm_mpx_supported()) {
- if (vmx->nested.nested_run_pending &&
- (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
- vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
- else
- vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
- }
+ if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
/*
@@ -2266,6 +2262,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
}
+ if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
+ vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
vmx_set_rflags(vcpu, vmcs12->guest_rflags);
/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d9771f5ec46c282d518b453c793635dbdc3a2a94 Mon Sep 17 00:00:00 2001
From: Xiao Ni <xni(a)redhat.com>
Date: Fri, 14 Jun 2019 15:41:05 -0700
Subject: [PATCH] raid5-cache: Need to do start() part job after adding journal
device
commit d5d885fd514f ("md: introduce new personality funciton start()")
splits the init job to two parts. The first part run() does the jobs that
do not require the md threads. The second part start() does the jobs that
require the md threads.
Now it just does run() in adding new journal device. It needs to do the
second part start() too.
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Cc: stable(a)vger.kernel.org #v4.9+
Reported-by: Michal Soltys <soltys(a)ziu.info>
Signed-off-by: Xiao Ni <xni(a)redhat.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..da94cbaa1a9e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7672,7 +7672,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- int err = -EEXIST;
+ int ret, err = -EEXIST;
int disk;
struct disk_info *p;
int first = 0;
@@ -7687,7 +7687,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
* The array is in readonly mode if journal is missing, so no
* write requests running. We should be safe
*/
- log_init(conf, rdev, false);
+ ret = log_init(conf, rdev, false);
+ if (ret)
+ return ret;
+
+ ret = r5l_start(conf->log);
+ if (ret)
+ return ret;
+
return 0;
}
if (mddev->recovery_disabled == conf->recovery_disabled)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: d03360aaf5cc pNFS: Ensure we return the error if someone kills a waiting layoutget.
The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59.
v5.2.1: Build OK!
v5.1.18: Build OK!
v4.19.59: Failed to apply! Possible dependencies:
400417b05f3e ("pNFS: Fix a typo in pnfs_update_layout")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
This reverts commit 240c35a3783ab9b3a0afaba0dde7291295680a6b
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.
Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/include/asm/kvm_host.h | 7 ++++---
arch/x86/kvm/x86.c | 4 ++--
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cc5b611a113..b2f1ffb937af 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -607,15 +607,16 @@ struct kvm_vcpu_arch {
/*
* QEMU userspace and the guest each have their own FPU state.
- * In vcpu_run, we switch between the user, maintained in the
- * task_struct struct, and guest FPU contexts. While running a VCPU,
- * the VCPU thread will have the guest FPU context.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
*
* Note that while the PKRU state lives inside the fpu registers,
* it is switched out separately at VMENTER and VMEXIT time. The
* "guest_fpu" state here contains the guest FPU context, with the
* host PRKU bits.
*/
+ struct fpu user_fpu;
struct fpu *guest_fpu;
u64 xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58305cf81182..cf2afdf8facf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8270,7 +8270,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
fpregs_lock();
- copy_fpregs_to_fpstate(¤t->thread.fpu);
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
~XFEATURE_MASK_PKRU);
@@ -8287,7 +8287,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
fpregs_lock();
copy_fpregs_to_fpstate(vcpu->arch.guest_fpu);
- copy_kernel_to_fpregs(¤t->thread.fpu.state);
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
fpregs_mark_activate();
fpregs_unlock();
--
1.8.3.1
The patch titled
Subject: mm/migrate.c: initialize pud_entry in migrate_vma()
has been added to the -mm tree. Its filename is
mm-migrate-initialize-pud_entry-in-migrate_vma.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-initialize-pud_entry-in…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-initialize-pud_entry-in…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/migrate.c: initialize pud_entry in migrate_vma()
When CONFIG_MIGRATE_VMA_HELPER is enabled, migrate_vma() calls
migrate_vma_collect() which initializes a struct mm_walk but didn't
initialize mm_walk.pud_entry. (Found by code inspection) Use a C
structure initialization to make sure it is set to NULL.
Link: http://lkml.kernel.org/r/20190719233225.12243-1-rcampbell@nvidia.com
Fixes: 8763cb45ab967 ("mm/migrate: new memory migration helper for use with
device memory")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
--- a/mm/migrate.c~mm-migrate-initialize-pud_entry-in-migrate_vma
+++ a/mm/migrate.c
@@ -2340,16 +2340,13 @@ next:
static void migrate_vma_collect(struct migrate_vma *migrate)
{
struct mmu_notifier_range range;
- struct mm_walk mm_walk;
-
- mm_walk.pmd_entry = migrate_vma_collect_pmd;
- mm_walk.pte_entry = NULL;
- mm_walk.pte_hole = migrate_vma_collect_hole;
- mm_walk.hugetlb_entry = NULL;
- mm_walk.test_walk = NULL;
- mm_walk.vma = migrate->vma;
- mm_walk.mm = migrate->vma->vm_mm;
- mm_walk.private = migrate;
+ struct mm_walk mm_walk = {
+ .pmd_entry = migrate_vma_collect_pmd,
+ .pte_hole = migrate_vma_collect_hole,
+ .vma = migrate->vma,
+ .mm = migrate->vma->vm_mm,
+ .private = migrate,
+ };
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm_walk.mm,
migrate->start,
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
mm-migrate-initialize-pud_entry-in-migrate_vma.patch
Currently handling of MADV_WILLNEED hint calls directly into readahead
code. Handle it by calling vfs_fadvise() instead so that filesystem can
use its ->fadvise() callback to acquire necessary locks or otherwise
prepare for the request.
Suggested-by: Amir Goldstein <amir73il(a)gmail.com>
CC: stable(a)vger.kernel.org # Needed by "xfs: Fix stale data exposure
when readahead races with hole punch"
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
mm/madvise.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 628022e674a7..ae56d0ef337d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -14,6 +14,7 @@
#include <linux/userfaultfd_k.h>
#include <linux/hugetlb.h>
#include <linux/falloc.h>
+#include <linux/fadvise.h>
#include <linux/sched.h>
#include <linux/ksm.h>
#include <linux/fs.h>
@@ -275,6 +276,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
struct file *file = vma->vm_file;
+ loff_t offset;
*prev = vma;
#ifdef CONFIG_SWAP
@@ -298,12 +300,20 @@ static long madvise_willneed(struct vm_area_struct *vma,
return 0;
}
- start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
- if (end > vma->vm_end)
- end = vma->vm_end;
- end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
-
- force_page_cache_readahead(file->f_mapping, file, start, end - start);
+ /*
+ * Filesystem's fadvise may need to take various locks. We need to
+ * explicitly grab a reference because the vma (and hence the
+ * vma's reference to the file) can go away as soon as we drop
+ * mmap_sem.
+ */
+ *prev = NULL; /* tell sys_madvise we drop mmap_sem */
+ get_file(file);
+ up_read(¤t->mm->mmap_sem);
+ offset = (loff_t)(start - vma->vm_start)
+ + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
+ fput(file);
+ down_read(¤t->mm->mmap_sem);
return 0;
}
--
2.16.4
The patch titled
Subject: mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
has been added to the -mm tree. Its filename is
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-clear-total_migratef…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-clear-total_migratef…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and
COMPACTFREE_SCANNED in compact_zone(). We should clear them before
scanning a new zone. In the proc triggered compaction, we forgot clearing
them.
Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.…
Fixes: 7f354a548d1c ("mm, compaction: add vmstats for kcompactd work")
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yafang Shao <shaoyafang(a)didiglobal.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/compaction.c~mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone
+++ a/mm/compaction.c
@@ -2408,8 +2408,6 @@ static void compact_node(int nid)
struct zone *zone;
struct compact_control cc = {
.order = -1,
- .total_migrate_scanned = 0,
- .total_free_scanned = 0,
.mode = MIGRATE_SYNC,
.ignore_skip_hint = true,
.whole_zone = true,
@@ -2425,6 +2423,8 @@ static void compact_node(int nid)
cc.nr_freepages = 0;
cc.nr_migratepages = 0;
+ cc.total_migrate_scanned = 0;
+ cc.total_free_scanned = 0;
cc.zone = zone;
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
mm-vmscan-expose-cgroup_ino-for-memcg-reclaim-tracepoints.patch
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone.patch
The patch titled
Subject: ubsan: build ubsan.c more conservatively
has been added to the -mm tree. Its filename is
ubsan-build-ubsanc-more-conservatively.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ubsan-build-ubsanc-more-conservati…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ubsan-build-ubsanc-more-conservati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending on
the combination with other configuration options and compiler variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file as
we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is mixed
with the gcc specific -fno-conserve-stack option. According to Andrey
Ryabinin, that option is not even needed, dropping it here fixes the
stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/Makefile~ubsan-build-ubsanc-more-conservatively
+++ a/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kasan-remove-clang-version-check-for-kasan_stack.patch
ubsan-build-ubsanc-more-conservatively.patch
mm-sparse-fix-memory-leak-of-sparsemap_buf-in-aliged-memory-fix.patch
This patch series includes the following:
1. Adding compiler options to not use XMM registers in the purgatory code.
2. Reuse the implementation of memcpy and memset instead of relying on
__builtin_memcpy and __builtin_memset as it causes infinite recursion
in clang.
Nick Desaulniers (1):
x86/purgatory: do not use __builtin_memcpy and __builtin_memset.
Vaibhav Rustagi (1):
x86/purgatory: add -mno-sse, -mno-mmx, -mno-sse2 to Makefile
arch/x86/purgatory/Makefile | 4 ++++
arch/x86/purgatory/purgatory.c | 6 ++++++
arch/x86/purgatory/string.c | 23 -----------------------
3 files changed, 10 insertions(+), 23 deletions(-)
delete mode 100644 arch/x86/purgatory/string.c
--
2.22.0.510.g264f2c817a-goog
The patch titled
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
has been removed from the -mm tree. Its filename was
libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be
zero" fields of a 'pfn' info-block to be filled with indeterminate data.
While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly initialized
to be guaranteed zero. Bump the minor version to indicate it is safe to
assume the 'padding' and 'flags' are zero. Otherwise, this corruption is
expected to benign since all other critical fields are explicitly
initialized.
Note The cc: stable is about spreading this new policy to as many kernels
as possible not fixing an issue in those kernels. It is not until the
change titled "libnvdimm/pfn: Stop padding pmem namespaces to section
alignment" where this improper initialization becomes a problem. So if
someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces
to section alignment" (which is not tagged for stable), make sure this
pre-requisite is flagged.
Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwil…
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> [ppc64]
Cc: <stable(a)vger.kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/nvdimm/dax_devs.c | 2 +-
drivers/nvdimm/pfn.h | 1 +
drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/nvdimm/dax_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/dax_devs.c
@@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
--- a/drivers/nvdimm/pfn_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn_devs.c
@@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(st
return 0;
}
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
@@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -693,7 +702,7 @@ static int nd_pfn_init(struct nd_pfn *nd
u64 checksum;
int rc;
- pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
@@ -702,11 +711,14 @@ static int nd_pfn_init(struct nd_pfn *nd
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
/* no info block, do init */;
+ memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -759,7 +771,7 @@ static int nd_pfn_init(struct nd_pfn *nd
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
- pfn_sb->version_minor = cpu_to_le16(2);
+ pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
--- a/drivers/nvdimm/pfn.h~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn.h
@@ -28,6 +28,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
};
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: resource: fix locking in find_next_iomem_res()
has been removed from the -mm tree. Its filename was
resource-fix-locking-in-find_next_iomem_res.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Nadav Amit <namit(a)vmware.com>
Subject: resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource is
not removed while accessing it. However, find_next_iomem_res() does not
hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/kernel/resource.c~resource-fix-locking-in-find_next_iomem_res
+++ a/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_
break;
}
- read_unlock(&resource_lock);
- if (!p)
- return -1;
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ read_unlock(&resource_lock);
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
_
Patches currently in -mm which might be from namit(a)vmware.com are
Please apply commit a1078e821b605813 ("xen: let
alloc_xenballooned_pages() fail if not enough memory free")
to the stable kernel tree.
It is mitigating a security bug related to Xen (XSA-300).
Juergen
From: Brian Foster <bfoster(a)redhat.com>
commit 6958d11f77d45db80f7e22a21a74d4d5f44dc667 upstream.
We've had rather rare reports of bmap btree block corruption where
the bmap root block has a level count of zero. The root cause of the
corruption is so far unknown. We do have verifier checks to detect
this form of on-disk corruption, but this doesn't cover a memory
corruption variant of the problem. The latter is a reasonable
possibility because the root block is part of the inode fork and can
reside in-core for some time before inode extents are read.
If this occurs, it leads to a system crash such as the following:
BUG: unable to handle kernel paging request at ffffffff00000221
PF error: [normal kernel read fault]
...
RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
...
Call Trace:
xfs_iread_extents+0x379/0x540 [xfs]
xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
? xfs_attr_get+0xd1/0x120 [xfs]
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
? __vfs_getxattr+0x53/0x70
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_apply+0x63/0x130
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_file_buffered_write+0x62/0x90
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
__vfs_write+0x150/0x1b0
vfs_write+0xba/0x1c0
ksys_pwrite64+0x64/0xa0
do_syscall_64+0x5a/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The crash occurs because xfs_iread_extents() attempts to release an
uninitialized buffer pointer as the level == 0 value prevented the
buffer from ever being allocated or read. Change the level > 0
assert to an explicit error check in xfs_iread_extents() to avoid
crashing the kernel in the event of localized, in-core inode
corruption.
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
[mcgrof: fixes kz#204223 ]
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
fs/xfs/libxfs/xfs_bmap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3a496ffe6551..ab2465bc413a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1178,7 +1178,10 @@ xfs_iread_extents(
* Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out.
*/
level = be16_to_cpu(block->bb_level);
- ASSERT(level > 0);
+ if (unlikely(level == 0)) {
+ XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+ return -EFSCORRUPTED;
+ }
pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
bno = be64_to_cpu(*pp);
--
2.18.0
When a pin is active-low, logical trigger edge should be inverted to match
the same interrupt opportunity.
For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
IRQ requesting doesn't need to do any modification except to configuring
GPIOHANDLE_REQUEST_ACTIVE_LOW.
For example, we want to catch the event when the button is pushed. The
button on the original board drives level to be low when it is pushed, and
drives level to be high when it is released.
In user space we can do:
req.handleflags = GPIOHANDLE_REQUEST_INPUT;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
while (1) {
read(fd, &dat, sizeof(dat));
if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
printf("button pushed\n");
}
Run the same logic on another board which the polarity of the button is
inverted; it drives level to be high when pushed, and level to be low when
released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
req.handleflags = GPIOHANDLE_REQUEST_INPUT |
GPIOHANDLE_REQUEST_ACTIVE_LOW;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
At the result, there are no any events caught when the button is pushed.
By the way, button releasing will emit a "falling" event. The timing of
"falling" catching is not expected.
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Wu <michael.wu(a)vatics.com>
---
Changes from v1:
- Correct undeclared 'IRQ_TRIGGER_RISING'
- Add an example to descibe the issue
---
drivers/gpio/gpiolib.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index e013d417a936..9c9597f929d7 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -956,9 +956,11 @@ static int lineevent_create(struct gpio_device *gdev, void __user *ip)
}
if (eflags & GPIOEVENT_REQUEST_RISING_EDGE)
- irqflags |= IRQF_TRIGGER_RISING;
+ irqflags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ?
+ IRQF_TRIGGER_FALLING : IRQF_TRIGGER_RISING;
if (eflags & GPIOEVENT_REQUEST_FALLING_EDGE)
- irqflags |= IRQF_TRIGGER_FALLING;
+ irqflags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ?
+ IRQF_TRIGGER_RISING : IRQF_TRIGGER_FALLING;
irqflags |= IRQF_ONESHOT;
INIT_KFIFO(le->events);
--
2.17.1
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option. According
to Andrey Ryabinin, that option is not even needed, dropping it here
fixes the stackprotector issue.
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
v2:
- drop -fno-conserve-stack
- fix the Fixes: line
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Makefile b/lib/Makefile
index 095601ce371d..29c02a924973 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
--
2.20.0
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option, so we need to
test them separately.
Fixes: 42440c1f9911 ("lib/ubsan: add type mismatch handler for new GCC/Clang")
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Makefile b/lib/Makefile
index 095601ce371d..320e3b632dd3 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack) $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
--
2.20.0
Shakeel Butt reported premature oom on kernel with
"cgroup_disable=memory" since mem_cgroup_is_root() returns false even
though memcg is actually NULL. The drop_caches is also broken.
It is because commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab()
calls in shrink_node()") removed the !memcg check before
!mem_cgroup_is_root(). And, surprisingly root memcg is allocated even
though memory cgroup is disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Fixes: aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: stable(a)vger.kernel.org 4.19+
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f8e3dcd..c10dc02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -684,7 +684,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
--
1.8.3.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index bea6f887adad..98d04466dcde 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1
From: Wanpeng Li <wanpengli(a)tencent.com>
The idea before commit 240c35a37 was that we have the following FPU states:
userspace (QEMU) guest
---------------------------------------------------------------------------
processor vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
vcpu->arch.user_fpu processor
>>> preempt out
vcpu->arch.user_fpu current->thread.fpu
>>> preempt in
vcpu->arch.user_fpu processor
>>> back to userspace
>>> kvm_put_guest_fpu
processor vcpu->arch.guest_fpu
---------------------------------------------------------------------------
With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.
Reported-by: Thomas Lambertz <mail(a)thomaslambertz.de>
Reported-by: anthony <antdev66(a)gmail.com>
Tested-by: anthony <antdev66(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Thomas Lambertz <mail(a)thomaslambertz.de>
Cc: anthony <antdev66(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
arch/x86/kvm/x86.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf2afdf..bdcd250 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3306,6 +3306,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_ops->vcpu_load(vcpu, cpu);
+ fpregs_assert_state_consistent();
+ if (test_thread_flag(TIF_NEED_FPU_LOAD))
+ switch_fpu_return();
+
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@@ -7990,9 +7994,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
guest_enter_irqoff();
- fpregs_assert_state_consistent();
- if (test_thread_flag(TIF_NEED_FPU_LOAD))
- switch_fpu_return();
+ WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD));
if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7);
--
2.7.4
From: "Gautham R. Shenoy" <ego(a)linux.vnet.ibm.com>
xive_find_target_in_mask() has the following for(;;) loop which has a
bug when @first == cpumask_first(@mask) and condition 1 fails to hold
for every CPU in @mask. In this case we loop forever in the for-loop.
first = cpu;
for (;;) {
if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
return cpu;
cpu = cpumask_next(cpu, mask);
if (cpu == first) // condition 2
break;
if (cpu >= nr_cpu_ids) // condition 3
cpu = cpumask_first(mask);
}
This is because, when @first == cpumask_first(@mask), we never hit the
condition 2 (cpu == first) since prior to this check, we would have
executed "cpu = cpumask_next(cpu, mask)" which will set the value of
@cpu to a value greater than @first or to nr_cpus_ids. When this is
coupled with the fact that condition 1 is not met, we will never exit
this loop.
This was discovered by the hard-lockup detector while running LTP test
concurrently with SMT switch tests.
watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
watchdog: CPU 68 Hard LOCKUP
watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le)
MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000
CFAR: c0000000006f558c IRQMASK: 1
GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
NIP [c0000000006f5578] find_next_bit+0x38/0x90
LR [c000000000cba9ec] cpumask_next+0x2c/0x50
Call Trace:
[c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
[c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
[c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
[c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
[c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
[c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
[c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
[c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
[c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
[c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
[c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
[c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
[c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
[c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
[c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
[c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
[c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
Instruction dump:
78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
To fix this, move the check for condition 2 after the check for
condition 3, so that we are able to break out of the loop soon after
iterating through all the CPUs in the @mask in the problem case. Use
do..while() to achieve this.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE
interrupt controller")
Cc: <stable(a)vger.kernel.org> # 4.12+
Reported-by: Indira P. Joga <indira.priya(a)in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego(a)linux.vnet.ibm.com>
---
arch/powerpc/sysdev/xive/common.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 082c7e1..1cdb395 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -479,7 +479,7 @@ static int xive_find_target_in_mask(const struct cpumask *mask,
* Now go through the entire mask until we find a valid
* target.
*/
- for (;;) {
+ do {
/*
* We re-check online as the fallback case passes us
* an untested affinity mask
@@ -487,12 +487,11 @@ static int xive_find_target_in_mask(const struct cpumask *mask,
if (cpu_online(cpu) && xive_try_pick_target(cpu))
return cpu;
cpu = cpumask_next(cpu, mask);
- if (cpu == first)
- break;
/* Wrap around */
if (cpu >= nr_cpu_ids)
cpu = cpumask_first(mask);
- }
+ } while (cpu != first);
+
return -1;
}
--
1.9.4
From: Yingying Tang <yintang(a)codeaurora.org>
[ Upstream commit 9e7251fa38978b85108c44743e1436d48e8d0d76 ]
tx_stats will be freed and set to NULL before debugfs_sta node is
removed in station disconnetion process. So if read the debugfs_sta
node there may be NULL pointer error. Add check for tx_stats before
use it to resove this issue.
Signed-off-by: Yingying Tang <yintang(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/debugfs_sta.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
index c704ae371c4d..42931a669b02 100644
--- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c
+++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
@@ -663,6 +663,13 @@ static ssize_t ath10k_dbg_sta_dump_tx_stats(struct file *file,
mutex_lock(&ar->conf_mutex);
+ if (!arsta->tx_stats) {
+ ath10k_warn(ar, "failed to get tx stats");
+ mutex_unlock(&ar->conf_mutex);
+ kfree(buf);
+ return 0;
+ }
+
spin_lock_bh(&ar->data_lock);
for (k = 0; k < ATH10K_STATS_TYPE_MAX; k++) {
for (j = 0; j < ATH10K_COUNTER_TYPE_MAX; j++) {
--
2.20.1
Hmm. I just realized when I saw Sasha's autoselect patches flying by
that the floppy ioctl fixes didn't get marked for stable, but they
probably should be.
There's four commits:
da99466ac243 floppy: fix out-of-bounds read in copy_buffer
9b04609b7840 floppy: fix invalid pointer dereference in drive_name
5635f897ed83 floppy: fix out-of-bounds read in next_valid_format
f3554aeb9912 floppy: fix div-by-zero in setup_format_params
that look like stable material - even if I sincerely hope that the
floppy driver isn't critical for anybody.
I leave it to the stable people to decide if they care. I don't think
the hardware matters any more, but I could imagine that people still
use it for some virtual images and have a floppy device inside a VM
for that reason.
Linus
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen
but advertise a landscape resolution and pitch, resulting in a messed
up display if we try to show anything on the efifb (because of the wrong
pitch).
This commit fixes this by adding a new DMI match table for devices which
need to have their width and height swapped.
At first I tried to use the existing table for overriding some of the
efifb parameters, but some of the affected devices have variants with
different LCD resolutions which will not work with hardcoded override
values.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1730783
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
arch/x86/kernel/sysfb_efi.c | 45 +++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/arch/x86/kernel/sysfb_efi.c b/arch/x86/kernel/sysfb_efi.c
index 8eb67a670b10..80d5b6720a87 100644
--- a/arch/x86/kernel/sysfb_efi.c
+++ b/arch/x86/kernel/sysfb_efi.c
@@ -230,9 +230,54 @@ static const struct dmi_system_id efifb_dmi_system_table[] __initconst = {
{},
};
+/*
+ * Some devices have a portrait LCD but advertise a landscape resolution (and
+ * pitch). We simply swap width and height for these devices so that we can
+ * correctly deal with some of them coming with multiple resolutions.
+ */
+static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = {
+ {
+ /*
+ * Lenovo MIIX310-10ICR, only some batches have the troublesome
+ * 800x1280 portrait screen. Luckily the portrait version has
+ * its own BIOS version, so we match on that.
+ */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "MIIX 310-10ICR"),
+ DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1HCN44WW"),
+ },
+ },
+ {
+ /* Lenovo MIIX 320-10ICR with 800x1280 portrait screen */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION,
+ "Lenovo MIIX 320-10ICR"),
+ },
+ },
+ {
+ /* Lenovo D330 with 800x1280 or 1200x1920 portrait screen */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION,
+ "Lenovo ideapad D330-10IGM"),
+ },
+ },
+ {},
+};
+
__init void sysfb_apply_efi_quirks(void)
{
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI ||
!(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS))
dmi_check_system(efifb_dmi_system_table);
+
+ if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI &&
+ dmi_check_system(efifb_dmi_swap_width_height)) {
+ u16 temp = screen_info.lfb_width;
+ screen_info.lfb_width = screen_info.lfb_height;
+ screen_info.lfb_height = temp;
+ screen_info.lfb_linelength = 4 * screen_info.lfb_width;
+ }
}
--
2.21.0
In Resize BAR control register, bits[8:12] represents size of BAR.
As per PCIe specification, below is encoded values in register bits
to actual BAR size table:
Bits BAR size
0 1 MB
1 2 MB
2 4 MB
3 8 MB
--
For 1 MB BAR size, BAR size bits should be set to 0 but incorrectly
these bits are set to "1f".
Latest megaraid_sas and mpt3sas adapters which support Resizable BAR
with 1 MB BAR size fails to initialize during system resume from S3 sleep.
Fix: Correctly set BAR size bits to "0" for 1MB BAR size.
CC: stable(a)vger.kernel.org # v4.16+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203939
Fixes: d3252ace0bc652a1a244455556b6a549f969bf99 ("PCI: Restore resized BAR state on resume")
Signed-off-by: Sumit Saxena <sumit.saxena(a)broadcom.com>
---
drivers/pci/pci.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843..b651f32 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1417,12 +1417,13 @@ static void pci_restore_rebar_state(struct pci_dev *pdev)
for (i = 0; i < nbars; i++, pos += 8) {
struct resource *res;
- int bar_idx, size;
+ int bar_idx, size, order;
pci_read_config_dword(pdev, pos + PCI_REBAR_CTRL, &ctrl);
bar_idx = ctrl & PCI_REBAR_CTRL_BAR_IDX;
res = pdev->resource + bar_idx;
- size = order_base_2((resource_size(res) >> 20) | 1) - 1;
+ order = order_base_2((resource_size(res) >> 20) | 1);
+ size = order ? order - 1 : 0;
ctrl &= ~PCI_REBAR_CTRL_BAR_SIZE;
ctrl |= size << PCI_REBAR_CTRL_BAR_SHIFT;
pci_write_config_dword(pdev, pos + PCI_REBAR_CTRL, ctrl);
--
1.8.3.1
From: Luca Coelho <luciano.coelho(a)intel.com>
Firmware versions before 41 don't support the GEO_TX_POWER_LIMIT
command, and sending it to the firmware will cause a firmware crash.
We allow this via debugfs, so we need to return an error value in case
it's not supported.
This had already been fixed during init, when we send the command if
the ACPI WGDS table is present. Fix it also for the other,
userspace-triggered case.
Cc: stable(a)vger.kernel.org
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 22 ++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 1d608e9e9101..a837cf40afde 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -880,6 +880,17 @@ int iwl_mvm_sar_select_profile(struct iwl_mvm *mvm, int prof_a, int prof_b)
return iwl_mvm_send_cmd_pdu(mvm, REDUCE_TX_POWER_CMD, 0, len, &cmd);
}
+static bool iwl_mvm_sar_geo_support(struct iwl_mvm *mvm)
+{
+ /*
+ * The GEO_TX_POWER_LIMIT command is not supported on earlier
+ * firmware versions. Unfortunately, we don't have a TLV API
+ * flag to rely on, so rely on the major version which is in
+ * the first byte of ucode_ver.
+ */
+ return IWL_UCODE_SERIAL(mvm->fw->ucode_ver) >= 41;
+}
+
int iwl_mvm_get_sar_geo_profile(struct iwl_mvm *mvm)
{
struct iwl_geo_tx_power_profiles_resp *resp;
@@ -909,6 +920,9 @@ int iwl_mvm_get_sar_geo_profile(struct iwl_mvm *mvm)
.data = { data },
};
+ if (!iwl_mvm_sar_geo_support(mvm))
+ return -EOPNOTSUPP;
+
ret = iwl_mvm_send_cmd(mvm, &cmd);
if (ret) {
IWL_ERR(mvm, "Failed to get geographic profile info %d\n", ret);
@@ -934,13 +948,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
int ret, i, j;
u16 cmd_wide_id = WIDE_ID(PHY_OPS_GROUP, GEO_TX_POWER_LIMIT);
- /*
- * This command is not supported on earlier firmware versions.
- * Unfortunately, we don't have a TLV API flag to rely on, so
- * rely on the major version which is in the first byte of
- * ucode_ver.
- */
- if (IWL_UCODE_SERIAL(mvm->fw->ucode_ver) < 41)
+ if (!iwl_mvm_sar_geo_support(mvm))
return 0;
ret = iwl_mvm_sar_get_wgds_table(mvm);
--
2.20.1
This is the start of the stable review cycle for the 4.19.60 release.
There are 47 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.60-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.60-rc1
Jiri Slaby <jslaby(a)suse.cz>
x86/entry/32: Fix ENDPROC of common_spurious
Dave Airlie <airlied(a)redhat.com>
drm/udl: move to embedding drm device inside udl device.
Thomas Zimmermann <tzimmermann(a)suse.de>
drm/udl: Replace drm_dev_unref with drm_dev_put
Dave Airlie <airlied(a)redhat.com>
drm/udl: introduce a macro to convert dev to udl.
Mark Zhang <markz(a)nvidia.com>
regmap-irq: do not write mask register if mask_base is zero
Haren Myneni <haren(a)linux.vnet.ibm.com>
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - fix hash on SEC1.
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - move struct talitos_edesc into talitos.h
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Seperate unused system vectors from spurious entry again
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Handle spurious interrupt after shutdown gracefully
Thomas Gleixner <tglx(a)linutronix.de>
x86/ioapic: Implement irq_get_irqchip_state() callback
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Add optional hardware synchronization for shutdown
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Fix misleading synchronize_irq() documentation
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Delay deactivation in free_irq()
Vinod Koul <vkoul(a)kernel.org>
linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Update cur_mask in mask/mask ops
Eiichi Tsukata <devel(a)etsukata.com>
cpu/hotplug: Fix out-of-bounds read when setting fail state
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Ignore interrupts that are wake only during resume
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
HID: multitouch: Add pointstick support for ALPS Touchpad
Oleksandr Natalenko <oleksandr(a)redhat.com>
HID: chicony: add another quirk for PixArt mouse
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Fix crash if kernel image crosses page table boundary
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Jerome Marchand <jmarchan(a)redhat.com>
dm table: don't copy from a NULL pointer in realloc_argv()
Phil Reid <preid(a)electromag.com.au>
pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: imx6ul: fix PWM[1-4] interrupts
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Colin Ian King <colin.king(a)canonical.com>
x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
David Howells <dhowells(a)redhat.com>
afs: Fix uninitialised spinlock afs_volume::cb_break_lock
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: gemini Fix up DNS-313 compatible string
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Hans de Goede <hdegoede(a)redhat.com>
efi/bgrt: Drop BGRT status field reserved bits check
Tony Lindgren <tony(a)atomide.com>
clk: ti: clkctrl: Fix returning uninitialized data
Heyi Guo <guoheyi(a)huawei.com>
irqchip/gic-v3-its: Fix command queue pointer comparison bug
Sven Van Asbroeck <thesven73(a)gmail.com>
firmware: improve LSM/IMA security behaviour
James Morse <james.morse(a)arm.com>
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Cole Rogers <colerogers(a)disroot.org>
Input: synaptics - enable SMBUS on T480 thinkpad trackpad
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 ++-
arch/arm/boot/dts/gemini-dlink-dns-313.dts | 2 +-
arch/arm/boot/dts/imx6ul.dtsi | 8 +--
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/s390/include/asm/facility.h | 21 ++++--
arch/x86/entry/entry_32.S | 24 +++++++
arch/x86/entry/entry_64.S | 30 ++++++--
arch/x86/include/asm/hw_irq.h | 5 +-
arch/x86/kernel/apic/apic.c | 36 ++++++----
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++
arch/x86/kernel/apic/vector.c | 4 +-
arch/x86/kernel/head64.c | 20 +++---
arch/x86/kernel/idt.c | 3 +-
arch/x86/kernel/irq.c | 2 +-
drivers/base/cacheinfo.c | 3 +-
drivers/base/firmware_loader/fallback.c | 2 +-
drivers/base/regmap/regmap-irq.c | 6 ++
drivers/clk/ti/clkctrl.c | 7 +-
drivers/crypto/nx/nx-842-powernv.c | 8 ++-
drivers/crypto/talitos.c | 99 +++++++++++---------------
drivers/crypto/talitos.h | 30 ++++++++
drivers/firmware/efi/efi-bgrt.c | 5 --
drivers/gpu/drm/udl/udl_drv.c | 56 ++++++++++++---
drivers/gpu/drm/udl/udl_drv.h | 9 +--
drivers/gpu/drm/udl/udl_fb.c | 12 ++--
drivers/gpu/drm/udl/udl_gem.c | 2 +-
drivers/gpu/drm/udl/udl_main.c | 35 +++------
drivers/hid/hid-ids.h | 2 +
drivers/hid/hid-multitouch.c | 4 ++
drivers/hid/hid-quirks.c | 1 +
drivers/input/mouse/synaptics.c | 1 +
drivers/irqchip/irq-gic-v3-its.c | 35 ++++++---
drivers/md/dm-table.c | 2 +-
drivers/md/dm-verity-target.c | 4 +-
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 ++++++--
drivers/net/ethernet/intel/e1000e/netdev.c | 21 +++---
drivers/net/ethernet/sis/sis900.c | 16 ++---
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/pinctrl/mediatek/mtk-eint.c | 34 +++++----
drivers/pinctrl/pinctrl-mcp23s08.c | 8 +--
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
fs/afs/callback.c | 4 +-
fs/afs/internal.h | 2 +-
fs/afs/volume.c | 1 +
include/linux/cpuhotplug.h | 1 +
include/linux/kernel.h | 3 +-
include/uapi/linux/nilfs2_ondisk.h | 24 +++----
kernel/cpu.c | 3 +
kernel/events/core.c | 2 +-
kernel/irq/autoprobe.c | 6 +-
kernel/irq/chip.c | 6 ++
kernel/irq/cpuhotplug.c | 2 +-
kernel/irq/internals.h | 5 ++
kernel/irq/manage.c | 88 +++++++++++++++++------
56 files changed, 534 insertions(+), 267 deletions(-)
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.
This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.
Reported-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Liran Alon <liran.alon(a)oracle.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/nested.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4f23e34f628b..0f1378789bd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
@@ -1341,6 +1342,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
unsigned long val;
int i;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
preempt_disable();
vmcs_load(shadow_vmcs);
@@ -1373,6 +1377,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
unsigned long val;
int i, q;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -4436,7 +4443,6 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
- vmx->nested.need_vmcs12_to_shadow_sync = false;
vmx_disable_shadow_vmcs(vmx);
}
vmx->nested.posted_intr_nv = -1;
--
1.8.3.1
The livelock can be triggerred in the following pattern,
while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE),
indices)) {
...
for (i = 0; i < pagevec_count(&pvec); i++) {
index = indices[i];
...
}
index++; /* BUG */
}
multi order exceptional entry is not specially considered in
invalidate_inode_pages2_range() and it ended up with a livelock because
both index 0 and index 1 finds the same pmd, but this pmd is binded to
index 0, so index is set to 0 again.
This introduces a helper to take the pmd entry's length into account when
deciding the next index.
Note that there're other users of the above pattern which doesn't need to
fix,
- dax_layout_busy_page
It's been fixed in commit d7782145e1ad
("filesystem-dax: Fix dax_layout_busy_page() livelock")
- truncate_inode_pages_range
This won't loop forever since the exceptional entries are immediately
removed from radix tree after the search.
Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
Cc: <stable(a)vger.kernel.org> since 4.9 to 4.19
Signed-off-by: Liu Bo <bo.liu(a)linux.alibaba.com>
---
The problem is gone after commit f280bf092d48 ("page cache: Convert
find_get_entries to XArray"), but since xarray seems too new to backport
to 4.19, I made this fix based on radix tree implementation.
fs/dax.c | 19 +++++++++++++++++++
include/linux/dax.h | 8 ++++++++
mm/truncate.c | 26 ++++++++++++++++++++++++--
3 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index ac334bc..cd05337 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -764,6 +764,25 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
return __dax_invalidate_mapping_entry(mapping, index, false);
}
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+ void *entry)
+{
+ struct radix_tree_root *pages = &mapping->i_pages;
+ pgoff_t nr_pages = 1;
+
+ if (!dax_mapping(mapping))
+ return nr_pages;
+
+ xa_lock_irq(pages);
+ entry = get_unlocked_mapping_entry(mapping, index, NULL);
+ if (entry)
+ nr_pages = 1UL << dax_radix_order(entry);
+ put_unlocked_mapping_entry(mapping, index, entry);
+ xa_unlock_irq(pages);
+
+ return nr_pages;
+}
+
static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev,
sector_t sector, size_t size, struct page *to,
unsigned long vaddr)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a846184..f3c95c6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,6 +91,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
struct page *dax_layout_busy_page(struct address_space *mapping);
bool dax_lock_mapping_entry(struct page *page);
void dax_unlock_mapping_entry(struct page *page);
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+ void *entry);
#else
static inline bool bdev_dax_supported(struct block_device *bdev,
int blocksize)
@@ -134,6 +136,12 @@ static inline bool dax_lock_mapping_entry(struct page *page)
static inline void dax_unlock_mapping_entry(struct page *page)
{
}
+
+static inline pgoff_t dax_get_multi_order(struct address_space *mapping,
+ pgoff_t index, void *entry)
+{
+ return 1;
+}
#endif
int dax_read_lock(void);
diff --git a/mm/truncate.c b/mm/truncate.c
index 71b65aa..835911f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -557,6 +557,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) {
+ pgoff_t nr_pages = 1;
+
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -568,6 +570,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
if (radix_tree_exceptional_entry(page)) {
invalidate_exceptional_entry(mapping, index,
page);
+ /*
+ * Account for multi-order entries at
+ * the end of the pagevec.
+ */
+ if (i < pagevec_count(&pvec) - 1)
+ continue;
+
+ nr_pages = dax_get_multi_order(mapping, index,
+ page);
continue;
}
@@ -607,7 +618,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec);
cond_resched();
- index++;
+ index += nr_pages;
}
return count;
}
@@ -688,6 +699,8 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) {
+ pgoff_t nr_pages = 1;
+
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -700,6 +713,15 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
if (!invalidate_exceptional_entry2(mapping,
index, page))
ret = -EBUSY;
+ /*
+ * Account for multi-order entries at
+ * the end of the pagevec.
+ */
+ if (i < pagevec_count(&pvec) - 1)
+ continue;
+
+ nr_pages = dax_get_multi_order(mapping, index,
+ page);
continue;
}
@@ -739,7 +761,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec);
cond_resched();
- index++;
+ index += nr_pages;
}
/*
* For DAX we invalidate page tables after invalidating radix tree. We
--
1.8.3.1
The patch titled
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
has been added to the -mm tree. Its filename is
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-fix-bad-subpage-pointer-in-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-fix-bad-subpage-pointer-in-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
has been added to the -mm tree. Its filename is
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-fix-zone_device-anon-page-m…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-fix-zone_device-anon-page-m…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table. For
example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Link: http://lkml.kernel.org/r/20190719192955.30462-3-rcampbell@nvidia.com
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/kernel/memremap.c~mm-hmm-fix-zone_device-anon-page-mapping-reuse
+++ a/kernel/memremap.c
@@ -397,6 +397,30 @@ void __put_devmap_managed_page(struct pa
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm: document zone device struct page field usage
has been added to the -mm tree. Its filename is
mm-document-zone-device-struct-page-field-usage.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-document-zone-device-struct-pag…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-document-zone-device-struct-pag…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm: document zone device struct page field usage
Struct page for ZONE_DEVICE private pages uses the page->mapping and and
page->index fields while the source anonymous pages are migrated to device
private memory. This is so rmap_walk() can find the page when migrating
the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem
backed fsdax pages also use the page->mapping and page->index fields when
files are mapped into a process address space.
Restructure struct page and add comments to make this more clear.
Link: http://lkml.kernel.org/r/20190719192955.30462-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 42 +++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 13 deletions(-)
--- a/include/linux/mm_types.h~mm-document-zone-device-struct-page-field-usage
+++ a/include/linux/mm_types.h
@@ -76,13 +76,35 @@ struct page {
* avoid collision and false-positive PageTail().
*/
union {
- struct { /* Page cache and anonymous pages */
- /**
- * @lru: Pageout list, eg. active_list protected by
- * pgdat->lru_lock. Sometimes used as a generic list
- * by the page owner.
- */
- struct list_head lru;
+ struct { /* Page cache, anonymous, ZONE_DEVICE pages */
+ union {
+ /**
+ * @lru: Pageout list, e.g., active_list
+ * protected by pgdat->lru_lock. Sometimes
+ * used as a generic list by the page owner.
+ */
+ struct list_head lru;
+ /**
+ * ZONE_DEVICE pages are never on the lru
+ * list so they reuse the list space.
+ * ZONE_DEVICE private pages are counted as
+ * being mapped so the @mapping and @index
+ * fields are used while the page is migrated
+ * to device private memory.
+ * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
+ * use the @mapping and @index fields when pmem
+ * backed DAX files are mapped.
+ */
+ struct {
+ /**
+ * @pgmap: Points to the hosting
+ * device page map.
+ */
+ struct dev_pagemap *pgmap;
+ /** @zone_device_data: opaque data. */
+ void *zone_device_data;
+ };
+ };
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
@@ -155,12 +177,6 @@ struct page {
spinlock_t ptl;
#endif
};
- struct { /* ZONE_DEVICE pages */
- /** @pgmap: Points to the hosting device page map. */
- struct dev_pagemap *pgmap;
- void *zone_device_data;
- unsigned long _zd_pad_1; /* uses mapping */
- };
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
has been removed from the -mm tree. Its filename was
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Initialize subpage correctly before calling page_remove_rmap().
Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..ec1af8b60423 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
--
2.20.1
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..ec1af8b60423 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
--
2.20.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index bea6f887adad..238ae5d0ae8a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -408,6 +408,10 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /* Clear anonymous page mapping to prevent stale pointers */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1
This is the start of the stable review cycle for the 5.2.2 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.2-rc1.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.2.2-rc1
Jiri Slaby <jslaby(a)suse.cz>
x86/entry/32: Fix ENDPROC of common_spurious
Haren Myneni <haren(a)linux.vnet.ibm.com>
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - fix hash on SEC1.
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - move struct talitos_edesc into talitos.h
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Philipp Rudo <prudo(a)linux.ibm.com>
s390/ipl: Fix detection of has_secure attribute
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Seperate unused system vectors from spurious entry again
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Handle spurious interrupt after shutdown gracefully
Thomas Gleixner <tglx(a)linutronix.de>
x86/ioapic: Implement irq_get_irqchip_state() callback
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Add optional hardware synchronization for shutdown
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Fix misleading synchronize_irq() documentation
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Delay deactivation in free_irq()
Sven Van Asbroeck <thesven73(a)gmail.com>
firmware: improve LSM/IMA security behaviour
James Morse <james.morse(a)arm.com>
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Cole Rogers <colerogers(a)disroot.org>
Input: synaptics - enable SMBUS on T480 thinkpad trackpad
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 ++-
arch/s390/include/asm/facility.h | 21 ++++---
arch/s390/include/asm/sclp.h | 1 -
arch/s390/kernel/ipl.c | 7 +--
arch/x86/entry/entry_32.S | 24 ++++++++
arch/x86/entry/entry_64.S | 30 +++++++--
arch/x86/include/asm/hw_irq.h | 5 +-
arch/x86/kernel/apic/apic.c | 33 ++++++----
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++++
arch/x86/kernel/apic/vector.c | 4 +-
arch/x86/kernel/idt.c | 3 +-
arch/x86/kernel/irq.c | 2 +-
drivers/base/cacheinfo.c | 3 +-
drivers/base/firmware_loader/fallback.c | 2 +-
drivers/crypto/nx/nx-842-powernv.c | 8 ++-
drivers/crypto/talitos.c | 99 +++++++++++++-----------------
drivers/crypto/talitos.h | 30 +++++++++
drivers/input/mouse/synaptics.c | 1 +
drivers/net/ethernet/intel/e1000e/netdev.c | 21 ++++---
drivers/s390/char/sclp_early.c | 1 -
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
include/linux/cpuhotplug.h | 1 +
include/uapi/linux/nilfs2_ondisk.h | 24 ++++----
kernel/irq/autoprobe.c | 6 +-
kernel/irq/chip.c | 6 ++
kernel/irq/cpuhotplug.c | 2 +-
kernel/irq/internals.h | 5 ++
kernel/irq/manage.c | 90 ++++++++++++++++++++-------
30 files changed, 342 insertions(+), 153 deletions(-)
Hi,
When one request is dispatched to LLD via dm-rq, if the result is
BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate
private stuff for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue.
And SCSI is the only driver which allocates private stuff in .queue_rq()
path.
Another use case of this callback is to free the request and re-submit
bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hu…
Ming Lei (2):
blk-mq: add callback of .cleanup_rq
scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 +
drivers/scsi/scsi_lib.c | 15 +++++++++++++++
include/linux/blk-mq.h | 13 +++++++++++++
3 files changed, 29 insertions(+)
Cc: Ewan D. Milne <emilne(a)redhat.com>
Cc: Bart Van Assche <bvanassche(a)acm.org>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Mike Snitzer <snitzer(a)redhat.com>
Cc: dm-devel(a)redhat.com
Cc: <stable(a)vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
--
2.20.1
VAG power control is improved to fit the manual [1]. This patchset fixes as
minimum one bug: if customer muxes Headphone to Line-In right after boot,
the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot:
- Connect sound source to Line-In jack;
- Connect headphone to HP jack;
- Run following commands:
$ amixer set 'Headphone' 80%
$ amixer set 'Headphone Mux' LINE_IN
Also this series includes fixes of non-important bugs in sgtl5000 codec
driver.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Changes in v6:
- Code optimization
Changes in v5:
- Add explicit stable tag
- Improve commit message
Changes in v4:
- CC the patch to kernel-stable
- Code optimization, simplify function signature
(thanks to Cezary Rojewski <cezary.rojewski(a)intel.com> for an idea)
- Add a Fixes tag
Changes in v3:
- Add the reference to NXP SGTL5000 data sheet to commit message
- Fix multi-line comment format
Changes in v2:
- Fix patch formatting
Oleksandr Suvorov (6):
ASoC: Define a set of DAPM pre/post-up events
ASoC: sgtl5000: Improve VAG power and mute control
ASoC: sgtl5000: Fix definition of VAG Ramp Control
ASoC: sgtl5000: add ADC mute control
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: sgtl5000: Fix charge pump source assignment
include/sound/soc-dapm.h | 2 +
sound/soc/codecs/sgtl5000.c | 250 ++++++++++++++++++++++++++++++------
sound/soc/codecs/sgtl5000.h | 2 +-
3 files changed, 213 insertions(+), 41 deletions(-)
--
2.20.1
On 7/19/2019 3:45 AM, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 67c2315def06 crypto: caam - add Queue Interface (QI) backend support.
>
> The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59, v4.14.133.
>
> v5.2.1: Build OK!
> v5.1.18: Build OK!
> v4.19.59: Failed to apply! Possible dependencies:
> 94cebd9da42c ("crypto: caam - add Queue Interface v2 error codes")
>
> v4.14.133: Failed to apply! Possible dependencies:
> 94cebd9da42c ("crypto: caam - add Queue Interface v2 error codes")
>
Indeed, the dependency is correct. Thanks!
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
In the next version we'll remove the # v4.12+ requirement and
we'll separately send backports once patch will be merged upstream.
Thanks,
Horia
In the function alps_is_cs19_trackpoint(), we check if the param[1] is
in the 0x20~0x2f range, but the code we wrote for this checking is not
correct:
(param[1] & 0x20) does not mean param[1] is in the range of 0x20~0x2f,
it also means the param[1] is in the range of 0x30~0x3f, 0x60~0x6f...
Now fix it with a new condition checking ((param[1] & 0xf0) == 0x20).
Fixes: 7e4935ccc323 ("Input: alps - don't handle ALPS cs19 trackpoint-only device")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
drivers/input/mouse/alps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
index 62ffea00902a..34700eda0429 100644
--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -2876,7 +2876,7 @@ static bool alps_is_cs19_trackpoint(struct psmouse *psmouse)
* trackpoint-only devices have their variant_ids equal
* TP_VARIANT_ALPS and their firmware_ids are in 0x20~0x2f range.
*/
- return param[0] == TP_VARIANT_ALPS && (param[1] & 0x20);
+ return param[0] == TP_VARIANT_ALPS && ((param[1] & 0xf0) == 0x20);
}
static int alps_identify(struct psmouse *psmouse, struct alps_data *priv)
--
2.17.1
From: Fei Yang <fei.yang(a)intel.com>
If scatter-gather operation is allowed, a large USB request would be split
into multiple TRBs. These TRBs are chained up by setting DWC3_TRB_CTRL_CHN
bit except the last one which has DWC3_TRB_CTRL_IOC bit set instead.
Since only the last TRB has IOC set, dwc3_gadget_ep_reclaim_completed_trb()
would be called only once for the whole USB request, thus all the TRBs need
to be reclaimed within this single call. However that is not what the current
code does.
This patch addresses the issue by checking each TRB in function
dwc3_gadget_ep_reclaim_trb_sg() and reclaiming the chained ones right there.
Only the last TRB gets passed to dwc3_gadget_ep_reclaim_completed_trb(). This
would guarantee all TRBs are reclaimed and trb_dequeue/num_trbs are updated
properly.
Signed-off-by: Fei Yang <fei.yang(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
---
V2: Better solution is to reclaim chained TRBs in dwc3_gadget_ep_reclaim_trb_sg()
and leave the last TRB to the dwc3_gadget_ep_reclaim_completed_trb().
---
drivers/usb/dwc3/gadget.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 173f532..c0662c2 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2404,7 +2404,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
struct dwc3_request *req, const struct dwc3_event_depevt *event,
int status)
{
- struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];
+ struct dwc3_trb *trb;
struct scatterlist *sg = req->sg;
struct scatterlist *s;
unsigned int pending = req->num_pending_sgs;
@@ -2419,7 +2419,15 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
req->sg = sg_next(s);
req->num_pending_sgs--;
+ if (!(trb->ctrl & DWC3_TRB_CTRL_IOC)) {
+ /* reclaim the TRB without calling
+ * dwc3_gadget_ep_reclaim_completed_trb */
+ dwc3_ep_inc_deq(dep);
+ req->num_trbs--;
+ continue;
+ }
+ /* Only the last TRB in the sg list would reach here */
ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
trb, event, status, true);
if (ret)
--
2.7.4
VAG power control is improved to fit the manual [1]. This patchset fixes as
minimum one bug: if customer muxes Headphone to Line-In right after boot,
the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot:
- Connect sound source to Line-In jack;
- Connect headphone to HP jack;
- Run following commands:
$ amixer set 'Headphone' 80%
$ amixer set 'Headphone Mux' LINE_IN
Also this series includes fixes of non-important bugs in sgtl5000 codec
driver.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Changes in v5:
- Add explicit stable tag
- Improve commit message
- Add explicit stable tag
Changes in v4:
- CC the patch to kernel-stable
- Code optimization, simplify function signature
(thanks to Cezary Rojewski <cezary.rojewski(a)intel.com> for an idea)
- CC the patch to kernel-stable
- Add a Fixes tag
Changes in v3:
- Add the reference to NXP SGTL5000 data sheet to commit message
- Add the reference to NXP SGTL5000 data sheet to commit message
- Fix multi-line comment format
Changes in v2:
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
Oleksandr Suvorov (6):
ASoC: Define a set of DAPM pre/post-up events
ASoC: sgtl5000: Improve VAG power and mute control
ASoC: sgtl5000: Fix definition of VAG Ramp Control
ASoC: sgtl5000: add ADC mute control
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: sgtl5000: Fix charge pump source assignment
include/sound/soc-dapm.h | 2 +
sound/soc/codecs/sgtl5000.c | 240 ++++++++++++++++++++++++++++++------
sound/soc/codecs/sgtl5000.h | 2 +-
3 files changed, 203 insertions(+), 41 deletions(-)
--
2.20.1
On Fri, Jul 19, 2019 at 12:45:23AM +0000, Sasha Levin wrote:
> v5.1.18: Failed to apply! Possible dependencies:
> Unable to calculate
>
> v4.19.59: Failed to apply! Possible dependencies:
<snip>
> How should we proceed with this patch?
I'll provide backported version of the patch for 4.19 and 5.1
after it will be merged to Linus' tree.
Stanislaw
This is the start of the stable review cycle for the 4.14.134 release.
There are 80 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.134-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.134-rc1
Dave Airlie <airlied(a)redhat.com>
drm/udl: move to embedding drm device inside udl device.
Dave Airlie <airlied(a)redhat.com>
drm/udl: introduce a macro to convert dev to udl.
Haren Myneni <haren(a)linux.vnet.ibm.com>
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Vinod Koul <vkoul(a)kernel.org>
linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
Eiichi Tsukata <devel(a)etsukata.com>
cpu/hotplug: Fix out-of-bounds read when setting fail state
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Fix crash if kernel image crosses page table boundary
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: imx6ul: fix PWM[1-4] interrupts
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Hans de Goede <hdegoede(a)redhat.com>
efi/bgrt: Drop BGRT status field reserved bits check
Tony Lindgren <tony(a)atomide.com>
clk: ti: clkctrl: Fix returning uninitialized data
Sean Young <sean(a)mess.org>
MIPS: Remove superfluous check for __linux__
Vishnu DASA <vdasa(a)vmware.com>
VMCI: Fix integer overflow in VMCI handle arrays
Christian Lamparter <chunkeey(a)gmail.com>
carl9170: fix misuse of device driver API
Todd Kjos <tkjos(a)android.com>
binder: fix memory leak in error path
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: amplc_pci230: fix null pointer deref on interrupt
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt282x: fix a null pointer deref on interrupt
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: add a workaround for a race condition of workqueue
Kiruthika Varadarajan <Kiruthika.Varadarajan(a)harman.com>
usb: gadget: ether: Fix race between gether_disconnect and rx_submit
Alan Stern <stern(a)rowland.harvard.edu>
p54usb: Fix race between disconnect and firmware loading
Oliver Barta <o.barta89(a)gmail.com>
Revert "serial: 8250: Don't service RX FIFO if interrupts are disabled"
Jörgen Storvist <jorgen.storvist(a)gmail.com>
USB: serial: option: add support for GosunCn ME3630 RNDIS mode
Andreas Fritiofson <andreas.fritiofson(a)unjo.com>
USB: serial: ftdi_sio: add ID for isodebug v1
Brian Norris <briannorris(a)chromium.org>
mwifiex: Don't abort on small, spec-compliant vendor IEs
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Abort at too short BSS descriptor element
Tim Chen <tim.c.chen(a)linux.intel.com>
Documentation: Add section about CPU vulnerabilities for Spectre
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
Douglas Anderson <dianders(a)chromium.org>
block, bfq: NULL out the bic when it's no longer valid
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Headphone Mic can't record after S3
Steven J. Magnani <steve.magnani(a)digidescorp.com>
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
Hongjie Fang <hongjiefang(a)asrmicro.com>
fscrypt: don't set policy for a dead directory
Lin Yi <teroincn(a)163.com>
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
yangerkun <yangerkun(a)huawei.com>
quota: fix a problem about transfer quota
Colin Ian King <colin.king(a)canonical.com>
net: lio_core: fix potential sign-extension overflow on large shift
Xin Long <lucien.xin(a)gmail.com>
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
Dan Carpenter <dan.carpenter(a)oracle.com>
drm: return -EFAULT if copy_to_user() fails
Mauro S. M. Rodrigues <maurosr(a)linux.vnet.ibm.com>
bnx2x: Check if transceiver implements DDM before access
Mariusz Tkaczyk <mariusz.tkaczyk(a)intel.com>
md: fix for divide error in status_resync
Reinhard Speyerer <rspmn(a)arcor.de>
qmi_wwan: extend permitted QMAP mux_id value range
Reinhard Speyerer <rspmn(a)arcor.de>
qmi_wwan: avoid RCU stalls on device disconnect when in QMAP mode
Reinhard Speyerer <rspmn(a)arcor.de>
qmi_wwan: add support for QMAP padding in the RX path
Yibo Zhao <yiboz(a)codeaurora.org>
mac80211: only warn once on chanctx_conf being NULL
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da850-evm: call regulator_has_full_constraints()
Ido Schimmel <idosch(a)mellanox.com>
mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
Anson Huang <anson.huang(a)nxp.com>
Input: imx_keypad - make sure keyboard can always wake up system
Teresa Remmet <t.remmet(a)phytec.de>
ARM: dts: am335x phytec boards: Fix cd-gpios active level
Thomas Falcon <tlfalcon(a)linux.ibm.com>
ibmvnic: Refresh device multicast list after reset
YueHaibing <yuehaibing(a)huawei.com>
can: af_can: Fix error path of can_init()
Eugen Hristev <eugen.hristev(a)microchip.com>
can: m_can: implement errata "Needless activation of MRAF irq"
Sean Nyekjaer <sean(a)geanix.com>
can: mcp251x: add support for mcp25625
Sean Nyekjaer <sean(a)geanix.com>
dt-bindings: can: mcp251x: add mcp25625 support
Guillaume Nault <gnault(a)redhat.com>
netfilter: ipv6: nf_defrag: accept duplicate fragments again
Guillaume Nault <gnault(a)redhat.com>
netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
Jia-Ju Bai <baijiaju1990(a)gmail.com>
iwlwifi: Fix double-free problems in iwl_req_fw_callback()
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix possible buffer overflows at parsing bss descriptor
Pradeep Kumar Chitrapu <pradeepc(a)codeaurora.org>
mac80211: free peer keys before vif down in mesh
Thomas Pedersen <thomas(a)eero.com>
mac80211: mesh: fix RCU warning
Melissa Wen <melissa.srw(a)gmail.com>
staging:iio:ad7150: fix threshold mode config bit
John Fastabend <john.fastabend(a)gmail.com>
bpf: sockmap, fix use after free from sleep in psock backlog workqueue
Chang-Hsien Tsai <luke.tw(a)gmail.com>
samples, bpf: fix to change the buffer size for read()
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - enable middle button support on 2 ThinkPads
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - rename alternative AEAD algos.
James Morse <james.morse(a)arm.com>
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Cole Rogers <colerogers(a)disroot.org>
Input: synaptics - enable SMBUS on T480 thinkpad trackpad
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
-------------
Diffstat:
Documentation/ABI/testing/sysfs-class-net-qmi | 4 +-
Documentation/admin-guide/hw-vuln/index.rst | 1 +
Documentation/admin-guide/hw-vuln/spectre.rst | 697 +++++++++++++++++++++
.../bindings/net/can/microchip,mcp251x.txt | 1 +
Documentation/userspace-api/spec_ctrl.rst | 2 +
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 +-
arch/arm/boot/dts/am335x-pcm-953.dtsi | 2 +-
arch/arm/boot/dts/am335x-wega.dtsi | 2 +-
arch/arm/boot/dts/imx6ul.dtsi | 8 +-
arch/arm/mach-davinci/board-da850-evm.c | 2 +
arch/arm/mach-davinci/devices-da8xx.c | 3 +
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/mips/include/uapi/asm/sgidefs.h | 8 -
arch/s390/include/asm/facility.h | 21 +-
arch/x86/kernel/head64.c | 17 +-
arch/x86/kernel/ptrace.c | 5 +-
arch/x86/kernel/tls.c | 9 +-
block/bfq-iosched.c | 1 +
drivers/android/binder.c | 4 +-
drivers/base/cacheinfo.c | 3 +-
drivers/clk/ti/clkctrl.c | 7 +-
drivers/crypto/nx/nx-842-powernv.c | 8 +-
drivers/crypto/talitos.c | 16 +-
drivers/firmware/efi/efi-bgrt.c | 5 -
drivers/gpu/drm/drm_bufs.c | 5 +-
drivers/gpu/drm/drm_ioc32.c | 5 +-
drivers/gpu/drm/udl/udl_drv.c | 56 +-
drivers/gpu/drm/udl/udl_drv.h | 9 +-
drivers/gpu/drm/udl/udl_fb.c | 12 +-
drivers/gpu/drm/udl/udl_main.c | 35 +-
drivers/input/keyboard/imx_keypad.c | 18 +-
drivers/input/mouse/elantech.c | 2 +
drivers/input/mouse/synaptics.c | 1 +
drivers/md/dm-verity-target.c | 4 +-
drivers/md/md.c | 36 +-
drivers/misc/vmw_vmci/vmci_context.c | 80 +--
drivers/misc/vmw_vmci/vmci_handle_array.c | 38 +-
drivers/misc/vmw_vmci/vmci_handle_array.h | 29 +-
drivers/net/can/m_can/m_can.c | 21 +
drivers/net/can/spi/Kconfig | 5 +-
drivers/net/can/spi/mcp251x.c | 25 +-
drivers/net/dsa/mv88e6xxx/global1_vtu.c | 2 +-
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 3 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 1 +
drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 +-
drivers/net/ethernet/ibm/ibmvnic.c | 3 +
drivers/net/ethernet/intel/e1000e/netdev.c | 21 +-
drivers/net/ethernet/mellanox/mlxsw/reg.h | 2 +-
drivers/net/ethernet/sis/sis900.c | 16 +-
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/net/usb/qmi_wwan.c | 27 +-
drivers/net/wireless/ath/carl9170/usb.c | 39 +-
drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 1 -
drivers/net/wireless/intersil/p54/p54usb.c | 43 +-
drivers/net/wireless/marvell/mwifiex/fw.h | 12 +-
drivers/net/wireless/marvell/mwifiex/ie.c | 45 +-
drivers/net/wireless/marvell/mwifiex/scan.c | 31 +-
drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 4 +-
drivers/net/wireless/marvell/mwifiex/wmm.c | 2 +-
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
drivers/staging/comedi/drivers/amplc_pci230.c | 3 +-
drivers/staging/comedi/drivers/dt282x.c | 3 +-
drivers/staging/iio/cdc/ad7150.c | 19 +-
drivers/tty/serial/8250/8250_port.c | 3 +-
drivers/usb/gadget/function/u_ether.c | 6 +-
drivers/usb/renesas_usbhs/fifo.c | 34 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 +
drivers/usb/serial/option.c | 1 +
fs/crypto/policy.c | 2 +
fs/quota/dquot.c | 4 +-
fs/udf/inode.c | 93 ++-
include/linux/cpuhotplug.h | 1 +
include/linux/kernel.h | 3 +-
include/linux/vmw_vmci_defs.h | 11 +-
include/net/ip6_tunnel.h | 9 +-
include/uapi/linux/nilfs2_ondisk.h | 24 +-
kernel/cpu.c | 3 +
kernel/events/core.c | 2 +-
net/can/af_can.c | 24 +-
net/core/skbuff.c | 1 +
net/ipv6/netfilter/nf_conntrack_reasm.c | 22 +-
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/mesh.c | 6 +-
net/sunrpc/clnt.c | 1 +
samples/bpf/bpf_load.c | 2 +-
sound/pci/hda/patch_realtek.c | 2 +-
virt/kvm/arm/vgic/vgic-its.c | 1 +
91 files changed, 1393 insertions(+), 408 deletions(-)
This is the start of the stable review cycle for the 5.1.19 release.
There are 54 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.19-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.1.19-rc1
Jiri Slaby <jslaby(a)suse.cz>
x86/entry/32: Fix ENDPROC of common_spurious
Haren Myneni <haren(a)linux.vnet.ibm.com>
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - fix hash on SEC1.
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - move struct talitos_edesc into talitos.h
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Seperate unused system vectors from spurious entry again
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Handle spurious interrupt after shutdown gracefully
Thomas Gleixner <tglx(a)linutronix.de>
x86/ioapic: Implement irq_get_irqchip_state() callback
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Add optional hardware synchronization for shutdown
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Fix misleading synchronize_irq() documentation
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Delay deactivation in free_irq()
Vinod Koul <vkoul(a)kernel.org>
linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
Andrea Arcangeli <aarcange(a)redhat.com>
fork,memcg: alloc_thread_stack_node needs to set tsk->stack
Yafang Shao <laoar.shao(a)gmail.com>
mm/oom_kill.c: fix uninitialized oc->constraint
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Update cur_mask in mask/mask ops
Eiichi Tsukata <devel(a)etsukata.com>
cpu/hotplug: Fix out-of-bounds read when setting fail state
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Ignore interrupts that are wake only during resume
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
HID: multitouch: Add pointstick support for ALPS Touchpad
Kyle Godbey <me(a)kyle.ee>
HID: uclogic: Add support for Huion HS64 tablet
Oleksandr Natalenko <oleksandr(a)redhat.com>
HID: chicony: add another quirk for PixArt mouse
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Fix crash if kernel image crosses page table boundary
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Jerome Marchand <jmarchan(a)redhat.com>
dm table: don't copy from a NULL pointer in realloc_argv()
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
pinctrl: ocelot: fix pinmuxing for pins after 31
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
pinctrl: ocelot: fix gpio direction for pins after 31
Phil Reid <preid(a)electromag.com.au>
pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: imx6ul: fix PWM[1-4] interrupts
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Colin Ian King <colin.king(a)canonical.com>
x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
Qian Cai <cai(a)lca.pw>
x86/efi: fix a -Wtype-limits compilation warning
David Howells <dhowells(a)redhat.com>
afs: Fix uninitialised spinlock afs_volume::cb_break_lock
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: gemini Fix up DNS-313 compatible string
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Michael Ellerman <mpe(a)ellerman.id.au>
selftests/powerpc: Add test of fork with mapping above 512TB
Ran Wang <ran.wang_1(a)nxp.com>
arm64: dts: ls1028a: Fix CPU idle fail.
Hans de Goede <hdegoede(a)redhat.com>
efi/bgrt: Drop BGRT status field reserved bits check
Tony Lindgren <tony(a)atomide.com>
clk: ti: clkctrl: Fix returning uninitialized data
Heyi Guo <guoheyi(a)huawei.com>
irqchip/gic-v3-its: Fix command queue pointer comparison bug
Guo Ren <ren_guo(a)c-sky.com>
irqchip/irq-csky-mpintc: Support auto irq deliver to all cpus
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
ARM: dts: meson8b: fix the operating voltage of the Mali GPU
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
ARM: dts: meson8: fix GPU interrupts and drop an undocumented property
Sven Van Asbroeck <thesven73(a)gmail.com>
firmware: improve LSM/IMA security behaviour
James Morse <james.morse(a)arm.com>
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Cole Rogers <colerogers(a)disroot.org>
Input: synaptics - enable SMBUS on T480 thinkpad trackpad
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 +-
arch/arm/boot/dts/gemini-dlink-dns-313.dts | 2 +-
arch/arm/boot/dts/imx6ul.dtsi | 8 +-
arch/arm/boot/dts/meson8.dtsi | 5 +-
arch/arm/boot/dts/meson8b.dtsi | 10 +--
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 18 ++--
arch/s390/include/asm/facility.h | 21 +++--
arch/x86/entry/entry_32.S | 24 ++++++
arch/x86/entry/entry_64.S | 30 ++++++-
arch/x86/include/asm/hw_irq.h | 5 +-
arch/x86/kernel/apic/apic.c | 36 +++++---
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++
arch/x86/kernel/apic/vector.c | 4 +-
arch/x86/kernel/head64.c | 20 +++--
arch/x86/kernel/idt.c | 3 +-
arch/x86/kernel/irq.c | 2 +-
arch/x86/platform/efi/quirks.c | 2 +-
drivers/base/cacheinfo.c | 3 +-
drivers/base/firmware_loader/fallback.c | 2 +-
drivers/clk/ti/clkctrl.c | 7 +-
drivers/crypto/nx/nx-842-powernv.c | 8 +-
drivers/crypto/talitos.c | 99 +++++++++-------------
drivers/crypto/talitos.h | 30 +++++++
drivers/firmware/efi/efi-bgrt.c | 5 --
drivers/hid/hid-ids.h | 3 +
drivers/hid/hid-multitouch.c | 4 +
drivers/hid/hid-quirks.c | 1 +
drivers/hid/hid-uclogic-core.c | 2 +
drivers/hid/hid-uclogic-params.c | 2 +
drivers/input/mouse/synaptics.c | 1 +
drivers/irqchip/irq-csky-mpintc.c | 15 +++-
drivers/irqchip/irq-gic-v3-its.c | 35 +++++---
drivers/md/dm-table.c | 2 +-
drivers/md/dm-verity-target.c | 4 +-
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 ++++--
drivers/net/ethernet/intel/e1000e/netdev.c | 21 +++--
drivers/net/ethernet/sis/sis900.c | 16 ++--
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/pinctrl/mediatek/mtk-eint.c | 34 ++++----
drivers/pinctrl/pinctrl-mcp23s08.c | 8 +-
drivers/pinctrl/pinctrl-ocelot.c | 18 ++--
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
fs/afs/callback.c | 4 +-
fs/afs/internal.h | 2 +-
fs/afs/volume.c | 1 +
include/linux/cpuhotplug.h | 1 +
include/linux/kernel.h | 3 +-
include/uapi/linux/nilfs2_ondisk.h | 24 +++---
kernel/cpu.c | 3 +
kernel/events/core.c | 2 +-
kernel/fork.c | 6 +-
kernel/irq/autoprobe.c | 6 +-
kernel/irq/chip.c | 6 ++
kernel/irq/cpuhotplug.c | 2 +-
kernel/irq/internals.h | 5 ++
kernel/irq/manage.c | 90 +++++++++++++++-----
mm/oom_kill.c | 12 ++-
tools/testing/selftests/powerpc/mm/.gitignore | 3 +-
tools/testing/selftests/powerpc/mm/Makefile | 4 +-
.../powerpc/mm/large_vm_fork_separation.c | 87 +++++++++++++++++++
63 files changed, 610 insertions(+), 258 deletions(-)
This is the start of the stable review cycle for the 4.9.186 release.
There are 54 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.186-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.186-rc1
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: imx6ul: fix PWM[1-4] interrupts
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Mark Rutland <mark.rutland(a)arm.com>
arm64: crypto: remove accidentally backported files
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
Sean Young <sean(a)mess.org>
MIPS: Remove superfluous check for __linux__
Vishnu DASA <vdasa(a)vmware.com>
VMCI: Fix integer overflow in VMCI handle arrays
Christian Lamparter <chunkeey(a)gmail.com>
carl9170: fix misuse of device driver API
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: amplc_pci230: fix null pointer deref on interrupt
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt282x: fix a null pointer deref on interrupt
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: add a workaround for a race condition of workqueue
Kiruthika Varadarajan <Kiruthika.Varadarajan(a)harman.com>
usb: gadget: ether: Fix race between gether_disconnect and rx_submit
Alan Stern <stern(a)rowland.harvard.edu>
p54usb: Fix race between disconnect and firmware loading
Oliver Barta <o.barta89(a)gmail.com>
Revert "serial: 8250: Don't service RX FIFO if interrupts are disabled"
Jörgen Storvist <jorgen.storvist(a)gmail.com>
USB: serial: option: add support for GosunCn ME3630 RNDIS mode
Andreas Fritiofson <andreas.fritiofson(a)unjo.com>
USB: serial: ftdi_sio: add ID for isodebug v1
Brian Norris <briannorris(a)chromium.org>
mwifiex: Don't abort on small, spec-compliant vendor IEs
Hongjie Fang <hongjiefang(a)asrmicro.com>
fscrypt: don't set policy for a dead directory
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Abort at too short BSS descriptor element
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
Steven J. Magnani <steve.magnani(a)digidescorp.com>
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
Lin Yi <teroincn(a)163.com>
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
Xin Long <lucien.xin(a)gmail.com>
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
Mauro S. M. Rodrigues <maurosr(a)linux.vnet.ibm.com>
bnx2x: Check if transceiver implements DDM before access
Mariusz Tkaczyk <mariusz.tkaczyk(a)intel.com>
md: fix for divide error in status_resync
Yibo Zhao <yiboz(a)codeaurora.org>
mac80211: only warn once on chanctx_conf being NULL
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da850-evm: call regulator_has_full_constraints()
Ido Schimmel <idosch(a)mellanox.com>
mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
Anson Huang <anson.huang(a)nxp.com>
Input: imx_keypad - make sure keyboard can always wake up system
Sean Nyekjaer <sean(a)geanix.com>
can: mcp251x: add support for mcp25625
Sean Nyekjaer <sean(a)geanix.com>
dt-bindings: can: mcp251x: add mcp25625 support
Guillaume Nault <gnault(a)redhat.com>
netfilter: ipv6: nf_defrag: accept duplicate fragments again
Guillaume Nault <gnault(a)redhat.com>
netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix possible buffer overflows at parsing bss descriptor
Pradeep Kumar Chitrapu <pradeepc(a)codeaurora.org>
mac80211: free peer keys before vif down in mesh
Thomas Pedersen <thomas(a)eero.com>
mac80211: mesh: fix RCU warning
Melissa Wen <melissa.srw(a)gmail.com>
staging:iio:ad7150: fix threshold mode config bit
Chang-Hsien Tsai <luke.tw(a)gmail.com>
samples, bpf: fix to change the buffer size for read()
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - enable middle button support on 2 ThinkPads
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - rename alternative AEAD algos.
-------------
Diffstat:
.../bindings/net/can/microchip,mcp251x.txt | 1 +
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 +-
arch/arm/boot/dts/imx6ul.dtsi | 8 +-
arch/arm/mach-davinci/board-da850-evm.c | 2 +
arch/arm/mach-davinci/devices-da8xx.c | 3 +
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/arm64/crypto/sha256-core.S | 2061 --------------------
arch/arm64/crypto/sha512-core.S | 1085 -----------
arch/mips/include/uapi/asm/sgidefs.h | 8 -
arch/s390/include/asm/facility.h | 21 +-
arch/x86/kernel/ptrace.c | 5 +-
arch/x86/kernel/tls.c | 9 +-
drivers/crypto/talitos.c | 16 +-
drivers/input/keyboard/imx_keypad.c | 18 +-
drivers/input/mouse/elantech.c | 2 +
drivers/md/dm-verity-target.c | 4 +-
drivers/md/md.c | 36 +-
drivers/misc/vmw_vmci/vmci_context.c | 80 +-
drivers/misc/vmw_vmci/vmci_handle_array.c | 38 +-
drivers/misc/vmw_vmci/vmci_handle_array.h | 29 +-
drivers/net/can/spi/Kconfig | 5 +-
drivers/net/can/spi/mcp251x.c | 25 +-
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 3 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 1 +
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 +-
drivers/net/ethernet/intel/e1000e/netdev.c | 21 +-
drivers/net/ethernet/mellanox/mlxsw/reg.h | 2 +-
drivers/net/ethernet/sis/sis900.c | 16 +-
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/net/wireless/ath/carl9170/usb.c | 39 +-
drivers/net/wireless/intersil/p54/p54usb.c | 43 +-
drivers/net/wireless/marvell/mwifiex/fw.h | 12 +-
drivers/net/wireless/marvell/mwifiex/ie.c | 45 +-
drivers/net/wireless/marvell/mwifiex/scan.c | 31 +-
drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 4 +-
drivers/net/wireless/marvell/mwifiex/wmm.c | 2 +-
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
drivers/staging/comedi/drivers/amplc_pci230.c | 3 +-
drivers/staging/comedi/drivers/dt282x.c | 3 +-
drivers/staging/iio/cdc/ad7150.c | 19 +-
drivers/tty/serial/8250/8250_port.c | 3 +-
drivers/usb/gadget/function/u_ether.c | 6 +-
drivers/usb/renesas_usbhs/fifo.c | 34 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 +
drivers/usb/serial/option.c | 1 +
fs/crypto/policy.c | 2 +
fs/udf/inode.c | 93 +-
include/linux/vmw_vmci_defs.h | 11 +-
include/net/ip6_tunnel.h | 9 +-
include/uapi/linux/nilfs2_ondisk.h | 24 +-
kernel/events/core.c | 2 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 22 +-
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/mesh.c | 6 +-
net/sunrpc/clnt.c | 1 +
samples/bpf/bpf_load.c | 2 +-
virt/kvm/arm/vgic/vgic-its.c | 1 +
60 files changed, 516 insertions(+), 3461 deletions(-)
This is the start of the stable review cycle for the 4.4.186 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.186-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.186-rc1
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: x86: protect KVM_CREATE_PIT/KVM_CREATE_PIT2 with kvm->lock
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Sean Young <sean(a)mess.org>
MIPS: Remove superfluous check for __linux__
Vishnu DASA <vdasa(a)vmware.com>
VMCI: Fix integer overflow in VMCI handle arrays
Christian Lamparter <chunkeey(a)gmail.com>
carl9170: fix misuse of device driver API
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: amplc_pci230: fix null pointer deref on interrupt
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt282x: fix a null pointer deref on interrupt
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: renesas_usbhs: add a workaround for a race condition of workqueue
Kiruthika Varadarajan <Kiruthika.Varadarajan(a)harman.com>
usb: gadget: ether: Fix race between gether_disconnect and rx_submit
Jörgen Storvist <jorgen.storvist(a)gmail.com>
USB: serial: option: add support for GosunCn ME3630 RNDIS mode
Andreas Fritiofson <andreas.fritiofson(a)unjo.com>
USB: serial: ftdi_sio: add ID for isodebug v1
Brian Norris <briannorris(a)chromium.org>
mwifiex: Don't abort on small, spec-compliant vendor IEs
Hongjie Fang <hongjiefang(a)asrmicro.com>
fscrypt: don't set policy for a dead directory
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Abort at too short BSS descriptor element
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
Dianzhang Chen <dianzhangchen0(a)gmail.com>
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
Steven J. Magnani <steve.magnani(a)digidescorp.com>
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
Mauro S. M. Rodrigues <maurosr(a)linux.vnet.ibm.com>
bnx2x: Check if transceiver implements DDM before access
Mariusz Tkaczyk <mariusz.tkaczyk(a)intel.com>
md: fix for divide error in status_resync
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
ARM: davinci: da850-evm: call regulator_has_full_constraints()
Anson Huang <anson.huang(a)nxp.com>
Input: imx_keypad - make sure keyboard can always wake up system
Sean Nyekjaer <sean(a)geanix.com>
can: mcp251x: add support for mcp25625
Sean Nyekjaer <sean(a)geanix.com>
dt-bindings: can: mcp251x: add mcp25625 support
Takashi Iwai <tiwai(a)suse.de>
mwifiex: Fix possible buffer overflows at parsing bss descriptor
Thomas Pedersen <thomas(a)eero.com>
mac80211: mesh: fix RCU warning
Chang-Hsien Tsai <luke.tw(a)gmail.com>
samples, bpf: fix to change the buffer size for read()
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - enable middle button support on 2 ThinkPads
-------------
Diffstat:
.../bindings/net/can/microchip,mcp251x.txt | 1 +
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 +--
arch/arm/mach-davinci/board-da850-evm.c | 2 +
arch/arm/mach-davinci/devices-da8xx.c | 3 +
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/mips/include/uapi/asm/sgidefs.h | 8 --
arch/s390/include/asm/facility.h | 21 +++--
arch/x86/kernel/ptrace.c | 5 +-
arch/x86/kernel/tls.c | 9 ++-
arch/x86/kvm/i8254.c | 5 +-
arch/x86/kvm/x86.c | 6 +-
drivers/input/keyboard/imx_keypad.c | 18 ++++-
drivers/input/mouse/elantech.c | 2 +
drivers/md/dm-verity.c | 4 +-
drivers/md/md.c | 36 +++++----
drivers/misc/vmw_vmci/vmci_context.c | 80 +++++++++++--------
drivers/misc/vmw_vmci/vmci_handle_array.c | 38 ++++++---
drivers/misc/vmw_vmci/vmci_handle_array.h | 29 ++++---
drivers/net/can/spi/Kconfig | 5 +-
drivers/net/can/spi/mcp251x.c | 25 +++---
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 3 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 1 +
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 +++++--
drivers/net/ethernet/intel/e1000e/netdev.c | 6 +-
drivers/net/ethernet/sis/sis900.c | 16 ++--
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/net/wireless/ath/carl9170/usb.c | 39 ++++-----
drivers/net/wireless/mwifiex/fw.h | 12 ++-
drivers/net/wireless/mwifiex/ie.c | 45 +++++++----
drivers/net/wireless/mwifiex/scan.c | 31 +++++++-
drivers/net/wireless/mwifiex/sta_ioctl.c | 4 +-
drivers/net/wireless/mwifiex/wmm.c | 2 +-
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
drivers/staging/comedi/drivers/amplc_pci230.c | 3 +-
drivers/staging/comedi/drivers/dt282x.c | 3 +-
drivers/usb/gadget/function/u_ether.c | 6 +-
drivers/usb/renesas_usbhs/fifo.c | 34 +++++---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++
drivers/usb/serial/option.c | 1 +
fs/ext4/crypto_policy.c | 2 +
fs/f2fs/crypto_policy.c | 2 +
fs/udf/inode.c | 93 ++++++++++++++--------
include/linux/vmw_vmci_defs.h | 11 ++-
kernel/events/core.c | 2 +-
net/mac80211/mesh.c | 5 +-
samples/bpf/bpf_load.c | 2 +-
49 files changed, 438 insertions(+), 240 deletions(-)
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be
zero" fields of a 'pfn' info-block to be filled with indeterminate data.
While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly initialized
to be guaranteed zero. Bump the minor version to indicate it is safe to
assume the 'padding' and 'flags' are zero. Otherwise, this corruption is
expected to benign since all other critical fields are explicitly
initialized.
Note The cc: stable is about spreading this new policy to as many kernels
as possible not fixing an issue in those kernels. It is not until the
change titled "libnvdimm/pfn: Stop padding pmem namespaces to section
alignment" where this improper initialization becomes a problem. So if
someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces
to section alignment" (which is not tagged for stable), make sure this
pre-requisite is flagged.
Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwil…
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> [ppc64]
Cc: <stable(a)vger.kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/nvdimm/dax_devs.c | 2 +-
drivers/nvdimm/pfn.h | 1 +
drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/nvdimm/dax_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/dax_devs.c
@@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
--- a/drivers/nvdimm/pfn_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn_devs.c
@@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(st
return 0;
}
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
@@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -693,7 +702,7 @@ static int nd_pfn_init(struct nd_pfn *nd
u64 checksum;
int rc;
- pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
@@ -702,11 +711,14 @@ static int nd_pfn_init(struct nd_pfn *nd
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
/* no info block, do init */;
+ memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -759,7 +771,7 @@ static int nd_pfn_init(struct nd_pfn *nd
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
- pfn_sb->version_minor = cpu_to_le16(2);
+ pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
--- a/drivers/nvdimm/pfn.h~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn.h
@@ -28,6 +28,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
};
_
From: Nadav Amit <namit(a)vmware.com>
Subject: resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource is
not removed while accessing it. However, find_next_iomem_res() does not
hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/kernel/resource.c~resource-fix-locking-in-find_next_iomem_res
+++ a/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_
break;
}
- read_unlock(&resource_lock);
- if (!p)
- return -1;
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ read_unlock(&resource_lock);
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
_
Servers can defer destaging any data and updating the mtime until close().
This means that if we do a setinfo to modify the mtime while other handles
are open for write the server may overwrite our setinfo timestamps when
if flushes the file on close() of the writeable handle.
To solve this we add an explicit flush when the mtime is about to
be updated.
This fixes "cp -p" to preserve mtime when copying a file onto an SMB2 share.
CC: Stable <stable(a)vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
---
fs/cifs/inode.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 1bffe029fb66..56ca4b8ccaba 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2406,6 +2406,8 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
struct inode *inode = d_inode(direntry);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifsInodeInfo *cifsInode = CIFS_I(inode);
+ struct cifsFileInfo *wfile;
+ struct cifs_tcon *tcon;
char *full_path = NULL;
int rc = -EACCES;
__u32 dosattr = 0;
@@ -2452,6 +2454,20 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
mapping_set_error(inode->i_mapping, rc);
rc = 0;
+ if (attrs->ia_valid & ATTR_MTIME) {
+ rc = cifs_get_writable_file(cifsInode, false, &wfile);
+ if (!rc) {
+ tcon = tlink_tcon(wfile->tlink);
+ rc = tcon->ses->server->ops->flush(xid, tcon, &wfile->fid);
+ cifsFileInfo_put(wfile);
+ if (rc)
+ return rc;
+ } else if (rc != -EBADF)
+ return rc;
+ else
+ rc = 0;
+ }
+
if (attrs->ia_valid & ATTR_SIZE) {
rc = cifs_set_file_size(inode, attrs, xid, full_path);
if (rc != 0)
--
2.13.6
The patch titled
Subject: mm: compaction: avoid 100% CPU usage during compaction when a task is killed
has been added to the -mm tree. Its filename is
mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-avoid-100%25-cpu-usa…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-avoid-100%25-cpu-usa…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting while
making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and isolating
a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
The reporter's real name is unknown. This was caught and repaired due to
their testing and tracing.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reported-by: <howaboutsynergy(a)protonmail.com>
Tested-by: <howaboutsynergy(a)protonmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/compaction.c~mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed
+++ a/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compac
/*
* Periodically drop the lock (if held) regardless of its
- * contention, to give chance to IRQs. Abort async compaction
- * if contended.
+ * contention, to give chance to IRQs. Abort completely if
+ * a fatal signal is pending.
*/
if (!(low_pfn % SWAP_CLUSTER_MAX)
&& compact_unlock_should_abort(&pgdat->lru_lock,
- flags, &locked, cc))
- break;
+ flags, &locked, cc)) {
+ low_pfn = 0;
+ goto fatal_pending;
+ }
if (!pfn_valid_within(low_pfn))
goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_abort:
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
+fatal_pending:
cc->total_migrate_scanned += nr_scanned;
if (nr_isolated)
count_compact_events(COMPACTISOLATED, nr_isolated);
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed.patch
The patch titled
Subject: mm: migrate: fix reference check race between __find_get_block() and migration
has been added to the -mm tree. Its filename is
mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-fix-reference-check-rac…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-fix-reference-check-rac…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: fix reference check race between __find_get_block() and migration
buffer_migrate_page_norefs() can race with bh users in the following way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be acquired
and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with a
similar page migration implementation as mainline. The data corruption
could not be reproduced with this patch applied. A small number of
migration-intensive tests were run and no performance problems were noted.
[mgorman(a)techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration
+++ a/mm/migrate.c
@@ -768,12 +768,12 @@ recheck_buffers:
}
bh = bh->b_this_page;
} while (bh != head);
- spin_unlock(&mapping->private_lock);
if (busy) {
if (invalidated) {
rc = -EAGAIN;
goto unlock_buffers;
}
+ spin_unlock(&mapping->private_lock);
invalidate_bh_lrus();
invalidated = true;
goto recheck_buffers;
@@ -806,6 +806,8 @@ recheck_buffers:
rc = MIGRATEPAGE_SUCCESS;
unlock_buffers:
+ if (check_refs)
+ spin_unlock(&mapping->private_lock);
bh = head;
do {
unlock_buffer(bh);
_
Patches currently in -mm which might be from jack(a)suse.cz are
mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration.patch
The patch titled
Subject: mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
has been added to the -mm tree. Its filename is
mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-vmscan-check-if-mem-cgroup-is-d…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmscan-check-if-mem-cgroup-is-d…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
Shakeel Butt reported premature oom on kernel with "cgroup_disable=memory"
since mem_cgroup_is_root() returns false even though memcg is actually
NULL. The drop_caches is also broken.
It is because aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls
in shrink_node()") removed the !memcg check before !mem_cgroup_is_root().
And, surprisingly root memcg is allocated even though memory cgroup is
disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.a…
Fixes: aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker
+++ a/mm/vmscan.c
@@ -699,7 +699,14 @@ static unsigned long shrink_slab(gfp_t g
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
revert-kmemleak-allow-to-coexist-with-fault-injection.patch
mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker.patch
mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified.patch
mm-mempolicy-handle-vma-with-unmovable-pages-mapped-correctly-in-mbind.patch
mm-thp-make-transhuge_vma_suitable-available-for-anonymous-thp.patch
mm-thp-make-transhuge_vma_suitable-available-for-anonymous-thp-v4.patch
mm-thp-fix-false-negative-of-shmem-vmas-thp-eligibility.patch
A single 32-bit PSR2 training pattern field follows the sixteen element
array of PSR table entries in the VBT spec. But, we incorrectly define
this PSR2 field for each of the PSR table entries. As a result, the PSR1
training pattern duration for any panel_type != 0 will be parsed
incorrectly. Secondly, PSR2 training pattern durations for VBTs with bdb
version >= 226 will also be wrong.
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Cc: stable(a)vger.kernel.org #v5.2
Fixes: 88a0d9606aff ("drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111088
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204183
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Tested-by: François Guerraz <kubrick(a)fgv6.net>
---
drivers/gpu/drm/i915/display/intel_bios.c | 2 +-
drivers/gpu/drm/i915/display/intel_vbt_defs.h | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c
index 21501d565327..b416b394b641 100644
--- a/drivers/gpu/drm/i915/display/intel_bios.c
+++ b/drivers/gpu/drm/i915/display/intel_bios.c
@@ -766,7 +766,7 @@ parse_psr(struct drm_i915_private *dev_priv, const struct bdb_header *bdb)
}
if (bdb->version >= 226) {
- u32 wakeup_time = psr_table->psr2_tp2_tp3_wakeup_time;
+ u32 wakeup_time = psr->psr2_tp2_tp3_wakeup_time;
wakeup_time = (wakeup_time >> (2 * panel_type)) & 0x3;
switch (wakeup_time) {
diff --git a/drivers/gpu/drm/i915/display/intel_vbt_defs.h b/drivers/gpu/drm/i915/display/intel_vbt_defs.h
index 93f5c9d204d6..09cd37fb0b1c 100644
--- a/drivers/gpu/drm/i915/display/intel_vbt_defs.h
+++ b/drivers/gpu/drm/i915/display/intel_vbt_defs.h
@@ -481,13 +481,13 @@ struct psr_table {
/* TP wake up time in multiple of 100 */
u16 tp1_wakeup_time;
u16 tp2_tp3_wakeup_time;
-
- /* PSR2 TP2/TP3 wakeup time for 16 panels */
- u32 psr2_tp2_tp3_wakeup_time;
} __packed;
struct bdb_psr {
struct psr_table psr_table[16];
+
+ /* PSR2 TP2/TP3 wakeup time for 16 panels */
+ u32 psr2_tp2_tp3_wakeup_time;
} __packed;
/*
--
2.17.1
Hi,
Hope this note finds you well.
We being a global B2B database provider over a decade and a Compliant
Certificate Holder for GDPR. I am trying to reach you with potential
leads to help you target your marketing capabilities and boost your sales
and marketing plans for 2019.
IT Decision Makers: CIO, CTO, VP of IT, IT/Technology, Director of IT, IT
Managers Etc.
Finance: CFO, Chief Accountant, VP of Finance, President of Finance,
Controllers, Etc.
Marketing: CMO, SVP/VP Marketing, VP Partner Marketing, Director of
Marketing Etc.
Telecom/Contact Center VP Telecom, Principal Director Telecom, Senior
Director Contact Center Etc.
HR Decision Makers: Head of HR, Chief HR Officer, SVP HR, Director HR, VP
of HR Etc.
IT Security CISO, Head of Security, VP of IT Security Etc. And many more
titles.
We do also have: If you are looking for the right MSPs, MSSPs, ISVs and
VARs to partner with. We have a customized list of MSPs, MSSPs, CSPs, VADs,
SIs, ISVs and VARs rendering services for Backup and Recovery,
Connectivity, Cyber/Email/Network/Cloud Security, IaaS, VoIP, CRM, DaaS,
Virtualization, IT Consulting, ERP, PaaS, End Users, Channel Partners,
Resellers, OEM Partners, Software Solution and Technology Partners,
Services Partners and many more.
Please let me know what technology users you would be interested in and I
will get back to you with more information regarding the same.
Regards,
Rosie Huynh
Marketing Specialist
If you don't want to include yourself in our mailing list, please reply
back “Leave Out" in a subject line
Changes since v1 [1]:
- Fix an ioctl command corruption regression that manifested as an
intermittent failure of the monitor.sh unit test. This is handled in
the patch4 prep patch that makes it safe for nd_ioctl() to be
re-entrant. (Vishal)
- Update the changelog for the driver-core 'lockdep_lock' hack to
indicate Greg's non-NAK.
[1]: https://lore.kernel.org/lkml/156029554317.419799.1324389595953183385.stgit@…
---
The libnvdimm subsystem uses async operations to parallelize device
probing operations and to allow sysfs to trigger device_unregister() on
deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
interface uncovered a case where device_unregister() is triggered
multiple times, and the subsequent investigation uncovered a broken
locking scenario.
The lack of lockdep coverage for device_lock() stymied the debug. That
is, until patch6 "driver-core, libnvdimm: Let device subsystems add
local lockdep coverage" solved that with a shadow lock, with lockdep
coverage, to mirror device_lock() operations. Given the time saved with
shadow-lock debug-hack, patch6 attempts to generalize device_lock()
debug facility that might be able to be carried upstream. Patch6 is
staged at the end of this fix series in case it is contentious and needs
to be dropped.
Patch1 "drivers/base: Introduce kill_device()" could be achieved with
local libnvdimm infrastructure. However, the existing 'dead' flag in
'struct device_private' aims to solve similar async register/unregister
races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
device_unregister() calls" can be implemented with existing driver-core
infrastructure.
Patch3 is a rare lockdep warning that is intermittent based on
namespaces racing ahead of the completion of probe of their parent
region. It is not related to the other fixes, it just happened to
trigger as a result of the async stress test.
Patch5 and patch6 address an ABBA deadlock tripped by the stress test.
These patches pass the failing stress test and the existing libnvdimm
unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
shadow lock with no lockdep warnings.
---
Dan Williams (7):
drivers/base: Introduce kill_device()
libnvdimm/bus: Prevent duplicate device_unregister() calls
libnvdimm/region: Register badblocks before namespaces
libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
driver-core, libnvdimm: Let device subsystems add local lockdep coverage
drivers/acpi/nfit/core.c | 28 +++--
drivers/acpi/nfit/nfit.h | 24 ++++
drivers/base/core.c | 30 ++++--
drivers/nvdimm/btt_devs.c | 16 +--
drivers/nvdimm/bus.c | 210 ++++++++++++++++++++++++++-------------
drivers/nvdimm/core.c | 10 +-
drivers/nvdimm/dimm_devs.c | 4 -
drivers/nvdimm/namespace_devs.c | 36 +++----
drivers/nvdimm/nd-core.h | 71 +++++++++++++
drivers/nvdimm/pfn_devs.c | 24 ++--
drivers/nvdimm/pmem.c | 4 -
drivers/nvdimm/region.c | 24 ++--
drivers/nvdimm/region_devs.c | 12 +-
include/linux/device.h | 6 +
14 files changed, 343 insertions(+), 156 deletions(-)
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
On VLV/CHV there is some kind of linkage between the cdclk frequency
and the DP link frequency. The spec says:
"For DP audio configuration, cdclk frequency shall be set to
meet the following requirements:
DP Link Frequency(MHz) | Cdclk frequency(MHz)
270 | 320 or higher
162 | 200 or higher"
I suspect that would more accurately be expressed as
"cdclk >= DP link clock", and in any case we can express it like
that in the code because of the limited set of cdclk and link
frequencies we support.
Without this we can end up in a situation where the cdclk
is too low and enabling DP audio will kill the pipe. Happens
eg. with 2560x1440 modes where the 266MHz cdclk is sufficient
to pump the pixels (241.5 MHz dotclock) but is too low for
the DP audio due to the link frequency being 270 MHz.
Cc: stable(a)vger.kernel.org
Tested-by: Stefan Gottwald <gottwald(a)igel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111149
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_cdclk.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c b/drivers/gpu/drm/i915/display/intel_cdclk.c
index d0581a1ac243..93b0d190c184 100644
--- a/drivers/gpu/drm/i915/display/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/display/intel_cdclk.c
@@ -2262,6 +2262,17 @@ int intel_crtc_compute_min_cdclk(const struct intel_crtc_state *crtc_state)
if (crtc_state->has_audio && INTEL_GEN(dev_priv) >= 9)
min_cdclk = max(2 * 96000, min_cdclk);
+ /*
+ * "For DP audio configuration, cdclk frequency shall be set to
+ * meet the following requirements:
+ * DP Link Frequency(MHz) | Cdclk frequency(MHz)
+ * 270 | 320 or higher
+ * 162 | 200 or higher"
+ */
+ if ((IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) &&
+ intel_crtc_has_dp_encoder(crtc_state) && crtc_state->has_audio)
+ min_cdclk = max(crtc_state->port_clock, min_cdclk);
+
/*
* On Valleyview some DSI panels lose (v|h)sync when the clock is lower
* than 320000KHz.
--
2.21.0
Folks!
There are more and more people worried about these usually harmless
warnings:
do_IRQ: 0.39 No irq handler for vector
It took a while to figure out why that happens and why it is harmless for
most interrupts, but there is also a real issue hidden for level type
IOAPIC interrupts.
The following commits in Linus tree are addressing the issue:
b7107a67f0d1 ("x86/irq: Handle spurious interrupt after shutdown gracefully")
dfe0cf8b51b0 ("x86/ioapic: Implement irq_get_irqchip_state() callback")
62e0468650c3 ("genirq: Add optional hardware synchronization for shutdown")
1d21f2af8571 ("genirq: Fix misleading synchronize_irq() documentation")
4001d8e8762f ("genirq: Delay deactivation in free_irq()")
There is another one which makes sense to be backported:
f8a8fe61fec8 ("x86/irq: Seperate unused system vectors from spurious entry again")
These should go back to 4.19, but not farther.
They apply cleanly to 5.1 and 5.2. A backport to 4.19 is attached.
Thanks,
tglx
GPU hang observed during the guest OCL conformance test which is caused
by THP GTT feature used durning the test.
It was observed the same GFN with different size (4K and 2M) requested
from the guest in GVT. So during the guest page dma map stage, it is
required to unmap first with orginal size and then remap again with
requested size.
Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Xiaolin Zhang <xiaolin.zhang(a)intel.com>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index a68addf..4a7cf86 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1911,6 +1911,18 @@ static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
ret = __gvt_cache_add(info->vgpu, gfn, *dma_addr, size);
if (ret)
goto err_unmap;
+ } else if (entry->size != size) {
+ /* the same gfn with different size: unmap and re-map */
+ gvt_dma_unmap_page(vgpu, gfn, entry->dma_addr, entry->size);
+ __gvt_cache_remove_entry(vgpu, entry);
+
+ ret = gvt_dma_map_page(vgpu, gfn, dma_addr, size);
+ if (ret)
+ goto err_unlock;
+
+ ret = __gvt_cache_add(info->vgpu, gfn, *dma_addr, size);
+ if (ret)
+ goto err_unmap;
} else {
kref_get(&entry->ref);
*dma_addr = entry->dma_addr;
--
1.8.3.1
From: Josua Mayer <josua(a)solid-run.com>
Armada 8040 needs four clocks to be enabled for MDIO accesses to work.
Update the binding to allow the extra clock to be specified.
Cc: stable(a)vger.kernel.org
Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio")
Signed-off-by: Josua Mayer <josua(a)solid-run.com>
---
Documentation/devicetree/bindings/net/marvell-orion-mdio.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
index 42cd81090a2c..3f3cfc1d8d4d 100644
--- a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
+++ b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
@@ -16,7 +16,7 @@ Required properties:
Optional properties:
- interrupts: interrupt line number for the SMI error/done interrupt
-- clocks: phandle for up to three required clocks for the MDIO instance
+- clocks: phandle for up to four required clocks for the MDIO instance
The child nodes of the MDIO driver are the individual PHY devices
connected to this MDIO bus. They must have a "reg" property given the
--
2.16.4
VAG power control is improved to fit the manual [1]. This patchset fixes as
minimum one bug: if customer muxes Headphone to Line-In right after boot,
the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot:
- Connect sound source to Line-In jack;
- Connect headphone to HP jack;
- Run following commands:
$ amixer set 'Headphone' 80%
$ amixer set 'Headphone Mux' LINE_IN
Also this series includes fixes of non-important bugs in sgtl5000 codec
driver.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Changes in v4:
- CC the patch to kernel-stable
- Code optimization, simplify function signature
(thanks to Cezary Rojewski <cezary.rojewski(a)intel.com> for an idea)
- CC the patch to kernel-stable
- Add a Fixes tag
Changes in v3:
- Add the reference to NXP SGTL5000 data sheet to commit message
- Add the reference to NXP SGTL5000 data sheet to commit message
- Fix multi-line comment format
Changes in v2:
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
- Fix patch formatting
Oleksandr Suvorov (6):
ASoC: Define a set of DAPM pre/post-up events
ASoC: sgtl5000: Improve VAG power and mute control
ASoC: sgtl5000: Fix definition of VAG Ramp Control
ASoC: sgtl5000: add ADC mute control
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: sgtl5000: Fix charge pump source assignment
include/sound/soc-dapm.h | 2 +
sound/soc/codecs/sgtl5000.c | 240 ++++++++++++++++++++++++++++++------
sound/soc/codecs/sgtl5000.h | 2 +-
3 files changed, 203 insertions(+), 41 deletions(-)
--
2.20.1
When testing with a device which uses the drm/udl driver, KASAN shows
that on hot-remove we have a use-after-free:
==================================================================
BUG: KASAN: use-after-free in do_raw_spin_lock+0x1c/0xd0
Read of size 4 at addr ffff888385e325fc by task kworker/2:2/47
CPU: 2 PID: 47 Comm: kworker/2:2 Tainted: G U 4.14.133 #19
Hardware name: GOOGLE Samus, BIOS Google_Samus.6300.276.0 08/17/2016
Workqueue: events drm_mode_rmfb_work_fn
Call Trace:
dump_stack+0x67/0x92
print_address_description+0x80/0x2d6
? do_raw_spin_lock+0x1c/0xd0
kasan_report+0x255/0x295
do_raw_spin_lock+0x1c/0xd0
_raw_spin_lock_irqsave+0x42/0x4e
? down_timeout+0x19/0x58
down_timeout+0x19/0x58
udl_get_urb+0x3d/0x13b
? drm_helper_encoder_in_use+0xc2/0xe1
udl_crtc_dpms+0x45/0x274
__drm_helper_disable_unused_functions+0xed/0x150
drm_crtc_helper_set_config+0x22d/0xfc2
? lock_acquire+0x1e4/0x21a
? modeset_lock+0x165/0x20e
? __mutex_trylock+0x9/0x11
? debug_lockdep_rcu_enabled+0x2a/0x59
__drm_mode_set_config_internal+0xf3/0x240
drm_crtc_force_disable+0x68/0x83
drm_framebuffer_remove+0x10b/0x1af
drm_mode_rmfb_work_fn+0x8d/0x9b
process_one_work+0x42f/0x7a2
worker_thread+0x3a4/0x483
? flush_delayed_work+0x64/0x64
kthread+0x1e7/0x1f7
? __init_completion+0x2c/0x2c
ret_from_fork+0x3a/0x50
Allocated by task 1959:
save_stack+0x46/0xce
kasan_kmalloc+0x99/0xa8
kmem_cache_alloc_trace+0x10d/0x133
udl_driver_load+0x59/0x7fe
drm_dev_register+0x16b/0x2fd
udl_usb_probe+0x4f/0xa6
usb_probe_interface+0x26a/0x31d
driver_probe_device+0x1d5/0x411
bus_for_each_drv+0xbe/0xe5
__device_attach+0xdd/0x15b
bus_probe_device+0x5a/0x10b
device_add+0x468/0x7fb
usb_set_configuration+0x978/0x9e5
generic_probe+0x45/0x77
driver_probe_device+0x1d5/0x411
bus_for_each_drv+0xbe/0xe5
__device_attach+0xdd/0x15b
bus_probe_device+0x5a/0x10b
device_add+0x468/0x7fb
usb_new_device+0x51d/0x6a1
hub_event+0xee4/0x1639
process_one_work+0x42f/0x7a2
worker_thread+0x31c/0x483
kthread+0x1e7/0x1f7
ret_from_fork+0x3a/0x50
Freed by task 1959:
save_stack+0x46/0xce
kasan_slab_free+0x8a/0xac
slab_free_hook+0x52/0x5c
kfree+0x1a5/0x228
drm_dev_unregister+0xa6/0x16c
drm_dev_unplug+0x12/0x5b
usb_unbind_interface+0xc8/0x2c1
device_release_driver_internal+0x1e4/0x302
bus_remove_device+0x1b9/0x1e4
device_del+0x275/0x42d
usb_disable_device+0x112/0x2cb
usb_disconnect+0xef/0x28e
usb_disconnect+0xe0/0x28e
hub_event+0x7cc/0x1639
process_one_work+0x42f/0x7a2
worker_thread+0x31c/0x483
kthread+0x1e7/0x1f7
ret_from_fork+0x3a/0x50
The buggy address belongs to the object at ffff888385e32588
which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 116 bytes inside of
2048-byte region [ffff888385e32588, ffff888385e32d88)
The buggy address belongs to the page:
page:ffffea000e178c00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x8000000000008100(slab|head)
raw: 8000000000008100 0000000000000000 0000000000000000 00000001000d000d
raw: ffffea000ee71e20 ffffea000ee6d620 ffff88842d00d0c0 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888385e32480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888385e32500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888385e32580: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888385e32600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888385e32680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This happens 100% of the time and is resolved by the following patch
upstream:
commit 6ecac85eadb9 ("drm/udl: move to embedding drm device inside udl device.")
This patch is the second in this series, and requires the first patch as
a dependency. This series apples cleanly to v4.14.133.
Dave Airlie (2):
drm/udl: introduce a macro to convert dev to udl.
drm/udl: move to embedding drm device inside udl device.
drivers/gpu/drm/udl/udl_drv.c | 56 +++++++++++++++++++++++++++-------
drivers/gpu/drm/udl/udl_drv.h | 9 +++---
drivers/gpu/drm/udl/udl_fb.c | 12 ++++----
drivers/gpu/drm/udl/udl_main.c | 35 ++++++---------------
4 files changed, 65 insertions(+), 47 deletions(-)
--
2.22.0.510.g264f2c817a-goog
The patch titled
Subject: include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
has been removed from the -mm tree. Its filename was
bug-fix-cut-here-for-warn_on-for-__warn_taint-architectures.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Drew Davenport <ddavenport(a)chromium.org>
Subject: include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
For architectures using __WARN_TAINT, the WARN_ON macro did not print out
the "cut here" string. The other WARN_XXX macros would print "cut here"
inside __warn_printk, which is not called for WARN_ON since it doesn't
have a message to print.
Link: http://lkml.kernel.org/r/20190624154831.163888-1-ddavenport@chromium.org
Fixes: a7bed27af194 ("bug: fix "cut here" location for __WARN_TAINT architectures")
Signed-off-by: Drew Davenport <ddavenport(a)chromium.org>
Acked-by: Kees Cook <keescook(a)chromium.org>
Tested-by: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/asm-generic/bug.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/include/asm-generic/bug.h~bug-fix-cut-here-for-warn_on-for-__warn_taint-architectures
+++ a/include/asm-generic/bug.h
@@ -104,8 +104,10 @@ extern void warn_slowpath_null(const cha
warn_slowpath_fmt_taint(__FILE__, __LINE__, taint, arg)
#else
extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
-#define __WARN() __WARN_TAINT(TAINT_WARN)
-#define __WARN_printf(arg...) do { __warn_printk(arg); __WARN(); } while (0)
+#define __WARN() do { \
+ printk(KERN_WARNING CUT_HERE); __WARN_TAINT(TAINT_WARN); \
+} while (0)
+#define __WARN_printf(arg...) __WARN_printf_taint(TAINT_WARN, arg)
#define __WARN_printf_taint(taint, arg...) \
do { __warn_printk(arg); __WARN_TAINT(taint); } while (0)
#endif
_
Patches currently in -mm which might be from ddavenport(a)chromium.org are
The patch titled
Subject: coda: pass the host file in vma->vm_file on mmap
has been removed from the -mm tree. Its filename was
coda-pass-the-host-file-in-vma-vm_file-on-mmap.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Subject: coda: pass the host file in vma->vm_file on mmap
Patch series "Coda updates".
The following patch series is a collection of various fixes for Coda, most
of which were collected from linux-fsdevel or linux-kernel but which have
as yet not found their way upstream.
This patch (of 22):
Various file systems expect that vma->vm_file points at their own file
handle, several use file_inode(vma->vm_file) to get at their inode or use
vma->vm_file->private_data. However the way Coda wrapped mmap on a host
file broke this assumption, vm_file was still pointing at the Coda file
and the host file systems would scribble over Coda's inode and private
file data.
This patch fixes the incorrect expectation and wraps vm_ops->open and
vm_ops->close to allow Coda to track when the vm_area_struct is destroyed
so we still release the reference on the Coda file handle at the right
time.
Link: http://lkml.kernel.org/r/0e850c6e59c0b147dc2dcd51a3af004c948c3697.155811738…
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Colin Ian King <colin.king(a)canonical.com>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Fabian Frederick <fabf(a)skynet.be>
Cc: Mikko Rapeli <mikko.rapeli(a)iki.fi>
Cc: Sam Protsenko <semen.protsenko(a)linaro.org>
Cc: Yann Droneaud <ydroneaud(a)opteya.com>
Cc: Zhouyang Jia <jiazhouyang09(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/coda/file.c | 70 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 68 insertions(+), 2 deletions(-)
--- a/fs/coda/file.c~coda-pass-the-host-file-in-vma-vm_file-on-mmap
+++ a/fs/coda/file.c
@@ -27,6 +27,13 @@
#include "coda_linux.h"
#include "coda_int.h"
+struct coda_vm_ops {
+ atomic_t refcnt;
+ struct file *coda_file;
+ const struct vm_operations_struct *host_vm_ops;
+ struct vm_operations_struct vm_ops;
+};
+
static ssize_t
coda_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
@@ -61,6 +68,34 @@ coda_file_write_iter(struct kiocb *iocb,
return ret;
}
+static void
+coda_vm_open(struct vm_area_struct *vma)
+{
+ struct coda_vm_ops *cvm_ops =
+ container_of(vma->vm_ops, struct coda_vm_ops, vm_ops);
+
+ atomic_inc(&cvm_ops->refcnt);
+
+ if (cvm_ops->host_vm_ops && cvm_ops->host_vm_ops->open)
+ cvm_ops->host_vm_ops->open(vma);
+}
+
+static void
+coda_vm_close(struct vm_area_struct *vma)
+{
+ struct coda_vm_ops *cvm_ops =
+ container_of(vma->vm_ops, struct coda_vm_ops, vm_ops);
+
+ if (cvm_ops->host_vm_ops && cvm_ops->host_vm_ops->close)
+ cvm_ops->host_vm_ops->close(vma);
+
+ if (atomic_dec_and_test(&cvm_ops->refcnt)) {
+ vma->vm_ops = cvm_ops->host_vm_ops;
+ fput(cvm_ops->coda_file);
+ kfree(cvm_ops);
+ }
+}
+
static int
coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
{
@@ -68,6 +103,8 @@ coda_file_mmap(struct file *coda_file, s
struct coda_inode_info *cii;
struct file *host_file;
struct inode *coda_inode, *host_inode;
+ struct coda_vm_ops *cvm_ops;
+ int ret;
cfi = CODA_FTOC(coda_file);
BUG_ON(!cfi || cfi->cfi_magic != CODA_MAGIC);
@@ -76,6 +113,13 @@ coda_file_mmap(struct file *coda_file, s
if (!host_file->f_op->mmap)
return -ENODEV;
+ if (WARN_ON(coda_file != vma->vm_file))
+ return -EIO;
+
+ cvm_ops = kmalloc(sizeof(struct coda_vm_ops), GFP_KERNEL);
+ if (!cvm_ops)
+ return -ENOMEM;
+
coda_inode = file_inode(coda_file);
host_inode = file_inode(host_file);
@@ -89,6 +133,7 @@ coda_file_mmap(struct file *coda_file, s
* the container file on us! */
else if (coda_inode->i_mapping != host_inode->i_mapping) {
spin_unlock(&cii->c_lock);
+ kfree(cvm_ops);
return -EBUSY;
}
@@ -97,7 +142,29 @@ coda_file_mmap(struct file *coda_file, s
cfi->cfi_mapcount++;
spin_unlock(&cii->c_lock);
- return call_mmap(host_file, vma);
+ vma->vm_file = get_file(host_file);
+ ret = call_mmap(vma->vm_file, vma);
+
+ if (ret) {
+ /* if call_mmap fails, our caller will put coda_file so we
+ * should drop the reference to the host_file that we got.
+ */
+ fput(host_file);
+ kfree(cvm_ops);
+ } else {
+ /* here we add redirects for the open/close vm_operations */
+ cvm_ops->host_vm_ops = vma->vm_ops;
+ if (vma->vm_ops)
+ cvm_ops->vm_ops = *vma->vm_ops;
+
+ cvm_ops->vm_ops.open = coda_vm_open;
+ cvm_ops->vm_ops.close = coda_vm_close;
+ cvm_ops->coda_file = coda_file;
+ atomic_set(&cvm_ops->refcnt, 1);
+
+ vma->vm_ops = &cvm_ops->vm_ops;
+ }
+ return ret;
}
int coda_open(struct inode *coda_inode, struct file *coda_file)
@@ -207,4 +274,3 @@ const struct file_operations coda_file_o
.fsync = coda_fsync,
.splice_read = generic_file_splice_read,
};
-
_
Patches currently in -mm which might be from jaharkes(a)cs.cmu.edu are
The patch titled
Subject: fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
has been removed from the -mm tree. Its filename was
fs-fix-the-default-values-of-i_uid-i_gid-on-proc-sys-inodes.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Radoslaw Burny <rburny(a)google.com>
Subject: fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc. Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more sense
to use these values in u_[ug]id of proc inodes. In other words: although
uid/gid in the inode is not read during test_perm, the inode logically
belongs to the root of the namespace. I have confirmed this with Eric
Biederman at LPC and in this thread:
https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com
Consequences
============
Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference. However, it
causes an issue in a setup where:
* a namespace container is created without root user in container -
hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID
* container creator tries to configure it by writing /proc/sys files,
e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit
Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.
Using a container with no root mapping is apparently rare, but we do use
this configuration at Google. Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.
History
=======
The invalid uids/gids in inodes first appeared due to 81754357770e (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues. The inability
to write to these "invalid" inodes was only caused by a later commit
0bd23d09b874 (vfs: Don't modify inodes with a uid or gid unknown to the
vfs).
Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0. The
overflow uid indicates that the uid in the inode is not correct and thus
it is not possible to open the file for writing.
Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
Fixes: 0bd23d09b874 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <rburny(a)google.com>
Acked-by: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: Seth Forshee <seth.forshee(a)canonical.com>
Cc: John Sperbeck <jsperbeck(a)google.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/proc_sysctl.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/fs/proc/proc_sysctl.c~fs-fix-the-default-values-of-i_uid-i_gid-on-proc-sys-inodes
+++ a/fs/proc/proc_sysctl.c
@@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode
if (root->set_ownership)
root->set_ownership(head, table, &inode->i_uid, &inode->i_gid);
+ else {
+ inode->i_uid = GLOBAL_ROOT_UID;
+ inode->i_gid = GLOBAL_ROOT_GID;
+ }
return inode;
}
_
Patches currently in -mm which might be from rburny(a)google.com are
Hello Christian,
> Hi,
>
> I assume you use the 1000 MHz firmware. This does also not work on my Rev 7
> board. But I'm pretty sure this is not a problem of the patches, because if
> I take a newer kernel (4.19.20/27) without the patches it also does not
> work. A kernel 4.19.17 does work for me. My opinion on that is that this is
> another problem which does just occure now because now the cpu frequency
> scaling is working with the right frequencies.
I am not sure which firmware i am running, i did all my tests on 5.0.0 and
changing between governors worked fine without the patches
Regards
/Ilias
>
> Ilias Apalodimas <ilias.apalodimas(a)linaro.org> schrieb am Do., 14. März
> 2019, 13:15:
>
> > Hi Gregory,
> > > The clock parenting was not setup properly when DVFS was enabled. It was
> > > expected that the same clock source was used with and without DVFS which
> > > was not the case.
> > >
> > > This patch fixes this issue, allowing to make the cpufreq support work
> > > when the CPU clocks source are not the default ones.
> > >
> > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > > Cc: <stable(a)vger.kernel.org>
> > > Reported-by: Christian Neubert <christian.neubert.86(a)gmail.com>
> > > Reported-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
> > > Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
> > > ---
> > > drivers/clk/mvebu/armada-37xx-periph.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/clk/mvebu/armada-37xx-periph.c
> > b/drivers/clk/mvebu/armada-37xx-periph.c
> > > index 1f1cff428d78..26ed3c18a239 100644
> > > --- a/drivers/clk/mvebu/armada-37xx-periph.c
> > > +++ b/drivers/clk/mvebu/armada-37xx-periph.c
> > > @@ -671,6 +671,17 @@ static int armada_3700_add_composite_clk(const
> > struct clk_periph_data *data,
> > > map = syscon_regmap_lookup_by_compatible(
> > > "marvell,armada-3700-nb-pm");
> > > pmcpu_clk->nb_pm_base = map;
> > > +
> > > + /*
> > > + * Use the same parent when DVFS is enabled that the
> > > + * default parent received at boot time. When this
> > > + * function is called, DVFS is not enabled yet, so we
> > > + * get the default parent and we can set the parent
> > > + * for DVFS.
> > > + */
> > > + if (clk_pm_cpu_set_parent(muxrate_hw,
> > > +
> > clk_pm_cpu_get_parent(muxrate_hw)))
> > > + dev_warn(dev, "Failed to setup default parent
> > clock for DVFS\n");
> > > }
> > >
> > > *hw = clk_hw_register_composite(dev, data->name,
> > data->parent_names,
> > > --
> > > 2.20.1
> > >
> > Applied this and selected only
> >
> > CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
> > CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> > CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> >
> > After changing the governor from 'powersave' to 'performance' the board
> > completely froze (i even lost access to the serial port)
> >
> > Cheers
> > /Ilias
> >
GPU hang observed during the guest OCL conformance test which is caused
by THP GTT feature used durning the test.
It was observed the same GFN with different size (4K and 2M) requested
from the guest in GVT. So during the guest page dma map stage, it is
required to unmap first with orginal size and then remap again with
requested size.
Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Xiaolin Zhang <xiaolin.zhang(a)intel.com>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index a68addf..4a7cf86 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1911,6 +1911,18 @@ static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
ret = __gvt_cache_add(info->vgpu, gfn, *dma_addr, size);
if (ret)
goto err_unmap;
+ } else if (entry->size != size) {
+ /* the same gfn with different size: unmap and re-map */
+ gvt_dma_unmap_page(vgpu, gfn, entry->dma_addr, entry->size);
+ __gvt_cache_remove_entry(vgpu, entry);
+
+ ret = gvt_dma_map_page(vgpu, gfn, dma_addr, size);
+ if (ret)
+ goto err_unlock;
+
+ ret = __gvt_cache_add(info->vgpu, gfn, *dma_addr, size);
+ if (ret)
+ goto err_unmap;
} else {
kref_get(&entry->ref);
*dma_addr = entry->dma_addr;
--
1.8.3.1
From: Mark Zhang <markz(a)nvidia.com>
commit 7151449fe7fa5962c6153355f9779d6be99e8e97 upstream.
If client have not provided the mask base register then do not
write into the mask register.
Signed-off-by: Laxman Dewangan <ldewangan(a)nvidia.com>
Signed-off-by: Jinyoung Park <jinyoungp(a)nvidia.com>
Signed-off-by: Venkat Reddy Talla <vreddytalla(a)nvidia.com>
Signed-off-by: Mark Zhang <markz(a)nvidia.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
This commit was found in an nVidia product tree based on
v4.19, and looks like definitive stable material to me.
It should go into v4.19 only as far as I can tell.
---
drivers/base/regmap/regmap-irq.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/base/regmap/regmap-irq.c b/drivers/base/regmap/regmap-irq.c
index 429ca8ed7e51..982c7ac311b8 100644
--- a/drivers/base/regmap/regmap-irq.c
+++ b/drivers/base/regmap/regmap-irq.c
@@ -91,6 +91,9 @@ static void regmap_irq_sync_unlock(struct irq_data *data)
* suppress pointless writes.
*/
for (i = 0; i < d->chip->num_regs; i++) {
+ if (!d->chip->mask_base)
+ continue;
+
reg = d->chip->mask_base +
(i * map->reg_stride * d->irq_reg_stride);
if (d->chip->mask_invert) {
@@ -526,6 +529,9 @@ int regmap_add_irq_chip(struct regmap *map, int irq, int irq_flags,
/* Mask all the interrupts by default */
for (i = 0; i < chip->num_regs; i++) {
d->mask_buf[i] = d->mask_buf_def[i];
+ if (!chip->mask_base)
+ continue;
+
reg = chip->mask_base +
(i * map->reg_stride * d->irq_reg_stride);
if (chip->mask_invert)
--
2.20.1