From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
commit 04bef83a3358946bfc98a5ecebd1b0003d83d882 upstream.
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/bridge/br_multicast.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 6a362da211e1..3504c3acd38f 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1791,7 +1791,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_multicast_ipv4_rcv(struct net_bridge *br,
--
2.31.1
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
commit 04bef83a3358946bfc98a5ecebd1b0003d83d882 upstream.
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/bridge/br_multicast.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 5015ece7adf7..9e9a584cf993 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -2998,7 +2998,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
--
2.31.1
Patches below in Linus's tree failed to apply to 5.13-stable tree.
Adjust them so we can apply them to 5.13-stable tree.
original git commit id:
* 3769e4c0af5b82c8ea21d037013cb9564dfaa51f
[PATCH] drm/dp_mst: Avoid to mess up payload table by ports in stale topology
* 35d3e8cb35e75450f87f87e3d314e2d418b6954b
[PATCH] drm/dp_mst: Do not set proposed vcpi directly
* 24ff3dc18b99c4b912ab1746e803ddb3be5ced4c
[PATCH] drm/dp_mst: Add missing drm parameters to recently added call to drm_dbg_kms()
José Roberto de Souza (1):
drm/dp_mst: Add missing drm parameters to recently added call to
drm_dbg_kms()
Wayne Lin (2):
drm/dp_mst: Do not set proposed vcpi directly
drm/dp_mst: Avoid to mess up payload table by ports in stale topology
drivers/gpu/drm/drm_dp_mst_topology.c | 68 +++++++++++++++++----------
1 file changed, 42 insertions(+), 26 deletions(-)
--
2.17.1
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
commit 04bef83a3358946bfc98a5ecebd1b0003d83d882 upstream.
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/bridge/br_multicast.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 226bb05c3b42..e27fe6e6ecd4 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3087,7 +3087,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
--
2.31.1
commit f18139966d072dab8e4398c95ce955a9742e04f7 upstream.
Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError
Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7d…https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen…https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
[pali: Backported to 4.19 version]
---
This patch is backported to 4.19 version. It depends on commit
7fbcb5da811b as presented on Cc: stable line.
---
drivers/pci/controller/pci-aardvark.c | 49 ++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 524e0fb3b062..947f60ba5b75 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -382,7 +382,7 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
udelay(PIO_RETRY_DELAY);
}
- dev_err(dev, "config read/write timed out\n");
+ dev_err(dev, "PIO read/write transfer time out\n");
return -ETIMEDOUT;
}
@@ -395,6 +395,35 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
return true;
}
+static bool advk_pcie_pio_is_running(struct advk_pcie *pcie)
+{
+ struct device *dev = &pcie->pdev->dev;
+
+ /*
+ * Trying to start a new PIO transfer when previous has not completed
+ * cause External Abort on CPU which results in kernel panic:
+ *
+ * SError Interrupt on CPU0, code 0xbf000002 -- SError
+ * Kernel panic - not syncing: Asynchronous SError Interrupt
+ *
+ * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected
+ * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent
+ * concurrent calls at the same time. But because PIO transfer may take
+ * about 1.5s when link is down or card is disconnected, it means that
+ * advk_pcie_wait_pio() does not always have to wait for completion.
+ *
+ * Some versions of ARM Trusted Firmware handles this External Abort at
+ * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit:
+ * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7d…
+ */
+ if (advk_readl(pcie, PIO_START)) {
+ dev_err(dev, "Previous PIO read/write transfer is still running\n");
+ return true;
+ }
+
+ return false;
+}
+
static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
int where, int size, u32 *val)
{
@@ -407,9 +436,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
return PCIBIOS_DEVICE_NOT_FOUND;
}
- /* Start PIO */
- advk_writel(pcie, 0, PIO_START);
- advk_writel(pcie, 1, PIO_ISR);
+ if (advk_pcie_pio_is_running(pcie)) {
+ *val = 0xffffffff;
+ return PCIBIOS_SET_FAILED;
+ }
/* Program the control register */
reg = advk_readl(pcie, PIO_CTRL);
@@ -428,7 +458,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
/* Program the data strobe */
advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
- /* Start the transfer */
+ /* Clear PIO DONE ISR and start the transfer */
+ advk_writel(pcie, 1, PIO_ISR);
advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie);
@@ -462,9 +493,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
if (where % size)
return PCIBIOS_SET_FAILED;
- /* Start PIO */
- advk_writel(pcie, 0, PIO_START);
- advk_writel(pcie, 1, PIO_ISR);
+ if (advk_pcie_pio_is_running(pcie))
+ return PCIBIOS_SET_FAILED;
/* Program the control register */
reg = advk_readl(pcie, PIO_CTRL);
@@ -491,7 +521,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
/* Program the data strobe */
advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
- /* Start the transfer */
+ /* Clear PIO DONE ISR and start the transfer */
+ advk_writel(pcie, 1, PIO_ISR);
advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie);
--
2.20.1
From: Remi Pommarel <repk(a)triplefau.lt>
commit 7fbcb5da811be7d47468417c7795405058abb3da upstream.
advk_pcie_wait_pio() can be called while holding a spinlock (from
pci_bus_read_config_dword()), then depends on jiffies in order to
timeout while polling on PIO state registers. In the case the PIO
transaction failed, the timeout will never happen and will also cause
the cpu to stall.
This decrements a variable and wait instead of using jiffies.
Signed-off-by: Remi Pommarel <repk(a)triplefau.lt>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Andrew Murray <andrew.murray(a)arm.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
---
This is backport to 4.14 kernel. Backport of upstream commit can be done
automatically by git cherry-pick command if merge.renamelimit variable is
set to at least 12711.
---
drivers/pci/host/pci-aardvark.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/host/pci-aardvark.c b/drivers/pci/host/pci-aardvark.c
index c1db09fbbe04..5dbbf3d7de36 100644
--- a/drivers/pci/host/pci-aardvark.c
+++ b/drivers/pci/host/pci-aardvark.c
@@ -185,7 +185,8 @@
(PCIE_CONF_BUS(bus) | PCIE_CONF_DEV(PCI_SLOT(devfn)) | \
PCIE_CONF_FUNC(PCI_FUNC(devfn)) | PCIE_CONF_REG(where))
-#define PIO_TIMEOUT_MS 1
+#define PIO_RETRY_CNT 500
+#define PIO_RETRY_DELAY 2 /* 2 us*/
#define LINK_WAIT_MAX_RETRIES 10
#define LINK_WAIT_USLEEP_MIN 90000
@@ -413,17 +414,16 @@ static void advk_pcie_check_pio_status(struct advk_pcie *pcie)
static int advk_pcie_wait_pio(struct advk_pcie *pcie)
{
struct device *dev = &pcie->pdev->dev;
- unsigned long timeout;
-
- timeout = jiffies + msecs_to_jiffies(PIO_TIMEOUT_MS);
+ int i;
- while (time_before(jiffies, timeout)) {
+ for (i = 0; i < PIO_RETRY_CNT; i++) {
u32 start, isr;
start = advk_readl(pcie, PIO_START);
isr = advk_readl(pcie, PIO_ISR);
if (!start && isr)
return 0;
+ udelay(PIO_RETRY_DELAY);
}
dev_err(dev, "config read/write timed out\n");
--
2.20.1
From: Eric Biggers <ebiggers(a)google.com>
commit 77f30bfcfcf484da7208affd6a9e63406420bf91 upstream.
[Please apply to 4.9-stable.]
When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the
minor_hash to 0 if the (major) hash is 0.
This doesn't make sense because 0 is a valid hash code, so we shouldn't
ignore the filesystem-provided minor_hash in that case. Fix this by
removing the special case for 'hash == 0'.
This is an old bug that appears to have originated when the encryption
code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and
f2fs code passed the hash by pointer instead of by value. So
'if (hash)' actually made sense then, as it was checking whether a
pointer was NULL. But now the hashes are passed by value, and
filesystems just pass 0 for any hashes they don't have. There is no
need to handle this any differently from the hashes actually being 0.
It is difficult to reproduce this bug, as it only made a difference in
the case where a filename's 32-bit major hash happened to be 0.
However, it probably had the largest chance of causing problems on
ubifs, since ubifs uses minor_hash to do lookups of no-key names, in
addition to using it as a readdir cookie. ext4 only uses minor_hash as
a readdir cookie, and f2fs doesn't use minor_hash at all.
Fixes: 0b81d0779072 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto")
Cc: <stable(a)vger.kernel.org> # v4.6+
Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/crypto/fname.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index e14bb7b67e9c..5136ea195934 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -294,12 +294,8 @@ int fscrypt_fname_disk_to_usr(struct inode *inode,
oname->name);
return 0;
}
- if (hash) {
- memcpy(buf, &hash, 4);
- memcpy(buf + 4, &minor_hash, 4);
- } else {
- memset(buf, 0, 8);
- }
+ memcpy(buf, &hash, 4);
+ memcpy(buf + 4, &minor_hash, 4);
memcpy(buf + 8, iname->name + ((iname->len - 17) & ~15), 16);
oname->name[0] = '_';
oname->len = 1 + digest_encode(buf, 24, oname->name + 1);
--
2.32.0
Commit e3d4030498c3 ("mac80211: do not accept/forward invalid EAPOL
frames") uses skb_mac_header() before eth_type_trans() is called
leading to incorrect pointer, the pointer gets written to. This issue
has appeared during backporting to 4.4, 4.9 and 4.14.
Fixes: e3d4030498c3 ("mac80211: do not accept/forward invalid EAPOL frames")
Link: https://lore.kernel.org/r/CAHQn7pKcyC_jYmGyTcPCdk9xxATwW5QPNph=bsZV8d-HPwNs…
Cc: <stable(a)vger.kernel.org> # 4.4.x
Signed-off-by: Davis Mosenkovs <davis(a)mosenkovs.lv>
---
net/mac80211/rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index bde924968cd2..b5848bcc09eb 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2234,7 +2234,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
#endif
if (skb) {
- struct ethhdr *ehdr = (void *)skb_mac_header(skb);
+ struct ethhdr *ehdr = (struct ethhdr *)skb->data;
/* deliver to local stack */
skb->protocol = eth_type_trans(skb, dev);
--
2.25.1
[ Upstream commit 35cba15a504bf4f585bb9d78f47b22b28a1a06b2 ]
Use devm_platform_get_and_ioremap_resource() to simplify
code and avoid a null-ptr-deref by checking 'res' in it.
[yyl: since devm_platform_get_and_ioremap_resource() is introduced
in linux-5.7, so just check the return value after calling
platform_get_resource()]
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/ethernet/moxa/moxart_ether.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
index f70bb81e1ed65..caf7051302725 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.c
+++ b/drivers/net/ethernet/moxa/moxart_ether.c
@@ -481,6 +481,10 @@ static int moxart_mac_probe(struct platform_device *pdev)
priv->pdev = pdev;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ ret = -EINVAL;
+ goto init_fail;
+ }
ndev->base_addr = res->start;
priv->base = devm_ioremap_resource(p_dev, res);
if (IS_ERR(priv->base)) {
--
2.25.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 474dd1c6506411752a9b2f2233eec11f1733a099 Mon Sep 17 00:00:00 2001
From: Lu Baolu <baolu.lu(a)linux.intel.com>
Date: Mon, 12 Jul 2021 15:17:12 +0800
Subject: [PATCH] iommu/vt-d: Fix clearing real DMA device's scalable-mode
context entries
The commit 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
fixes an issue of "sub-device is removed where the context entry is cleared
for all aliases". But this commit didn't consider the PASID entry and PASID
table in VT-d scalable mode. This fix increases the coverage of scalable
mode.
Suggested-by: Sanjay Kumar <sanjay.k.kumar(a)intel.com>
Fixes: 8038bdb855331 ("iommu/vt-d: Only clear real DMA device's context entries")
Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Cc: stable(a)vger.kernel.org # v5.6+
Cc: Jon Derrick <jonathan.derrick(a)intel.com>
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 57270290d62b..dd22fc7d5176 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4472,14 +4472,13 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info)
iommu = info->iommu;
domain = info->domain;
- if (info->dev) {
+ if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
if (dev_is_pci(info->dev) && sm_supported(iommu))
intel_pasid_tear_down_entry(iommu, info->dev,
PASID_RID2PASID, false);
iommu_disable_dev_iotlb(info);
- if (!dev_is_real_dma_subdevice(info->dev))
- domain_context_clear(info);
+ domain_context_clear(info);
intel_pasid_free_table(info->dev);
}
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5f93e776c6734cea989aeb4f2d6c97e521baa683 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Date: Tue, 29 Jun 2021 03:16:46 +0900
Subject: [PATCH] btrfs: zoned: print unusable percentage when reclaiming block
groups
When we're automatically reclaiming a zone, because its zone_unusable
value is above the reclaim threshold, we're only logging how much
percent of the zone's capacity are used, but not how much of the
capacity is unusable.
Also print the percentage of the unusable space in the block group
before we're reclaiming it.
Example:
BTRFS info (device sdg): reclaiming chunk 230686720 with 13% used 86% unusable
CC: stable(a)vger.kernel.org # 5.13
Signed-off-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index d4c8dc550889..fec7a34b27f3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1501,6 +1501,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
mutex_lock(&fs_info->reclaim_bgs_lock);
spin_lock(&fs_info->unused_bgs_lock);
while (!list_empty(&fs_info->reclaim_bgs)) {
+ u64 zone_unusable;
int ret = 0;
bg = list_first_entry(&fs_info->reclaim_bgs,
@@ -1534,13 +1535,22 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
goto next;
}
+ /*
+ * Cache the zone_unusable value before turning the block group
+ * to read only. As soon as the blog group is read only it's
+ * zone_unusable value gets moved to the block group's read-only
+ * bytes and isn't available for calculations anymore.
+ */
+ zone_unusable = bg->zone_unusable;
ret = inc_block_group_ro(bg, 0);
up_write(&space_info->groups_sem);
if (ret < 0)
goto next;
- btrfs_info(fs_info, "reclaiming chunk %llu with %llu%% used",
- bg->start, div64_u64(bg->used * 100, bg->length));
+ btrfs_info(fs_info,
+ "reclaiming chunk %llu with %llu%% used %llu%% unusable",
+ bg->start, div_u64(bg->used * 100, bg->length),
+ div64_u64(zone_unusable * 100, bg->length));
trace_btrfs_reclaim_block_group(bg);
ret = btrfs_relocate_chunk(fs_info, bg->start);
if (ret)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04bef83a3358946bfc98a5ecebd1b0003d83d882 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:28 +0300
Subject: [PATCH] net: bridge: multicast: fix PIM hello router port marking
race
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53c3a9d80d9c..3bbbc6d7b7c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3264,7 +3264,9 @@ static void br_multicast_pim(struct net_bridge *br,
pim_hdr_type(pimhdr) != PIM_TYPE_HELLO)
return;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 000b7287b67555fee39d39fff75229dedde0dcbf Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:29 +0300
Subject: [PATCH] net: bridge: multicast: fix MRD advertisement router port
marking race
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Cc: linus.luessing(a)c0d3.blue
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3bbbc6d7b7c3..d0434dc8c03b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3277,7 +3277,9 @@ static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
igmp_hdr(skb)->type != IGMP_MRDISC_ADV)
return -ENOMSG;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
return 0;
}
@@ -3345,7 +3347,9 @@ static void br_ip6_multicast_mrd_rcv(struct net_bridge *br,
if (icmp6_hdr(skb)->icmp6_type != ICMPV6_MRDISC_ADV)
return;
+ spin_lock(&br->multicast_lock);
br_ip6_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_multicast_ipv6_rcv(struct net_bridge *br,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 000b7287b67555fee39d39fff75229dedde0dcbf Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:29 +0300
Subject: [PATCH] net: bridge: multicast: fix MRD advertisement router port
marking race
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Cc: linus.luessing(a)c0d3.blue
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3bbbc6d7b7c3..d0434dc8c03b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3277,7 +3277,9 @@ static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
igmp_hdr(skb)->type != IGMP_MRDISC_ADV)
return -ENOMSG;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
return 0;
}
@@ -3345,7 +3347,9 @@ static void br_ip6_multicast_mrd_rcv(struct net_bridge *br,
if (icmp6_hdr(skb)->icmp6_type != ICMPV6_MRDISC_ADV)
return;
+ spin_lock(&br->multicast_lock);
br_ip6_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_multicast_ipv6_rcv(struct net_bridge *br,
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 000b7287b67555fee39d39fff75229dedde0dcbf Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:29 +0300
Subject: [PATCH] net: bridge: multicast: fix MRD advertisement router port
marking race
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Cc: linus.luessing(a)c0d3.blue
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3bbbc6d7b7c3..d0434dc8c03b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3277,7 +3277,9 @@ static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
igmp_hdr(skb)->type != IGMP_MRDISC_ADV)
return -ENOMSG;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
return 0;
}
@@ -3345,7 +3347,9 @@ static void br_ip6_multicast_mrd_rcv(struct net_bridge *br,
if (icmp6_hdr(skb)->icmp6_type != ICMPV6_MRDISC_ADV)
return;
+ spin_lock(&br->multicast_lock);
br_ip6_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_multicast_ipv6_rcv(struct net_bridge *br,
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 000b7287b67555fee39d39fff75229dedde0dcbf Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Date: Sun, 11 Jul 2021 12:56:29 +0300
Subject: [PATCH] net: bridge: multicast: fix MRD advertisement router port
marking race
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable(a)vger.kernel.org
Cc: linus.luessing(a)c0d3.blue
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 3bbbc6d7b7c3..d0434dc8c03b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3277,7 +3277,9 @@ static int br_ip4_multicast_mrd_rcv(struct net_bridge *br,
igmp_hdr(skb)->type != IGMP_MRDISC_ADV)
return -ENOMSG;
+ spin_lock(&br->multicast_lock);
br_ip4_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
return 0;
}
@@ -3345,7 +3347,9 @@ static void br_ip6_multicast_mrd_rcv(struct net_bridge *br,
if (icmp6_hdr(skb)->icmp6_type != ICMPV6_MRDISC_ADV)
return;
+ spin_lock(&br->multicast_lock);
br_ip6_multicast_mark_router(br, port);
+ spin_unlock(&br->multicast_lock);
}
static int br_multicast_ipv6_rcv(struct net_bridge *br,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0abb33bfca0fb74df76aac03e90ce685016ef7be Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Tue, 13 Jul 2021 14:04:31 +0100
Subject: [PATCH] drm/i915/gtt: drop the page table optimisation
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Jon Bloomfield <jon.bloomfield(a)intel.com>
Cc: Chris Wilson <chris.p.wilson(a)intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: <stable(a)vger.kernel.org> # v4.15+
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matt…
(cherry picked from commit 8f88ca76b3942d82e2c1cea8735ec368d89ecc15)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 21c8b7350b7a..da4f5eb43ac2 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
__i915_gem_object_pin_pages(pt->base);
i915_gem_object_make_unshrinkable(pt->base);
- if (lvl ||
- gen8_pt_count(*start, end) < I915_PDES ||
- intel_vgpu_active(vm->i915))
- fill_px(pt, vm->scratch[lvl]->encode);
+ fill_px(pt, vm->scratch[lvl]->encode);
spin_lock(&pd->lock);
if (likely(!pd->entry[idx])) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0abb33bfca0fb74df76aac03e90ce685016ef7be Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Tue, 13 Jul 2021 14:04:31 +0100
Subject: [PATCH] drm/i915/gtt: drop the page table optimisation
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Jon Bloomfield <jon.bloomfield(a)intel.com>
Cc: Chris Wilson <chris.p.wilson(a)intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: <stable(a)vger.kernel.org> # v4.15+
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matt…
(cherry picked from commit 8f88ca76b3942d82e2c1cea8735ec368d89ecc15)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 21c8b7350b7a..da4f5eb43ac2 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
__i915_gem_object_pin_pages(pt->base);
i915_gem_object_make_unshrinkable(pt->base);
- if (lvl ||
- gen8_pt_count(*start, end) < I915_PDES ||
- intel_vgpu_active(vm->i915))
- fill_px(pt, vm->scratch[lvl]->encode);
+ fill_px(pt, vm->scratch[lvl]->encode);
spin_lock(&pd->lock);
if (likely(!pd->entry[idx])) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 76ff371b67cb12fb635396234468abcf6a466f16 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 24 Jun 2021 19:03:54 -0700
Subject: [PATCH] KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.
Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.
Note that because guest physical addresses are always translated
through the nested page tables, the size of the guest physical address
space is not impacted by any physical address space reduction indicated
in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
the guest physical address space is effectively reduced by 1 bit.
And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.
Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda(a)google.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210625020354.431829-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8834822c00cd..ca5614a48b21 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1923,7 +1923,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(fault_address, error_code);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 76ff371b67cb12fb635396234468abcf6a466f16 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 24 Jun 2021 19:03:54 -0700
Subject: [PATCH] KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.
Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.
Note that because guest physical addresses are always translated
through the nested page tables, the size of the guest physical address
space is not impacted by any physical address space reduction indicated
in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
the guest physical address space is effectively reduced by 1 bit.
And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.
Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda(a)google.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210625020354.431829-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8834822c00cd..ca5614a48b21 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1923,7 +1923,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(fault_address, error_code);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 76ff371b67cb12fb635396234468abcf6a466f16 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 24 Jun 2021 19:03:54 -0700
Subject: [PATCH] KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.
Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.
Note that because guest physical addresses are always translated
through the nested page tables, the size of the guest physical address
space is not impacted by any physical address space reduction indicated
in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
the guest physical address space is effectively reduced by 1 bit.
And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.
Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda(a)google.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210625020354.431829-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8834822c00cd..ca5614a48b21 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1923,7 +1923,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(fault_address, error_code);
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 76ff371b67cb12fb635396234468abcf6a466f16 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 24 Jun 2021 19:03:54 -0700
Subject: [PATCH] KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.
Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.
Note that because guest physical addresses are always translated
through the nested page tables, the size of the guest physical address
space is not impacted by any physical address space reduction indicated
in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
the guest physical address space is effectively reduced by 1 bit.
And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.
Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda(a)google.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210625020354.431829-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8834822c00cd..ca5614a48b21 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1923,7 +1923,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(fault_address, error_code);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fc9bf2e087efcd81bda2e52d09616d2a1bf982a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:49 -0700
Subject: [PATCH] KVM: x86/mmu: Do not apply HPA (memory encryption) mask to
GPAs
Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".
For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.
For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-5-seanjc(a)google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 00732757cc60..b888385d1933 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -53,6 +53,8 @@
#include <asm/kvm_page_track.h>
#include "trace.h"
+#include "paging.h"
+
extern bool itlb_multihit_kvm_mitigation;
int __read_mostly nx_huge_pages = -1;
diff --git a/arch/x86/kvm/mmu/paging.h b/arch/x86/kvm/mmu/paging.h
new file mode 100644
index 000000000000..de8ab323bb70
--- /dev/null
+++ b/arch/x86/kvm/mmu/paging.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Shadow paging constants/helpers that don't need to be #undef'd. */
+#ifndef __KVM_X86_PAGING_H
+#define __KVM_X86_PAGING_H
+
+#define GUEST_PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_LVL_ADDR_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#define PT64_LVL_OFFSET_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#endif /* __KVM_X86_PAGING_H */
+
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 490a028ddabe..ee044d357b5f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -24,7 +24,7 @@
#define pt_element_t u64
#define guest_walker guest_walker64
#define FNAME(name) paging##64_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
@@ -57,7 +57,7 @@
#define pt_element_t u64
#define guest_walker guest_walkerEPT
#define FNAME(name) ept_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7a5ce9314107..eb7b227fc6cf 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -38,12 +38,6 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
#else
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
#endif
-#define PT64_LVL_ADDR_MASK(level) \
- (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
-#define PT64_LVL_OFFSET_MASK(level) \
- (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fc9bf2e087efcd81bda2e52d09616d2a1bf982a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:49 -0700
Subject: [PATCH] KVM: x86/mmu: Do not apply HPA (memory encryption) mask to
GPAs
Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".
For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.
For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-5-seanjc(a)google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 00732757cc60..b888385d1933 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -53,6 +53,8 @@
#include <asm/kvm_page_track.h>
#include "trace.h"
+#include "paging.h"
+
extern bool itlb_multihit_kvm_mitigation;
int __read_mostly nx_huge_pages = -1;
diff --git a/arch/x86/kvm/mmu/paging.h b/arch/x86/kvm/mmu/paging.h
new file mode 100644
index 000000000000..de8ab323bb70
--- /dev/null
+++ b/arch/x86/kvm/mmu/paging.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Shadow paging constants/helpers that don't need to be #undef'd. */
+#ifndef __KVM_X86_PAGING_H
+#define __KVM_X86_PAGING_H
+
+#define GUEST_PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_LVL_ADDR_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#define PT64_LVL_OFFSET_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#endif /* __KVM_X86_PAGING_H */
+
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 490a028ddabe..ee044d357b5f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -24,7 +24,7 @@
#define pt_element_t u64
#define guest_walker guest_walker64
#define FNAME(name) paging##64_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
@@ -57,7 +57,7 @@
#define pt_element_t u64
#define guest_walker guest_walkerEPT
#define FNAME(name) ept_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7a5ce9314107..eb7b227fc6cf 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -38,12 +38,6 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
#else
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
#endif
-#define PT64_LVL_ADDR_MASK(level) \
- (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
-#define PT64_LVL_OFFSET_MASK(level) \
- (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fc9bf2e087efcd81bda2e52d09616d2a1bf982a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:49 -0700
Subject: [PATCH] KVM: x86/mmu: Do not apply HPA (memory encryption) mask to
GPAs
Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".
For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.
For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-5-seanjc(a)google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 00732757cc60..b888385d1933 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -53,6 +53,8 @@
#include <asm/kvm_page_track.h>
#include "trace.h"
+#include "paging.h"
+
extern bool itlb_multihit_kvm_mitigation;
int __read_mostly nx_huge_pages = -1;
diff --git a/arch/x86/kvm/mmu/paging.h b/arch/x86/kvm/mmu/paging.h
new file mode 100644
index 000000000000..de8ab323bb70
--- /dev/null
+++ b/arch/x86/kvm/mmu/paging.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Shadow paging constants/helpers that don't need to be #undef'd. */
+#ifndef __KVM_X86_PAGING_H
+#define __KVM_X86_PAGING_H
+
+#define GUEST_PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_LVL_ADDR_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#define PT64_LVL_OFFSET_MASK(level) \
+ (GUEST_PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
+ * PT64_LEVEL_BITS))) - 1))
+#endif /* __KVM_X86_PAGING_H */
+
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 490a028ddabe..ee044d357b5f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -24,7 +24,7 @@
#define pt_element_t u64
#define guest_walker guest_walker64
#define FNAME(name) paging##64_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
@@ -57,7 +57,7 @@
#define pt_element_t u64
#define guest_walker guest_walkerEPT
#define FNAME(name) ept_##name
- #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+ #define PT_BASE_ADDR_MASK GUEST_PT64_BASE_ADDR_MASK
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7a5ce9314107..eb7b227fc6cf 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -38,12 +38,6 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
#else
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
#endif
-#define PT64_LVL_ADDR_MASK(level) \
- (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
-#define PT64_LVL_OFFSET_MASK(level) \
- (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
- * PT64_LEVEL_BITS))) - 1))
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e39f00f60ebd2e7b295c37a05e6349df656d3eb8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:47 -0700
Subject: [PATCH] KVM: x86: Use kernel's x86_phys_bits to handle reduced
MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a4217b3e185..ca7866d63e98 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -941,11 +941,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
unsigned phys_as = entry->eax & 0xff;
/*
- * Use bare metal's MAXPHADDR if the CPU doesn't report guest
- * MAXPHYADDR separately, or if TDP (NPT) is disabled, as the
- * guest version "applies only to guests using nested paging".
+ * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
+ * the guest operates in the same PA space as the host, i.e.
+ * reductions in MAXPHYADDR for memory encryption affect shadow
+ * paging, too.
+ *
+ * If TDP is enabled but an explicit guest MAXPHYADDR is not
+ * provided, use the raw bare metal MAXPHYADDR as reductions to
+ * the HPAs do not affect GPAs.
*/
- if (!g_phys_as || !tdp_enabled)
+ if (!tdp_enabled)
+ g_phys_as = boot_cpu_data.x86_phys_bits;
+ else if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
@@ -970,12 +977,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
case 0x8000001a:
case 0x8000001e:
break;
- /* Support memory encryption cpuid if host supports it */
case 0x8000001F:
- if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
+ if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
- else
+ } else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+
+ /*
+ * Enumerate '0' for "PA bits reduction", the adjusted
+ * MAXPHYADDR is enumerated directly (see 0x80000008).
+ */
+ entry->ebx &= ~GENMASK(11, 6);
+ }
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e39f00f60ebd2e7b295c37a05e6349df656d3eb8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:47 -0700
Subject: [PATCH] KVM: x86: Use kernel's x86_phys_bits to handle reduced
MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a4217b3e185..ca7866d63e98 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -941,11 +941,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
unsigned phys_as = entry->eax & 0xff;
/*
- * Use bare metal's MAXPHADDR if the CPU doesn't report guest
- * MAXPHYADDR separately, or if TDP (NPT) is disabled, as the
- * guest version "applies only to guests using nested paging".
+ * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
+ * the guest operates in the same PA space as the host, i.e.
+ * reductions in MAXPHYADDR for memory encryption affect shadow
+ * paging, too.
+ *
+ * If TDP is enabled but an explicit guest MAXPHYADDR is not
+ * provided, use the raw bare metal MAXPHYADDR as reductions to
+ * the HPAs do not affect GPAs.
*/
- if (!g_phys_as || !tdp_enabled)
+ if (!tdp_enabled)
+ g_phys_as = boot_cpu_data.x86_phys_bits;
+ else if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
@@ -970,12 +977,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
case 0x8000001a:
case 0x8000001e:
break;
- /* Support memory encryption cpuid if host supports it */
case 0x8000001F:
- if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
+ if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
- else
+ } else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+
+ /*
+ * Enumerate '0' for "PA bits reduction", the adjusted
+ * MAXPHYADDR is enumerated directly (see 0x80000008).
+ */
+ entry->ebx &= ~GENMASK(11, 6);
+ }
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e39f00f60ebd2e7b295c37a05e6349df656d3eb8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:47 -0700
Subject: [PATCH] KVM: x86: Use kernel's x86_phys_bits to handle reduced
MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a4217b3e185..ca7866d63e98 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -941,11 +941,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
unsigned phys_as = entry->eax & 0xff;
/*
- * Use bare metal's MAXPHADDR if the CPU doesn't report guest
- * MAXPHYADDR separately, or if TDP (NPT) is disabled, as the
- * guest version "applies only to guests using nested paging".
+ * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
+ * the guest operates in the same PA space as the host, i.e.
+ * reductions in MAXPHYADDR for memory encryption affect shadow
+ * paging, too.
+ *
+ * If TDP is enabled but an explicit guest MAXPHYADDR is not
+ * provided, use the raw bare metal MAXPHYADDR as reductions to
+ * the HPAs do not affect GPAs.
*/
- if (!g_phys_as || !tdp_enabled)
+ if (!tdp_enabled)
+ g_phys_as = boot_cpu_data.x86_phys_bits;
+ else if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
@@ -970,12 +977,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
case 0x8000001a:
case 0x8000001e:
break;
- /* Support memory encryption cpuid if host supports it */
case 0x8000001F:
- if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
+ if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
- else
+ } else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+
+ /*
+ * Enumerate '0' for "PA bits reduction", the adjusted
+ * MAXPHYADDR is enumerated directly (see 0x80000008).
+ */
+ entry->ebx &= ~GENMASK(11, 6);
+ }
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e39f00f60ebd2e7b295c37a05e6349df656d3eb8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:47 -0700
Subject: [PATCH] KVM: x86: Use kernel's x86_phys_bits to handle reduced
MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a4217b3e185..ca7866d63e98 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -941,11 +941,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
unsigned phys_as = entry->eax & 0xff;
/*
- * Use bare metal's MAXPHADDR if the CPU doesn't report guest
- * MAXPHYADDR separately, or if TDP (NPT) is disabled, as the
- * guest version "applies only to guests using nested paging".
+ * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
+ * the guest operates in the same PA space as the host, i.e.
+ * reductions in MAXPHYADDR for memory encryption affect shadow
+ * paging, too.
+ *
+ * If TDP is enabled but an explicit guest MAXPHYADDR is not
+ * provided, use the raw bare metal MAXPHYADDR as reductions to
+ * the HPAs do not affect GPAs.
*/
- if (!g_phys_as || !tdp_enabled)
+ if (!tdp_enabled)
+ g_phys_as = boot_cpu_data.x86_phys_bits;
+ else if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
@@ -970,12 +977,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
case 0x8000001a:
case 0x8000001e:
break;
- /* Support memory encryption cpuid if host supports it */
case 0x8000001F:
- if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
+ if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
- else
+ } else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+
+ /*
+ * Enumerate '0' for "PA bits reduction", the adjusted
+ * MAXPHYADDR is enumerated directly (see 0x80000008).
+ */
+ entry->ebx &= ~GENMASK(11, 6);
+ }
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e39f00f60ebd2e7b295c37a05e6349df656d3eb8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 23 Jun 2021 16:05:47 -0700
Subject: [PATCH] KVM: x86: Use kernel's x86_phys_bits to handle reduced
MAXPHYADDR
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to
enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest
version is only relevant to NPT/TDP).
When using shadow paging, any reductions to the host's MAXPHYADDR apply
to KVM and its guests as well, i.e. using the raw CPUID info will cause
KVM to misreport the number of PA bits available to the guest.
Unconditionally zero out the "Physical Address bit reduction" entry.
For !TDP, the adjustment is already done, and for TDP enumerating the
host's reduction is wrong as the reduction does not apply to GPAs.
Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210623230552.4027702-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a4217b3e185..ca7866d63e98 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -941,11 +941,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
unsigned phys_as = entry->eax & 0xff;
/*
- * Use bare metal's MAXPHADDR if the CPU doesn't report guest
- * MAXPHYADDR separately, or if TDP (NPT) is disabled, as the
- * guest version "applies only to guests using nested paging".
+ * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
+ * the guest operates in the same PA space as the host, i.e.
+ * reductions in MAXPHYADDR for memory encryption affect shadow
+ * paging, too.
+ *
+ * If TDP is enabled but an explicit guest MAXPHYADDR is not
+ * provided, use the raw bare metal MAXPHYADDR as reductions to
+ * the HPAs do not affect GPAs.
*/
- if (!g_phys_as || !tdp_enabled)
+ if (!tdp_enabled)
+ g_phys_as = boot_cpu_data.x86_phys_bits;
+ else if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
@@ -970,12 +977,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
case 0x8000001a:
case 0x8000001e:
break;
- /* Support memory encryption cpuid if host supports it */
case 0x8000001F:
- if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
+ if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
- else
+ } else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+
+ /*
+ * Enumerate '0' for "PA bits reduction", the adjusted
+ * MAXPHYADDR is enumerated directly (see 0x80000008).
+ */
+ entry->ebx &= ~GENMASK(11, 6);
+ }
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
This is a Lenovo ThinkStation machine which uses the codec alc623.
There are 2 issues on this machine, the 1st one is the pop noise in
the lineout, the 2nd one is there are 2 Front Mics and pulseaudio
can't handle them, After applying the fixup of
ALC623_FIXUP_LENOVO_THINKSTATION_P340 to this machine, the 2 issues
are fixed.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 1389cfd5e0db..caaf0e8aac11 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -8626,6 +8626,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x3151, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x17aa, 0x3176, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x17aa, 0x3178, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC),
+ SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340),
SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940", ALC298_FIXUP_LENOVO_SPK_VOLUME),
SND_PCI_QUIRK(0x17aa, 0x3827, "Ideapad S740", ALC285_FIXUP_IDEAPAD_S740_COEF),
SND_PCI_QUIRK(0x17aa, 0x3843, "Yoga 9i", ALC287_FIXUP_IDEAPAD_BASS_SPK_AMP),
--
2.25.1
We are in custody of an inheritance attached to your surname.
Contact Mr Mark Tucker on mark00tucker(a)naver.com with your full
names for validation. Ts & Cs apply
From: Sherry Sun <sherry.sun(a)nxp.com>
[ Upstream commit fcb10ee27fb91b25b68d7745db9817ecea9f1038 ]
We should be very careful about the register values that will be used
for division or modulo operations, althrough the possibility that the
UARTBAUD register value is zero is very low, but we had better to deal
with the "bad data" of hardware in advance to avoid division or modulo
by zero leading to undefined kernel behavior.
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20210427021226.27468-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/fsl_lpuart.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 4b9f42269477..deb9d4fa9cb0 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -1992,6 +1992,9 @@ lpuart32_console_get_options(struct lpuart_port *sport, int *baud,
bd = lpuart32_read(&sport->port, UARTBAUD);
bd &= UARTBAUD_SBR_MASK;
+ if (!bd)
+ return;
+
sbr = bd;
uartclk = clk_get_rate(sport->clk);
/*
--
2.30.2
In the rtw_pci_init_rx_ring function the "if (len > TRX_BD_IDX_MASK)"
statement guarantees that len is less than or equal to GENMASK(11, 0) or
in other words that len is less than or equal to 4095. However the
rx_ring->buf has a size of RTK_MAX_RX_DESC_NUM (defined as 512). This
way it is possible an out-of-bounds write in the for statement due to
the i variable can exceed the rx_ring->buff size.
However, this overflow never happens due to the rtw_pci_init_rx_ring is
only ever called with a fixed constant of RTK_MAX_RX_DESC_NUM. But it is
better to be defensive in this case and add a new check to avoid
overflows if this function is called in a future with a value greater
than 512.
Cc: stable(a)vger.kernel.org
Addresses-Coverity-ID: 1461515 ("Out-of-bounds write")
Fixes: e3037485c68ec ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Len Baker <len.baker(a)gmx.com>
---
Changelog v1 -> v2
- Remove the macro ARRAY_SIZE from the for loop (Pkshih, Brian Norris).
- Add a new check for the len variable (Pkshih, Brian Norris).
drivers/net/wireless/realtek/rtw88/pci.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index e7d17ab8f113..53dc90276693 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -273,6 +273,11 @@ static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
return -EINVAL;
}
+ if (len > ARRAY_SIZE(rx_ring->buf)) {
+ rtw_err(rtwdev, "len %d exceeds maximum RX ring buffer\n", len);
+ return -EINVAL;
+ }
+
head = dma_alloc_coherent(&pdev->dev, ring_sz, &dma, GFP_KERNEL);
if (!head) {
rtw_err(rtwdev, "failed to allocate rx ring\n");
--
2.25.1
In v4.4, commit e76511a6fbb5 ("mac80211: properly handle A-MSDUs that
start with an RFC 1042 header") looks like an incomplete backport.
There is no functional changes in the commit, since
__ieee80211_data_to_8023() which defined in net/wireless/util.c is
only called by ieee80211_data_to_8023() and parameter 'is_amsdu' is
always input as false.
By comparing with its upstream, I found that following snippet has not
been backported:
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -2682,7 +2682,7 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset)
> if (ieee80211_data_to_8023_exthdr(skb, ðhdr,
> rx->sdata->vif.addr,
> rx->sdata->vif.type,
> - data_offset))
> + data_offset, true))
> return RX_DROP_UNUSABLE;
I think that's where really causing changes and also needs to be
backported, so I try to do it.
Fixes: e76511a6fbb5 ("mac80211: properly handle A-MSDUs that start with an RFC 1042 header")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
---
net/wireless/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 84c0a96b3cb6d..a2b35e6619697 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -664,7 +664,7 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list,
u8 dst[ETH_ALEN], src[ETH_ALEN];
if (has_80211_header) {
- err = ieee80211_data_to_8023(skb, addr, iftype);
+ err = __ieee80211_data_to_8023(skb, addr, iftype, true);
if (err)
goto out;
--
2.31.1
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From df29a7440c4b5c65765c8f60396b3b13063e24e9 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Fri, 25 Jun 2021 15:02:08 +0200
Subject: [PATCH] s390/signal: switch to using vdso for sigreturn and syscall
restart
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:
a) a signal is coming in, and no handler is registered, so the lower
part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
sets PIF_SYSCALL_RESTART.
b) a second signal gets pending while the kernel is still in the exit
loop, and for that one, a handler exists.
c) The first part of arch_do_signal_or_restart() is called. That part
calls handle_signal(), which sets up stack + registers for handling
the signal.
d) __do_syscall() in arch/s390/kernel/syscall.c checks for
PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
it restart's the syscall. However, the registers are already setup
for handling a signal from c). The syscall is now restarted with the
wrong arguments.
Change the code to:
- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
we cannot rewind and go back to userspace on s390 because the system call
number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.
Cc: <stable(a)kernel.org> # v5.12+ d57778feb987: s390/vdso: always enable vdso
Cc: <stable(a)kernel.org> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <stable(a)kernel.org> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <stable(a)kernel.org> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso
Cc: <stable(a)kernel.org> # v5.12+
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com>
diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
index 1d0e17ec93eb..cca142fbb516 100644
--- a/arch/s390/kernel/compat_signal.c
+++ b/arch/s390/kernel/compat_signal.c
@@ -28,6 +28,7 @@
#include <linux/uaccess.h>
#include <asm/lowcore.h>
#include <asm/switch_to.h>
+#include <asm/vdso.h>
#include "compat_linux.h"
#include "compat_ptrace.h"
#include "entry.h"
@@ -118,7 +119,6 @@ static int restore_sigregs32(struct pt_regs *regs,_sigregs32 __user *sregs)
fpregs_load((_s390_fp_regs *) &user_sregs.fpregs, ¤t->thread.fpu);
clear_pt_regs_flag(regs, PIF_SYSCALL); /* No longer in a system call */
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
return 0;
}
@@ -304,11 +304,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
restorer = (unsigned long __force)
ksig->ka.sa.sa_restorer | PSW32_ADDR_AMODE;
} else {
- /* Signal frames without vectors registers are short ! */
- __u16 __user *svc = (void __user *) frame + frame_size - 2;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long __force) svc | PSW32_ADDR_AMODE;
+ restorer = VDSO32_SYMBOL(current, sigreturn);
}
/* Set up registers for signal handler */
@@ -371,10 +367,7 @@ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
restorer = (unsigned long __force)
ksig->ka.sa.sa_restorer | PSW32_ADDR_AMODE;
} else {
- __u16 __user *svc = &frame->svc_insn;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_rt_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long __force) svc | PSW32_ADDR_AMODE;
+ restorer = VDSO32_SYMBOL(current, rt_sigreturn);
}
/* Create siginfo on the signal stack */
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 7ae5dde9c54d..350e94d0cac2 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -166,6 +166,12 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
p->thread.acrs[1] = (unsigned int)tls;
}
}
+ /*
+ * s390 stores the svc return address in arch_data when calling
+ * sigreturn()/restart_syscall() via vdso. 1 means no valid address
+ * stored.
+ */
+ p->restart_block.arch_data = 1;
return 0;
}
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index 080e7aed181f..78ef53b29958 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -32,6 +32,7 @@
#include <linux/uaccess.h>
#include <asm/lowcore.h>
#include <asm/switch_to.h>
+#include <asm/vdso.h>
#include "entry.h"
/*
@@ -171,7 +172,6 @@ static int restore_sigregs(struct pt_regs *regs, _sigregs __user *sregs)
fpregs_load(&user_sregs.fpregs, ¤t->thread.fpu);
clear_pt_regs_flag(regs, PIF_SYSCALL); /* No longer in a system call */
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
return 0;
}
@@ -334,15 +334,10 @@ static int setup_frame(int sig, struct k_sigaction *ka,
/* Set up to return from userspace. If provided, use a stub
already in userspace. */
- if (ka->sa.sa_flags & SA_RESTORER) {
+ if (ka->sa.sa_flags & SA_RESTORER)
restorer = (unsigned long) ka->sa.sa_restorer;
- } else {
- /* Signal frame without vector registers are short ! */
- __u16 __user *svc = (void __user *) frame + frame_size - 2;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long) svc;
- }
+ else
+ restorer = VDSO64_SYMBOL(current, sigreturn);
/* Set up registers for signal handler */
regs->gprs[14] = restorer;
@@ -397,14 +392,10 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
/* Set up to return from userspace. If provided, use a stub
already in userspace. */
- if (ksig->ka.sa.sa_flags & SA_RESTORER) {
+ if (ksig->ka.sa.sa_flags & SA_RESTORER)
restorer = (unsigned long) ksig->ka.sa.sa_restorer;
- } else {
- __u16 __user *svc = &frame->svc_insn;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_rt_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long) svc;
- }
+ else
+ restorer = VDSO64_SYMBOL(current, rt_sigreturn);
/* Create siginfo on the signal stack */
if (copy_siginfo_to_user(&frame->info, &ksig->info))
@@ -501,7 +492,7 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)
}
/* No longer in a system call */
clear_pt_regs_flag(regs, PIF_SYSCALL);
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
+
rseq_signal_deliver(&ksig, regs);
if (is_compat_task())
handle_signal32(&ksig, oldset, regs);
@@ -517,14 +508,20 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)
switch (regs->gprs[2]) {
case -ERESTART_RESTARTBLOCK:
/* Restart with sys_restart_syscall */
- regs->int_code = __NR_restart_syscall;
- fallthrough;
+ regs->gprs[2] = regs->orig_gpr2;
+ current->restart_block.arch_data = regs->psw.addr;
+ if (is_compat_task())
+ regs->psw.addr = VDSO32_SYMBOL(current, restart_syscall);
+ else
+ regs->psw.addr = VDSO64_SYMBOL(current, restart_syscall);
+ if (test_thread_flag(TIF_SINGLE_STEP))
+ clear_thread_flag(TIF_PER_TRAP);
+ break;
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
- /* Restart system call with magic TIF bit. */
regs->gprs[2] = regs->orig_gpr2;
- set_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
+ regs->psw.addr = __rewind_psw(regs->psw, regs->int_code >> 16);
if (test_thread_flag(TIF_SINGLE_STEP))
clear_thread_flag(TIF_PER_TRAP);
break;
diff --git a/arch/s390/kernel/syscall.c b/arch/s390/kernel/syscall.c
index 76f7916cc30f..c6b99da0738b 100644
--- a/arch/s390/kernel/syscall.c
+++ b/arch/s390/kernel/syscall.c
@@ -121,6 +121,10 @@ void do_syscall(struct pt_regs *regs)
regs->gprs[2] = nr;
+ if (nr == __NR_restart_syscall && !(current->restart_block.arch_data & 1)) {
+ regs->psw.addr = current->restart_block.arch_data;
+ current->restart_block.arch_data = 1;
+ }
nr = syscall_enter_from_user_mode_work(regs, nr);
/*
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From df29a7440c4b5c65765c8f60396b3b13063e24e9 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Fri, 25 Jun 2021 15:02:08 +0200
Subject: [PATCH] s390/signal: switch to using vdso for sigreturn and syscall
restart
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:
a) a signal is coming in, and no handler is registered, so the lower
part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
sets PIF_SYSCALL_RESTART.
b) a second signal gets pending while the kernel is still in the exit
loop, and for that one, a handler exists.
c) The first part of arch_do_signal_or_restart() is called. That part
calls handle_signal(), which sets up stack + registers for handling
the signal.
d) __do_syscall() in arch/s390/kernel/syscall.c checks for
PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
it restart's the syscall. However, the registers are already setup
for handling a signal from c). The syscall is now restarted with the
wrong arguments.
Change the code to:
- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
we cannot rewind and go back to userspace on s390 because the system call
number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.
Cc: <stable(a)kernel.org> # v5.12+ d57778feb987: s390/vdso: always enable vdso
Cc: <stable(a)kernel.org> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <stable(a)kernel.org> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <stable(a)kernel.org> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso
Cc: <stable(a)kernel.org> # v5.12+
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com>
diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
index 1d0e17ec93eb..cca142fbb516 100644
--- a/arch/s390/kernel/compat_signal.c
+++ b/arch/s390/kernel/compat_signal.c
@@ -28,6 +28,7 @@
#include <linux/uaccess.h>
#include <asm/lowcore.h>
#include <asm/switch_to.h>
+#include <asm/vdso.h>
#include "compat_linux.h"
#include "compat_ptrace.h"
#include "entry.h"
@@ -118,7 +119,6 @@ static int restore_sigregs32(struct pt_regs *regs,_sigregs32 __user *sregs)
fpregs_load((_s390_fp_regs *) &user_sregs.fpregs, ¤t->thread.fpu);
clear_pt_regs_flag(regs, PIF_SYSCALL); /* No longer in a system call */
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
return 0;
}
@@ -304,11 +304,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
restorer = (unsigned long __force)
ksig->ka.sa.sa_restorer | PSW32_ADDR_AMODE;
} else {
- /* Signal frames without vectors registers are short ! */
- __u16 __user *svc = (void __user *) frame + frame_size - 2;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long __force) svc | PSW32_ADDR_AMODE;
+ restorer = VDSO32_SYMBOL(current, sigreturn);
}
/* Set up registers for signal handler */
@@ -371,10 +367,7 @@ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
restorer = (unsigned long __force)
ksig->ka.sa.sa_restorer | PSW32_ADDR_AMODE;
} else {
- __u16 __user *svc = &frame->svc_insn;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_rt_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long __force) svc | PSW32_ADDR_AMODE;
+ restorer = VDSO32_SYMBOL(current, rt_sigreturn);
}
/* Create siginfo on the signal stack */
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 7ae5dde9c54d..350e94d0cac2 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -166,6 +166,12 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
p->thread.acrs[1] = (unsigned int)tls;
}
}
+ /*
+ * s390 stores the svc return address in arch_data when calling
+ * sigreturn()/restart_syscall() via vdso. 1 means no valid address
+ * stored.
+ */
+ p->restart_block.arch_data = 1;
return 0;
}
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index 080e7aed181f..78ef53b29958 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -32,6 +32,7 @@
#include <linux/uaccess.h>
#include <asm/lowcore.h>
#include <asm/switch_to.h>
+#include <asm/vdso.h>
#include "entry.h"
/*
@@ -171,7 +172,6 @@ static int restore_sigregs(struct pt_regs *regs, _sigregs __user *sregs)
fpregs_load(&user_sregs.fpregs, ¤t->thread.fpu);
clear_pt_regs_flag(regs, PIF_SYSCALL); /* No longer in a system call */
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
return 0;
}
@@ -334,15 +334,10 @@ static int setup_frame(int sig, struct k_sigaction *ka,
/* Set up to return from userspace. If provided, use a stub
already in userspace. */
- if (ka->sa.sa_flags & SA_RESTORER) {
+ if (ka->sa.sa_flags & SA_RESTORER)
restorer = (unsigned long) ka->sa.sa_restorer;
- } else {
- /* Signal frame without vector registers are short ! */
- __u16 __user *svc = (void __user *) frame + frame_size - 2;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long) svc;
- }
+ else
+ restorer = VDSO64_SYMBOL(current, sigreturn);
/* Set up registers for signal handler */
regs->gprs[14] = restorer;
@@ -397,14 +392,10 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
/* Set up to return from userspace. If provided, use a stub
already in userspace. */
- if (ksig->ka.sa.sa_flags & SA_RESTORER) {
+ if (ksig->ka.sa.sa_flags & SA_RESTORER)
restorer = (unsigned long) ksig->ka.sa.sa_restorer;
- } else {
- __u16 __user *svc = &frame->svc_insn;
- if (__put_user(S390_SYSCALL_OPCODE | __NR_rt_sigreturn, svc))
- return -EFAULT;
- restorer = (unsigned long) svc;
- }
+ else
+ restorer = VDSO64_SYMBOL(current, rt_sigreturn);
/* Create siginfo on the signal stack */
if (copy_siginfo_to_user(&frame->info, &ksig->info))
@@ -501,7 +492,7 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)
}
/* No longer in a system call */
clear_pt_regs_flag(regs, PIF_SYSCALL);
- clear_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
+
rseq_signal_deliver(&ksig, regs);
if (is_compat_task())
handle_signal32(&ksig, oldset, regs);
@@ -517,14 +508,20 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)
switch (regs->gprs[2]) {
case -ERESTART_RESTARTBLOCK:
/* Restart with sys_restart_syscall */
- regs->int_code = __NR_restart_syscall;
- fallthrough;
+ regs->gprs[2] = regs->orig_gpr2;
+ current->restart_block.arch_data = regs->psw.addr;
+ if (is_compat_task())
+ regs->psw.addr = VDSO32_SYMBOL(current, restart_syscall);
+ else
+ regs->psw.addr = VDSO64_SYMBOL(current, restart_syscall);
+ if (test_thread_flag(TIF_SINGLE_STEP))
+ clear_thread_flag(TIF_PER_TRAP);
+ break;
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
- /* Restart system call with magic TIF bit. */
regs->gprs[2] = regs->orig_gpr2;
- set_pt_regs_flag(regs, PIF_SYSCALL_RESTART);
+ regs->psw.addr = __rewind_psw(regs->psw, regs->int_code >> 16);
if (test_thread_flag(TIF_SINGLE_STEP))
clear_thread_flag(TIF_PER_TRAP);
break;
diff --git a/arch/s390/kernel/syscall.c b/arch/s390/kernel/syscall.c
index 76f7916cc30f..c6b99da0738b 100644
--- a/arch/s390/kernel/syscall.c
+++ b/arch/s390/kernel/syscall.c
@@ -121,6 +121,10 @@ void do_syscall(struct pt_regs *regs)
regs->gprs[2] = nr;
+ if (nr == __NR_restart_syscall && !(current->restart_block.arch_data & 1)) {
+ regs->psw.addr = current->restart_block.arch_data;
+ current->restart_block.arch_data = 1;
+ }
nr = syscall_enter_from_user_mode_work(regs, nr);
/*
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: ee00910f75ff - powerpc/powernv/vas: Release reference to tgid during window close
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: FAILED
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
We attempted to compile the kernel for multiple architectures, but the compile
failed on one or more architectures:
ppc64le: FAILED (see build-ppc64le.log.xz attachment)
x86_64: FAILED (see build-x86_64.log.xz attachment)
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 78ebca321999699f30ea19029726d1a3908b395f Mon Sep 17 00:00:00 2001
From: Wesley Chalmers <Wesley.Chalmers(a)amd.com>
Date: Thu, 6 May 2021 17:43:42 -0400
Subject: [PATCH] drm/amd/display: Cover edge-case when changing DISPCLK
WDIVIDER
[WHY]
When changing the DISPCLK_WDIVIDER value from 126 to 127, the change in
clock rate is too great for the FIFOs to handle. This can cause visible
corruption during clock change.
HW has handed down this register sequence to fix the issue.
[HOW]
The sequence, from HW:
a. 127 -> 126
Read DIG_FIFO_CAL_AVERAGE_LEVEL
FIFO level N = DIG_FIFO_CAL_AVERAGE_LEVEL / 4
Set DCCG_FIFO_ERRDET_OVR_EN = 1
Write 1 to OTGx_DROP_PIXEL for (N-4) times
Set DCCG_FIFO_ERRDET_OVR_EN = 0
Write DENTIST_DISPCLK_RDIVIDER = 126
Because of frequency stepping, sequence a can be executed to change the
divider from 127 to any other divider value.
b. 126 -> 127
Read DIG_FIFO_CAL_AVERAGE_LEVEL
FIFO level N = DIG_FIFO_CAL_AVERAGE_LEVEL / 4
Set DCCG_FIFO_ERRDET_OVR_EN = 1
Write 1 to OTGx_ADD_PIXEL for (12-N) times
Set DCCG_FIFO_ERRDET_OVR_EN = 0
Write DENTIST_DISPCLK_RDIVIDER = 127
Because of frequency stepping, divider must first be set from any other
divider value to 126 before executing sequence b.
Signed-off-by: Wesley Chalmers <Wesley.Chalmers(a)amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin(a)amd.com>
Acked-by: Anson Jacob <Anson.Jacob(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
index 59d17195bc22..9d1db74de36d 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
@@ -123,7 +123,7 @@ void dcn20_update_clocks_update_dpp_dto(struct clk_mgr_internal *clk_mgr,
}
}
-void dcn20_update_clocks_update_dentist(struct clk_mgr_internal *clk_mgr)
+void dcn20_update_clocks_update_dentist(struct clk_mgr_internal *clk_mgr, struct dc_state *context)
{
int dpp_divider = DENTIST_DIVIDER_RANGE_SCALE_FACTOR
* clk_mgr->base.dentist_vco_freq_khz / clk_mgr->base.clks.dppclk_khz;
@@ -132,6 +132,68 @@ void dcn20_update_clocks_update_dentist(struct clk_mgr_internal *clk_mgr)
uint32_t dppclk_wdivider = dentist_get_did_from_divider(dpp_divider);
uint32_t dispclk_wdivider = dentist_get_did_from_divider(disp_divider);
+ uint32_t current_dispclk_wdivider;
+ uint32_t i;
+
+ REG_GET(DENTIST_DISPCLK_CNTL,
+ DENTIST_DISPCLK_WDIVIDER, ¤t_dispclk_wdivider);
+
+ /* When changing divider to or from 127, some extra programming is required to prevent corruption */
+ if (current_dispclk_wdivider == 127 && dispclk_wdivider != 127) {
+ for (i = 0; i < clk_mgr->base.ctx->dc->res_pool->pipe_count; i++) {
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+ uint32_t fifo_level;
+ struct dccg *dccg = clk_mgr->base.ctx->dc->res_pool->dccg;
+ struct stream_encoder *stream_enc = pipe_ctx->stream_res.stream_enc;
+ int32_t N;
+ int32_t j;
+
+ if (!pipe_ctx->stream)
+ continue;
+ /* Virtual encoders don't have this function */
+ if (!stream_enc->funcs->get_fifo_cal_average_level)
+ continue;
+ fifo_level = stream_enc->funcs->get_fifo_cal_average_level(
+ stream_enc);
+ N = fifo_level / 4;
+ dccg->funcs->set_fifo_errdet_ovr_en(
+ dccg,
+ true);
+ for (j = 0; j < N - 4; j++)
+ dccg->funcs->otg_drop_pixel(
+ dccg,
+ pipe_ctx->stream_res.tg->inst);
+ dccg->funcs->set_fifo_errdet_ovr_en(
+ dccg,
+ false);
+ }
+ } else if (dispclk_wdivider == 127 && current_dispclk_wdivider != 127) {
+ REG_UPDATE(DENTIST_DISPCLK_CNTL,
+ DENTIST_DISPCLK_WDIVIDER, 126);
+ REG_WAIT(DENTIST_DISPCLK_CNTL, DENTIST_DISPCLK_CHG_DONE, 1, 50, 100);
+ for (i = 0; i < clk_mgr->base.ctx->dc->res_pool->pipe_count; i++) {
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+ struct dccg *dccg = clk_mgr->base.ctx->dc->res_pool->dccg;
+ struct stream_encoder *stream_enc = pipe_ctx->stream_res.stream_enc;
+ uint32_t fifo_level;
+ int32_t N;
+ int32_t j;
+
+ if (!pipe_ctx->stream)
+ continue;
+ /* Virtual encoders don't have this function */
+ if (!stream_enc->funcs->get_fifo_cal_average_level)
+ continue;
+ fifo_level = stream_enc->funcs->get_fifo_cal_average_level(
+ stream_enc);
+ N = fifo_level / 4;
+ dccg->funcs->set_fifo_errdet_ovr_en(dccg, true);
+ for (j = 0; j < 12 - N; j++)
+ dccg->funcs->otg_add_pixel(dccg,
+ pipe_ctx->stream_res.tg->inst);
+ dccg->funcs->set_fifo_errdet_ovr_en(dccg, false);
+ }
+ }
REG_UPDATE(DENTIST_DISPCLK_CNTL,
DENTIST_DISPCLK_WDIVIDER, dispclk_wdivider);
@@ -251,11 +313,11 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base,
if (dpp_clock_lowered) {
// if clock is being lowered, increase DTO before lowering refclk
dcn20_update_clocks_update_dpp_dto(clk_mgr, context, safe_to_lower);
- dcn20_update_clocks_update_dentist(clk_mgr);
+ dcn20_update_clocks_update_dentist(clk_mgr, context);
} else {
// if clock is being raised, increase refclk before lowering DTO
if (update_dppclk || update_dispclk)
- dcn20_update_clocks_update_dentist(clk_mgr);
+ dcn20_update_clocks_update_dentist(clk_mgr, context);
// always update dtos unless clock is lowered and not safe to lower
dcn20_update_clocks_update_dpp_dto(clk_mgr, context, safe_to_lower);
}
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.h
index 0b9c045b0c8e..d254d0b6fba1 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.h
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.h
@@ -50,7 +50,8 @@ void dcn2_get_clock(struct clk_mgr *clk_mgr,
enum dc_clock_type clock_type,
struct dc_clock_config *clock_cfg);
-void dcn20_update_clocks_update_dentist(struct clk_mgr_internal *clk_mgr);
+void dcn20_update_clocks_update_dentist(struct clk_mgr_internal *clk_mgr,
+ struct dc_state *context);
void dcn2_read_clocks_from_hw_dentist(struct clk_mgr *clk_mgr_base);
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c
index 652fa89fae5f..513676a6f52b 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c
@@ -334,11 +334,11 @@ static void dcn3_update_clocks(struct clk_mgr *clk_mgr_base,
if (dpp_clock_lowered) {
/* if clock is being lowered, increase DTO before lowering refclk */
dcn20_update_clocks_update_dpp_dto(clk_mgr, context, safe_to_lower);
- dcn20_update_clocks_update_dentist(clk_mgr);
+ dcn20_update_clocks_update_dentist(clk_mgr, context);
} else {
/* if clock is being raised, increase refclk before lowering DTO */
if (update_dppclk || update_dispclk)
- dcn20_update_clocks_update_dentist(clk_mgr);
+ dcn20_update_clocks_update_dentist(clk_mgr, context);
/* There is a check inside dcn20_update_clocks_update_dpp_dto which ensures
* that we do not lower dto when it is not safe to lower. We do not need to
* compare the current and new dppclk before calling this function.*/
In the rtw_pci_init_rx_ring function the "if (len > TRX_BD_IDX_MASK)"
statement guarantees that len is less than or equal to GENMASK(11, 0) or
in other words that len is less than or equal to 4095. However the
rx_ring->buf has a size of RTK_MAX_RX_DESC_NUM (defined as 512). This
way it is possible an out-of-bounds write in the for statement due to
the i variable can exceed the rx_ring->buff size.
Fix it using the ARRAY_SIZE macro.
Cc: stable(a)vger.kernel.org
Addresses-Coverity-ID: 1461515 ("Out-of-bounds write")
Fixes: e3037485c68ec ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Len Baker <len.baker(a)gmx.com>
---
drivers/net/wireless/realtek/rtw88/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index e7d17ab8f113..b9d8c049e776 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -280,7 +280,7 @@ static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
}
rx_ring->r.head = head;
- for (i = 0; i < len; i++) {
+ for (i = 0; i < ARRAY_SIZE(rx_ring->buf); i++) {
skb = dev_alloc_skb(buf_sz);
if (!skb) {
allocated = i;
--
2.25.1
Once an exception has been injected, any side effects related to
the exception (such as setting CR2 or DR6) have been taked place.
Therefore, once KVM sets the VM-entry interruption information
field or the AMD EVENTINJ field, the next VM-entry must deliver that
exception.
Pending interrupts are processed after injected exceptions, so
in theory it would not be a problem to use KVM_INTERRUPT when
an injected exception is present. However, DOSBox is using
run->ready_for_interrupt_injection to detect interrupt windows
and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
interrupt manually. For this to work, the interrupt window
must be delayed after the completion of the previous event
injection.
Cc: stable(a)vger.kernel.org
Reported-by: Stas Sergeev <stsp2(a)yandex.ru>
Tested-by: Stas Sergeev <stsp2(a)yandex.ru>
Fixes: 71cc849b7093 ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/x86.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe3aaf195292..7fbab29b3569 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4342,6 +4342,9 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
{
+ if (kvm_event_needs_reinjection(vcpu))
+ return false;
+
/*
* We can accept userspace's request for interrupt injection
* as long as we have a place to store the interrupt number.
--
2.27.0
commit f18139966d072dab8e4398c95ce955a9742e04f7 upstream.
Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError
Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7d…https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen…https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
[pali: Backported to 4.14 version]
---
This patch is backported to 4.14 version. It depends on commit
7fbcb5da811b as presented on Cc: stable line. Backport for commit
7fbcb5da811b to 4.14 was sent in previous email:
https://lore.kernel.org/stable/20210716122033.22568-1-pali@kernel.org/
---
drivers/pci/host/pci-aardvark.c | 49 +++++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/host/pci-aardvark.c b/drivers/pci/host/pci-aardvark.c
index 5dbbf3d7de36..a3a28e38430a 100644
--- a/drivers/pci/host/pci-aardvark.c
+++ b/drivers/pci/host/pci-aardvark.c
@@ -426,10 +426,39 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
udelay(PIO_RETRY_DELAY);
}
- dev_err(dev, "config read/write timed out\n");
+ dev_err(dev, "PIO read/write transfer time out\n");
return -ETIMEDOUT;
}
+static bool advk_pcie_pio_is_running(struct advk_pcie *pcie)
+{
+ struct device *dev = &pcie->pdev->dev;
+
+ /*
+ * Trying to start a new PIO transfer when previous has not completed
+ * cause External Abort on CPU which results in kernel panic:
+ *
+ * SError Interrupt on CPU0, code 0xbf000002 -- SError
+ * Kernel panic - not syncing: Asynchronous SError Interrupt
+ *
+ * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected
+ * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent
+ * concurrent calls at the same time. But because PIO transfer may take
+ * about 1.5s when link is down or card is disconnected, it means that
+ * advk_pcie_wait_pio() does not always have to wait for completion.
+ *
+ * Some versions of ARM Trusted Firmware handles this External Abort at
+ * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit:
+ * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7d…
+ */
+ if (advk_readl(pcie, PIO_START)) {
+ dev_err(dev, "Previous PIO read/write transfer is still running\n");
+ return true;
+ }
+
+ return false;
+}
+
static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
int where, int size, u32 *val)
{
@@ -442,9 +471,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
return PCIBIOS_DEVICE_NOT_FOUND;
}
- /* Start PIO */
- advk_writel(pcie, 0, PIO_START);
- advk_writel(pcie, 1, PIO_ISR);
+ if (advk_pcie_pio_is_running(pcie)) {
+ *val = 0xffffffff;
+ return PCIBIOS_SET_FAILED;
+ }
/* Program the control register */
reg = advk_readl(pcie, PIO_CTRL);
@@ -463,7 +493,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
/* Program the data strobe */
advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
- /* Start the transfer */
+ /* Clear PIO DONE ISR and start the transfer */
+ advk_writel(pcie, 1, PIO_ISR);
advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie);
@@ -497,9 +528,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
if (where % size)
return PCIBIOS_SET_FAILED;
- /* Start PIO */
- advk_writel(pcie, 0, PIO_START);
- advk_writel(pcie, 1, PIO_ISR);
+ if (advk_pcie_pio_is_running(pcie))
+ return PCIBIOS_SET_FAILED;
/* Program the control register */
reg = advk_readl(pcie, PIO_CTRL);
@@ -526,7 +556,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
/* Program the data strobe */
advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
- /* Start the transfer */
+ /* Clear PIO DONE ISR and start the transfer */
+ advk_writel(pcie, 1, PIO_ISR);
advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie);
--
2.20.1
kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.
However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.
On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!
Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.
Reported-by: Moritz Fischer <mdf(a)kernel.org>
Tested-by: Moritz Fischer <mdf(a)kernel.org>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: James Morse <james.morse(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
---
drivers/firmware/efi/efi.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 4b7ee3fa9224..847f33ffc4ae 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -896,6 +896,7 @@ static int __init efi_memreserve_map_root(void)
static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
{
struct resource *res, *parent;
+ int ret;
res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
if (!res)
@@ -908,7 +909,17 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
/* we expect a conflict with a 'System RAM' region */
parent = request_resource_conflict(&iomem_resource, res);
- return parent ? request_resource(parent, res) : 0;
+ ret = parent ? request_resource(parent, res) : 0;
+
+ /*
+ * Given that efi_mem_reserve_iomem() can be called at any
+ * time, only call memblock_reserve() if the architecture
+ * keeps the infrastructure around.
+ */
+ if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK) && !ret)
+ memblock_reserve(addr, size);
+
+ return ret;
}
int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
--
2.30.2
From: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
MHI reads the channel ID from the event ring element sent by the
device which can be any value between 0 and 255. In order to
prevent any out of bound accesses, add a check against the maximum
number of channels supported by the controller and those channels
not configured yet so as to skip processing of that event ring
element.
Cc: stable(a)vger.kernel.org #5.10
Fixes: 1d3173a3bae7 ("bus: mhi: core: Add support for processing events from client device")
Signed-off-by: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Link: https://lore.kernel.org/r/1624558141-11045-1-git-send-email-bbhatt@codeauro…
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/bus/mhi/core/main.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 22acde118bc3..fc9196f11cb7 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -773,11 +773,18 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl,
cmd_pkt = mhi_to_virtual(mhi_ring, ptr);
chan = MHI_TRE_GET_CMD_CHID(cmd_pkt);
- mhi_chan = &mhi_cntrl->mhi_chan[chan];
- write_lock_bh(&mhi_chan->lock);
- mhi_chan->ccs = MHI_TRE_GET_EV_CODE(tre);
- complete(&mhi_chan->completion);
- write_unlock_bh(&mhi_chan->lock);
+
+ if (chan < mhi_cntrl->max_chan &&
+ mhi_cntrl->mhi_chan[chan].configured) {
+ mhi_chan = &mhi_cntrl->mhi_chan[chan];
+ write_lock_bh(&mhi_chan->lock);
+ mhi_chan->ccs = MHI_TRE_GET_EV_CODE(tre);
+ complete(&mhi_chan->completion);
+ write_unlock_bh(&mhi_chan->lock);
+ } else {
+ dev_err(&mhi_cntrl->mhi_dev->dev,
+ "Completion packet for invalid channel ID: %d\n", chan);
+ }
mhi_del_ring_element(mhi_cntrl, mhi_ring);
}
--
2.25.1
Patches below in Linus's tree failed to apply to 5.12-stable tree.
Adjust them so we can apply them to 5.12-stable tree.
original git commit id:
* 3769e4c0af5b82c8ea21d037013cb9564dfaa51f
[PATCH] drm/dp_mst: Avoid to mess up payload table by ports in stale topology
* 35d3e8cb35e75450f87f87e3d314e2d418b6954b
[PATCH] drm/dp_mst: Do not set proposed vcpi directly
* 24ff3dc18b99c4b912ab1746e803ddb3be5ced4c
[PATCH] drm/dp_mst: Add missing drm parameters to recently added call to drm_dbg_kms()
José Roberto de Souza (1):
drm/dp_mst: Add missing drm parameters to recently added call to
drm_dbg_kms()
Wayne Lin (2):
drm/dp_mst: Do not set proposed vcpi directly
drm/dp_mst: Avoid to mess up payload table by ports in stale topology
drivers/gpu/drm/drm_dp_mst_topology.c | 68 +++++++++++++++++----------
1 file changed, 42 insertions(+), 26 deletions(-)
--
2.17.1
Patches below in Linus's tree failed to apply to 5.10-stable tree.
Adjust them so we can apply them to 5.10-stable tree.
original git commit id:
* 3769e4c0af5b82c8ea21d037013cb9564dfaa51f
[PATCH] drm/dp_mst: Avoid to mess up payload table by ports in stale topology
* 35d3e8cb35e75450f87f87e3d314e2d418b6954b
[PATCH] drm/dp_mst: Do not set proposed vcpi directly
* 24ff3dc18b99c4b912ab1746e803ddb3be5ced4c
[PATCH] drm/dp_mst: Add missing drm parameters to recently added call to drm_dbg_kms()
José Roberto de Souza (1):
drm/dp_mst: Add missing drm parameters to recently added call to
drm_dbg_kms()
Wayne Lin (2):
drm/dp_mst: Do not set proposed vcpi directly
drm/dp_mst: Avoid to mess up payload table by ports in stale topology
drivers/gpu/drm/drm_dp_mst_topology.c | 68 +++++++++++++++++----------
1 file changed, 42 insertions(+), 26 deletions(-)
--
2.17.1
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3769e4c0af5b82c8ea21d037013cb9564dfaa51f Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 16 Jun 2021 11:55:01 +0800
Subject: [PATCH] drm/dp_mst: Avoid to mess up payload table by ports in stale
topology
[Why]
After unplug/hotplug hub from the system, userspace might start to
clear stale payloads gradually. If we call drm_dp_mst_deallocate_vcpi()
to release stale VCPI of those ports which are not relating to current
topology, we have chane to wrongly clear active payload table entry for
current topology.
E.g.
We have allocated VCPI 1 in current payload table and we call
drm_dp_mst_deallocate_vcpi() to clean VCPI 1 in stale topology. In
drm_dp_mst_deallocate_vcpi(), it will call drm_dp_mst_put_payload_id()
tp put VCPI 1 and which means ID 1 is available again. Thereafter, if we
want to allocate a new payload stream, it will find ID 1 is available by
drm_dp_mst_assign_payload_id(). However, ID 1 is being used
[How]
Check target sink is relating to current topology or not before doing
any payload table update.
Searching upward to find the target sink's relevant root branch device.
If the found root branch device is not the same root of current
topology, don't update payload table.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Cc tag.
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-3-Wayne.L…
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index b41b837db66d..9ac148efd9e4 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -94,6 +94,9 @@ static int drm_dp_mst_register_i2c_bus(struct drm_dp_mst_port *port);
static void drm_dp_mst_unregister_i2c_bus(struct drm_dp_mst_port *port);
static void drm_dp_mst_kick_tx(struct drm_dp_mst_topology_mgr *mgr);
+static bool drm_dp_mst_port_downstream_of_branch(struct drm_dp_mst_port *port,
+ struct drm_dp_mst_branch *branch);
+
#define DBG_PREFIX "[dp_mst]"
#define DP_STR(x) [DP_ ## x] = #x
@@ -3366,6 +3369,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
struct drm_dp_mst_port *port;
int i, j;
int cur_slots = 1;
+ bool skip;
mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3380,6 +3384,14 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
port = container_of(vcpi, struct drm_dp_mst_port,
vcpi);
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip) {
+ drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
+ continue;
+ }
/* Validated ports don't matter if we're releasing
* VCPI
*/
@@ -3479,6 +3491,7 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
struct drm_dp_mst_port *port;
int i;
int ret = 0;
+ bool skip;
mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3488,6 +3501,13 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip)
+ continue;
+
drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr->payloads[i].payload_state);
if (mgr->payloads[i].payload_state == DP_PAYLOAD_LOCAL) {
ret = drm_dp_create_payload_step2(mgr, port, mgr->proposed_vcpis[i]->vcpi, &mgr->payloads[i]);
@@ -4574,9 +4594,18 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_mst_port *port)
{
+ bool skip;
+
if (!port->vcpi.vcpi)
return;
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip)
+ return;
+
drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
port->vcpi.num_slots = 0;
port->vcpi.pbn = 0;
The patch titled
Subject: memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
has been added to the -mm tree. Its filename is
memblock-make-for_each_mem_range-traverse-memblock_hotplug-regions.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/memblock-make-for_each_mem_range-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/memblock-make-for_each_mem_range-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
Commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with
for_each_mem_range()") didn't take into account that when there is
movable_node parameter in the kernel command line, for_each_mem_range()
would skip ranges marked with MEMBLOCK_HOTPLUG.
The page table setup code in POWER uses for_each_mem_range() to create the
linear mapping of the physical memory and since the regions marked as
MEMORY_HOTPLUG are skipped, they never make it to the linear map.
A later access to the memory in those ranges will fail:
[ 2.271743] BUG: Unable to handle kernel data access on write at 0xc000000400000000
[ 2.271984] Faulting instruction address: 0xc00000000008a3c0
[ 2.272568] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2.272683] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 2.273063] Modules linked in:
[ 2.273435] CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7
[ 2.273832] NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040
[ 2.273918] REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0)
[ 2.274036] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000
[ 2.274454] CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0
[ 2.274454] GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000
[ 2.274454] GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200
[ 2.274454] GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300
[ 2.274454] GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00
[ 2.274454] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.274454] GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000
[ 2.274454] GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0
[ 2.274454] GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680
[ 2.275520] NIP [c00000000008a3c0] clear_user_page+0x50/0x80
[ 2.276333] LR [c0000000003c1ed8] __handle_mm_fault+0xc88/0x1910
[ 2.276688] Call Trace:
[ 2.276839] [c000000008a57a10] [c0000000003c1e94] __handle_mm_fault+0xc44/0x1910 (unreliable)
[ 2.277142] [c000000008a57af0] [c0000000003c2c90] handle_mm_fault+0x130/0x2a0
[ 2.277331] [c000000008a57b40] [c0000000003b5f08] __get_user_pages+0x248/0x610
[ 2.277541] [c000000008a57c40] [c0000000003b848c] __get_user_pages_remote+0x12c/0x3e0
[ 2.277768] [c000000008a57cd0] [c000000000473f24] get_arg_page+0x54/0xf0
[ 2.277959] [c000000008a57d10] [c000000000474a7c] copy_string_kernel+0x11c/0x210
[ 2.278159] [c000000008a57d80] [c00000000047663c] kernel_execve+0x16c/0x220
[ 2.278361] [c000000008a57dd0] [c000000000166270] call_usermodehelper_exec_async+0x1b0/0x2f0
[ 2.278543] [c000000008a57e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70
[ 2.278870] Instruction dump:
[ 2.279214] 79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050
[ 2.279416] 7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec
[ 2.280193] ---[ end trace 490b8c67e6075e09 ]---
Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the
traversal fixes this issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100
Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org
Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Tested-by: Greg Kurz <groug(a)kaod.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memblock.h | 4 ++--
mm/memblock.c | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
--- a/include/linux/memblock.h~memblock-make-for_each_mem_range-traverse-memblock_hotplug-regions
+++ a/include/linux/memblock.h
@@ -209,7 +209,7 @@ static inline void __next_physmem_range(
*/
#define for_each_mem_range(i, p_start, p_end) \
__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, \
- MEMBLOCK_NONE, p_start, p_end, NULL)
+ MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
/**
* for_each_mem_range_rev - reverse iterate through memblock areas from
@@ -220,7 +220,7 @@ static inline void __next_physmem_range(
*/
#define for_each_mem_range_rev(i, p_start, p_end) \
__for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE, \
- MEMBLOCK_NONE, p_start, p_end, NULL)
+ MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
/**
* for_each_reserved_mem_range - iterate over all reserved memblock areas
--- a/mm/memblock.c~memblock-make-for_each_mem_range-traverse-memblock_hotplug-regions
+++ a/mm/memblock.c
@@ -947,7 +947,8 @@ static bool should_skip_region(struct me
return true;
/* skip hotpluggable memory regions if needed */
- if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
+ if (movable_node_is_enabled() && memblock_is_hotpluggable(m) &&
+ !(flags & MEMBLOCK_HOTPLUG))
return true;
/* if we want mirror memory skip non-mirror memory regions */
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
memblock-make-for_each_mem_range-traverse-memblock_hotplug-regions.patch
mm-page_alloc-always-initialize-memory-map-for-the-holes.patch
microblaze-simplify-pte_alloc_one_kernel.patch
mm-introduce-memmap_alloc-to-unify-memory-map-allocation.patch
memblock-stop-poisoning-raw-allocations.patch
mm-remove-pfn_valid_within-and-config_holes_in_zone.patch
mm-memory_hotplug-cleanup-after-removal-of-pfn_valid_within.patch
Hi Greg,
thanks for applying/staging the commit for the stable release. For
kernel <= 5.11 the patch needs a small modification for the debug log.
See below. Should I send an adapted patch to stable(a)vger.kernel.org to
replace this one?
On Thu, 2021-07-15 at 16:43 +0200, gregkh(a)linuxfoundation.org wrote:
> This is a note to let you know that I've just added the patch titled
>
> media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
>
> to the 5.10-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> media-uvcvideo-fix-pixel-format-change-for-elgato-cam-link-4k.patch
> and it can be found in the queue-5.10 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
> From 4c6e0976295add7f0ed94d276c04a3d6f1ea8f83 Mon Sep 17 00:00:00 2001
> From: Benjamin Drung <bdrung(a)posteo.de>
> Date: Sat, 5 Jun 2021 22:15:36 +0200
> Subject: media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
>
> From: Benjamin Drung <bdrung(a)posteo.de>
>
> commit 4c6e0976295add7f0ed94d276c04a3d6f1ea8f83 upstream.
>
> The Elgato Cam Link 4K HDMI video capture card reports to support three
> different pixel formats, where the first format depends on the connected
> HDMI device.
>
> ```
> $ v4l2-ctl -d /dev/video0 --list-formats-ext
> ioctl: VIDIOC_ENUM_FMT
> Type: Video Capture
>
> [0]: 'NV12' (Y/CbCr 4:2:0)
> Size: Discrete 3840x2160
> Interval: Discrete 0.033s (29.970 fps)
> [1]: 'NV12' (Y/CbCr 4:2:0)
> Size: Discrete 3840x2160
> Interval: Discrete 0.033s (29.970 fps)
> [2]: 'YU12' (Planar YUV 4:2:0)
> Size: Discrete 3840x2160
> Interval: Discrete 0.033s (29.970 fps)
> ```
>
> Changing the pixel format to anything besides the first pixel format
> does not work:
>
> ```
> $ v4l2-ctl -d /dev/video0 --try-fmt-video pixelformat=YU12
> Format Video Capture:
> Width/Height : 3840/2160
> Pixel Format : 'NV12' (Y/CbCr 4:2:0)
> Field : None
> Bytes per Line : 3840
> Size Image : 12441600
> Colorspace : sRGB
> Transfer Function : Rec. 709
> YCbCr/HSV Encoding: Rec. 709
> Quantization : Default (maps to Limited Range)
> Flags :
> ```
>
> User space applications like VLC might show an error message on the
> terminal in that case:
>
> ```
> libv4l2: error set_fmt gave us a different result than try_fmt!
> ```
>
> Depending on the error handling of the user space applications, they
> might display a distorted video, because they use the wrong pixel format
> for decoding the stream.
>
> The Elgato Cam Link 4K responds to the USB video probe
> VS_PROBE_CONTROL/VS_COMMIT_CONTROL with a malformed data structure: The
> second byte contains bFormatIndex (instead of being the second byte of
> bmHint). The first byte is always zero. The third byte is always 1.
>
> The firmware bug was reported to Elgato on 2020-12-01 and it was
> forwarded by the support team to the developers as feature request.
> There is no firmware update available since then. The latest firmware
> for Elgato Cam Link 4K as of 2021-03-23 has MCU 20.02.19 and FPGA 67.
>
> Therefore correct the malformed data structure for this device. The
> change was successfully tested with VLC, OBS, and Chromium using
> different pixel formats (YUYV, NV12, YU12), resolutions (3840x2160,
> 1920x1080), and frame rates (29.970 and 59.940 fps).
>
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Benjamin Drung <bdrung(a)posteo.de>
> Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> drivers/media/usb/uvc/uvc_video.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -124,10 +124,37 @@ int uvc_query_ctrl(struct uvc_device *de
> static void uvc_fixup_video_ctrl(struct uvc_streaming *stream,
> struct uvc_streaming_control *ctrl)
> {
> + static const struct usb_device_id elgato_cam_link_4k = {
> + USB_DEVICE(0x0fd9, 0x0066)
> + };
> struct uvc_format *format = NULL;
> struct uvc_frame *frame = NULL;
> unsigned int i;
>
> + /*
> + * The response of the Elgato Cam Link 4K is incorrect: The second byte
> + * contains bFormatIndex (instead of being the second byte of bmHint).
> + * The first byte is always zero. The third byte is always 1.
> + *
> + * The UVC 1.5 class specification defines the first five bits in the
> + * bmHint bitfield. The remaining bits are reserved and should be zero.
> + * Therefore a valid bmHint will be less than 32.
> + *
> + * Latest Elgato Cam Link 4K firmware as of 2021-03-23 needs this fix.
> + * MCU: 20.02.19, FPGA: 67
> + */
> + if (usb_match_one_id(stream->dev->intf, &elgato_cam_link_4k) &&
> + ctrl->bmHint > 255) {
> + u8 corrected_format_index = ctrl->bmHint >> 8;
> +
> + /* uvc_dbg(stream->dev, VIDEO,
Instead of commenting out the debug log, this line needs to be replaced by:
uvc_trace(UVC_TRACE_VIDEO,
> + "Correct USB video probe response from {bmHint: 0x%04x, bFormatIndex: %u} to {bmHint: 0x%04x, bFormatIndex: %u}\n",
> + ctrl->bmHint, ctrl->bFormatIndex,
> + 1, corrected_format_index); */
> + ctrl->bmHint = 1;
> + ctrl->bFormatIndex = corrected_format_index;
> + }
> +
> for (i = 0; i < stream->nformats; ++i) {
> if (stream->format[i].index == ctrl->bFormatIndex) {
> format = &stream->format[i];
>
>
> Patches currently in stable-queue which might be from bdrung(a)posteo.de are
>
> queue-5.10/media-uvcvideo-fix-pixel-format-change-for-elgato-cam-link-4k.patch
Results from Linaro’s test farm.
Regression detected on arm and arm64 due to the following patch
with CONFIG_CORESIGHT=y enabled,
coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer()
commit 5fae8a946ac2df879caf3f79a193d4766d00239b upstream.
Build error:
------------
drivers/hwtracing/coresight/coresight-tmc-etf.c: In function
'tmc_update_etf_buffer':
drivers/hwtracing/coresight/coresight-tmc-etf.c:477:33: error:
'CORESIGHT_BARRIER_PKT_SIZE' undeclared (first use in this function)
477 | if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
ref:
https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/142783…https://builds.tuxbuild.com/1vMA3olyV9E6Yr1Ix0LWLlXinkv/
--
Linaro LKFT
https://lkft.linaro.org
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5fae8a946ac2df879caf3f79a193d4766d00239b Mon Sep 17 00:00:00 2001
From: Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
Date: Mon, 14 Jun 2021 11:59:00 -0600
Subject: [PATCH] coresight: tmc-etf: Fix global-out-of-bounds in
tmc_update_etf_buffer()
commit 6f755e85c332 ("coresight: Add helper for inserting synchronization
packets") removed trailing '\0' from barrier_pkt array and updated the
call sites like etb_update_buffer() to have proper checks for barrier_pkt
size before read but missed updating tmc_update_etf_buffer() which still
reads barrier_pkt past the array size resulting in KASAN out-of-bounds
bug. Fix this by adding a check for barrier_pkt size before accessing
like it is done in etb_update_buffer().
BUG: KASAN: global-out-of-bounds in tmc_update_etf_buffer+0x4b8/0x698
Read of size 4 at addr ffffffd05b7d1030 by task perf/2629
Call trace:
dump_backtrace+0x0/0x27c
show_stack+0x20/0x2c
dump_stack+0x11c/0x188
print_address_description+0x3c/0x4a4
__kasan_report+0x140/0x164
kasan_report+0x10/0x18
__asan_report_load4_noabort+0x1c/0x24
tmc_update_etf_buffer+0x4b8/0x698
etm_event_stop+0x248/0x2d8
etm_event_del+0x20/0x2c
event_sched_out+0x214/0x6f0
group_sched_out+0xd0/0x270
ctx_sched_out+0x2ec/0x518
__perf_event_task_sched_out+0x4fc/0xe6c
__schedule+0x1094/0x16a0
preempt_schedule_irq+0x88/0x170
arm64_preempt_schedule_irq+0xf0/0x18c
el1_irq+0xe8/0x180
perf_event_exec+0x4d8/0x56c
setup_new_exec+0x204/0x400
load_elf_binary+0x72c/0x18c0
search_binary_handler+0x13c/0x420
load_script+0x500/0x6c4
search_binary_handler+0x13c/0x420
exec_binprm+0x118/0x654
__do_execve_file+0x77c/0xba4
__arm64_compat_sys_execve+0x98/0xac
el0_svc_common+0x1f8/0x5e0
el0_svc_compat_handler+0x84/0xb0
el0_svc_compat+0x10/0x50
The buggy address belongs to the variable:
barrier_pkt+0x10/0x40
Memory state around the buggy address:
ffffffd05b7d0f00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
ffffffd05b7d0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffd05b7d1000: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 03
^
ffffffd05b7d1080: fa fa fa fa 00 02 fa fa fa fa fa fa 03 fa fa fa
ffffffd05b7d1100: fa fa fa fa 00 00 00 00 05 fa fa fa fa fa fa fa
==================================================================
Link: https://lore.kernel.org/r/20210505093430.18445-1-saiprakash.ranjan@codeauro…
Fixes: 0c3fc4d5fa26 ("coresight: Add barrier packet for synchronisation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20210614175901.532683-6-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 45b85edfc690..cd0fb7bfba68 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -530,7 +530,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
buf_ptr = buf->data_pages[cur] + offset;
*buf_ptr = readl_relaxed(drvdata->base + TMC_RRD);
- if (lost && *barrier) {
+ if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
*buf_ptr = *barrier;
barrier++;
}
The patch titled
Subject: kfence: skip all GFP_ZONEMASK allocations
has been added to the -mm tree. Its filename is
kfence-skip-all-gfp_zonemask-allocations.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/kfence-skip-all-gfp_zonemask-allo…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/kfence-skip-all-gfp_zonemask-allo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kfence: skip all GFP_ZONEMASK allocations
Allocation requests outside ZONE_NORMAL (MOVABLE, HIGHMEM or DMA) cannot
be fulfilled by KFENCE, because KFENCE memory pool is located in a zone
different from the requested one.
Because callers of kmem_cache_alloc() may actually rely on the allocation
to reside in the requested zone (e.g. memory allocations done with
__GFP_DMA must be DMAable), skip all allocations done with GFP_ZONEMASK
and/or respective SLAB flags (SLAB_CACHE_DMA and SLAB_CACHE_DMA32).
Link: https://lkml.kernel.org/r/20210714092222.1890268-2-glider@google.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Acked-by: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/kfence/core.c~kfence-skip-all-gfp_zonemask-allocations
+++ a/mm/kfence/core.c
@@ -741,6 +741,15 @@ void *__kfence_alloc(struct kmem_cache *
return NULL;
/*
+ * Skip allocations from non-default zones, including DMA. We cannot
+ * guarantee that pages in the KFENCE pool will have the requested
+ * properties (e.g. reside in DMAable memory).
+ */
+ if ((flags & GFP_ZONEMASK) ||
+ (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32)))
+ return NULL;
+
+ /*
* allocation_gate only needs to become non-zero, so it doesn't make
* sense to continue writing to it and pay the associated contention
* cost, in case we have a large number of concurrent allocations.
_
Patches currently in -mm which might be from glider(a)google.com are
kfence-move-the-size-check-to-the-beginning-of-__kfence_alloc.patch
kfence-skip-all-gfp_zonemask-allocations.patch
On Mon, 12 Jul 2021, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.13.2 release.
> There are 800 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 14 Jul 2021 06:02:46 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.13.2-rc1…
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.13.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
> -------------
> Pseudo-Shortlog of commits:
>
> Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Linux 5.13.2-rc1
Hi Greg,
Sorry to be making waves, but please, what's up with the 5.13.2-rc,
5.12.17-rc, 5.10.50-rc, 5.4.132-rc stable release candidates?
Amongst the 2000+ patches posted today, there are a significant number
of them Signed-off-by Andrew, Signed-off-by Linus, Signed-off-by Sasha:
yet never Cc'ed to stable (nor even posted as AUTOSELs, I think).
Am I out of date? I thought that had been agreed not to happen:
https://lore.kernel.org/linux-mm/20190808000533.7701-1-mike.kravetz@oracle.…
is the thread I found when I looked for confirmation, but I believe the
same has been agreed before and since too.
Andrew goes to a lot of trouble to establish which Fixes from his tree
ought to go to stable. Of course there will be exceptions which we
later decide should go in after all; but it's worrying when there's a
wholesale breach like this, and I think most of them should be dropped.
To pick on just one of many examples (sorry Miaohe!), a patch that
surprises me, but I've not had time to look into so far, and would
not want accelerated into X stable releases, 385/800
> Miaohe Lin <linmiaohe(a)huawei.com>
> mm/shmem: fix shmem_swapin() race with swapoff
Hugh
From: Petr Mladek <pmladek(a)suse.com>
The standard printk() tries to flush the message to the console
immediately. It tries to take the console lock. If the lock is
already taken then the current owner is responsible for flushing
even the new message.
There is a small race window between checking whether a new message is
available and releasing the console lock. It is solved by re-checking
the state after releasing the console lock. If the check is positive
then console_unlock() tries to take the lock again and process the new
message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that
console_seq is not longer read atomically. As a result, the re-check might
be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under
the console lock. In the worst case, it will take the lock again only
to realized that the new message has already been proceed. But it
was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler
warning when CONFIG_PRINTK is not defined.
Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock")
Reported-by: kernel test robot <lkp(a)intel.com> # unused next_seq warning
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Reviewed-by: John Ogness <john.ogness(a)linutronix.de>
Link: https://lore.kernel.org/r/20210702150657.26760-1-pmladek@suse.com
---
kernel/printk/printk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..6dad7da8f383 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2545,6 +2545,7 @@ void console_unlock(void)
bool do_cond_resched, retry;
struct printk_info info;
struct printk_record r;
+ u64 __maybe_unused next_seq;
if (console_suspended) {
up_console_sem();
@@ -2654,8 +2655,10 @@ void console_unlock(void)
cond_resched();
}
- console_locked = 0;
+ /* Get consistent value of the next-to-be-used sequence number. */
+ next_seq = console_seq;
+ console_locked = 0;
up_console_sem();
/*
@@ -2664,7 +2667,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- retry = prb_read_valid(prb, console_seq, NULL);
+ retry = prb_read_valid(prb, next_seq, NULL);
printk_safe_exit_irqrestore(flags);
if (retry && console_trylock())
--
2.20.1
There's a small window where a USB 2 remote wake may be left unhandled
due to a race between hub thread and xhci port event interrupt handler.
When the resume event is detected in the xhci interrupt handler it kicks
the hub timer, which should move the port from resume to U0 once resume
has been signalled for long enough.
To keep the hub "thread" running we set a bus_state->resuming_ports flag.
This flag makes sure hub timer function kicks itself.
checking this flag was not properly protected by the spinlock. Flag was
copied to a local variable before lock was taken. The local variable was
then checked later with spinlock held.
If interrupt is handled right after copying the flag to the local variable
we end up stopping the hub thread before it can handle the USB 2 resume.
CPU0 CPU1
(hub thread) (xhci event handler)
xhci_hub_status_data()
status = bus_state->resuming_ports;
<Interrupt>
handle_port_status()
spin_lock()
bus_state->resuming_ports = 1
set_flag(HCD_FLAG_POLL_RH)
spin_unlock()
spin_lock()
if (!status)
clear_flag(HCD_FLAG_POLL_RH)
spin_unlock()
Fix this by taking the lock a bit earlier so that it covers
the resuming_ports flag copy in the hub thread
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-hub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index e9b18fc17617..151e93c4bd57 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1638,11 +1638,12 @@ int xhci_hub_status_data(struct usb_hcd *hcd, char *buf)
* Inform the usbcore about resume-in-progress by returning
* a non-zero value even if there are no status changes.
*/
+ spin_lock_irqsave(&xhci->lock, flags);
+
status = bus_state->resuming_ports;
mask = PORT_CSC | PORT_PEC | PORT_OCC | PORT_PLC | PORT_WRC | PORT_CEC;
- spin_lock_irqsave(&xhci->lock, flags);
/* For each port, did anything change? If so, set that bit in buf. */
for (i = 0; i < max_ports; i++) {
temp = readl(ports[i]->addr);
--
2.25.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0a7790be182d32b9b332a37cb4206e24fe94b728 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Mon, 14 Jun 2021 12:34:09 +0200
Subject: [PATCH] media: subdev: disallow ioctl for saa6588/davinci
The saa6588_ioctl() function expects to get called from other kernel
functions with a 'saa6588_command' pointer, but I found nothing stops it
from getting called from user space instead, which seems rather dangerous.
The same thing happens in the davinci vpbe driver with its VENC_GET_FLD
command.
As a quick fix, add a separate .command() callback pointer for this
driver and change the two callers over to that. This change can easily
get backported to stable kernels if necessary, but since there are only
two drivers, we may want to eventually replace this with a set of more
specialized callbacks in the long run.
Fixes: c3fda7f835b0 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/i2c/saa6588.c b/drivers/media/i2c/saa6588.c
index ecb491d5f2ab..d1e0716bdfff 100644
--- a/drivers/media/i2c/saa6588.c
+++ b/drivers/media/i2c/saa6588.c
@@ -380,7 +380,7 @@ static void saa6588_configure(struct saa6588 *s)
/* ---------------------------------------------------------------------- */
-static long saa6588_ioctl(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
+static long saa6588_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
struct saa6588 *s = to_saa6588(sd);
struct saa6588_command *a = arg;
@@ -433,7 +433,7 @@ static int saa6588_s_tuner(struct v4l2_subdev *sd, const struct v4l2_tuner *vt)
/* ----------------------------------------------------------------------- */
static const struct v4l2_subdev_core_ops saa6588_core_ops = {
- .ioctl = saa6588_ioctl,
+ .command = saa6588_command,
};
static const struct v4l2_subdev_tuner_ops saa6588_tuner_ops = {
diff --git a/drivers/media/pci/bt8xx/bttv-driver.c b/drivers/media/pci/bt8xx/bttv-driver.c
index 1f62a9d8ea1d..0e9df8b35ac6 100644
--- a/drivers/media/pci/bt8xx/bttv-driver.c
+++ b/drivers/media/pci/bt8xx/bttv-driver.c
@@ -3179,7 +3179,7 @@ static int radio_release(struct file *file)
btv->radio_user--;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_CLOSE, &cmd);
if (btv->radio_user == 0)
btv->has_radio_tuner = 0;
@@ -3260,7 +3260,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
radio_enable(btv);
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_READ, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_READ, &cmd);
return cmd.result;
}
@@ -3281,7 +3281,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.instance = file;
cmd.event_list = wait;
cmd.poll_mask = res;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_POLL, &cmd);
return cmd.poll_mask;
}
diff --git a/drivers/media/pci/saa7134/saa7134-video.c b/drivers/media/pci/saa7134/saa7134-video.c
index 0f9d6b9edb90..374c8e1087de 100644
--- a/drivers/media/pci/saa7134/saa7134-video.c
+++ b/drivers/media/pci/saa7134/saa7134-video.c
@@ -1181,7 +1181,7 @@ static int video_release(struct file *file)
saa_call_all(dev, tuner, standby);
if (vdev->vfl_type == VFL_TYPE_RADIO)
- saa_call_all(dev, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_CLOSE, &cmd);
mutex_unlock(&dev->lock);
return 0;
@@ -1200,7 +1200,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_READ, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_READ, &cmd);
mutex_unlock(&dev->lock);
return cmd.result;
@@ -1216,7 +1216,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.event_list = wait;
cmd.poll_mask = 0;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_POLL, &cmd);
mutex_unlock(&dev->lock);
return rc | cmd.poll_mask;
diff --git a/drivers/media/platform/davinci/vpbe_display.c b/drivers/media/platform/davinci/vpbe_display.c
index d19bad997f30..bf3c3e76b921 100644
--- a/drivers/media/platform/davinci/vpbe_display.c
+++ b/drivers/media/platform/davinci/vpbe_display.c
@@ -47,7 +47,7 @@ static int venc_is_second_field(struct vpbe_display *disp_dev)
ret = v4l2_subdev_call(vpbe_dev->venc,
core,
- ioctl,
+ command,
VENC_GET_FLD,
&val);
if (ret < 0) {
diff --git a/drivers/media/platform/davinci/vpbe_venc.c b/drivers/media/platform/davinci/vpbe_venc.c
index 8caa084e5704..bde241c26d79 100644
--- a/drivers/media/platform/davinci/vpbe_venc.c
+++ b/drivers/media/platform/davinci/vpbe_venc.c
@@ -521,9 +521,7 @@ static int venc_s_routing(struct v4l2_subdev *sd, u32 input, u32 output,
return ret;
}
-static long venc_ioctl(struct v4l2_subdev *sd,
- unsigned int cmd,
- void *arg)
+static long venc_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
u32 val;
@@ -542,7 +540,7 @@ static long venc_ioctl(struct v4l2_subdev *sd,
}
static const struct v4l2_subdev_core_ops venc_core_ops = {
- .ioctl = venc_ioctl,
+ .command = venc_command,
};
static const struct v4l2_subdev_video_ops venc_video_ops = {
diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h
index 89115ba4c0f2..95f8bfd63273 100644
--- a/include/media/v4l2-subdev.h
+++ b/include/media/v4l2-subdev.h
@@ -162,6 +162,9 @@ struct v4l2_subdev_io_pin_config {
* @s_gpio: set GPIO pins. Very simple right now, might need to be extended with
* a direction argument if needed.
*
+ * @command: called by in-kernel drivers in order to call functions internal
+ * to subdev drivers driver that have a separate callback.
+ *
* @ioctl: called at the end of ioctl() syscall handler at the V4L2 core.
* used to provide support for private ioctls used on the driver.
*
@@ -193,6 +196,7 @@ struct v4l2_subdev_core_ops {
int (*load_fw)(struct v4l2_subdev *sd);
int (*reset)(struct v4l2_subdev *sd, u32 val);
int (*s_gpio)(struct v4l2_subdev *sd, u32 val);
+ long (*command)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
long (*ioctl)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
#ifdef CONFIG_COMPAT
long (*compat_ioctl32)(struct v4l2_subdev *sd, unsigned int cmd,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0a7790be182d32b9b332a37cb4206e24fe94b728 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Mon, 14 Jun 2021 12:34:09 +0200
Subject: [PATCH] media: subdev: disallow ioctl for saa6588/davinci
The saa6588_ioctl() function expects to get called from other kernel
functions with a 'saa6588_command' pointer, but I found nothing stops it
from getting called from user space instead, which seems rather dangerous.
The same thing happens in the davinci vpbe driver with its VENC_GET_FLD
command.
As a quick fix, add a separate .command() callback pointer for this
driver and change the two callers over to that. This change can easily
get backported to stable kernels if necessary, but since there are only
two drivers, we may want to eventually replace this with a set of more
specialized callbacks in the long run.
Fixes: c3fda7f835b0 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/i2c/saa6588.c b/drivers/media/i2c/saa6588.c
index ecb491d5f2ab..d1e0716bdfff 100644
--- a/drivers/media/i2c/saa6588.c
+++ b/drivers/media/i2c/saa6588.c
@@ -380,7 +380,7 @@ static void saa6588_configure(struct saa6588 *s)
/* ---------------------------------------------------------------------- */
-static long saa6588_ioctl(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
+static long saa6588_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
struct saa6588 *s = to_saa6588(sd);
struct saa6588_command *a = arg;
@@ -433,7 +433,7 @@ static int saa6588_s_tuner(struct v4l2_subdev *sd, const struct v4l2_tuner *vt)
/* ----------------------------------------------------------------------- */
static const struct v4l2_subdev_core_ops saa6588_core_ops = {
- .ioctl = saa6588_ioctl,
+ .command = saa6588_command,
};
static const struct v4l2_subdev_tuner_ops saa6588_tuner_ops = {
diff --git a/drivers/media/pci/bt8xx/bttv-driver.c b/drivers/media/pci/bt8xx/bttv-driver.c
index 1f62a9d8ea1d..0e9df8b35ac6 100644
--- a/drivers/media/pci/bt8xx/bttv-driver.c
+++ b/drivers/media/pci/bt8xx/bttv-driver.c
@@ -3179,7 +3179,7 @@ static int radio_release(struct file *file)
btv->radio_user--;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_CLOSE, &cmd);
if (btv->radio_user == 0)
btv->has_radio_tuner = 0;
@@ -3260,7 +3260,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
radio_enable(btv);
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_READ, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_READ, &cmd);
return cmd.result;
}
@@ -3281,7 +3281,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.instance = file;
cmd.event_list = wait;
cmd.poll_mask = res;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_POLL, &cmd);
return cmd.poll_mask;
}
diff --git a/drivers/media/pci/saa7134/saa7134-video.c b/drivers/media/pci/saa7134/saa7134-video.c
index 0f9d6b9edb90..374c8e1087de 100644
--- a/drivers/media/pci/saa7134/saa7134-video.c
+++ b/drivers/media/pci/saa7134/saa7134-video.c
@@ -1181,7 +1181,7 @@ static int video_release(struct file *file)
saa_call_all(dev, tuner, standby);
if (vdev->vfl_type == VFL_TYPE_RADIO)
- saa_call_all(dev, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_CLOSE, &cmd);
mutex_unlock(&dev->lock);
return 0;
@@ -1200,7 +1200,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_READ, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_READ, &cmd);
mutex_unlock(&dev->lock);
return cmd.result;
@@ -1216,7 +1216,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.event_list = wait;
cmd.poll_mask = 0;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_POLL, &cmd);
mutex_unlock(&dev->lock);
return rc | cmd.poll_mask;
diff --git a/drivers/media/platform/davinci/vpbe_display.c b/drivers/media/platform/davinci/vpbe_display.c
index d19bad997f30..bf3c3e76b921 100644
--- a/drivers/media/platform/davinci/vpbe_display.c
+++ b/drivers/media/platform/davinci/vpbe_display.c
@@ -47,7 +47,7 @@ static int venc_is_second_field(struct vpbe_display *disp_dev)
ret = v4l2_subdev_call(vpbe_dev->venc,
core,
- ioctl,
+ command,
VENC_GET_FLD,
&val);
if (ret < 0) {
diff --git a/drivers/media/platform/davinci/vpbe_venc.c b/drivers/media/platform/davinci/vpbe_venc.c
index 8caa084e5704..bde241c26d79 100644
--- a/drivers/media/platform/davinci/vpbe_venc.c
+++ b/drivers/media/platform/davinci/vpbe_venc.c
@@ -521,9 +521,7 @@ static int venc_s_routing(struct v4l2_subdev *sd, u32 input, u32 output,
return ret;
}
-static long venc_ioctl(struct v4l2_subdev *sd,
- unsigned int cmd,
- void *arg)
+static long venc_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
u32 val;
@@ -542,7 +540,7 @@ static long venc_ioctl(struct v4l2_subdev *sd,
}
static const struct v4l2_subdev_core_ops venc_core_ops = {
- .ioctl = venc_ioctl,
+ .command = venc_command,
};
static const struct v4l2_subdev_video_ops venc_video_ops = {
diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h
index 89115ba4c0f2..95f8bfd63273 100644
--- a/include/media/v4l2-subdev.h
+++ b/include/media/v4l2-subdev.h
@@ -162,6 +162,9 @@ struct v4l2_subdev_io_pin_config {
* @s_gpio: set GPIO pins. Very simple right now, might need to be extended with
* a direction argument if needed.
*
+ * @command: called by in-kernel drivers in order to call functions internal
+ * to subdev drivers driver that have a separate callback.
+ *
* @ioctl: called at the end of ioctl() syscall handler at the V4L2 core.
* used to provide support for private ioctls used on the driver.
*
@@ -193,6 +196,7 @@ struct v4l2_subdev_core_ops {
int (*load_fw)(struct v4l2_subdev *sd);
int (*reset)(struct v4l2_subdev *sd, u32 val);
int (*s_gpio)(struct v4l2_subdev *sd, u32 val);
+ long (*command)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
long (*ioctl)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
#ifdef CONFIG_COMPAT
long (*compat_ioctl32)(struct v4l2_subdev *sd, unsigned int cmd,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0a7790be182d32b9b332a37cb4206e24fe94b728 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Mon, 14 Jun 2021 12:34:09 +0200
Subject: [PATCH] media: subdev: disallow ioctl for saa6588/davinci
The saa6588_ioctl() function expects to get called from other kernel
functions with a 'saa6588_command' pointer, but I found nothing stops it
from getting called from user space instead, which seems rather dangerous.
The same thing happens in the davinci vpbe driver with its VENC_GET_FLD
command.
As a quick fix, add a separate .command() callback pointer for this
driver and change the two callers over to that. This change can easily
get backported to stable kernels if necessary, but since there are only
two drivers, we may want to eventually replace this with a set of more
specialized callbacks in the long run.
Fixes: c3fda7f835b0 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/i2c/saa6588.c b/drivers/media/i2c/saa6588.c
index ecb491d5f2ab..d1e0716bdfff 100644
--- a/drivers/media/i2c/saa6588.c
+++ b/drivers/media/i2c/saa6588.c
@@ -380,7 +380,7 @@ static void saa6588_configure(struct saa6588 *s)
/* ---------------------------------------------------------------------- */
-static long saa6588_ioctl(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
+static long saa6588_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
struct saa6588 *s = to_saa6588(sd);
struct saa6588_command *a = arg;
@@ -433,7 +433,7 @@ static int saa6588_s_tuner(struct v4l2_subdev *sd, const struct v4l2_tuner *vt)
/* ----------------------------------------------------------------------- */
static const struct v4l2_subdev_core_ops saa6588_core_ops = {
- .ioctl = saa6588_ioctl,
+ .command = saa6588_command,
};
static const struct v4l2_subdev_tuner_ops saa6588_tuner_ops = {
diff --git a/drivers/media/pci/bt8xx/bttv-driver.c b/drivers/media/pci/bt8xx/bttv-driver.c
index 1f62a9d8ea1d..0e9df8b35ac6 100644
--- a/drivers/media/pci/bt8xx/bttv-driver.c
+++ b/drivers/media/pci/bt8xx/bttv-driver.c
@@ -3179,7 +3179,7 @@ static int radio_release(struct file *file)
btv->radio_user--;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_CLOSE, &cmd);
if (btv->radio_user == 0)
btv->has_radio_tuner = 0;
@@ -3260,7 +3260,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
radio_enable(btv);
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_READ, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_READ, &cmd);
return cmd.result;
}
@@ -3281,7 +3281,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.instance = file;
cmd.event_list = wait;
cmd.poll_mask = res;
- bttv_call_all(btv, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ bttv_call_all(btv, core, command, SAA6588_CMD_POLL, &cmd);
return cmd.poll_mask;
}
diff --git a/drivers/media/pci/saa7134/saa7134-video.c b/drivers/media/pci/saa7134/saa7134-video.c
index 0f9d6b9edb90..374c8e1087de 100644
--- a/drivers/media/pci/saa7134/saa7134-video.c
+++ b/drivers/media/pci/saa7134/saa7134-video.c
@@ -1181,7 +1181,7 @@ static int video_release(struct file *file)
saa_call_all(dev, tuner, standby);
if (vdev->vfl_type == VFL_TYPE_RADIO)
- saa_call_all(dev, core, ioctl, SAA6588_CMD_CLOSE, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_CLOSE, &cmd);
mutex_unlock(&dev->lock);
return 0;
@@ -1200,7 +1200,7 @@ static ssize_t radio_read(struct file *file, char __user *data,
cmd.result = -ENODEV;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_READ, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_READ, &cmd);
mutex_unlock(&dev->lock);
return cmd.result;
@@ -1216,7 +1216,7 @@ static __poll_t radio_poll(struct file *file, poll_table *wait)
cmd.event_list = wait;
cmd.poll_mask = 0;
mutex_lock(&dev->lock);
- saa_call_all(dev, core, ioctl, SAA6588_CMD_POLL, &cmd);
+ saa_call_all(dev, core, command, SAA6588_CMD_POLL, &cmd);
mutex_unlock(&dev->lock);
return rc | cmd.poll_mask;
diff --git a/drivers/media/platform/davinci/vpbe_display.c b/drivers/media/platform/davinci/vpbe_display.c
index d19bad997f30..bf3c3e76b921 100644
--- a/drivers/media/platform/davinci/vpbe_display.c
+++ b/drivers/media/platform/davinci/vpbe_display.c
@@ -47,7 +47,7 @@ static int venc_is_second_field(struct vpbe_display *disp_dev)
ret = v4l2_subdev_call(vpbe_dev->venc,
core,
- ioctl,
+ command,
VENC_GET_FLD,
&val);
if (ret < 0) {
diff --git a/drivers/media/platform/davinci/vpbe_venc.c b/drivers/media/platform/davinci/vpbe_venc.c
index 8caa084e5704..bde241c26d79 100644
--- a/drivers/media/platform/davinci/vpbe_venc.c
+++ b/drivers/media/platform/davinci/vpbe_venc.c
@@ -521,9 +521,7 @@ static int venc_s_routing(struct v4l2_subdev *sd, u32 input, u32 output,
return ret;
}
-static long venc_ioctl(struct v4l2_subdev *sd,
- unsigned int cmd,
- void *arg)
+static long venc_command(struct v4l2_subdev *sd, unsigned int cmd, void *arg)
{
u32 val;
@@ -542,7 +540,7 @@ static long venc_ioctl(struct v4l2_subdev *sd,
}
static const struct v4l2_subdev_core_ops venc_core_ops = {
- .ioctl = venc_ioctl,
+ .command = venc_command,
};
static const struct v4l2_subdev_video_ops venc_video_ops = {
diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h
index 89115ba4c0f2..95f8bfd63273 100644
--- a/include/media/v4l2-subdev.h
+++ b/include/media/v4l2-subdev.h
@@ -162,6 +162,9 @@ struct v4l2_subdev_io_pin_config {
* @s_gpio: set GPIO pins. Very simple right now, might need to be extended with
* a direction argument if needed.
*
+ * @command: called by in-kernel drivers in order to call functions internal
+ * to subdev drivers driver that have a separate callback.
+ *
* @ioctl: called at the end of ioctl() syscall handler at the V4L2 core.
* used to provide support for private ioctls used on the driver.
*
@@ -193,6 +196,7 @@ struct v4l2_subdev_core_ops {
int (*load_fw)(struct v4l2_subdev *sd);
int (*reset)(struct v4l2_subdev *sd, u32 val);
int (*s_gpio)(struct v4l2_subdev *sd, u32 val);
+ long (*command)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
long (*ioctl)(struct v4l2_subdev *sd, unsigned int cmd, void *arg);
#ifdef CONFIG_COMPAT
long (*compat_ioctl32)(struct v4l2_subdev *sd, unsigned int cmd,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1df5172c5c251ec24a1bd0f44fe38c841f384330 Mon Sep 17 00:00:00 2001
From: Andreas Noever <andreas.noever(a)gmail.com>
Date: Tue, 3 Jun 2014 22:04:10 +0200
Subject: [PATCH] PCI: Suspend/resume quirks for Apple thunderbolt
Add two quirks to support thunderbolt suspend/resume on Apple systems.
We need to perform two different actions during suspend and resume:
The whole controller has to be powered down before suspend. If this is
not done then the native host interface device will be gone after resume
if a thunderbolt device was plugged in before suspending. The controller
represents itself as multiple PCI devices/bridges. To power it down we
hook into the upstream bridge of the controller and call the magic ACPI
methods. Power will be restored automatically during resume (by the
firmware presumably).
During resume we have to wait for the native host interface to
reestablish all pci tunnels. Since there is no parent-child relationship
between the NHI and the bridges we have to explicitly wait for them
using device_pm_wait_for_dev. We do this in the resume_noirq phase of
the downstream bridges of the controller (which lead into the
thunderbolt tunnels).
Signed-off-by: Andreas Noever <andreas.noever(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 03266af20d5f..ca8a171f9689 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2986,6 +2986,103 @@ DECLARE_PCI_FIXUP_HEADER(0x1814, 0x0601, /* Ralink RT2800 802.11n PCI */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169,
quirk_broken_intx_masking);
+#ifdef CONFIG_ACPI
+/*
+ * Apple: Shutdown Cactus Ridge Thunderbolt controller.
+ *
+ * On Apple hardware the Cactus Ridge Thunderbolt controller needs to be
+ * shutdown before suspend. Otherwise the native host interface (NHI) will not
+ * be present after resume if a device was plugged in before suspend.
+ *
+ * The thunderbolt controller consists of a pcie switch with downstream
+ * bridges leading to the NHI and to the tunnel pci bridges.
+ *
+ * This quirk cuts power to the whole chip. Therefore we have to apply it
+ * during suspend_noirq of the upstream bridge.
+ *
+ * Power is automagically restored before resume. No action is needed.
+ */
+static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
+{
+ acpi_handle bridge, SXIO, SXFP, SXLV;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
+ return;
+ bridge = ACPI_HANDLE(&dev->dev);
+ if (!bridge)
+ return;
+ /*
+ * SXIO and SXLV are present only on machines requiring this quirk.
+ * TB bridges in external devices might have the same device id as those
+ * on the host, but they will not have the associated ACPI methods. This
+ * implicitly checks that we are at the right bridge.
+ */
+ if (ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXIO", &SXIO))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXFP", &SXFP))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXLV", &SXLV)))
+ return;
+ dev_info(&dev->dev, "quirk: cutting power to thunderbolt controller...\n");
+
+ /* magic sequence */
+ acpi_execute_simple_method(SXIO, NULL, 1);
+ acpi_execute_simple_method(SXFP, NULL, 0);
+ msleep(300);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+ acpi_execute_simple_method(SXIO, NULL, 0);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+}
+DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Apple: Wait for the thunderbolt controller to reestablish pci tunnels.
+ *
+ * During suspend the thunderbolt controller is reset and all pci
+ * tunnels are lost. The NHI driver will try to reestablish all tunnels
+ * during resume. We have to manually wait for the NHI since there is
+ * no parent child relationship between the NHI and the tunneled
+ * bridges.
+ */
+static void quirk_apple_wait_for_thunderbolt(struct pci_dev *dev)
+{
+ struct pci_dev *sibling = NULL;
+ struct pci_dev *nhi = NULL;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
+ return;
+ /*
+ * Find the NHI and confirm that we are a bridge on the tb host
+ * controller and not on a tb endpoint.
+ */
+ sibling = pci_get_slot(dev->bus, 0x0);
+ if (sibling == dev)
+ goto out; /* we are the downstream bridge to the NHI */
+ if (!sibling || !sibling->subordinate)
+ goto out;
+ nhi = pci_get_slot(sibling->subordinate, 0x0);
+ if (!nhi)
+ goto out;
+ if (nhi->vendor != PCI_VENDOR_ID_INTEL
+ || (nhi->device != 0x1547 && nhi->device != 0x156c)
+ || nhi->subsystem_vendor != 0x2222
+ || nhi->subsystem_device != 0x1111)
+ goto out;
+ dev_info(&dev->dev, "quirk: wating for thunderbolt to reestablish pci tunnels...\n");
+ device_pm_wait_for_dev(&dev->dev, &nhi->dev);
+out:
+ pci_dev_put(nhi);
+ pci_dev_put(sibling);
+}
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_wait_for_thunderbolt);
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
+ quirk_apple_wait_for_thunderbolt);
+#endif
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1df5172c5c251ec24a1bd0f44fe38c841f384330 Mon Sep 17 00:00:00 2001
From: Andreas Noever <andreas.noever(a)gmail.com>
Date: Tue, 3 Jun 2014 22:04:10 +0200
Subject: [PATCH] PCI: Suspend/resume quirks for Apple thunderbolt
Add two quirks to support thunderbolt suspend/resume on Apple systems.
We need to perform two different actions during suspend and resume:
The whole controller has to be powered down before suspend. If this is
not done then the native host interface device will be gone after resume
if a thunderbolt device was plugged in before suspending. The controller
represents itself as multiple PCI devices/bridges. To power it down we
hook into the upstream bridge of the controller and call the magic ACPI
methods. Power will be restored automatically during resume (by the
firmware presumably).
During resume we have to wait for the native host interface to
reestablish all pci tunnels. Since there is no parent-child relationship
between the NHI and the bridges we have to explicitly wait for them
using device_pm_wait_for_dev. We do this in the resume_noirq phase of
the downstream bridges of the controller (which lead into the
thunderbolt tunnels).
Signed-off-by: Andreas Noever <andreas.noever(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 03266af20d5f..ca8a171f9689 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2986,6 +2986,103 @@ DECLARE_PCI_FIXUP_HEADER(0x1814, 0x0601, /* Ralink RT2800 802.11n PCI */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169,
quirk_broken_intx_masking);
+#ifdef CONFIG_ACPI
+/*
+ * Apple: Shutdown Cactus Ridge Thunderbolt controller.
+ *
+ * On Apple hardware the Cactus Ridge Thunderbolt controller needs to be
+ * shutdown before suspend. Otherwise the native host interface (NHI) will not
+ * be present after resume if a device was plugged in before suspend.
+ *
+ * The thunderbolt controller consists of a pcie switch with downstream
+ * bridges leading to the NHI and to the tunnel pci bridges.
+ *
+ * This quirk cuts power to the whole chip. Therefore we have to apply it
+ * during suspend_noirq of the upstream bridge.
+ *
+ * Power is automagically restored before resume. No action is needed.
+ */
+static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
+{
+ acpi_handle bridge, SXIO, SXFP, SXLV;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
+ return;
+ bridge = ACPI_HANDLE(&dev->dev);
+ if (!bridge)
+ return;
+ /*
+ * SXIO and SXLV are present only on machines requiring this quirk.
+ * TB bridges in external devices might have the same device id as those
+ * on the host, but they will not have the associated ACPI methods. This
+ * implicitly checks that we are at the right bridge.
+ */
+ if (ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXIO", &SXIO))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXFP", &SXFP))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXLV", &SXLV)))
+ return;
+ dev_info(&dev->dev, "quirk: cutting power to thunderbolt controller...\n");
+
+ /* magic sequence */
+ acpi_execute_simple_method(SXIO, NULL, 1);
+ acpi_execute_simple_method(SXFP, NULL, 0);
+ msleep(300);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+ acpi_execute_simple_method(SXIO, NULL, 0);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+}
+DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Apple: Wait for the thunderbolt controller to reestablish pci tunnels.
+ *
+ * During suspend the thunderbolt controller is reset and all pci
+ * tunnels are lost. The NHI driver will try to reestablish all tunnels
+ * during resume. We have to manually wait for the NHI since there is
+ * no parent child relationship between the NHI and the tunneled
+ * bridges.
+ */
+static void quirk_apple_wait_for_thunderbolt(struct pci_dev *dev)
+{
+ struct pci_dev *sibling = NULL;
+ struct pci_dev *nhi = NULL;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
+ return;
+ /*
+ * Find the NHI and confirm that we are a bridge on the tb host
+ * controller and not on a tb endpoint.
+ */
+ sibling = pci_get_slot(dev->bus, 0x0);
+ if (sibling == dev)
+ goto out; /* we are the downstream bridge to the NHI */
+ if (!sibling || !sibling->subordinate)
+ goto out;
+ nhi = pci_get_slot(sibling->subordinate, 0x0);
+ if (!nhi)
+ goto out;
+ if (nhi->vendor != PCI_VENDOR_ID_INTEL
+ || (nhi->device != 0x1547 && nhi->device != 0x156c)
+ || nhi->subsystem_vendor != 0x2222
+ || nhi->subsystem_device != 0x1111)
+ goto out;
+ dev_info(&dev->dev, "quirk: wating for thunderbolt to reestablish pci tunnels...\n");
+ device_pm_wait_for_dev(&dev->dev, &nhi->dev);
+out:
+ pci_dev_put(nhi);
+ pci_dev_put(sibling);
+}
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_wait_for_thunderbolt);
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
+ quirk_apple_wait_for_thunderbolt);
+#endif
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1df5172c5c251ec24a1bd0f44fe38c841f384330 Mon Sep 17 00:00:00 2001
From: Andreas Noever <andreas.noever(a)gmail.com>
Date: Tue, 3 Jun 2014 22:04:10 +0200
Subject: [PATCH] PCI: Suspend/resume quirks for Apple thunderbolt
Add two quirks to support thunderbolt suspend/resume on Apple systems.
We need to perform two different actions during suspend and resume:
The whole controller has to be powered down before suspend. If this is
not done then the native host interface device will be gone after resume
if a thunderbolt device was plugged in before suspending. The controller
represents itself as multiple PCI devices/bridges. To power it down we
hook into the upstream bridge of the controller and call the magic ACPI
methods. Power will be restored automatically during resume (by the
firmware presumably).
During resume we have to wait for the native host interface to
reestablish all pci tunnels. Since there is no parent-child relationship
between the NHI and the bridges we have to explicitly wait for them
using device_pm_wait_for_dev. We do this in the resume_noirq phase of
the downstream bridges of the controller (which lead into the
thunderbolt tunnels).
Signed-off-by: Andreas Noever <andreas.noever(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 03266af20d5f..ca8a171f9689 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2986,6 +2986,103 @@ DECLARE_PCI_FIXUP_HEADER(0x1814, 0x0601, /* Ralink RT2800 802.11n PCI */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169,
quirk_broken_intx_masking);
+#ifdef CONFIG_ACPI
+/*
+ * Apple: Shutdown Cactus Ridge Thunderbolt controller.
+ *
+ * On Apple hardware the Cactus Ridge Thunderbolt controller needs to be
+ * shutdown before suspend. Otherwise the native host interface (NHI) will not
+ * be present after resume if a device was plugged in before suspend.
+ *
+ * The thunderbolt controller consists of a pcie switch with downstream
+ * bridges leading to the NHI and to the tunnel pci bridges.
+ *
+ * This quirk cuts power to the whole chip. Therefore we have to apply it
+ * during suspend_noirq of the upstream bridge.
+ *
+ * Power is automagically restored before resume. No action is needed.
+ */
+static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
+{
+ acpi_handle bridge, SXIO, SXFP, SXLV;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
+ return;
+ bridge = ACPI_HANDLE(&dev->dev);
+ if (!bridge)
+ return;
+ /*
+ * SXIO and SXLV are present only on machines requiring this quirk.
+ * TB bridges in external devices might have the same device id as those
+ * on the host, but they will not have the associated ACPI methods. This
+ * implicitly checks that we are at the right bridge.
+ */
+ if (ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXIO", &SXIO))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXFP", &SXFP))
+ || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXLV", &SXLV)))
+ return;
+ dev_info(&dev->dev, "quirk: cutting power to thunderbolt controller...\n");
+
+ /* magic sequence */
+ acpi_execute_simple_method(SXIO, NULL, 1);
+ acpi_execute_simple_method(SXFP, NULL, 0);
+ msleep(300);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+ acpi_execute_simple_method(SXIO, NULL, 0);
+ acpi_execute_simple_method(SXLV, NULL, 0);
+}
+DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Apple: Wait for the thunderbolt controller to reestablish pci tunnels.
+ *
+ * During suspend the thunderbolt controller is reset and all pci
+ * tunnels are lost. The NHI driver will try to reestablish all tunnels
+ * during resume. We have to manually wait for the NHI since there is
+ * no parent child relationship between the NHI and the tunneled
+ * bridges.
+ */
+static void quirk_apple_wait_for_thunderbolt(struct pci_dev *dev)
+{
+ struct pci_dev *sibling = NULL;
+ struct pci_dev *nhi = NULL;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+ return;
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
+ return;
+ /*
+ * Find the NHI and confirm that we are a bridge on the tb host
+ * controller and not on a tb endpoint.
+ */
+ sibling = pci_get_slot(dev->bus, 0x0);
+ if (sibling == dev)
+ goto out; /* we are the downstream bridge to the NHI */
+ if (!sibling || !sibling->subordinate)
+ goto out;
+ nhi = pci_get_slot(sibling->subordinate, 0x0);
+ if (!nhi)
+ goto out;
+ if (nhi->vendor != PCI_VENDOR_ID_INTEL
+ || (nhi->device != 0x1547 && nhi->device != 0x156c)
+ || nhi->subsystem_vendor != 0x2222
+ || nhi->subsystem_device != 0x1111)
+ goto out;
+ dev_info(&dev->dev, "quirk: wating for thunderbolt to reestablish pci tunnels...\n");
+ device_pm_wait_for_dev(&dev->dev, &nhi->dev);
+out:
+ pci_dev_put(nhi);
+ pci_dev_put(sibling);
+}
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x1547,
+ quirk_apple_wait_for_thunderbolt);
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
+ quirk_apple_wait_for_thunderbolt);
+#endif
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f123c42bbeff26bfe8bdb08a01307e92d51eec39 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 23 Jun 2021 13:39:33 -0700
Subject: [PATCH] lkdtm: Enable DOUBLE_FAULT on all architectures
Where feasible, I prefer to have all tests visible on all architectures,
but to have them wired to XFAIL. DOUBLE_FAIL was set up to XFAIL, but
wasn't actually being added to the test list.
Fixes: cea23efb4de2 ("lkdtm/bugs: Make double-fault test always available")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20210623203936.3151093-7-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 645b31e98c77..2c89fc18669f 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -178,9 +178,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(STACKLEAK_ERASING),
CRASHTYPE(CFI_FORWARD_PROTO),
CRASHTYPE(FORTIFIED_STRSCPY),
-#ifdef CONFIG_X86_32
CRASHTYPE(DOUBLE_FAULT),
-#endif
#ifdef CONFIG_PPC_BOOK3S_64
CRASHTYPE(PPC_SLB_MULTIHIT),
#endif
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2eb472bbe25b3f360990f23b293b3fbadfa4bc0 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 23 Jun 2021 13:39:29 -0700
Subject: [PATCH] selftests/lkdtm: Fix expected text for CR4 pinning
The error text for CR4 pinning changed. Update the test to match.
Fixes: a13b9d0b9721 ("x86/cpu: Use pinning mask for CR4 bits needing to be 0")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20210623203936.3151093-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
index 11ef159be0fd..a5fce7fd4520 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -11,7 +11,7 @@ CORRUPT_LIST_ADD list_add corruption
CORRUPT_LIST_DEL list_del corruption
STACK_GUARD_PAGE_LEADING
STACK_GUARD_PAGE_TRAILING
-UNSET_SMEP CR4 bits went missing
+UNSET_SMEP pinned CR4 bits changed:
DOUBLE_FAULT
CORRUPT_PAC
UNALIGNED_LOAD_STORE_WRITE
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d7b0408934c749f546b01f2b33d07421a49b6f3e Mon Sep 17 00:00:00 2001
From: Varad Gautam <varad.gautam(a)suse.com>
Date: Fri, 28 May 2021 18:04:06 +0200
Subject: [PATCH] xfrm: policy: Read seqcount outside of rcu-read side in
xfrm_policy_lookup_bytype
xfrm_policy_lookup_bytype loops on seqcount mutex xfrm_policy_hash_generation
within an RCU read side critical section. Although ill advised, this is fine if
the loop is bounded.
xfrm_policy_hash_generation wraps mutex hash_resize_mutex, which is used to
serialize writers (xfrm_hash_resize, xfrm_hash_rebuild). This is fine too.
On PREEMPT_RT=y, the read_seqcount_begin call within xfrm_policy_lookup_bytype
emits a mutex lock/unlock for hash_resize_mutex. Mutex locking is fine, since
RCU read side critical sections are allowed to sleep with PREEMPT_RT.
xfrm_hash_resize can, however, block on synchronize_rcu while holding
hash_resize_mutex.
This leads to the following situation on PREEMPT_RT, where the writer is
blocked on RCU grace period expiry, while the reader is blocked on a lock held
by the writer:
Thead 1 (xfrm_hash_resize) Thread 2 (xfrm_policy_lookup_bytype)
rcu_read_lock();
mutex_lock(&hash_resize_mutex);
read_seqcount_begin(&xfrm_policy_hash_generation);
mutex_lock(&hash_resize_mutex); // block
xfrm_bydst_resize();
synchronize_rcu(); // block
<RCU stalls in xfrm_policy_lookup_bytype>
Move the read_seqcount_begin call outside of the RCU read side critical section,
and do an rcu_read_unlock/retry if we got stale data within the critical section.
On non-PREEMPT_RT, this shortens the time spent within RCU read side critical
section in case the seqcount needs a retry, and avoids unbounded looping.
Fixes: 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated lock")
Signed-off-by: Varad Gautam <varad.gautam(a)suse.com>
Cc: linux-rt-users <linux-rt-users(a)vger.kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.9
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Florian Westphal <fw(a)strlen.de>
Cc: "Ahmed S. Darwish" <a.darwish(a)linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Acked-by: Ahmed S. Darwish <a.darwish(a)linutronix.de>
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b74f28cabe24..8c56e3e59c3c 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2092,12 +2092,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
if (unlikely(!daddr || !saddr))
return NULL;
- rcu_read_lock();
retry:
- do {
- sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
- chain = policy_hash_direct(net, daddr, saddr, family, dir);
- } while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+ sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+ rcu_read_lock();
+
+ chain = policy_hash_direct(net, daddr, saddr, family, dir);
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
+ goto retry;
+ }
ret = NULL;
hlist_for_each_entry_rcu(pol, chain, bydst) {
@@ -2128,11 +2131,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
}
skip_inexact:
- if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence))
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
goto retry;
+ }
- if (ret && !xfrm_pol_hold_rcu(ret))
+ if (ret && !xfrm_pol_hold_rcu(ret)) {
+ rcu_read_unlock();
goto retry;
+ }
fail:
rcu_read_unlock();
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d7b0408934c749f546b01f2b33d07421a49b6f3e Mon Sep 17 00:00:00 2001
From: Varad Gautam <varad.gautam(a)suse.com>
Date: Fri, 28 May 2021 18:04:06 +0200
Subject: [PATCH] xfrm: policy: Read seqcount outside of rcu-read side in
xfrm_policy_lookup_bytype
xfrm_policy_lookup_bytype loops on seqcount mutex xfrm_policy_hash_generation
within an RCU read side critical section. Although ill advised, this is fine if
the loop is bounded.
xfrm_policy_hash_generation wraps mutex hash_resize_mutex, which is used to
serialize writers (xfrm_hash_resize, xfrm_hash_rebuild). This is fine too.
On PREEMPT_RT=y, the read_seqcount_begin call within xfrm_policy_lookup_bytype
emits a mutex lock/unlock for hash_resize_mutex. Mutex locking is fine, since
RCU read side critical sections are allowed to sleep with PREEMPT_RT.
xfrm_hash_resize can, however, block on synchronize_rcu while holding
hash_resize_mutex.
This leads to the following situation on PREEMPT_RT, where the writer is
blocked on RCU grace period expiry, while the reader is blocked on a lock held
by the writer:
Thead 1 (xfrm_hash_resize) Thread 2 (xfrm_policy_lookup_bytype)
rcu_read_lock();
mutex_lock(&hash_resize_mutex);
read_seqcount_begin(&xfrm_policy_hash_generation);
mutex_lock(&hash_resize_mutex); // block
xfrm_bydst_resize();
synchronize_rcu(); // block
<RCU stalls in xfrm_policy_lookup_bytype>
Move the read_seqcount_begin call outside of the RCU read side critical section,
and do an rcu_read_unlock/retry if we got stale data within the critical section.
On non-PREEMPT_RT, this shortens the time spent within RCU read side critical
section in case the seqcount needs a retry, and avoids unbounded looping.
Fixes: 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated lock")
Signed-off-by: Varad Gautam <varad.gautam(a)suse.com>
Cc: linux-rt-users <linux-rt-users(a)vger.kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.9
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Florian Westphal <fw(a)strlen.de>
Cc: "Ahmed S. Darwish" <a.darwish(a)linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Acked-by: Ahmed S. Darwish <a.darwish(a)linutronix.de>
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b74f28cabe24..8c56e3e59c3c 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2092,12 +2092,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
if (unlikely(!daddr || !saddr))
return NULL;
- rcu_read_lock();
retry:
- do {
- sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
- chain = policy_hash_direct(net, daddr, saddr, family, dir);
- } while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+ sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+ rcu_read_lock();
+
+ chain = policy_hash_direct(net, daddr, saddr, family, dir);
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
+ goto retry;
+ }
ret = NULL;
hlist_for_each_entry_rcu(pol, chain, bydst) {
@@ -2128,11 +2131,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
}
skip_inexact:
- if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence))
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
goto retry;
+ }
- if (ret && !xfrm_pol_hold_rcu(ret))
+ if (ret && !xfrm_pol_hold_rcu(ret)) {
+ rcu_read_unlock();
goto retry;
+ }
fail:
rcu_read_unlock();
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d7b0408934c749f546b01f2b33d07421a49b6f3e Mon Sep 17 00:00:00 2001
From: Varad Gautam <varad.gautam(a)suse.com>
Date: Fri, 28 May 2021 18:04:06 +0200
Subject: [PATCH] xfrm: policy: Read seqcount outside of rcu-read side in
xfrm_policy_lookup_bytype
xfrm_policy_lookup_bytype loops on seqcount mutex xfrm_policy_hash_generation
within an RCU read side critical section. Although ill advised, this is fine if
the loop is bounded.
xfrm_policy_hash_generation wraps mutex hash_resize_mutex, which is used to
serialize writers (xfrm_hash_resize, xfrm_hash_rebuild). This is fine too.
On PREEMPT_RT=y, the read_seqcount_begin call within xfrm_policy_lookup_bytype
emits a mutex lock/unlock for hash_resize_mutex. Mutex locking is fine, since
RCU read side critical sections are allowed to sleep with PREEMPT_RT.
xfrm_hash_resize can, however, block on synchronize_rcu while holding
hash_resize_mutex.
This leads to the following situation on PREEMPT_RT, where the writer is
blocked on RCU grace period expiry, while the reader is blocked on a lock held
by the writer:
Thead 1 (xfrm_hash_resize) Thread 2 (xfrm_policy_lookup_bytype)
rcu_read_lock();
mutex_lock(&hash_resize_mutex);
read_seqcount_begin(&xfrm_policy_hash_generation);
mutex_lock(&hash_resize_mutex); // block
xfrm_bydst_resize();
synchronize_rcu(); // block
<RCU stalls in xfrm_policy_lookup_bytype>
Move the read_seqcount_begin call outside of the RCU read side critical section,
and do an rcu_read_unlock/retry if we got stale data within the critical section.
On non-PREEMPT_RT, this shortens the time spent within RCU read side critical
section in case the seqcount needs a retry, and avoids unbounded looping.
Fixes: 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated lock")
Signed-off-by: Varad Gautam <varad.gautam(a)suse.com>
Cc: linux-rt-users <linux-rt-users(a)vger.kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.9
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Florian Westphal <fw(a)strlen.de>
Cc: "Ahmed S. Darwish" <a.darwish(a)linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Acked-by: Ahmed S. Darwish <a.darwish(a)linutronix.de>
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b74f28cabe24..8c56e3e59c3c 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2092,12 +2092,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
if (unlikely(!daddr || !saddr))
return NULL;
- rcu_read_lock();
retry:
- do {
- sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
- chain = policy_hash_direct(net, daddr, saddr, family, dir);
- } while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+ sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+ rcu_read_lock();
+
+ chain = policy_hash_direct(net, daddr, saddr, family, dir);
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
+ goto retry;
+ }
ret = NULL;
hlist_for_each_entry_rcu(pol, chain, bydst) {
@@ -2128,11 +2131,15 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
}
skip_inexact:
- if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence))
+ if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) {
+ rcu_read_unlock();
goto retry;
+ }
- if (ret && !xfrm_pol_hold_rcu(ret))
+ if (ret && !xfrm_pol_hold_rcu(ret)) {
+ rcu_read_unlock();
goto retry;
+ }
fail:
rcu_read_unlock();
This is _not_ an upstream commit and just for 5.4.y only.
kernel test robot reported a 5.4.y build issue found by randconfig [1]
after backporting commit 89b158635ad7 ("lib/lz4: explicitly support
in-place decompression"") due to "undefined reference to `memmove'".
However, upstream and 5.10 LTS seem fine. After digging further,
I found commit a510b616131f ("MIPS: Add support for ZSTD-compressed
kernels") introduced memmove() occasionally and it has been included
since v5.10.
This partially cherry-picks the memmove() part of commit a510b616131f
to fix the reported build regression since we don't need the whole
patch for 5.4 LTS at all.
[1] https://lore.kernel.org/r/202107070120.6dOj1kB7-lkp@intel.com/
Fixes: defcc2b5e54a ("lib/lz4: explicitly support in-place decompression") # 5.4.y
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
---
v2:
"just submit the fix and say _why_ this is not an upstream
commit, do not attempt to emulate an upstream commit like
your change did." as Greg suggested...
arch/mips/boot/compressed/string.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/mips/boot/compressed/string.c b/arch/mips/boot/compressed/string.c
index 43beecc3587c..0b593b709228 100644
--- a/arch/mips/boot/compressed/string.c
+++ b/arch/mips/boot/compressed/string.c
@@ -5,6 +5,7 @@
* Very small subset of simple string routines
*/
+#include <linux/compiler_attributes.h>
#include <linux/types.h>
void *memcpy(void *dest, const void *src, size_t n)
@@ -27,3 +28,19 @@ void *memset(void *s, int c, size_t n)
ss[i] = c;
return s;
}
+
+void * __weak memmove(void *dest, const void *src, size_t n)
+{
+ unsigned int i;
+ const char *s = src;
+ char *d = dest;
+
+ if ((uintptr_t)dest < (uintptr_t)src) {
+ for (i = 0; i < n; i++)
+ d[i] = s[i];
+ } else {
+ for (i = n; i > 0; i--)
+ d[i - 1] = s[i - 1];
+ }
+ return dest;
+}
--
2.24.4
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35d3e8cb35e75450f87f87e3d314e2d418b6954b Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 16 Jun 2021 11:55:00 +0800
Subject: [PATCH] drm/dp_mst: Do not set proposed vcpi directly
[Why]
When we receive CSN message to notify one port is disconnected, we will
implicitly set its corresponding num_slots to 0. Later on, we will
eventually call drm_dp_update_payload_part1() to arrange down streams.
In drm_dp_update_payload_part1(), we iterate over all proposed_vcpis[]
to do the update. Not specific to a target sink only. For example, if we
light up 2 monitors, Monitor_A and Monitor_B, and then we unplug
Monitor_B. Later on, when we call drm_dp_update_payload_part1() to try
to update payload for Monitor_A, we'll also implicitly clean payload for
Monitor_B at the same time. And finally, when we try to call
drm_dp_update_payload_part1() to clean payload for Monitor_B, we will do
nothing at this time since payload for Monitor_B has been cleaned up
previously.
For StarTech 1to3 DP hub, it seems like if we didn't update DPCD payload
ID table then polling for "ACT Handled"(BIT_1 of DPCD 002C0h) will fail
and this polling will last for 3 seconds.
Therefore, guess the best way is we don't set the proposed_vcpi[]
diretly. Let user of these herlper functions to set the proposed_vcpi
directly.
[How]
1. Revert commit 7617e9621bf2 ("drm/dp_mst: clear time slots for ports
invalid")
2. Tackle the issue in previous commit by skipping those trasient
proposed VCPIs. These stale VCPIs shoulde be explicitly cleared by
user later on.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Fixed & Cc tags
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 7617e9621bf2 ("drm/dp_mst: clear time slots for ports invalid")
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.5+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-2-Wayne.L…
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 32b7f8983b94..b41b837db66d 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2501,7 +2501,7 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
{
struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
struct drm_dp_mst_port *port;
- int old_ddps, old_input, ret, i;
+ int old_ddps, ret;
u8 new_pdt;
bool new_mcs;
bool dowork = false, create_connector = false;
@@ -2533,7 +2533,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
}
old_ddps = port->ddps;
- old_input = port->input;
port->input = conn_stat->input_port;
port->ldps = conn_stat->legacy_device_plug_status;
port->ddps = conn_stat->displayport_device_plug_status;
@@ -2555,28 +2554,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
dowork = false;
}
- if (!old_input && old_ddps != port->ddps && !port->ddps) {
- for (i = 0; i < mgr->max_payloads; i++) {
- struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
- struct drm_dp_mst_port *port_validated;
-
- if (!vcpi)
- continue;
-
- port_validated =
- container_of(vcpi, struct drm_dp_mst_port, vcpi);
- port_validated =
- drm_dp_mst_topology_get_port_validated(mgr, port_validated);
- if (!port_validated) {
- mutex_lock(&mgr->payload_lock);
- vcpi->num_slots = 0;
- mutex_unlock(&mgr->payload_lock);
- } else {
- drm_dp_mst_topology_put_port(port_validated);
- }
- }
- }
-
if (port->connector)
drm_modeset_unlock(&mgr->base.lock);
else if (create_connector)
@@ -3410,8 +3387,15 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
port = drm_dp_mst_topology_get_port_validated(
mgr, port);
if (!port) {
- mutex_unlock(&mgr->payload_lock);
- return -EINVAL;
+ if (vcpi->num_slots == payload->num_slots) {
+ cur_slots += vcpi->num_slots;
+ payload->start_slot = req_payload.start_slot;
+ continue;
+ } else {
+ drm_dbg_kms("Fail:set payload to invalid sink");
+ mutex_unlock(&mgr->payload_lock);
+ return -EINVAL;
+ }
}
put_port = true;
}
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35d3e8cb35e75450f87f87e3d314e2d418b6954b Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 16 Jun 2021 11:55:00 +0800
Subject: [PATCH] drm/dp_mst: Do not set proposed vcpi directly
[Why]
When we receive CSN message to notify one port is disconnected, we will
implicitly set its corresponding num_slots to 0. Later on, we will
eventually call drm_dp_update_payload_part1() to arrange down streams.
In drm_dp_update_payload_part1(), we iterate over all proposed_vcpis[]
to do the update. Not specific to a target sink only. For example, if we
light up 2 monitors, Monitor_A and Monitor_B, and then we unplug
Monitor_B. Later on, when we call drm_dp_update_payload_part1() to try
to update payload for Monitor_A, we'll also implicitly clean payload for
Monitor_B at the same time. And finally, when we try to call
drm_dp_update_payload_part1() to clean payload for Monitor_B, we will do
nothing at this time since payload for Monitor_B has been cleaned up
previously.
For StarTech 1to3 DP hub, it seems like if we didn't update DPCD payload
ID table then polling for "ACT Handled"(BIT_1 of DPCD 002C0h) will fail
and this polling will last for 3 seconds.
Therefore, guess the best way is we don't set the proposed_vcpi[]
diretly. Let user of these herlper functions to set the proposed_vcpi
directly.
[How]
1. Revert commit 7617e9621bf2 ("drm/dp_mst: clear time slots for ports
invalid")
2. Tackle the issue in previous commit by skipping those trasient
proposed VCPIs. These stale VCPIs shoulde be explicitly cleared by
user later on.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Fixed & Cc tags
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 7617e9621bf2 ("drm/dp_mst: clear time slots for ports invalid")
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.5+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-2-Wayne.L…
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 32b7f8983b94..b41b837db66d 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2501,7 +2501,7 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
{
struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
struct drm_dp_mst_port *port;
- int old_ddps, old_input, ret, i;
+ int old_ddps, ret;
u8 new_pdt;
bool new_mcs;
bool dowork = false, create_connector = false;
@@ -2533,7 +2533,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
}
old_ddps = port->ddps;
- old_input = port->input;
port->input = conn_stat->input_port;
port->ldps = conn_stat->legacy_device_plug_status;
port->ddps = conn_stat->displayport_device_plug_status;
@@ -2555,28 +2554,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
dowork = false;
}
- if (!old_input && old_ddps != port->ddps && !port->ddps) {
- for (i = 0; i < mgr->max_payloads; i++) {
- struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
- struct drm_dp_mst_port *port_validated;
-
- if (!vcpi)
- continue;
-
- port_validated =
- container_of(vcpi, struct drm_dp_mst_port, vcpi);
- port_validated =
- drm_dp_mst_topology_get_port_validated(mgr, port_validated);
- if (!port_validated) {
- mutex_lock(&mgr->payload_lock);
- vcpi->num_slots = 0;
- mutex_unlock(&mgr->payload_lock);
- } else {
- drm_dp_mst_topology_put_port(port_validated);
- }
- }
- }
-
if (port->connector)
drm_modeset_unlock(&mgr->base.lock);
else if (create_connector)
@@ -3410,8 +3387,15 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
port = drm_dp_mst_topology_get_port_validated(
mgr, port);
if (!port) {
- mutex_unlock(&mgr->payload_lock);
- return -EINVAL;
+ if (vcpi->num_slots == payload->num_slots) {
+ cur_slots += vcpi->num_slots;
+ payload->start_slot = req_payload.start_slot;
+ continue;
+ } else {
+ drm_dbg_kms("Fail:set payload to invalid sink");
+ mutex_unlock(&mgr->payload_lock);
+ return -EINVAL;
+ }
}
put_port = true;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35d3e8cb35e75450f87f87e3d314e2d418b6954b Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 16 Jun 2021 11:55:00 +0800
Subject: [PATCH] drm/dp_mst: Do not set proposed vcpi directly
[Why]
When we receive CSN message to notify one port is disconnected, we will
implicitly set its corresponding num_slots to 0. Later on, we will
eventually call drm_dp_update_payload_part1() to arrange down streams.
In drm_dp_update_payload_part1(), we iterate over all proposed_vcpis[]
to do the update. Not specific to a target sink only. For example, if we
light up 2 monitors, Monitor_A and Monitor_B, and then we unplug
Monitor_B. Later on, when we call drm_dp_update_payload_part1() to try
to update payload for Monitor_A, we'll also implicitly clean payload for
Monitor_B at the same time. And finally, when we try to call
drm_dp_update_payload_part1() to clean payload for Monitor_B, we will do
nothing at this time since payload for Monitor_B has been cleaned up
previously.
For StarTech 1to3 DP hub, it seems like if we didn't update DPCD payload
ID table then polling for "ACT Handled"(BIT_1 of DPCD 002C0h) will fail
and this polling will last for 3 seconds.
Therefore, guess the best way is we don't set the proposed_vcpi[]
diretly. Let user of these herlper functions to set the proposed_vcpi
directly.
[How]
1. Revert commit 7617e9621bf2 ("drm/dp_mst: clear time slots for ports
invalid")
2. Tackle the issue in previous commit by skipping those trasient
proposed VCPIs. These stale VCPIs shoulde be explicitly cleared by
user later on.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Fixed & Cc tags
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 7617e9621bf2 ("drm/dp_mst: clear time slots for ports invalid")
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.5+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-2-Wayne.L…
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 32b7f8983b94..b41b837db66d 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2501,7 +2501,7 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
{
struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
struct drm_dp_mst_port *port;
- int old_ddps, old_input, ret, i;
+ int old_ddps, ret;
u8 new_pdt;
bool new_mcs;
bool dowork = false, create_connector = false;
@@ -2533,7 +2533,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
}
old_ddps = port->ddps;
- old_input = port->input;
port->input = conn_stat->input_port;
port->ldps = conn_stat->legacy_device_plug_status;
port->ddps = conn_stat->displayport_device_plug_status;
@@ -2555,28 +2554,6 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
dowork = false;
}
- if (!old_input && old_ddps != port->ddps && !port->ddps) {
- for (i = 0; i < mgr->max_payloads; i++) {
- struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
- struct drm_dp_mst_port *port_validated;
-
- if (!vcpi)
- continue;
-
- port_validated =
- container_of(vcpi, struct drm_dp_mst_port, vcpi);
- port_validated =
- drm_dp_mst_topology_get_port_validated(mgr, port_validated);
- if (!port_validated) {
- mutex_lock(&mgr->payload_lock);
- vcpi->num_slots = 0;
- mutex_unlock(&mgr->payload_lock);
- } else {
- drm_dp_mst_topology_put_port(port_validated);
- }
- }
- }
-
if (port->connector)
drm_modeset_unlock(&mgr->base.lock);
else if (create_connector)
@@ -3410,8 +3387,15 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
port = drm_dp_mst_topology_get_port_validated(
mgr, port);
if (!port) {
- mutex_unlock(&mgr->payload_lock);
- return -EINVAL;
+ if (vcpi->num_slots == payload->num_slots) {
+ cur_slots += vcpi->num_slots;
+ payload->start_slot = req_payload.start_slot;
+ continue;
+ } else {
+ drm_dbg_kms("Fail:set payload to invalid sink");
+ mutex_unlock(&mgr->payload_lock);
+ return -EINVAL;
+ }
}
put_port = true;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b22afcdf04c96ca58327784e280e10288cfd3303 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sat, 27 Mar 2021 22:01:36 +0100
Subject: [PATCH] cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov(a)redhat.com>
Reported-by: Joshua Baker <jobaker(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Alexey Klimov <aklimov(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e538518556f4..d2e1692d7bdf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -32,6 +32,7 @@
#include <linux/relay.h>
#include <linux/slab.h>
#include <linux/percpu-rwsem.h>
+#include <linux/cpuset.h>
#include <trace/events/power.h>
#define CREATE_TRACE_POINTS
@@ -873,6 +874,52 @@ void __init cpuhp_threads_init(void)
kthread_unpark(this_cpu_read(cpuhp_state.thread));
}
+/*
+ *
+ * Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
+ * protected region.
+ *
+ * The operation is still serialized against concurrent CPU hotplug via
+ * cpu_add_remove_lock, i.e. CPU map protection. But it is _not_
+ * serialized against other hotplug related activity like adding or
+ * removing of state callbacks and state instances, which invoke either the
+ * startup or the teardown callback of the affected state.
+ *
+ * This is required for subsystems which are unfixable vs. CPU hotplug and
+ * evade lock inversion problems by scheduling work which has to be
+ * completed _before_ cpu_up()/_cpu_down() returns.
+ *
+ * Don't even think about adding anything to this for any new code or even
+ * drivers. It's only purpose is to keep existing lock order trainwrecks
+ * working.
+ *
+ * For cpu_down() there might be valid reasons to finish cleanups which are
+ * not required to be done under cpu_hotplug_lock, but that's a different
+ * story and would be not invoked via this.
+ */
+static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
+{
+ /*
+ * cpusets delegate hotplug operations to a worker to "solve" the
+ * lock order problems. Wait for the worker, but only if tasks are
+ * _not_ frozen (suspend, hibernate) as that would wait forever.
+ *
+ * The wait is required because otherwise the hotplug operation
+ * returns with inconsistent state, which could even be observed in
+ * user space when a new CPU is brought up. The CPU plug uevent
+ * would be delivered and user space reacting on it would fail to
+ * move tasks to the newly plugged CPU up to the point where the
+ * work has finished because up to that point the newly plugged CPU
+ * is not assignable in cpusets/cgroups. On unplug that's not
+ * necessarily a visible issue, but it is still inconsistent state,
+ * which is the real problem which needs to be "fixed". This can't
+ * prevent the transient state between scheduling the work and
+ * returning from waiting for it.
+ */
+ if (!tasks_frozen)
+ cpuset_wait_for_hotplug();
+}
+
#ifdef CONFIG_HOTPLUG_CPU
#ifndef arch_clear_mm_cpumask_cpu
#define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
@@ -1108,6 +1155,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
*/
lockup_detector_cleanup();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
@@ -1302,6 +1350,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
out:
cpus_write_unlock();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b22afcdf04c96ca58327784e280e10288cfd3303 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sat, 27 Mar 2021 22:01:36 +0100
Subject: [PATCH] cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov(a)redhat.com>
Reported-by: Joshua Baker <jobaker(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Alexey Klimov <aklimov(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e538518556f4..d2e1692d7bdf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -32,6 +32,7 @@
#include <linux/relay.h>
#include <linux/slab.h>
#include <linux/percpu-rwsem.h>
+#include <linux/cpuset.h>
#include <trace/events/power.h>
#define CREATE_TRACE_POINTS
@@ -873,6 +874,52 @@ void __init cpuhp_threads_init(void)
kthread_unpark(this_cpu_read(cpuhp_state.thread));
}
+/*
+ *
+ * Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
+ * protected region.
+ *
+ * The operation is still serialized against concurrent CPU hotplug via
+ * cpu_add_remove_lock, i.e. CPU map protection. But it is _not_
+ * serialized against other hotplug related activity like adding or
+ * removing of state callbacks and state instances, which invoke either the
+ * startup or the teardown callback of the affected state.
+ *
+ * This is required for subsystems which are unfixable vs. CPU hotplug and
+ * evade lock inversion problems by scheduling work which has to be
+ * completed _before_ cpu_up()/_cpu_down() returns.
+ *
+ * Don't even think about adding anything to this for any new code or even
+ * drivers. It's only purpose is to keep existing lock order trainwrecks
+ * working.
+ *
+ * For cpu_down() there might be valid reasons to finish cleanups which are
+ * not required to be done under cpu_hotplug_lock, but that's a different
+ * story and would be not invoked via this.
+ */
+static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
+{
+ /*
+ * cpusets delegate hotplug operations to a worker to "solve" the
+ * lock order problems. Wait for the worker, but only if tasks are
+ * _not_ frozen (suspend, hibernate) as that would wait forever.
+ *
+ * The wait is required because otherwise the hotplug operation
+ * returns with inconsistent state, which could even be observed in
+ * user space when a new CPU is brought up. The CPU plug uevent
+ * would be delivered and user space reacting on it would fail to
+ * move tasks to the newly plugged CPU up to the point where the
+ * work has finished because up to that point the newly plugged CPU
+ * is not assignable in cpusets/cgroups. On unplug that's not
+ * necessarily a visible issue, but it is still inconsistent state,
+ * which is the real problem which needs to be "fixed". This can't
+ * prevent the transient state between scheduling the work and
+ * returning from waiting for it.
+ */
+ if (!tasks_frozen)
+ cpuset_wait_for_hotplug();
+}
+
#ifdef CONFIG_HOTPLUG_CPU
#ifndef arch_clear_mm_cpumask_cpu
#define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
@@ -1108,6 +1155,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
*/
lockup_detector_cleanup();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
@@ -1302,6 +1350,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
out:
cpus_write_unlock();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 222a28edce38b62074a950fb243df621c602b4d3 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Thu, 17 Jun 2021 15:58:08 -0700
Subject: [PATCH] docs: Makefile: Use CONFIG_SHELL not SHELL
Fix think-o about which variable to find the Kbuild-configured shell.
This has accidentally worked due to most shells setting $SHELL by
default.
Fixes: 51e46c7a4007 ("docs, parallelism: Rearrange how jobserver reservations are made")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20210617225808.3907377-1-keescook@chromium.org
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 9c42dde97671..c3feb657b654 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -76,7 +76,7 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
PYTHONDONTWRITEBYTECODE=1 \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(PYTHON3) $(srctree)/scripts/jobserver-exec \
- $(SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
+ $(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(srctree)/$(src)) \
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 77347eda64ed5c9383961d1de9165f9d0b7d8df6 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Date: Thu, 24 Jun 2021 17:16:14 +0200
Subject: [PATCH] mmc: core: clear flags before allowing to retune
It might be that something goes wrong during tuning so the MMC core will
immediately trigger a retune. In our case it was:
- we sent a tuning block
- there was an error so we need to send an abort cmd to the eMMC
- the abort cmd had a CRC error
- retune was set by the MMC core
This lead to a vicious circle causing a performance regression of 75%.
So, clear retuning flags before we enable retuning to start with a known
cleared state.
Reported-by Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Suggested-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210624151616.38770-2-wsa+renesas@sang-engineeri…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index b039dcff17f8..95fedcf56e4a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -937,11 +937,14 @@ int mmc_execute_tuning(struct mmc_card *card)
err = host->ops->execute_tuning(host, opcode);
- if (err)
+ if (err) {
pr_err("%s: tuning execution failed: %d\n",
mmc_hostname(host), err);
- else
+ } else {
+ host->retune_now = 0;
+ host->need_retune = 0;
mmc_retune_enable(host);
+ }
return err;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cee93c028288b9af02919f3bd8593ba61d1e610d Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Date: Tue, 27 Apr 2021 11:20:16 +0200
Subject: [PATCH] drm/nouveau: Don't set allow_fb_modifiers explicitly
Since
commit 890880ddfdbe256083170866e49c87618b706ac7
Author: Paul Kocialkowski <paul.kocialkowski(a)bootlin.com>
Date: Fri Jan 4 09:56:10 2019 +0100
drm: Auto-set allow_fb_modifiers when given modifiers at plane init
this is done automatically as part of plane init, if drivers set the
modifier list correctly. Which is the case here.
Note that this fixes an inconsistency: We've set the cap everywhere,
but only nv50+ supports modifiers. Hence cc stable, but not further
back then the patch from Paul.
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org # v5.1 +
Cc: Pekka Paalanen <pekka.paalanen(a)collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Ben Skeggs <bskeggs(a)redhat.com>
Cc: nouveau(a)lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210427092018.832258-6-danie…
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 14101bd2a0ff..929de41c281f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -697,7 +697,6 @@ nouveau_display_create(struct drm_device *dev)
dev->mode_config.preferred_depth = 24;
dev->mode_config.prefer_shadow = 1;
- dev->mode_config.allow_fb_modifiers = true;
if (drm->client.device.info.chipset < 0x11)
dev->mode_config.async_page_flip = false;
Hi Greg, stable all,
On Fri, Jul 09, 2021 at 01:50:16PM +0800, Gao Xiang wrote:
> On Wed, Jul 07, 2021 at 01:15:28AM +0800, kernel test robot wrote:
> > tree: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> > head: 3909e2374335335c9504467caabc906d3f7487e4
> > commit: defcc2b5e54a4724fb5733f802edf5dd596018b6 [7045/7049] lib/lz4: explicitly support in-place decompression
> > config: mips-randconfig-r036-20210706 (attached as .config)
> > compiler: mipsel-linux-gcc (GCC) 9.3.0
> > reproduce (this is a W=1 build):
> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
> > git remote add linux-stable-rc https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > git fetch --no-tags linux-stable-rc linux-5.4.y
> > git checkout defcc2b5e54a4724fb5733f802edf5dd596018b6
> > # save the attached .config to linux build tree
> > mkdir build_dir
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp(a)intel.com>
>
> Which is weird, does the preboot environment miss memmove() on mipsel?
> Just a guess, I may look into that myself later...
>
After manually checking, I found memmove() for the mips preboot environment
was incidentally introduced by commit a510b616131f ("MIPS: Add support for
ZSTD-compressed kernels") which wasn't included in v5.4, but included in
v5.10 as below (so v5.10.y is fine):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/a…
And when I applied the following patch partially from the original
commit, the compile error with the command lines mentioned above was gone:
diff --git a/arch/mips/boot/compressed/string.c b/arch/mips/boot/compressed/string.c
index 43beecc3587c..e9ab7ea592ba 100644
--- a/arch/mips/boot/compressed/string.c
+++ b/arch/mips/boot/compressed/string.c
@@ -27,3 +27,19 @@ void *memset(void *s, int c, size_t n)
ss[i] = c;
return s;
}
+
+void * __weak memmove(void *dest, const void *src, size_t n)
+{
+ unsigned int i;
+ const char *s = src;
+ char *d = dest;
+
+ if ((uintptr_t)dest < (uintptr_t)src) {
+ for (i = 0; i < n; i++)
+ d[i] = s[i];
+ } else {
+ for (i = n; i > 0; i--)
+ d[i - 1] = s[i - 1];
+ }
+ return dest;
+}
How to backport such commit partially to the v5.4.y stable kernel?
... Also, it would be better to check other mips compile combinations
automatically since it's hard for me to check all such combinations
one-by-one...
Thanks,
Gao Xiang
> Thanks,
> Gao Xiang
>
> >
> > All errors (new ones prefixed by >>):
> >
> > mipsel-linux-ld: arch/mips/boot/compressed/decompress.o: in function `LZ4_decompress_safe_withSmallPrefix':
> > decompress.c:(.text+0x220): undefined reference to `memmove'
> > mipsel-linux-ld: arch/mips/boot/compressed/decompress.o: in function `LZ4_decompress_fast_extDict':
> > decompress.c:(.text+0x694): undefined reference to `memmove'
> > >> mipsel-linux-ld: decompress.c:(.text+0x774): undefined reference to `memmove'
> > mipsel-linux-ld: arch/mips/boot/compressed/decompress.o: in function `LZ4_decompress_safe':
> > decompress.c:(.text+0xb88): undefined reference to `memmove'
> > mipsel-linux-ld: arch/mips/boot/compressed/decompress.o: in function `LZ4_decompress_safe_partial':
> > decompress.c:(.text+0x1078): undefined reference to `memmove'
> > mipsel-linux-ld: arch/mips/boot/compressed/decompress.o:decompress.c:(.text+0x12f8): more undefined references to `memmove' follow
> >
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 60a6b73dd821e98fe958b2a83393ccd724b306b1 Mon Sep 17 00:00:00 2001
From: Paul Cercueil <paul(a)crapouillou.net>
Date: Tue, 23 Mar 2021 14:40:08 +0000
Subject: [PATCH] drm/ingenic: Fix pixclock rate for 24-bit serial panels
When using a 24-bit panel on a 8-bit serial bus, the pixel clock
requested by the panel has to be multiplied by 3, since the subpixels
are shifted sequentially.
The code (in ingenic_drm_encoder_atomic_check) already computed
crtc_state->adjusted_mode->crtc_clock accordingly, but clk_set_rate()
used crtc_state->adjusted_mode->clock instead.
Fixes: 28ab7d35b6e0 ("drm/ingenic: Properly compute timings when using a 3x8-bit panel")
Cc: stable(a)vger.kernel.org # v5.10
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
Tested-by: H. Nikolaus Schaller <hns(a)goldelico.com> # CI20/jz4780 (HDMI) and Alpha400/jz4730 (LCD)
Acked-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323144008.166248-1-paul@…
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 09225b770bb8..389cad59e090 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -342,7 +342,7 @@ static void ingenic_drm_crtc_atomic_flush(struct drm_crtc *crtc,
if (priv->update_clk_rate) {
mutex_lock(&priv->clk_mutex);
clk_set_rate(priv->pix_clk,
- crtc_state->adjusted_mode.clock * 1000);
+ crtc_state->adjusted_mode.crtc_clock * 1000);
priv->update_clk_rate = false;
mutex_unlock(&priv->clk_mutex);
}
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3769e4c0af5b82c8ea21d037013cb9564dfaa51f Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 16 Jun 2021 11:55:01 +0800
Subject: [PATCH] drm/dp_mst: Avoid to mess up payload table by ports in stale
topology
[Why]
After unplug/hotplug hub from the system, userspace might start to
clear stale payloads gradually. If we call drm_dp_mst_deallocate_vcpi()
to release stale VCPI of those ports which are not relating to current
topology, we have chane to wrongly clear active payload table entry for
current topology.
E.g.
We have allocated VCPI 1 in current payload table and we call
drm_dp_mst_deallocate_vcpi() to clean VCPI 1 in stale topology. In
drm_dp_mst_deallocate_vcpi(), it will call drm_dp_mst_put_payload_id()
tp put VCPI 1 and which means ID 1 is available again. Thereafter, if we
want to allocate a new payload stream, it will find ID 1 is available by
drm_dp_mst_assign_payload_id(). However, ID 1 is being used
[How]
Check target sink is relating to current topology or not before doing
any payload table update.
Searching upward to find the target sink's relevant root branch device.
If the found root branch device is not the same root of current
topology, don't update payload table.
Changes since v1:
* Change debug macro to use drm_dbg_kms() instead
* Amend the commit message to add Cc tag.
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616035501.3776-3-Wayne.L…
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index b41b837db66d..9ac148efd9e4 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -94,6 +94,9 @@ static int drm_dp_mst_register_i2c_bus(struct drm_dp_mst_port *port);
static void drm_dp_mst_unregister_i2c_bus(struct drm_dp_mst_port *port);
static void drm_dp_mst_kick_tx(struct drm_dp_mst_topology_mgr *mgr);
+static bool drm_dp_mst_port_downstream_of_branch(struct drm_dp_mst_port *port,
+ struct drm_dp_mst_branch *branch);
+
#define DBG_PREFIX "[dp_mst]"
#define DP_STR(x) [DP_ ## x] = #x
@@ -3366,6 +3369,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
struct drm_dp_mst_port *port;
int i, j;
int cur_slots = 1;
+ bool skip;
mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3380,6 +3384,14 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
port = container_of(vcpi, struct drm_dp_mst_port,
vcpi);
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip) {
+ drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
+ continue;
+ }
/* Validated ports don't matter if we're releasing
* VCPI
*/
@@ -3479,6 +3491,7 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
struct drm_dp_mst_port *port;
int i;
int ret = 0;
+ bool skip;
mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3488,6 +3501,13 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip)
+ continue;
+
drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr->payloads[i].payload_state);
if (mgr->payloads[i].payload_state == DP_PAYLOAD_LOCAL) {
ret = drm_dp_create_payload_step2(mgr, port, mgr->proposed_vcpis[i]->vcpi, &mgr->payloads[i]);
@@ -4574,9 +4594,18 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_mst_port *port)
{
+ bool skip;
+
if (!port->vcpi.vcpi)
return;
+ mutex_lock(&mgr->lock);
+ skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
+ mutex_unlock(&mgr->lock);
+
+ if (skip)
+ return;
+
drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
port->vcpi.num_slots = 0;
port->vcpi.pbn = 0;
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 511eea5e2ccdfdbf3d626bde0314e551f247dd18 Mon Sep 17 00:00:00 2001
From: "Naveen N. Rao" <naveen.n.rao(a)linux.vnet.ibm.com>
Date: Wed, 23 Jun 2021 05:23:30 +0000
Subject: [PATCH] powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to
emulate_step() on ppc32
Trying to use a kprobe on ppc32 results in the below splat:
BUG: Unable to handle kernel data access on read at 0x7c0802a6
Faulting instruction address: 0xc002e9f0
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PowerPC 44x Platform
Modules linked in:
CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
NIP: c002e9f0 LR: c0011858 CTR: 00008a47
REGS: c292fd50 TRAP: 0300 Not tainted (5.13.0-rc1-01824-g3a81c0495fdb)
MSR: 00009000 <EE,ME> CR: 24002002 XER: 20000000
DEAR: 7c0802a6 ESR: 00000000
<snip>
NIP [c002e9f0] emulate_step+0x28/0x324
LR [c0011858] optinsn_slot+0x128/0x10000
Call Trace:
opt_pre_handler+0x7c/0xb4 (unreliable)
optinsn_slot+0x128/0x10000
ret_from_syscall+0x0/0x28
The offending instruction is:
81 24 00 00 lwz r9,0(r4)
Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.
Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/5bdc8cbc9a95d0779e27c9ddbf42b40f51f883c0.16244257…
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 8b9f82dc6ece..c79899abcec8 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -228,8 +228,12 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
/*
* 3. load instruction to be emulated into relevant register, and
*/
- temp = ppc_inst_read(p->ainsn.insn);
- patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);
+ if (IS_ENABLED(CONFIG_PPC64)) {
+ temp = ppc_inst_read(p->ainsn.insn);
+ patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);
+ } else {
+ patch_imm_load_insns((unsigned long)p->ainsn.insn, 4, buff + TMPL_INSN_IDX);
+ }
/*
* 4. branch back from trampoline
Eugeniu Rosca <erosca(a)de.adit-jv.com> wrote:
>
> Hello,
>
> On Thu, Jun 03, 2021 at 12:15:07PM -0500, Andrew Gabbasov wrote:
> > FunctionFS device structure 'struct ffs_dev' and driver data structure
> > 'struct ffs_data' are bound to each other with cross-reference pointers
> > 'ffs_data->private_data' and 'ffs_dev->ffs_data'. While the first one
> > is supposed to be valid through the whole life of 'struct ffs_data'
> > (and while 'struct ffs_dev' exists non-freed), the second one is cleared
> > in 'ffs_closed()' (called from 'ffs_data_reset()' or the last
> > 'ffs_data_put()'). This can be called several times, alternating in
> > different order with 'ffs_free_inst()', that, if possible, clears
> > the other cross-reference.
> >
[Skip some comment...]
> I confirm there are at least two KASAN use-after-free issues
> consistently/100% reproducible on v5.13-rc4-88-gf88cd3fb9df2:
>
> https://gist.github.com/erosca/b5976a96789e574b319cb9e076938b5c
> https://gist.github.com/erosca/4ded55ed32f0133bc2f4ccfe821c7776
>
> These two can no longer be seen after the patch is applied.
>
> In addition, below static analysis tools did not spot any regressions:
> cppcheck 2.4, smatch v0.5.0-7445-g58776ae33ae8, make W=1, coccicheck
>
> Reviewed-by: Eugeniu Rosca <erosca(a)de.adit-jv.com>
> Tested-by: Eugeniu Rosca <erosca(a)de.adit-jv.com>
>
> --
> Best regards,
> Eugeniu Rosca
It like there is similar issue on kernel-4.14 reported by our customer
(Android).
The back trace are similar.
It looks like this patch has fixed issue existed in earlier kernels.
Could Engeniu and Andrew help to comment if this fix is suggested to be pick to
stable-tree? I've tried to port it onto kernel-4.14, kernel-4.19, and
kernel-5.10.
But it seems there is some revise work to do.
If the origin issue affect multiple LTS kernel versions, then it will
be better to be
cherry-pick to stable-tree after it has been merged.
Thanks!
--
Best regards,
Macpaul Lin
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 511eea5e2ccdfdbf3d626bde0314e551f247dd18 Mon Sep 17 00:00:00 2001
From: "Naveen N. Rao" <naveen.n.rao(a)linux.vnet.ibm.com>
Date: Wed, 23 Jun 2021 05:23:30 +0000
Subject: [PATCH] powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to
emulate_step() on ppc32
Trying to use a kprobe on ppc32 results in the below splat:
BUG: Unable to handle kernel data access on read at 0x7c0802a6
Faulting instruction address: 0xc002e9f0
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PowerPC 44x Platform
Modules linked in:
CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
NIP: c002e9f0 LR: c0011858 CTR: 00008a47
REGS: c292fd50 TRAP: 0300 Not tainted (5.13.0-rc1-01824-g3a81c0495fdb)
MSR: 00009000 <EE,ME> CR: 24002002 XER: 20000000
DEAR: 7c0802a6 ESR: 00000000
<snip>
NIP [c002e9f0] emulate_step+0x28/0x324
LR [c0011858] optinsn_slot+0x128/0x10000
Call Trace:
opt_pre_handler+0x7c/0xb4 (unreliable)
optinsn_slot+0x128/0x10000
ret_from_syscall+0x0/0x28
The offending instruction is:
81 24 00 00 lwz r9,0(r4)
Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.
Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/5bdc8cbc9a95d0779e27c9ddbf42b40f51f883c0.16244257…
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 8b9f82dc6ece..c79899abcec8 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -228,8 +228,12 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
/*
* 3. load instruction to be emulated into relevant register, and
*/
- temp = ppc_inst_read(p->ainsn.insn);
- patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);
+ if (IS_ENABLED(CONFIG_PPC64)) {
+ temp = ppc_inst_read(p->ainsn.insn);
+ patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);
+ } else {
+ patch_imm_load_insns((unsigned long)p->ainsn.insn, 4, buff + TMPL_INSN_IDX);
+ }
/*
* 4. branch back from trampoline
Commit 557bb37533a3 ("mac80211: do not accept/forward invalid EAPOL
frames") uses skb_mac_header() before eth_type_trans() is called
leading to incorrect pointer, the pointer gets written to. This issue
has appeared during backporting to 4.4, 4.9 and 4.14.
Fixes: 557bb37533a3 ("mac80211: do not accept/forward invalid EAPOL frames")
Link: https://lore.kernel.org/r/CAHQn7pKcyC_jYmGyTcPCdk9xxATwW5QPNph=bsZV8d-HPwNs…
Cc: <stable(a)vger.kernel.org> # 4.14.x
Signed-off-by: Davis Mosenkovs <davis(a)mosenkovs.lv>
---
net/mac80211/rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index ac2c52709e1c..87926c6fe0bf 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2404,7 +2404,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
#endif
if (skb) {
- struct ethhdr *ehdr = (void *)skb_mac_header(skb);
+ struct ethhdr *ehdr = (struct ethhdr *)skb->data;
/* deliver to local stack */
skb->protocol = eth_type_trans(skb, dev);
--
2.25.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 22 Jun 2021 09:15:35 +0200
Subject: [PATCH] fuse: reject internal errno
Don't allow userspace to report errors that could be kernel-internal.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko(a)gmail.com>
Fixes: 334f485df85a ("[PATCH] FUSE - device functions")
Cc: <stable(a)vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6e63bcba2a40..b8d58aa08206 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1867,7 +1867,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
}
err = -EINVAL;
- if (oh.error <= -1000 || oh.error > 0)
+ if (oh.error <= -512 || oh.error > 0)
goto copy_finish;
spin_lock(&fpq->lock);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9078204ca5c33ba20443a8623a41a68a9995a70d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Fri, 25 Jun 2021 00:49:00 +0200
Subject: [PATCH] serial: mvebu-uart: fix calculation of clock divisor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The clock divisor should be rounded to the closest value.
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Fixes: 68a0db1d7da2 ("serial: mvebu-uart: add function to change baudrate")
Cc: stable(a)vger.kernel.org # 0e4cf69ede87 ("serial: mvebu-uart: clarify the baud rate derivation")
Link: https://lore.kernel.org/r/20210624224909.6350-2-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index 04c41689d81c..f3ecbcf495ee 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -463,7 +463,7 @@ static int mvebu_uart_baud_rate_set(struct uart_port *port, unsigned int baud)
* makes use of D to configure the desired baudrate.
*/
m_divisor = OSAMP_DEFAULT_DIVISOR;
- d_divisor = DIV_ROUND_UP(port->uartclk, baud * m_divisor);
+ d_divisor = DIV_ROUND_CLOSEST(port->uartclk, baud * m_divisor);
brdv = readl(port->membase + UART_BRDV);
brdv &= ~BRDV_BAUD_MASK;
Hi Greg, Sasha,
Patches below are some collections of bugfixes, those fixes are
found when we are using the stable 5.10 kernel, please consider
to apply them, I also Cced the author and maintainer for each
patch to see if any objections.
Patch 2/7 will fix the failure of LTP test case 'move_pages 12',
and patch 3/7 is not a bugfix but a preparation for later bugfixes,
other patches are obvious bugfixes.
Gulam Mohamed (1):
scsi: iscsi: Fix race condition between login and sync thread
Jens Axboe (1):
io_uring: convert io_buffer_idr to XArray
Matthew Wilcox (Oracle) (1):
io_uring: Convert personality_idr to XArray
Mauricio Faria de Oliveira (1):
loop: fix I/O error on fsync() in detached loop devices
Mike Christie (1):
scsi: iscsi: Fix iSCSI cls conn state
Oscar Salvador (1):
mm,hwpoison: return -EBUSY when migration fails
Yejune Deng (1):
io_uring: simplify io_remove_personalities()
drivers/block/loop.c | 3 +
drivers/scsi/libiscsi.c | 26 +-------
drivers/scsi/scsi_transport_iscsi.c | 28 ++++++++-
fs/io_uring.c | 116 +++++++++++++++---------------------
include/scsi/scsi_transport_iscsi.h | 1 +
mm/memory-failure.c | 6 +-
6 files changed, 85 insertions(+), 95 deletions(-)
--
1.7.12.4
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 122e093c1734361dedb64f65c99b93e28e4624f4 Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt(a)kernel.org>
Date: Mon, 28 Jun 2021 19:33:26 -0700
Subject: [PATCH] mm/page_alloc: fix memory map initialization for descending
nodes
On systems with memory nodes sorted in descending order, for instance Dell
Precision WorkStation T5500, the struct pages for higher PFNs and
respectively lower nodes, could be overwritten by the initialization of
struct pages corresponding to the holes in the memory sections.
For example for the below memory layout
[ 0.245624] Early memory node ranges
[ 0.248496] node 1: [mem 0x0000000000001000-0x0000000000090fff]
[ 0.251376] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
[ 0.254256] node 1: [mem 0x0000000100000000-0x0000001423ffffff]
[ 0.257144] node 0: [mem 0x0000001424000000-0x0000002023ffffff]
the range 0x1424000000 - 0x1428000000 in the beginning of node 0 starts in
the middle of a section and will be considered as a hole during the
initialization of the last section in node 1.
The wrong initialization of the memory map causes panic on boot when
CONFIG_DEBUG_VM is enabled.
Reorder loop order of the memory map initialization so that the outer loop
will always iterate over populated memory regions in the ascending order
and the inner loop will select the zone corresponding to the PFN range.
This way initialization of the struct pages for the memory holes will be
always done for the ranges that are actually not populated.
[akpm(a)linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/YNXlMqBbL+tBG7yq@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213073
Link: https://lkml.kernel.org/r/20210624062305.10940-1-rppt@kernel.org
Fixes: 0740a50b9baa ("mm/page_alloc.c: refactor initialization of struct page for holes in memory layout")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Boris Petkov <bp(a)alien8.de>
Cc: Robert Shteynfeld <robert.shteynfeld(a)gmail.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..9afb8998e7e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2474,7 +2474,6 @@ extern void set_dma_reserve(unsigned long new_dma_reserve);
extern void memmap_init_range(unsigned long, int, unsigned long,
unsigned long, unsigned long, enum meminit_context,
struct vmem_altmap *, int migratetype);
-extern void memmap_init_zone(struct zone *zone);
extern void setup_per_zone_wmarks(void);
extern int __meminit init_per_zone_wmark_min(void);
extern void mem_init(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef2265f86b91..5b5c9f5813b9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6400,7 +6400,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
return;
/*
- * The call to memmap_init_zone should have already taken care
+ * The call to memmap_init should have already taken care
* of the pages reserved for the memmap, so we can just jump to
* the end of that region and start processing the device pages.
*/
@@ -6465,7 +6465,7 @@ static void __meminit zone_init_free_lists(struct zone *zone)
/*
* Only struct pages that correspond to ranges defined by memblock.memory
* are zeroed and initialized by going through __init_single_page() during
- * memmap_init_zone().
+ * memmap_init_zone_range().
*
* But, there could be struct pages that correspond to holes in
* memblock.memory. This can happen because of the following reasons:
@@ -6484,9 +6484,9 @@ static void __meminit zone_init_free_lists(struct zone *zone)
* zone/node above the hole except for the trailing pages in the last
* section that will be appended to the zone/node below.
*/
-static u64 __meminit init_unavailable_range(unsigned long spfn,
- unsigned long epfn,
- int zone, int node)
+static void __init init_unavailable_range(unsigned long spfn,
+ unsigned long epfn,
+ int zone, int node)
{
unsigned long pfn;
u64 pgcnt = 0;
@@ -6502,56 +6502,77 @@ static u64 __meminit init_unavailable_range(unsigned long spfn,
pgcnt++;
}
- return pgcnt;
+ if (pgcnt)
+ pr_info("On node %d, zone %s: %lld pages in unavailable ranges",
+ node, zone_names[zone], pgcnt);
}
#else
-static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
- int zone, int node)
+static inline void init_unavailable_range(unsigned long spfn,
+ unsigned long epfn,
+ int zone, int node)
{
- return 0;
}
#endif
-void __meminit __weak memmap_init_zone(struct zone *zone)
+static void __init memmap_init_zone_range(struct zone *zone,
+ unsigned long start_pfn,
+ unsigned long end_pfn,
+ unsigned long *hole_pfn)
{
unsigned long zone_start_pfn = zone->zone_start_pfn;
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
- int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
- static unsigned long hole_pfn;
+ int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+
+ start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
+ end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
+
+ if (start_pfn >= end_pfn)
+ return;
+
+ memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
+ zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
+
+ if (*hole_pfn < start_pfn)
+ init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+
+ *hole_pfn = end_pfn;
+}
+
+static void __init memmap_init(void)
+{
unsigned long start_pfn, end_pfn;
- u64 pgcnt = 0;
+ unsigned long hole_pfn = 0;
+ int i, j, zone_id, nid;
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
- start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
- end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+ struct pglist_data *node = NODE_DATA(nid);
+
+ for (j = 0; j < MAX_NR_ZONES; j++) {
+ struct zone *zone = node->node_zones + j;
- if (end_pfn > start_pfn)
- memmap_init_range(end_pfn - start_pfn, nid,
- zone_id, start_pfn, zone_end_pfn,
- MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
+ if (!populated_zone(zone))
+ continue;
- if (hole_pfn < start_pfn)
- pgcnt += init_unavailable_range(hole_pfn, start_pfn,
- zone_id, nid);
- hole_pfn = end_pfn;
+ memmap_init_zone_range(zone, start_pfn, end_pfn,
+ &hole_pfn);
+ zone_id = j;
+ }
}
#ifdef CONFIG_SPARSEMEM
/*
- * Initialize the hole in the range [zone_end_pfn, section_end].
- * If zone boundary falls in the middle of a section, this hole
- * will be re-initialized during the call to this function for the
- * higher zone.
+ * Initialize the memory map for hole in the range [memory_end,
+ * section_end].
+ * Append the pages in this hole to the highest zone in the last
+ * node.
+ * The call to init_unavailable_range() is outside the ifdef to
+ * silence the compiler warining about zone_id set but not used;
+ * for FLATMEM it is a nop anyway
*/
- end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION);
+ end_pfn = round_up(end_pfn, PAGES_PER_SECTION);
if (hole_pfn < end_pfn)
- pgcnt += init_unavailable_range(hole_pfn, end_pfn,
- zone_id, nid);
#endif
-
- if (pgcnt)
- pr_info(" %s zone: %llu pages in unavailable ranges\n",
- zone->name, pgcnt);
+ init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
}
static int zone_batchsize(struct zone *zone)
@@ -7254,7 +7275,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
set_pageblock_order();
setup_usemap(zone);
init_currently_empty_zone(zone, zone->zone_start_pfn, size);
- memmap_init_zone(zone);
}
}
@@ -7780,6 +7800,8 @@ void __init free_area_init(unsigned long *max_zone_pfn)
node_set_state(nid, N_MEMORY);
check_for_memory(pgdat, nid);
}
+
+ memmap_init();
}
static int __init cmdline_parse_core(char *p, unsigned long *core,
Hi,
At some point the USB 3 port broke on Rock64 boards; users report it
working back on 4.4 kernels, but by 5.3 it wasn't any more (eg,
https://forum.armbian.com/topic/12439-rock64-usb-3-broken/).
These two patches add a USB3 node to the device-tree, which allows it
to work again. I've tested a gigabit nic and was able to get
close to 1000mbit speeds over the USB 3 ports. I'd love to see
these two unintrusive patches added to 5.10.
Thanks,
Andres
Cameron Nemo (2):
arm64: dts: rockchip: add rk3328 dwc3 usb controller node
arm64: dts: rockchip: Enable USB3 for rk3328 Rock64
arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 5 +++++
arch/arm64/boot/dts/rockchip/rk3328.dtsi | 19 +++++++++++++++++++
2 files changed, 24 insertions(+)
LKFT have noticed these warnings / errors when we have updated gcc version from
gcc-9 to gcc-11 on stable-rc linux-5.4.y branch. I have provided the steps to
reproduce in this email below.
Following perf builds failed with gcc-11 with linux-5.4.y branch.
- build-arm-gcc-11-perf
- build-arm64-gcc-11-perf
- build-i386-gcc-11-perf
- build-x86-gcc-11-perf
Build error log:
--------------------
find: 'x86_64-linux-gnu-gcc/arch': No such file or directory
error: Found argument '-I' which wasn't expected, or isn't valid in this context
USAGE:
sccache [FLAGS] [OPTIONS] [cmd]...
For more information try --help
error: Found argument '-I' which wasn't expected, or isn't valid in this context
USAGE:
sccache [FLAGS] [OPTIONS] [cmd]...
For more information try --help
In function 'ready',
inlined from 'sender' at bench/sched-messaging.c:87:2:
bench/sched-messaging.c:73:13: error: 'dummy' may be used
uninitialized [-Werror=maybe-uninitialized]
73 | if (write(ready_out, &dummy, 1) != 1)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from bench/sched-messaging.c:22:
bench/sched-messaging.c: In function 'sender':
/usr/x86_64-linux-gnu/include/unistd.h:366:16: note: by argument 2 of
type 'const void *' to 'write' declared here
366 | extern ssize_t write (int __fd, const void *__buf, size_t __n) __wur;
| ^~~~~
bench/sched-messaging.c:69:14: note: 'dummy' declared here
69 | char dummy;
| ^~~~~
cc1: all warnings being treated as errors
ref:
https://builds.tuxbuild.com/1vEIWryaujwVtL4wmodXkz1djUa/https://builds.tuxbuild.com/1vEIX7NTo5OpaN9nrs2UvO74oxB/
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Steps to reproduce:
---------------------------
tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-11
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1vEIWryaujwVtL4wmodXkz1djUa/config headers
kernel modules perf
--
Linaro LKFT
https://lkft.linaro.org
Dear Greg,
Our customers have feedback some similar issues as below link on Android
kernel-4.14 and kernel-4.19.
Link: https://marc.info/?l=linux-kernel&m=138695698516487
They've reported system become abnormal when OTG U-disk has been plugged
out after the system has been suspended.
After debugging, we've found the root cause is the same of the issue
(link) has been reported. We've also tested the patch "bdi: Do not use
freezable workqueue" is worked.
Link: https://lkml.org/lkml/2019/10/4/176
commit id in Linus tree: a2b90f11217790ec0964ba9c93a4abb369758c26
However, we've checked that patch seems hasn't been applied to stable
tree (We've checked 4.14 and 4.19). Would you please help to cherry-pick
this patch to stable trees (and to Android trees)?
Thanks!
Macpaul Lin
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c8671c7dc7d51125ab9f651697866bf4a9132277 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel(a)suse.de>
Date: Mon, 26 Apr 2021 10:17:48 +0200
Subject: [PATCH] crypto: ccp - Annotate SEV Firmware file names
Annotate the firmware files CCP might need using MODULE_FIRMWARE().
This will get them included into an initrd when CCP is also included
there. Otherwise the CCP module will not find its firmware when loaded
before the root-fs is mounted.
This can cause problems when the pre-loaded SEV firmware is too old to
support current SEV and SEV-ES virtualization features.
Fixes: e93720606efd ("crypto: ccp - Allow SEV firmware to be chosen based on Family and Model")
Cc: stable(a)vger.kernel.org # v4.20+
Acked-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 3506b2050fb8..91808402e0bf 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -43,6 +43,10 @@ static int psp_probe_timeout = 5;
module_param(psp_probe_timeout, int, 0644);
MODULE_PARM_DESC(psp_probe_timeout, " default timeout value, in seconds, during PSP device probe");
+MODULE_FIRMWARE("amd/amd_sev_fam17h_model0xh.sbin"); /* 1st gen EPYC */
+MODULE_FIRMWARE("amd/amd_sev_fam17h_model3xh.sbin"); /* 2nd gen EPYC */
+MODULE_FIRMWARE("amd/amd_sev_fam19h_model0xh.sbin"); /* 3rd gen EPYC */
+
static bool psp_dead;
static int psp_timeout;
From: Eric Biggers <ebiggers(a)google.com>
commit 77f30bfcfcf484da7208affd6a9e63406420bf91 upstream.
[Please apply to 5.4-stable, 4.19-stable, and 4.14-stable.]
When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the
minor_hash to 0 if the (major) hash is 0.
This doesn't make sense because 0 is a valid hash code, so we shouldn't
ignore the filesystem-provided minor_hash in that case. Fix this by
removing the special case for 'hash == 0'.
This is an old bug that appears to have originated when the encryption
code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and
f2fs code passed the hash by pointer instead of by value. So
'if (hash)' actually made sense then, as it was checking whether a
pointer was NULL. But now the hashes are passed by value, and
filesystems just pass 0 for any hashes they don't have. There is no
need to handle this any differently from the hashes actually being 0.
It is difficult to reproduce this bug, as it only made a difference in
the case where a filename's 32-bit major hash happened to be 0.
However, it probably had the largest chance of causing problems on
ubifs, since ubifs uses minor_hash to do lookups of no-key names, in
addition to using it as a readdir cookie. ext4 only uses minor_hash as
a readdir cookie, and f2fs doesn't use minor_hash at all.
Fixes: 0b81d0779072 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto")
Cc: <stable(a)vger.kernel.org> # v4.6+
Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/crypto/fname.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 3da3707c10e3..891328f09b3c 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -273,13 +273,8 @@ int fscrypt_fname_disk_to_usr(struct inode *inode,
oname->name);
return 0;
}
- if (hash) {
- digested_name.hash = hash;
- digested_name.minor_hash = minor_hash;
- } else {
- digested_name.hash = 0;
- digested_name.minor_hash = 0;
- }
+ digested_name.hash = hash;
+ digested_name.minor_hash = minor_hash;
memcpy(digested_name.digest,
FSCRYPT_FNAME_DIGEST(iname->name, iname->len),
FSCRYPT_FNAME_DIGEST_SIZE);
--
2.32.0
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1421ec684a43379b2aa3cfda20b03d38282dc990 Mon Sep 17 00:00:00 2001
From: Xiaochen Shen <xiaochen.shen(a)intel.com>
Date: Thu, 27 May 2021 17:31:53 +0800
Subject: [PATCH] selftests/resctrl: Fix incorrect parsing of option "-t"
Resctrl test suite accepts command line argument "-t" to specify the
unit tests to run in the test list (e.g., -t mbm,mba,cmt,cat) as
documented in the help.
When calling strtok() to parse the option, the incorrect delimiters
argument ":\t" is used. As a result, passing "-t mbm,mba,cmt,cat" throws
an invalid option error.
Fix this by using delimiters argument "," instead of ":\t" for parsing
of unit tests list. At the same time, remove the unnecessary "spaces"
between the unit tests in help documentation to prevent confusion.
Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Fixes: 78941183d1b1 ("selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest")
Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test")
Fixes: 034c7678dd2c ("selftests/resctrl: Add README for resctrl tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Xiaochen Shen <xiaochen.shen(a)intel.com>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
diff --git a/tools/testing/selftests/resctrl/README b/tools/testing/selftests/resctrl/README
index 4b36b25b6ac0..3d2bbd4fa3aa 100644
--- a/tools/testing/selftests/resctrl/README
+++ b/tools/testing/selftests/resctrl/README
@@ -47,7 +47,7 @@ Parameter '-h' shows usage information.
usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n no_of_bits]
-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CMT default benchmark is builtin fill_buf
- -t test list: run tests specified in the test list, e.g. -t mbm, mba, cmt, cat
+ -t test list: run tests specified in the test list, e.g. -t mbm,mba,cmt,cat
-n no_of_bits: run cache tests using specified no of bits in cache bit mask
-p cpu_no: specify CPU number to run the test. 1 is default
-h: help
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index f51b5fc066a3..973f09a66e1e 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -40,7 +40,7 @@ static void cmd_help(void)
printf("\t-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CMT\n");
printf("\t default benchmark is builtin fill_buf\n");
printf("\t-t test list: run tests specified in the test list, ");
- printf("e.g. -t mbm, mba, cmt, cat\n");
+ printf("e.g. -t mbm,mba,cmt,cat\n");
printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");
printf("\t-p cpu_no: specify CPU number to run the test. 1 is default\n");
printf("\t-h: help\n");
@@ -173,7 +173,7 @@ int main(int argc, char **argv)
return -1;
}
- token = strtok(NULL, ":\t");
+ token = strtok(NULL, ",");
}
break;
case 'p':
Hi Greg,
Resume of the Tegra194 PCI controller was broken in Linux v5.11 but has
now been fixed for Linux v5.14. Unfortunately, we have missed v5.11 now,
but please can you pull the following commit into Linux v5.12.y and
Linux v5.13.y stable branches?
c4bf1f25c6c1 PCI: tegra194: Fix host initialization during resume
Thanks!
Jon
--
nvpublic
From: Nicolas Saenz Julienne <nsaenzju(a)redhat.com>
31cd0e119d50 ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.
In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1
base->clk = 0;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account
no timer operations happened:
base->clk = 1;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
'struct timer_base' any bigger.
Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju(a)redhat.com>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
---
kernel/time/timer.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 3fadb58fc9d7..9eb11c2209e5 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -207,6 +207,7 @@ struct timer_base {
unsigned int cpu;
bool next_expiry_recalc;
bool is_idle;
+ bool timers_pending;
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -595,6 +596,7 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
* can reevaluate the wheel:
*/
base->next_expiry = bucket_expiry;
+ base->timers_pending = true;
base->next_expiry_recalc = false;
trigger_dyntick_cpu(base, timer);
}
@@ -1582,6 +1584,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
}
base->next_expiry_recalc = false;
+ base->timers_pending = !(next == base->clk + NEXT_TIMER_MAX_DELTA);
return next;
}
@@ -1633,7 +1636,6 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
u64 expires = KTIME_MAX;
unsigned long nextevt;
- bool is_max_delta;
/*
* Pretend that there is no timer pending if the cpu is offline.
@@ -1646,7 +1648,6 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
if (base->next_expiry_recalc)
base->next_expiry = __next_timer_interrupt(base);
nextevt = base->next_expiry;
- is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA);
/*
* We have a fresh next event. Check whether we can forward the
@@ -1664,7 +1665,7 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
expires = basem;
base->is_idle = false;
} else {
- if (!is_max_delta)
+ if (base->timers_pending)
expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
/*
* If we expect to sleep more than a tick, mark the base idle.
@@ -1947,6 +1948,7 @@ int timers_prepare_cpu(unsigned int cpu)
base = per_cpu_ptr(&timer_bases[b], cpu);
base->clk = jiffies;
base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->timers_pending = false;
base->is_idle = false;
}
return 0;
--
2.25.1
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:
CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)
handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)
tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A
Fix this with simply locking process wide cputime counting start and
timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: stable(a)vger.kernel.org
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
---
kernel/time/posix-cpu-timers.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 29a5e54e6e10..517be7fd175e 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -991,6 +991,11 @@ static void posix_cpu_timer_rearm(struct k_itimer *timer)
if (!p)
goto out;
+ /* Protect timer list r/w in arm_timer() */
+ sighand = lock_task_sighand(p, &flags);
+ if (unlikely(sighand == NULL))
+ goto out;
+
/*
* Fetch the current sample and update the timer's expiry time.
*/
@@ -1001,11 +1006,6 @@ static void posix_cpu_timer_rearm(struct k_itimer *timer)
bump_cpu_timer(timer, now);
- /* Protect timer list r/w in arm_timer() */
- sighand = lock_task_sighand(p, &flags);
- if (unlikely(sighand == NULL))
- goto out;
-
/*
* Now re-arm for the new expiry time.
*/
--
2.25.1
The compatibility issue between linux exfat and exfat of some camera
company was reported from Florian. In their exfat, if the number of files
exceeds any limit, the DataLength in stream entry of the directory is
no longer updated. So some files created from camera does not show in
linux exfat. because linux exfat doesn't allow that cpos becomes larger
than DataLength of stream entry. This patch check DataLength in stream
entry only if the type is ALLOC_NO_FAT_CHAIN and add the check ensure
that dentry offset does not exceed max dentries size(256 MB) to avoid
the circular FAT chain issue.
Fixes: ca06197382bd ("exfat: add directory operations")
Cc: stable(a)vger.kernel.org # v5.9
Reported-by: Florian Cramer <flrncrmr(a)gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon(a)samsung.com>
---
fs/exfat/dir.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index c4523648472a..f4e4d8d9894d 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -63,7 +63,7 @@ static void exfat_get_uniname_from_ext_entry(struct super_block *sb,
static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_entry *dir_entry)
{
int i, dentries_per_clu, dentries_per_clu_bits = 0, num_ext;
- unsigned int type, clu_offset;
+ unsigned int type, clu_offset, max_dentries;
sector_t sector;
struct exfat_chain dir, clu;
struct exfat_uni_name uni_name;
@@ -86,6 +86,8 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
dentries_per_clu = sbi->dentries_per_clu;
dentries_per_clu_bits = ilog2(dentries_per_clu);
+ max_dentries = (unsigned int)min_t(u64, MAX_EXFAT_DENTRIES,
+ (u64)sbi->num_clusters << dentries_per_clu_bits);
clu_offset = dentry >> dentries_per_clu_bits;
exfat_chain_dup(&clu, &dir);
@@ -109,7 +111,7 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
}
}
- while (clu.dir != EXFAT_EOF_CLUSTER) {
+ while (clu.dir != EXFAT_EOF_CLUSTER && dentry < max_dentries) {
i = dentry & (dentries_per_clu - 1);
for ( ; i < dentries_per_clu; i++, dentry++) {
@@ -245,7 +247,7 @@ static int exfat_iterate(struct file *filp, struct dir_context *ctx)
if (err)
goto unlock;
get_new:
- if (cpos >= i_size_read(inode))
+ if (ei->flags == ALLOC_NO_FAT_CHAIN && cpos >= i_size_read(inode))
goto end_of_dir;
err = exfat_readdir(inode, &cpos, &de);
--
2.17.1
From: Tian Tao <tiantao6(a)hisilicon.com>
[ Upstream commit 7d614ab2f20503ed8766363d41f8607337571adf ]
fixed the below warning:
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c:84:2-8: WARNING: NULL check
before some freeing functions is not needed.
Signed-off-by: Tian Tao <tiantao6(a)hisilicon.com>
Acked-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Lucas Stach <l.stach(a)pengutronix.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
index f21529e635e3..dd814d42a0f9 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
@@ -75,8 +75,7 @@ static void etnaviv_gem_prime_release(struct etnaviv_gem_object *etnaviv_obj)
/* Don't drop the pages for imported dmabuf, as they are not
* ours, just free the array we allocated:
*/
- if (etnaviv_obj->pages)
- kvfree(etnaviv_obj->pages);
+ kvfree(etnaviv_obj->pages);
drm_prime_gem_destroy(&etnaviv_obj->base, etnaviv_obj->sgt);
}
--
2.30.2
From: Joao Martins <joao.m.martins(a)oracle.com>
Subject: mm/hugetlb: fix refs calculation from unaligned @vaddr
commit 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording") refactored
the count of subpages but missed an edge case when @vaddr is not aligned
to PAGE_SIZE e.g. when close to vma->vm_end. It would then errousnly set
@refs to 0 and record_subpages_vmas() wouldn't set the @pages array
element to its value, consequently causing the reported null-deref by
syzbot.
Fix it by aligning down @vaddr by PAGE_SIZE in @refs calculation.
Link: https://lkml.kernel.org/r/20210713152440.28650-1-joao.m.martins@oracle.com
Fixes: 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording")
Reported-by: syzbot+a3fcd59df1b372066f5a(a)syzkaller.appspotmail.com
Signed-off-by: Joao Martins <joao.m.martins(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr
+++ a/mm/hugetlb.c
@@ -5440,8 +5440,9 @@ long follow_hugetlb_page(struct mm_struc
continue;
}
- refs = min3(pages_per_huge_page(h) - pfn_offset,
- (vma->vm_end - vaddr) >> PAGE_SHIFT, remainder);
+ /* vaddr may not be aligned to PAGE_SIZE */
+ refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
+ (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
if (pages || vmas)
record_subpages_vmas(mem_map_offset(page, pfn_offset),
_
The patch titled
Subject: mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
has been added to the -mm tree. Its filename is
mm-memory_hotplug-use-unsigned-long-for-pfn-in-zone_for_pfn_range.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-use-unsigned-lo…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-use-unsigned-lo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"
These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.
These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.
[1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com
This patch (of 4):
Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.
Use "unsigned long" instead, just as we do in other places when handling
PFNs. This can bite us once we have physical addresses in the range of
multiple TB.
Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com
Fixes: e5e689302633 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta(a)ionos.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: Wei Yang <richard.weiyang(a)linux.alibaba.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
Cc: Len Brown <lenb(a)kernel.org>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: virtualization(a)lists.linux-foundation.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)c-s.fr>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jia He <justin.he(a)arm.com>
Cc: Joe Perches <joe(a)perches.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Michel Lespinasse <michel(a)lespinasse.org>
Cc: Nathan Lynch <nathanl(a)linux.ibm.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Pierre Morel <pmorel(a)linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Scott Cheloha <cheloha(a)linux.ibm.com>
Cc: Sergei Trofimovich <slyfox(a)gentoo.org>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memory_hotplug.h | 4 ++--
mm/memory_hotplug.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-use-unsigned-long-for-pfn-in-zone_for_pfn_range
+++ a/include/linux/memory_hotplug.h
@@ -339,8 +339,8 @@ extern void sparse_remove_section(struct
unsigned long map_offset, struct vmem_altmap *altmap);
extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
-extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
- unsigned long nr_pages);
+extern struct zone *zone_for_pfn_range(int online_type, int nid,
+ unsigned long start_pfn, unsigned long nr_pages);
extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
struct mhp_params *params);
void arch_remove_linear_mapping(u64 start, u64 size);
--- a/mm/memory_hotplug.c~mm-memory_hotplug-use-unsigned-long-for-pfn-in-zone_for_pfn_range
+++ a/mm/memory_hotplug.c
@@ -708,8 +708,8 @@ static inline struct zone *default_zone_
return movable_node_enabled ? movable_zone : kernel_zone;
}
-struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
- unsigned long nr_pages)
+struct zone *zone_for_pfn_range(int online_type, int nid,
+ unsigned long start_pfn, unsigned long nr_pages)
{
if (online_type == MMOP_ONLINE_KERNEL)
return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
_
Patches currently in -mm which might be from david(a)redhat.com are
memory-hotplugrst-remove-locking-details-from-admin-guide.patch
memory-hotplugrst-complete-admin-guide-overhaul.patch
mm-memory_hotplug-use-unsigned-long-for-pfn-in-zone_for_pfn_range.patch
mm-memory_hotplug-remove-nid-parameter-from-arch_remove_memory.patch
mm-memory_hotplug-remove-nid-parameter-from-remove_memory-and-friends.patch
acpi-memhotplug-memory-resources-cannot-be-enabled-yet.patch
The patch titled
Subject: mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-page_poison=1-init_on_alloc_default_on-interaction.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Sergei Trofimovich <slyfox(a)gentoo.org>
Subject: mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction
To reproduce the failure we need the following system:
- kernel command: page_poison=1 init_on_free=0 init_on_alloc=0
- kernel config:
* CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
* CONFIG_INIT_ON_FREE_DEFAULT_ON=y
* CONFIG_PAGE_POISONING=y
0000000085629bdd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000000022861832: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000000c597f5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G U O 5.13.1-gentoo-x86_64 #1
Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
Call Trace:
dump_stack+0x64/0x7c
__kernel_unpoison_pages.cold+0x48/0x84
post_alloc_hook+0x60/0xa0
get_page_from_freelist+0xdb8/0x1000
__alloc_pages+0x163/0x2b0
__get_free_pages+0xc/0x30
pgd_alloc+0x2e/0x1a0
? dup_mm+0x37/0x4f0
mm_init+0x185/0x270
dup_mm+0x6b/0x4f0
? __lock_task_sighand+0x35/0x70
copy_process+0x190d/0x1b10
kernel_clone+0xba/0x3b0
__do_sys_clone+0x8f/0xb0
do_syscall_64+0x68/0x80
? do_syscall_64+0x11/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Before the 51cba1eb ("init_on_alloc: Optimize static branches")
init_on_alloc never enabled static branch by default. It could only be
enabed explicitly by init_mem_debugging_and_hardening().
But after the 51cba1eb static branch could already be enabled by default.
There was no code to ever disable it. That caused page_poison=1 /
init_on_free=1 conflict.
This change extends init_mem_debugging_and_hardening() to also disable
static branch disabling.
Link: https://lkml.kernel.org/r/20210712215816.1512739-1-slyfox@gentoo.org
Fixes: 51cba1eb ("init_on_alloc: Optimize static branches")
Signed-off-by: Sergei Trofimovich <slyfox(a)gentoo.org>
Reported-by: <bowsingbetee(a)pm.me>
Reported-by: Mikhail Morfikov
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-page_poison=1-init_on_alloc_default_on-interaction
+++ a/mm/page_alloc.c
@@ -840,18 +840,22 @@ void init_mem_debugging_and_hardening(vo
}
#endif
- if (_init_on_alloc_enabled_early) {
- if (page_poisoning_requested)
+ if (_init_on_alloc_enabled_early ||
+ IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON)) {
+ if (page_poisoning_requested) {
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_alloc\n");
- else
+ static_branch_disable(&init_on_alloc);
+ } else
static_branch_enable(&init_on_alloc);
}
- if (_init_on_free_enabled_early) {
- if (page_poisoning_requested)
+ if (_init_on_free_enabled_early ||
+ IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON)) {
+ if (page_poisoning_requested) {
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_free\n");
- else
+ static_branch_disable(&init_on_free);
+ } else
static_branch_enable(&init_on_free);
}
_
Patches currently in -mm which might be from slyfox(a)gentoo.org are
The patch titled
Subject: mm/hugetlb: fix refs calculation from unaligned @vaddr
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-refs-calculation-f…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-refs-calculation-f…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Joao Martins <joao.m.martins(a)oracle.com>
Subject: mm/hugetlb: fix refs calculation from unaligned @vaddr
commit 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording") refactored
the count of subpages but missed an edge case when @vaddr is not aligned
to PAGE_SIZE e.g. when close to vma->vm_end. It would then errousnly set
@refs to 0 and record_subpages_vmas() wouldn't set the @pages array
element to its value, consequently causing the reported null-deref by
syzbot.
Fix it by aligning down @vaddr by PAGE_SIZE in @refs calculation.
Link: https://lkml.kernel.org/r/20210713152440.28650-1-joao.m.martins@oracle.com
Fixes: 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording")
Reported-by: syzbot+a3fcd59df1b372066f5a(a)syzkaller.appspotmail.com
Signed-off-by: Joao Martins <joao.m.martins(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr
+++ a/mm/hugetlb.c
@@ -5440,8 +5440,9 @@ long follow_hugetlb_page(struct mm_struc
continue;
}
- refs = min3(pages_per_huge_page(h) - pfn_offset,
- (vma->vm_end - vaddr) >> PAGE_SHIFT, remainder);
+ /* vaddr may not be aligned to PAGE_SIZE */
+ refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
+ (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
if (pages || vmas)
record_subpages_vmas(mem_map_offset(page, pfn_offset),
_
Patches currently in -mm which might be from joao.m.martins(a)oracle.com are
mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr.patch
The patch titled
Subject: mm/hugetlb: fix refs calculation from unaligned @vaddr
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Joao Martins <joao.m.martins(a)oracle.com>
Subject: mm/hugetlb: fix refs calculation from unaligned @vaddr
commit 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording") refactored
the count of subpages but missed an edge case when @vaddr is not aligned
to PAGE_SIZE e.g. when close to vma->vm_end. It would then errousnly set
@refs to 0 and record_subpages_vmas() wouldn't set the @pages array
element to its value, consequently causing the reported null-deref by
syzbot.
Fix it by aligning down @vaddr by PAGE_SIZE in @refs calculation.
Link: https://lkml.kernel.org/r/20210713152440.28650-1-joao.m.martins@oracle.com
Fixes: 82e5d378b0e47 ("mm/hugetlb: refactor subpage recording")
Reported-by: syzbot+a3fcd59df1b372066f5a(a)syzkaller.appspotmail.com
Signed-off-by: Joao Martins <joao.m.martins(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-refs-calculation-from-unaligned-vaddr
+++ a/mm/hugetlb.c
@@ -5440,8 +5440,9 @@ long follow_hugetlb_page(struct mm_struc
continue;
}
- refs = min3(pages_per_huge_page(h) - pfn_offset,
- (vma->vm_end - vaddr) >> PAGE_SHIFT, remainder);
+ /* vaddr may not be aligned to PAGE_SIZE */
+ refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
+ (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
if (pages || vmas)
record_subpages_vmas(mem_map_offset(page, pfn_offset),
_
Patches currently in -mm which might be from joao.m.martins(a)oracle.com are
FPU_STATUS register contains FP exception flags bits which are updated
as side-effect of FP instructions but can also be manually wiggled such
as by glibc C99 functions fe{raise,clear,test}except() etc.
To effect the update, the programming model requires OR'ing FWE
bit(231). This bit is write-only and RAZ, meaning it is effectively
auto-cleared after a write and thus needs to be set everytime which
is how glibc implements this.
However there's another usecase of FPU_STATUS update, at the time of
Linux task switch when incoming task value needs to be programmed into
the register. This was added as part of f45ba2bd6da0dc ("ARCv2:
fpu: preserve userspace fpu state") which however missing the OR'ing
with FWE bit, meaning the new value is not effectively being written at
all, which is what this patch fixes. This was not caught in interm glibc
testing as the race window which relies on a specific exception bit to be
set/clear is really small and will end up causing extremely hard to
reproduce/debug issues.
Fortunately this was caught by glibc's math/test-fenv-tls test which
repeatedly set/clear exception flags in a big loop, concurrently in main
program and also in a thread.
Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/54
Fixes: f45ba2bd6da0dc ("ARCv2: fpu: preserve userspace fpu state")
Cc: stable(a)vger.kernel.org #5.6+
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
---
arch/arc/kernel/fpu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arc/kernel/fpu.c b/arch/arc/kernel/fpu.c
index c67c0f0f5f77..ec640219d989 100644
--- a/arch/arc/kernel/fpu.c
+++ b/arch/arc/kernel/fpu.c
@@ -57,23 +57,26 @@ void fpu_save_restore(struct task_struct *prev, struct task_struct *next)
void fpu_init_task(struct pt_regs *regs)
{
+ const unsigned int fwe = 0x80000000;
+
/* default rounding mode */
write_aux_reg(ARC_REG_FPU_CTRL, 0x100);
- /* set "Write enable" to allow explicit write to exception flags */
- write_aux_reg(ARC_REG_FPU_STATUS, 0x80000000);
+ /* Initialize to zero: setting requires FWE be set */
+ write_aux_reg(ARC_REG_FPU_STATUS, fwe);
}
void fpu_save_restore(struct task_struct *prev, struct task_struct *next)
{
struct arc_fpu *save = &prev->thread.fpu;
struct arc_fpu *restore = &next->thread.fpu;
+ const unsigned int fwe = 0x80000000;
save->ctrl = read_aux_reg(ARC_REG_FPU_CTRL);
save->status = read_aux_reg(ARC_REG_FPU_STATUS);
write_aux_reg(ARC_REG_FPU_CTRL, restore->ctrl);
- write_aux_reg(ARC_REG_FPU_STATUS, restore->status);
+ write_aux_reg(ARC_REG_FPU_STATUS, (fwe | restore->status));
}
#endif
--
2.25.1
The patch titled
Subject: kfence: move the size check to the beginning of __kfence_alloc()
has been added to the -mm tree. Its filename is
kfence-move-the-size-check-to-the-beginning-of-__kfence_alloc.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/kfence-move-the-size-check-to-the…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/kfence-move-the-size-check-to-the…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kfence: move the size check to the beginning of __kfence_alloc()
Check the allocation size before toggling kfence_allocation_gate. This
way allocations that can't be served by KFENCE will not result in waiting
for another CONFIG_KFENCE_SAMPLE_INTERVAL without allocating anything.
Link: https://lkml.kernel.org/r/20210714092222.1890268-1-glider@google.com
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Suggested-by: Marco Elver <elver(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/mm/kfence/core.c~kfence-move-the-size-check-to-the-beginning-of-__kfence_alloc
+++ a/mm/kfence/core.c
@@ -734,6 +734,13 @@ void kfence_shutdown_cache(struct kmem_c
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
/*
+ * Perform size check before switching kfence_allocation_gate, so that
+ * we don't disable KFENCE without making an allocation.
+ */
+ if (size > PAGE_SIZE)
+ return NULL;
+
+ /*
* allocation_gate only needs to become non-zero, so it doesn't make
* sense to continue writing to it and pay the associated contention
* cost, in case we have a large number of concurrent allocations.
@@ -757,9 +764,6 @@ void *__kfence_alloc(struct kmem_cache *
if (!READ_ONCE(kfence_enabled))
return NULL;
- if (size > PAGE_SIZE)
- return NULL;
-
return kfence_guarded_alloc(s, size, flags);
}
_
Patches currently in -mm which might be from glider(a)google.com are
kfence-move-the-size-check-to-the-beginning-of-__kfence_alloc.patch
kfence-skip-all-gfp_zonemask-allocations.patch
The patch titled
Subject: selftest: use mmap instead of posix_memalign to allocate memory
has been added to the -mm tree. Its filename is
selftest-use-mmap-instead-of-posix_memalign-to-allocate-memory.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/selftest-use-mmap-instead-of-posi…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/selftest-use-mmap-instead-of-posi…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Collingbourne <pcc(a)google.com>
Subject: selftest: use mmap instead of posix_memalign to allocate memory
This test passes pointers obtained from anon_allocate_area to the
userfaultfd and mremap APIs. This causes a problem if the system
allocator returns tagged pointers because with the tagged address ABI the
kernel rejects tagged addresses passed to these APIs, which would end up
causing the test to fail. To make this test compatible with such system
allocators, stop using the system allocator to allocate memory in
anon_allocate_area, and instead just use mmap.
Link: https://lkml.kernel.org/r/20210714195437.118982-3-pcc@google.com
Link: https://linux-review.googlesource.com/id/Icac91064fcd923f77a83e8e133f8631c5…
Fixes: c47174fc362a ("userfaultfd: selftest")
Co-developed-by: Lokesh Gidra <lokeshgidra(a)google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra(a)google.com>
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Alistair Delva <adelva(a)google.com>
Cc: William McVicker <willmcvicker(a)google.com>
Cc: Evgenii Stepanov <eugenis(a)google.com>
Cc: Mitch Phillips <mitchp(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.4]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/vm/userfaultfd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/vm/userfaultfd.c~selftest-use-mmap-instead-of-posix_memalign-to-allocate-memory
+++ a/tools/testing/selftests/vm/userfaultfd.c
@@ -210,8 +210,10 @@ static void anon_release_pages(char *rel
static void anon_allocate_area(void **alloc_area)
{
- if (posix_memalign(alloc_area, page_size, nr_pages * page_size))
- err("posix_memalign() failed");
+ *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+ if (*alloc_area == MAP_FAILED)
+ err("mmap of anonymous memory failed");
}
static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
_
Patches currently in -mm which might be from pcc(a)google.com are
userfaultfd-do-not-untag-user-pointers.patch
selftest-use-mmap-instead-of-posix_memalign-to-allocate-memory.patch
The patch titled
Subject: userfaultfd: do not untag user pointers
has been added to the -mm tree. Its filename is
userfaultfd-do-not-untag-user-pointers.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-do-not-untag-user-poi…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-do-not-untag-user-poi…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Collingbourne <pcc(a)google.com>
Subject: userfaultfd: do not untag user pointers
Patch series "userfaultfd: do not untag user pointers", v5.
If a user program uses userfaultfd on ranges of heap memory, it may end up
passing a tagged pointer to the kernel in the range.start field of the
UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg. However, from the application's perspective, the tagged address
*is* the memory address, so if the application is unaware of memory tags,
it may get confused by receiving an address that is, from its point of
view, outside of the bounds of the allocation. We observed this behavior
in the kselftest for userfaultfd [2] but other applications could have the
same problem.
Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail. Also change the kselftest to use mmap
so that it doesn't encounter this problem.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
This patch (of 2):
If a user program uses userfaultfd on ranges of heap memory, it may end up
passing a tagged pointer to the kernel in the range.start field of the
UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg. However, from the application's perspective, the tagged address
*is* the memory address, so if the application is unaware of memory tags,
it may get confused by receiving an address that is, from its point of
view, outside of the bounds of the allocation. We observed this behavior
in the kselftest for userfaultfd [2] but other applications could have the
same problem.
Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail. This will provide an early indication
of problems with tag-unaware userspace code instead of letting the code
get confused later, and is consistent with how we decided to handle
brk/mmap/mremap in commit dcde237319e6 ("mm: Avoid creating virtual
address aliases in brk()/mmap()/mremap()"), as well as being consistent
with the existing tagged address ABI documentation relating to how ioctl
arguments are handled.
The code change is a revert of commit 7d0325749a6c ("userfaultfd: untag
user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0…
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Alistair Delva <adelva(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: Evgenii Stepanov <eugenis(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Mitch Phillips <mitchp(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: William McVicker <willmcvicker(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.4]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/arm64/tagged-address-abi.rst | 26 +++++++++++++------
fs/userfaultfd.c | 26 ++++++++-----------
2 files changed, 30 insertions(+), 22 deletions(-)
--- a/Documentation/arm64/tagged-address-abi.rst~userfaultfd-do-not-untag-user-pointers
+++ a/Documentation/arm64/tagged-address-abi.rst
@@ -45,14 +45,24 @@ how the user addresses are used by the k
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
- tagged pointers in this context is allowed with the exception of
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
- ``mremap()`` as these have the potential to alias with existing
- user addresses.
-
- NOTE: This behaviour changed in v5.6 and so some earlier kernels may
- incorrectly accept valid tagged pointers for the ``brk()``,
- ``mmap()`` and ``mremap()`` system calls.
+ tagged pointers in this context is allowed with these exceptions:
+
+ - ``brk()``, ``mmap()`` and the ``new_address`` argument to
+ ``mremap()`` as these have the potential to alias with existing
+ user addresses.
+
+ NOTE: This behaviour changed in v5.6 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for the ``brk()``,
+ ``mmap()`` and ``mremap()`` system calls.
+
+ - The ``range.start``, ``start`` and ``dst`` arguments to the
+ ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from
+ ``userfaultfd()``, as fault addresses subsequently obtained by reading
+ the file descriptor will be untagged, which may otherwise confuse
+ tag-unaware programs.
+
+ NOTE: This behaviour changed in v5.14 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for this system call.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to
--- a/fs/userfaultfd.c~userfaultfd-do-not-untag-user-pointers
+++ a/fs/userfaultfd.c
@@ -1236,23 +1236,21 @@ static __always_inline void wake_userfau
}
static __always_inline int validate_range(struct mm_struct *mm,
- __u64 *start, __u64 len)
+ __u64 start, __u64 len)
{
__u64 task_size = mm->task_size;
- *start = untagged_addr(*start);
-
- if (*start & ~PAGE_MASK)
+ if (start & ~PAGE_MASK)
return -EINVAL;
if (len & ~PAGE_MASK)
return -EINVAL;
if (!len)
return -EINVAL;
- if (*start < mmap_min_addr)
+ if (start < mmap_min_addr)
return -EINVAL;
- if (*start >= task_size)
+ if (start >= task_size)
return -EINVAL;
- if (len > task_size - *start)
+ if (len > task_size - start)
return -EINVAL;
return 0;
}
@@ -1316,7 +1314,7 @@ static int userfaultfd_register(struct u
vm_flags |= VM_UFFD_MINOR;
}
- ret = validate_range(mm, &uffdio_register.range.start,
+ ret = validate_range(mm, uffdio_register.range.start,
uffdio_register.range.len);
if (ret)
goto out;
@@ -1522,7 +1520,7 @@ static int userfaultfd_unregister(struct
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
goto out;
- ret = validate_range(mm, &uffdio_unregister.start,
+ ret = validate_range(mm, uffdio_unregister.start,
uffdio_unregister.len);
if (ret)
goto out;
@@ -1671,7 +1669,7 @@ static int userfaultfd_wake(struct userf
if (copy_from_user(&uffdio_wake, buf, sizeof(uffdio_wake)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_wake.start, uffdio_wake.len);
+ ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len);
if (ret)
goto out;
@@ -1711,7 +1709,7 @@ static int userfaultfd_copy(struct userf
sizeof(uffdio_copy)-sizeof(__s64)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_copy.dst, uffdio_copy.len);
+ ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len);
if (ret)
goto out;
/*
@@ -1768,7 +1766,7 @@ static int userfaultfd_zeropage(struct u
sizeof(uffdio_zeropage)-sizeof(__s64)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_zeropage.range.start,
+ ret = validate_range(ctx->mm, uffdio_zeropage.range.start,
uffdio_zeropage.range.len);
if (ret)
goto out;
@@ -1818,7 +1816,7 @@ static int userfaultfd_writeprotect(stru
sizeof(struct uffdio_writeprotect)))
return -EFAULT;
- ret = validate_range(ctx->mm, &uffdio_wp.range.start,
+ ret = validate_range(ctx->mm, uffdio_wp.range.start,
uffdio_wp.range.len);
if (ret)
return ret;
@@ -1866,7 +1864,7 @@ static int userfaultfd_continue(struct u
sizeof(uffdio_continue) - (sizeof(__s64))))
goto out;
- ret = validate_range(ctx->mm, &uffdio_continue.range.start,
+ ret = validate_range(ctx->mm, uffdio_continue.range.start,
uffdio_continue.range.len);
if (ret)
goto out;
_
Patches currently in -mm which might be from pcc(a)google.com are
userfaultfd-do-not-untag-user-pointers.patch
selftest-use-mmap-instead-of-posix_memalign-to-allocate-memory.patch