The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 111fc9f517cb293c4213673733b980123c3b0209
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091336-family-daffodil-541d@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
111fc9f517cb ("virtio_net: disable premapped mode by default")
dc4547fbba87 ("Revert "virtio_net: rx remove premapped failover code"")
e9f3962441c0 ("virtio_net: xsk: rx: support fill with xsk buffer")
19a5a7710ee1 ("virtio_net: xsk: support wakeup")
09d2b3182c8e ("virtio_net: xsk: bind/unbind xsk for rx")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 111fc9f517cb293c4213673733b980123c3b0209 Mon Sep 17 00:00:00 2001
From: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Date: Fri, 6 Sep 2024 20:31:37 +0800
Subject: [PATCH] virtio_net: disable premapped mode by default
Now, the premapped mode encounters some problem.
http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
So we disable the premapped mode by default.
We can re-enable it in the future.
Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
Reported-by: "Si-Wei Liu" <si-wei.liu(a)oracle.com>
Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
Signed-off-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Takero Funaki <flintglass(a)gmail.com>
Link: https://patch.msgid.link/20240906123137.108741-4-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1cf80648f82a..5a1c1ec5a64b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -977,22 +977,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
return buf;
}
-static void virtnet_rq_set_premapped(struct virtnet_info *vi)
-{
- int i;
-
- /* disable for big mode */
- if (!vi->mergeable_rx_bufs && vi->big_packets)
- return;
-
- for (i = 0; i < vi->max_queue_pairs; i++) {
- if (virtqueue_set_dma_premapped(vi->rq[i].vq))
- continue;
-
- vi->rq[i].do_dma = true;
- }
-}
-
static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
{
struct virtnet_info *vi = vq->vdev->priv;
@@ -6105,8 +6089,6 @@ static int init_vqs(struct virtnet_info *vi)
if (ret)
goto err_free;
- virtnet_rq_set_premapped(vi);
-
cpus_read_lock();
virtnet_set_affinity(vi);
cpus_read_unlock();
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38eef112a8e547b8c207b2a521ad4b077d792100
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091308-deceit-dingy-d575@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
38eef112a8e5 ("Revert "virtio_net: big mode skip the unmap check"")
99c861b44eb1 ("virtio_net: xsk: rx: support recv merge mode")
a4e7ba702701 ("virtio_net: xsk: rx: support recv small mode")
e9f3962441c0 ("virtio_net: xsk: rx: support fill with xsk buffer")
19a5a7710ee1 ("virtio_net: xsk: support wakeup")
09d2b3182c8e ("virtio_net: xsk: bind/unbind xsk for rx")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38eef112a8e547b8c207b2a521ad4b077d792100 Mon Sep 17 00:00:00 2001
From: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Date: Fri, 6 Sep 2024 20:31:36 +0800
Subject: [PATCH] Revert "virtio_net: big mode skip the unmap check"
This reverts commit a377ae542d8d0a20a3173da3bbba72e045bea7a9.
Signed-off-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Takero Funaki <flintglass(a)gmail.com>
Link: https://patch.msgid.link/20240906123137.108741-3-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6fa8aab18484..1cf80648f82a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1006,7 +1006,7 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
return;
}
- if (!vi->big_packets || vi->mergeable_rx_bufs)
+ if (rq->do_dma)
virtnet_rq_unmap(rq, buf, 0);
virtnet_rq_free_buf(vi, rq, buf);
@@ -2716,7 +2716,7 @@ static int virtnet_receive_packets(struct virtnet_info *vi,
}
} else {
while (packets < budget &&
- (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+ (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
receive_buf(vi, rq, buf, len, NULL, xdp_xmit, stats);
packets++;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 33297cef3101d950cec0033a0dce0a2d2bd59999
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091305-rephrase-pastime-be42@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
33297cef3101 ("platform/x86: panasonic-laptop: Allocate 1 entry extra in the sinf array")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 33297cef3101d950cec0033a0dce0a2d2bd59999 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:26 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Allocate 1 entry extra in the
sinf array
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Some DSDT-s have an off-by-one bug where the SINF package count is
one higher than the SQTY reported value, allocate 1 entry extra.
Also make the SQTY <-> SINF package count mismatch error more verbose
to help debugging similar issues in the future.
This fixes the panasonic-laptop driver failing to probe() on some
devices with the following errors:
[ 3.958887] SQTY reports bad SINF length SQTY: 37 SINF-pkg-count: 38
[ 3.958892] Couldn't retrieve BIOS data
[ 3.983685] Panasonic Laptop Support - With Macros: probe of MAT0019:00 failed with error -5
Fixes: 709ee531c153 ("panasonic-laptop: add Panasonic Let's Note laptop extras driver v0.94")
Cc: stable(a)vger.kernel.org
Tested-by: James Harmison <jharmison(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-2-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index 39044119d2a6..ebd81846e2d5 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -337,7 +337,8 @@ static int acpi_pcc_retrieve_biosdata(struct pcc_acpi *pcc)
}
if (pcc->num_sifr < hkey->package.count) {
- pr_err("SQTY reports bad SINF length\n");
+ pr_err("SQTY reports bad SINF length SQTY: %lu SINF-pkg-count: %u\n",
+ pcc->num_sifr, hkey->package.count);
status = AE_ERROR;
goto end;
}
@@ -994,6 +995,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
return -ENODEV;
}
+ /*
+ * Some DSDT-s have an off-by-one bug where the SINF package count is
+ * one higher than the SQTY reported value, allocate 1 entry extra.
+ */
+ num_sifr++;
+
pcc = kzalloc(sizeof(struct pcc_acpi), GFP_KERNEL);
if (!pcc) {
pr_err("Couldn't allocate mem for pcc");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 33297cef3101d950cec0033a0dce0a2d2bd59999
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091304-unwary-translate-85c3@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
33297cef3101 ("platform/x86: panasonic-laptop: Allocate 1 entry extra in the sinf array")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 33297cef3101d950cec0033a0dce0a2d2bd59999 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:26 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Allocate 1 entry extra in the
sinf array
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Some DSDT-s have an off-by-one bug where the SINF package count is
one higher than the SQTY reported value, allocate 1 entry extra.
Also make the SQTY <-> SINF package count mismatch error more verbose
to help debugging similar issues in the future.
This fixes the panasonic-laptop driver failing to probe() on some
devices with the following errors:
[ 3.958887] SQTY reports bad SINF length SQTY: 37 SINF-pkg-count: 38
[ 3.958892] Couldn't retrieve BIOS data
[ 3.983685] Panasonic Laptop Support - With Macros: probe of MAT0019:00 failed with error -5
Fixes: 709ee531c153 ("panasonic-laptop: add Panasonic Let's Note laptop extras driver v0.94")
Cc: stable(a)vger.kernel.org
Tested-by: James Harmison <jharmison(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-2-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index 39044119d2a6..ebd81846e2d5 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -337,7 +337,8 @@ static int acpi_pcc_retrieve_biosdata(struct pcc_acpi *pcc)
}
if (pcc->num_sifr < hkey->package.count) {
- pr_err("SQTY reports bad SINF length\n");
+ pr_err("SQTY reports bad SINF length SQTY: %lu SINF-pkg-count: %u\n",
+ pcc->num_sifr, hkey->package.count);
status = AE_ERROR;
goto end;
}
@@ -994,6 +995,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
return -ENODEV;
}
+ /*
+ * Some DSDT-s have an off-by-one bug where the SINF package count is
+ * one higher than the SQTY reported value, allocate 1 entry extra.
+ */
+ num_sifr++;
+
pcc = kzalloc(sizeof(struct pcc_acpi), GFP_KERNEL);
if (!pcc) {
pr_err("Couldn't allocate mem for pcc");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 33297cef3101d950cec0033a0dce0a2d2bd59999
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091303-dash-sensitive-4aee@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
33297cef3101 ("platform/x86: panasonic-laptop: Allocate 1 entry extra in the sinf array")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 33297cef3101d950cec0033a0dce0a2d2bd59999 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:26 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Allocate 1 entry extra in the
sinf array
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Some DSDT-s have an off-by-one bug where the SINF package count is
one higher than the SQTY reported value, allocate 1 entry extra.
Also make the SQTY <-> SINF package count mismatch error more verbose
to help debugging similar issues in the future.
This fixes the panasonic-laptop driver failing to probe() on some
devices with the following errors:
[ 3.958887] SQTY reports bad SINF length SQTY: 37 SINF-pkg-count: 38
[ 3.958892] Couldn't retrieve BIOS data
[ 3.983685] Panasonic Laptop Support - With Macros: probe of MAT0019:00 failed with error -5
Fixes: 709ee531c153 ("panasonic-laptop: add Panasonic Let's Note laptop extras driver v0.94")
Cc: stable(a)vger.kernel.org
Tested-by: James Harmison <jharmison(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-2-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index 39044119d2a6..ebd81846e2d5 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -337,7 +337,8 @@ static int acpi_pcc_retrieve_biosdata(struct pcc_acpi *pcc)
}
if (pcc->num_sifr < hkey->package.count) {
- pr_err("SQTY reports bad SINF length\n");
+ pr_err("SQTY reports bad SINF length SQTY: %lu SINF-pkg-count: %u\n",
+ pcc->num_sifr, hkey->package.count);
status = AE_ERROR;
goto end;
}
@@ -994,6 +995,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
return -ENODEV;
}
+ /*
+ * Some DSDT-s have an off-by-one bug where the SINF package count is
+ * one higher than the SQTY reported value, allocate 1 entry extra.
+ */
+ num_sifr++;
+
pcc = kzalloc(sizeof(struct pcc_acpi), GFP_KERNEL);
if (!pcc) {
pr_err("Couldn't allocate mem for pcc");
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091346-football-rely-87b8@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f52e98d16e9b ("platform/x86: panasonic-laptop: Fix SINF array out of bounds accesses")
f1fba0860962 ("platform/x86: panasonic-laptop: remove redundant assignment of variable result")
25dd390c6206 ("platform/x86: panasonic-laptop: Add sysfs attributes for firmware brightness registers")
468f96bfa3a0 ("platform/x86: panasonic-laptop: Add support for battery charging threshold (eco mode)")
ed83c9171829 ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug")
e3a9afbbc309 ("platform/x86: panasonic-laptop: Add write support to mute")
008563513348 ("platform/x86: panasonic-laptop: Fix sticky key init bug")
80373ad0edb5 ("platform/x86: panasonic-laptop: Fix naming of platform files for consistency with other modules")
0119fbc0215a ("platform/x86: panasonic-laptop: Split MODULE_AUTHOR() by one author per macro call")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:25 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Fix SINF array out of bounds
accesses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The panasonic laptop code in various places uses the SINF array with index
values of 0 - SINF_CUR_BRIGHT(0x0d) without checking that the SINF array
is big enough.
Not all panasonic laptops have this many SINF array entries, for example
the Toughbook CF-18 model only has 10 SINF array entries. So it only
supports the AC+DC brightness entries and mute.
Check that the SINF array has a minimum size which covers all AC+DC
brightness entries and refuse to load if the SINF array is smaller.
For higher SINF indexes hide the sysfs attributes when the SINF array
does not contain an entry for that attribute, avoiding show()/store()
accessing the array out of bounds and add bounds checking to the probe()
and resume() code accessing these.
Fixes: e424fb8cc4e6 ("panasonic-laptop: avoid overflow in acpi_pcc_hotkey_add()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-1-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index cf845ee1c7b1..39044119d2a6 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -773,6 +773,24 @@ static DEVICE_ATTR_RW(dc_brightness);
static DEVICE_ATTR_RW(current_brightness);
static DEVICE_ATTR_RW(cdpower);
+static umode_t pcc_sysfs_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct acpi_device *acpi = to_acpi_device(dev);
+ struct pcc_acpi *pcc = acpi_driver_data(acpi);
+
+ if (attr == &dev_attr_mute.attr)
+ return (pcc->num_sifr > SINF_MUTE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_eco_mode.attr)
+ return (pcc->num_sifr > SINF_ECO_MODE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_current_brightness.attr)
+ return (pcc->num_sifr > SINF_CUR_BRIGHT) ? attr->mode : 0;
+
+ return attr->mode;
+}
+
static struct attribute *pcc_sysfs_entries[] = {
&dev_attr_numbatt.attr,
&dev_attr_lcdtype.attr,
@@ -787,8 +805,9 @@ static struct attribute *pcc_sysfs_entries[] = {
};
static const struct attribute_group pcc_attr_group = {
- .name = NULL, /* put in device directory */
- .attrs = pcc_sysfs_entries,
+ .name = NULL, /* put in device directory */
+ .attrs = pcc_sysfs_entries,
+ .is_visible = pcc_sysfs_is_visible,
};
@@ -941,12 +960,15 @@ static int acpi_pcc_hotkey_resume(struct device *dev)
if (!pcc)
return -EINVAL;
- acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
- acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
+ if (pcc->num_sifr > SINF_MUTE)
+ acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, pcc->sticky_key);
acpi_pcc_write_sset(pcc, SINF_AC_CUR_BRIGHT, pcc->ac_brightness);
acpi_pcc_write_sset(pcc, SINF_DC_CUR_BRIGHT, pcc->dc_brightness);
- acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
return 0;
}
@@ -963,8 +985,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
num_sifr = acpi_pcc_get_sqty(device);
- if (num_sifr < 0 || num_sifr > 255) {
- pr_err("num_sifr out of range");
+ /*
+ * pcc->sinf is expected to at least have the AC+DC brightness entries.
+ * Accesses to higher SINF entries are checked against num_sifr.
+ */
+ if (num_sifr <= SINF_DC_CUR_BRIGHT || num_sifr > 255) {
+ pr_err("num_sifr %d out of range %d - 255\n", num_sifr, SINF_DC_CUR_BRIGHT + 1);
return -ENODEV;
}
@@ -1020,11 +1046,14 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, 0);
pcc->sticky_key = 0;
- pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
- pcc->mute = pcc->sinf[SINF_MUTE];
pcc->ac_brightness = pcc->sinf[SINF_AC_CUR_BRIGHT];
pcc->dc_brightness = pcc->sinf[SINF_DC_CUR_BRIGHT];
- pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
+ if (pcc->num_sifr > SINF_MUTE)
+ pcc->mute = pcc->sinf[SINF_MUTE];
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
/* add sysfs attributes */
result = sysfs_create_group(&device->dev.kobj, &pcc_attr_group);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091345-sweep-flashback-a85d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f52e98d16e9b ("platform/x86: panasonic-laptop: Fix SINF array out of bounds accesses")
f1fba0860962 ("platform/x86: panasonic-laptop: remove redundant assignment of variable result")
25dd390c6206 ("platform/x86: panasonic-laptop: Add sysfs attributes for firmware brightness registers")
468f96bfa3a0 ("platform/x86: panasonic-laptop: Add support for battery charging threshold (eco mode)")
ed83c9171829 ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug")
e3a9afbbc309 ("platform/x86: panasonic-laptop: Add write support to mute")
008563513348 ("platform/x86: panasonic-laptop: Fix sticky key init bug")
80373ad0edb5 ("platform/x86: panasonic-laptop: Fix naming of platform files for consistency with other modules")
0119fbc0215a ("platform/x86: panasonic-laptop: Split MODULE_AUTHOR() by one author per macro call")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:25 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Fix SINF array out of bounds
accesses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The panasonic laptop code in various places uses the SINF array with index
values of 0 - SINF_CUR_BRIGHT(0x0d) without checking that the SINF array
is big enough.
Not all panasonic laptops have this many SINF array entries, for example
the Toughbook CF-18 model only has 10 SINF array entries. So it only
supports the AC+DC brightness entries and mute.
Check that the SINF array has a minimum size which covers all AC+DC
brightness entries and refuse to load if the SINF array is smaller.
For higher SINF indexes hide the sysfs attributes when the SINF array
does not contain an entry for that attribute, avoiding show()/store()
accessing the array out of bounds and add bounds checking to the probe()
and resume() code accessing these.
Fixes: e424fb8cc4e6 ("panasonic-laptop: avoid overflow in acpi_pcc_hotkey_add()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-1-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index cf845ee1c7b1..39044119d2a6 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -773,6 +773,24 @@ static DEVICE_ATTR_RW(dc_brightness);
static DEVICE_ATTR_RW(current_brightness);
static DEVICE_ATTR_RW(cdpower);
+static umode_t pcc_sysfs_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct acpi_device *acpi = to_acpi_device(dev);
+ struct pcc_acpi *pcc = acpi_driver_data(acpi);
+
+ if (attr == &dev_attr_mute.attr)
+ return (pcc->num_sifr > SINF_MUTE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_eco_mode.attr)
+ return (pcc->num_sifr > SINF_ECO_MODE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_current_brightness.attr)
+ return (pcc->num_sifr > SINF_CUR_BRIGHT) ? attr->mode : 0;
+
+ return attr->mode;
+}
+
static struct attribute *pcc_sysfs_entries[] = {
&dev_attr_numbatt.attr,
&dev_attr_lcdtype.attr,
@@ -787,8 +805,9 @@ static struct attribute *pcc_sysfs_entries[] = {
};
static const struct attribute_group pcc_attr_group = {
- .name = NULL, /* put in device directory */
- .attrs = pcc_sysfs_entries,
+ .name = NULL, /* put in device directory */
+ .attrs = pcc_sysfs_entries,
+ .is_visible = pcc_sysfs_is_visible,
};
@@ -941,12 +960,15 @@ static int acpi_pcc_hotkey_resume(struct device *dev)
if (!pcc)
return -EINVAL;
- acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
- acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
+ if (pcc->num_sifr > SINF_MUTE)
+ acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, pcc->sticky_key);
acpi_pcc_write_sset(pcc, SINF_AC_CUR_BRIGHT, pcc->ac_brightness);
acpi_pcc_write_sset(pcc, SINF_DC_CUR_BRIGHT, pcc->dc_brightness);
- acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
return 0;
}
@@ -963,8 +985,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
num_sifr = acpi_pcc_get_sqty(device);
- if (num_sifr < 0 || num_sifr > 255) {
- pr_err("num_sifr out of range");
+ /*
+ * pcc->sinf is expected to at least have the AC+DC brightness entries.
+ * Accesses to higher SINF entries are checked against num_sifr.
+ */
+ if (num_sifr <= SINF_DC_CUR_BRIGHT || num_sifr > 255) {
+ pr_err("num_sifr %d out of range %d - 255\n", num_sifr, SINF_DC_CUR_BRIGHT + 1);
return -ENODEV;
}
@@ -1020,11 +1046,14 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, 0);
pcc->sticky_key = 0;
- pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
- pcc->mute = pcc->sinf[SINF_MUTE];
pcc->ac_brightness = pcc->sinf[SINF_AC_CUR_BRIGHT];
pcc->dc_brightness = pcc->sinf[SINF_DC_CUR_BRIGHT];
- pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
+ if (pcc->num_sifr > SINF_MUTE)
+ pcc->mute = pcc->sinf[SINF_MUTE];
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
/* add sysfs attributes */
result = sysfs_create_group(&device->dev.kobj, &pcc_attr_group);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091343-editor-sulfite-f306@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f52e98d16e9b ("platform/x86: panasonic-laptop: Fix SINF array out of bounds accesses")
f1fba0860962 ("platform/x86: panasonic-laptop: remove redundant assignment of variable result")
25dd390c6206 ("platform/x86: panasonic-laptop: Add sysfs attributes for firmware brightness registers")
468f96bfa3a0 ("platform/x86: panasonic-laptop: Add support for battery charging threshold (eco mode)")
ed83c9171829 ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug")
e3a9afbbc309 ("platform/x86: panasonic-laptop: Add write support to mute")
008563513348 ("platform/x86: panasonic-laptop: Fix sticky key init bug")
80373ad0edb5 ("platform/x86: panasonic-laptop: Fix naming of platform files for consistency with other modules")
0119fbc0215a ("platform/x86: panasonic-laptop: Split MODULE_AUTHOR() by one author per macro call")
f1aaf914654a ("platform/x86: panasonic-laptop: Replace ACPI prints with pr_*() macros")
d5a81d8e864b ("platform/x86: panasonic-laptop: Add support for optical driver power in Y and W series")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f52e98d16e9bd7dd2b3aef8e38db5cbc9899d6a4 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 9 Sep 2024 13:32:25 +0200
Subject: [PATCH] platform/x86: panasonic-laptop: Fix SINF array out of bounds
accesses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The panasonic laptop code in various places uses the SINF array with index
values of 0 - SINF_CUR_BRIGHT(0x0d) without checking that the SINF array
is big enough.
Not all panasonic laptops have this many SINF array entries, for example
the Toughbook CF-18 model only has 10 SINF array entries. So it only
supports the AC+DC brightness entries and mute.
Check that the SINF array has a minimum size which covers all AC+DC
brightness entries and refuse to load if the SINF array is smaller.
For higher SINF indexes hide the sysfs attributes when the SINF array
does not contain an entry for that attribute, avoiding show()/store()
accessing the array out of bounds and add bounds checking to the probe()
and resume() code accessing these.
Fixes: e424fb8cc4e6 ("panasonic-laptop: avoid overflow in acpi_pcc_hotkey_add()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240909113227.254470-1-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
diff --git a/drivers/platform/x86/panasonic-laptop.c b/drivers/platform/x86/panasonic-laptop.c
index cf845ee1c7b1..39044119d2a6 100644
--- a/drivers/platform/x86/panasonic-laptop.c
+++ b/drivers/platform/x86/panasonic-laptop.c
@@ -773,6 +773,24 @@ static DEVICE_ATTR_RW(dc_brightness);
static DEVICE_ATTR_RW(current_brightness);
static DEVICE_ATTR_RW(cdpower);
+static umode_t pcc_sysfs_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct acpi_device *acpi = to_acpi_device(dev);
+ struct pcc_acpi *pcc = acpi_driver_data(acpi);
+
+ if (attr == &dev_attr_mute.attr)
+ return (pcc->num_sifr > SINF_MUTE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_eco_mode.attr)
+ return (pcc->num_sifr > SINF_ECO_MODE) ? attr->mode : 0;
+
+ if (attr == &dev_attr_current_brightness.attr)
+ return (pcc->num_sifr > SINF_CUR_BRIGHT) ? attr->mode : 0;
+
+ return attr->mode;
+}
+
static struct attribute *pcc_sysfs_entries[] = {
&dev_attr_numbatt.attr,
&dev_attr_lcdtype.attr,
@@ -787,8 +805,9 @@ static struct attribute *pcc_sysfs_entries[] = {
};
static const struct attribute_group pcc_attr_group = {
- .name = NULL, /* put in device directory */
- .attrs = pcc_sysfs_entries,
+ .name = NULL, /* put in device directory */
+ .attrs = pcc_sysfs_entries,
+ .is_visible = pcc_sysfs_is_visible,
};
@@ -941,12 +960,15 @@ static int acpi_pcc_hotkey_resume(struct device *dev)
if (!pcc)
return -EINVAL;
- acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
- acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
+ if (pcc->num_sifr > SINF_MUTE)
+ acpi_pcc_write_sset(pcc, SINF_MUTE, pcc->mute);
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ acpi_pcc_write_sset(pcc, SINF_ECO_MODE, pcc->eco_mode);
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, pcc->sticky_key);
acpi_pcc_write_sset(pcc, SINF_AC_CUR_BRIGHT, pcc->ac_brightness);
acpi_pcc_write_sset(pcc, SINF_DC_CUR_BRIGHT, pcc->dc_brightness);
- acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ acpi_pcc_write_sset(pcc, SINF_CUR_BRIGHT, pcc->current_brightness);
return 0;
}
@@ -963,8 +985,12 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
num_sifr = acpi_pcc_get_sqty(device);
- if (num_sifr < 0 || num_sifr > 255) {
- pr_err("num_sifr out of range");
+ /*
+ * pcc->sinf is expected to at least have the AC+DC brightness entries.
+ * Accesses to higher SINF entries are checked against num_sifr.
+ */
+ if (num_sifr <= SINF_DC_CUR_BRIGHT || num_sifr > 255) {
+ pr_err("num_sifr %d out of range %d - 255\n", num_sifr, SINF_DC_CUR_BRIGHT + 1);
return -ENODEV;
}
@@ -1020,11 +1046,14 @@ static int acpi_pcc_hotkey_add(struct acpi_device *device)
acpi_pcc_write_sset(pcc, SINF_STICKY_KEY, 0);
pcc->sticky_key = 0;
- pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
- pcc->mute = pcc->sinf[SINF_MUTE];
pcc->ac_brightness = pcc->sinf[SINF_AC_CUR_BRIGHT];
pcc->dc_brightness = pcc->sinf[SINF_DC_CUR_BRIGHT];
- pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
+ if (pcc->num_sifr > SINF_MUTE)
+ pcc->mute = pcc->sinf[SINF_MUTE];
+ if (pcc->num_sifr > SINF_ECO_MODE)
+ pcc->eco_mode = pcc->sinf[SINF_ECO_MODE];
+ if (pcc->num_sifr > SINF_CUR_BRIGHT)
+ pcc->current_brightness = pcc->sinf[SINF_CUR_BRIGHT];
/* add sysfs attributes */
result = sysfs_create_group(&device->dev.kobj, &pcc_attr_group);
Slackware 15.0 64-bit compiles and runs fine.
Slackware 15.0 32-bit fails to build and gives the "out of memory" error:
cc1: out of memory allocating 180705472 bytes after a total of 284454912
bytes
...
make[4]: *** [scripts/Makefile.build:289:
drivers/staging/media/atomisp/pci/isp/kernels/ynr/ynr_1.0/ia_css_ynr.ho
st.o] Error 1
Patching it with help from Lorenzo Stoakes allows the build to
run:
https://lore.kernel.org/lkml/5882b96e-1287-4390-8174-3316d39038ef@lucifer.l…
And then 32-bit runs fine too.
Richard Narron
Spec says SW is expected to round up to the nearest 128K, if not already
aligned. We are seeing the assert sometimes pop on BMG to tell us that
there is a hole between GSM and CCS, as well as popping other asserts
with having a vram size with strange alignment, which is likely caused
by misaligned offset here.
BSpec: 68023
Fixes: b5c2ca0372dc ("drm/xe/xe2hpg: Determine flat ccs offset for vram")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar(a)intel.com>
Cc: Shuicheng Lin <shuicheng.lin(a)intel.com>
Cc: Matt Roper <matthew.d.roper(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
---
drivers/gpu/drm/xe/xe_vram.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/xe/xe_vram.c b/drivers/gpu/drm/xe/xe_vram.c
index 7e765b1499b1..8e65cb4cc477 100644
--- a/drivers/gpu/drm/xe/xe_vram.c
+++ b/drivers/gpu/drm/xe/xe_vram.c
@@ -181,6 +181,7 @@ static inline u64 get_flat_ccs_offset(struct xe_gt *gt, u64 tile_size)
offset = offset_hi << 32; /* HW view bits 39:32 */
offset |= offset_lo << 6; /* HW view bits 31:6 */
+ offset = round_up(offset, SZ_128K); /* SW must round up to nearest 128K */
offset *= num_enabled; /* convert to SW view */
/* We don't expect any holes */
--
2.46.0
Registering a gadget driver is expected to complete synchronously and
immediately after calling driver_register() this function checks that
the driver has bound so as to return an error.
Set PROBE_FORCE_SYNCHRONOUS to ensure this is the case even when
asynchronous probing is set as the default.
Fixes: fc274c1e99731 ("USB: gadget: Add a new bus for gadgets")
Cc: stable(a)vger.kernel.org
Signed-off-by: John Keeping <jkeeping(a)inmusicbrands.com>
---
v3:
- Add cc: stable
v2:
- Add "Fixes" trailer
drivers/usb/gadget/udc/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index cf6478f97f4a..a6f46364be65 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1696,6 +1696,7 @@ int usb_gadget_register_driver_owner(struct usb_gadget_driver *driver,
driver->driver.bus = &gadget_bus_type;
driver->driver.owner = owner;
driver->driver.mod_name = mod_name;
+ driver->driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
ret = driver_register(&driver->driver);
if (ret) {
pr_warn("%s: driver registration failed: %d\n",
--
2.46.0
Streams should flush their TRB cache, re-read TRBs, and start executing
TRBs from the beginning of the new dequeue pointer after a 'Set TR Dequeue
Pointer' command.
Cadence controllers may fail to start from the beginning of the dequeue
TRB as it doesn't clear the Opaque 'RsvdO' field of the stream context
during 'Set TR Dequeue' command. This stream context area is where xHC
stores information about the last partially executed TD when a stream
is stopped. xHC uses this information to resume the transfer where it left
mid TD, when the stream is restarted.
Patch fixes this by clearing out all RsvdO fields before initializing new
Stream transfer using a 'Set TR Dequeue Pointer' command.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
Changelog:
v3:
- changed patch to patch Cadence specific
v2:
- removed restoring of EDTLA field
drivers/usb/cdns3/host.c | 4 +++-
drivers/usb/host/xhci-pci.c | 7 +++++++
drivers/usb/host/xhci-ring.c | 14 ++++++++++++++
drivers/usb/host/xhci.h | 1 +
4 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/cdns3/host.c b/drivers/usb/cdns3/host.c
index ceca4d839dfd..7ba760ee62e3 100644
--- a/drivers/usb/cdns3/host.c
+++ b/drivers/usb/cdns3/host.c
@@ -62,7 +62,9 @@ static const struct xhci_plat_priv xhci_plat_cdns3_xhci = {
.resume_quirk = xhci_cdns3_resume_quirk,
};
-static const struct xhci_plat_priv xhci_plat_cdnsp_xhci;
+static const struct xhci_plat_priv xhci_plat_cdnsp_xhci = {
+ .quirks = XHCI_CDNS_SCTX_QUIRK,
+};
static int __cdns_host_init(struct cdns *cdns)
{
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index b9ae5c2a2527..9199dbfcea07 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -74,6 +74,9 @@
#define PCI_DEVICE_ID_ASMEDIA_2142_XHCI 0x2142
#define PCI_DEVICE_ID_ASMEDIA_3242_XHCI 0x3242
+#define PCI_DEVICE_ID_CADENCE 0x17CD
+#define PCI_DEVICE_ID_CADENCE_SSP 0x0200
+
static const char hcd_name[] = "xhci_hcd";
static struct hc_driver __read_mostly xhci_pci_hc_driver;
@@ -532,6 +535,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
}
+ if (pdev->vendor == PCI_DEVICE_ID_CADENCE &&
+ pdev->device == PCI_DEVICE_ID_CADENCE_SSP)
+ xhci->quirks |= XHCI_CDNS_SCTX_QUIRK;
+
/* xHC spec requires PCI devices to support D3hot and D3cold */
if (xhci->hci_version >= 0x120)
xhci->quirks |= XHCI_DEFAULT_PM_RUNTIME_ALLOW;
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1dde53f6eb31..a1ad2658c0c7 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1386,6 +1386,20 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
struct xhci_stream_ctx *ctx =
&ep->stream_info->stream_ctx_array[stream_id];
deq = le64_to_cpu(ctx->stream_ring) & SCTX_DEQ_MASK;
+
+ /*
+ * Cadence xHCI controllers store some endpoint state
+ * information within Rsvd0 fields of Stream Endpoint
+ * context. This field is not cleared during Set TR
+ * Dequeue Pointer command which causes XDMA to skip
+ * over transfer ring and leads to data loss on stream
+ * pipe.
+ * To fix this issue driver must clear Rsvd0 field.
+ */
+ if (xhci->quirks & XHCI_CDNS_SCTX_QUIRK) {
+ ctx->reserved[0] = 0;
+ ctx->reserved[1] = 0;
+ }
} else {
deq = le64_to_cpu(ep_ctx->deq) & ~EP_CTX_CYCLE_MASK;
}
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 101e74c9060f..4cbd58eed214 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1907,6 +1907,7 @@ struct xhci_hcd {
#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
+#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.43.0
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dec92020671c ("drm: Use the state pointer directly in planes atomic_check")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_cursor.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..e460f5ba2d87 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,8 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
@cpu is a offline cpu would cause system stuck forever.
This can be happen if a node is online while all its CPUs are
offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
So, in the above case, let pci_call_probe() call local_pci_probe()
instead of work_on_cpu() when the best selected cpu is offline.
Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn>
---
v2 -> v3: Modify commit message according to Markus's suggestion
v1 -> v2: Add a method to reproduce the problem
---
drivers/pci/pci-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index af2996d0d17f..32a99828e6a3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
free_cpumask_var(wq_domain_mask);
}
- if (cpu < nr_cpu_ids)
+ if ((cpu < nr_cpu_ids) && cpu_online(cpu))
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
--
2.33.0
From: Peter Wang <peter.wang(a)mediatek.com>
ufshcd_abort_all forcibly aborts all on-going commands and the host
controller will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.
MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.
SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.
For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 35 ++++++++++++++++++++---------------
include/ufs/ufshcd.h | 3 +++
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..a0548a495f30 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
lrbp->req_abort_skip = false;
+ lrbp->abort_initiated_by_eh = false;
ufshcd_comp_scsi_upiu(hba, lrbp);
@@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
+ if (lrbp->abort_initiated_by_eh)
+ result |= DID_REQUEUE << 16;
+ else
+ result |= DID_ABORT << 16;
break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
@@ -7561,6 +7551,21 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}
+ /*
+ * When the host software receives a "FUNCTION COMPLETE", set flag
+ * to requeue command after receive response with OCS_ABORTED
+ * SDB mode: UTRLCLR Task Management response which means a Transfer
+ * Request was aborted.
+ * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+ *
+ * This flag is set because error handler ufshcd_abort_all forcibly
+ * aborts all commands, and the host controller will automatically
+ * fill in the OCS field of the corresponding response with OCS_ABORTED.
+ * Therefore, upon receiving this response, it needs to be requeued.
+ */
+ if (!err && ufshcd_eh_in_progress(hba))
+ lrbp->abort_initiated_by_eh = true;
+
err = ufshcd_clear_cmd(hba, tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..07b5e60e6c44 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
* @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
* @data_unit_num: the data unit number for the first block for inline crypto
* @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_eh: The flag is specifically used to handle
+ * ufshcd_err_handler forcibly aborts.
*/
struct ufshcd_lrb {
struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@ struct ufshcd_lrb {
#endif
bool req_abort_skip;
+ bool abort_initiated_by_eh;
};
/**
--
2.45.2
devm_kasprintf() can return a NULL pointer on failure but this returned
value is not checked. Fix this lack and check the returned value.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: a0f160ffcb83 ("pinctrl: add pinctrl/GPIO driver for Apple SoCs")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-apple-gpio.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-apple-gpio.c b/drivers/pinctrl/pinctrl-apple-gpio.c
index 3751c7de37aa..f861e63f4115 100644
--- a/drivers/pinctrl/pinctrl-apple-gpio.c
+++ b/drivers/pinctrl/pinctrl-apple-gpio.c
@@ -474,6 +474,9 @@ static int apple_gpio_pinctrl_probe(struct platform_device *pdev)
for (i = 0; i < npins; i++) {
pins[i].number = i;
pins[i].name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "PIN%u", i);
+ if (!pins[i].name)
+ return -ENOMEM;
+
pins[i].drv_data = pctl;
pin_names[i] = pins[i].name;
pin_nums[i] = i;
--
2.25.1
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is
bigger than the variable hv_sched_clock_offset, which is cached during
early boot, but depending on the timing this assumption may be false
when a hibernated VM starts again (the clock counter starts from 0
again) and is resuming back (Note: hv_init_tsc_clocksource() is not
called during hibernation/resume); consequently,
read_hv_sched_clock_tsc() may return a negative integer (which is
interpreted as a huge positive integer since the return type is u64)
and new kernel messages are prefixed with huge timestamps before
read_hv_sched_clock_tsc() grows big enough (which typically takes
several seconds).
Fix the issue by saving the Hyper-V clock counter just before the
suspend, and using it to correct the hv_sched_clock_offset in
resume. Override x86_platform.save_sched_clock_state and
x86_platform.restore_sched_clock_state.
Note: if Invariant TSC is available, the issue doesn't happen because
1) we don't register read_hv_sched_clock_tsc() for sched clock:
See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework
clocksource and sched clock setup");
2) the common x86 code adjusts TSC similarly: see
__restore_processor_state() -> tsc_verify_tsc_adjust(true) and
x86_platform.restore_sched_clock_state().
Cc: stable(a)vger.kernel.org
Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation")
Co-developed-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
Changes from v1:
https://lore.kernel.org/all/20240909053923.8512-1-namjain@linux.microsoft.c…
* Reorganized code as per Michael's comment, and moved the logic to x86
specific files, to keep hyperv_timer.c arch independent.
---
arch/x86/kernel/cpu/mshyperv.c | 70 ++++++++++++++++++++++++++++++
drivers/clocksource/hyperv_timer.c | 8 +++-
include/clocksource/hyperv_timer.h | 8 ++++
3 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba84..d83a694e387c 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -224,6 +224,75 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
hyperv_cleanup();
}
#endif /* CONFIG_CRASH_DUMP */
+
+static u64 hv_sched_clock_offset_saved;
+static void (*old_save_sched_clock_state)(void);
+static void (*old_restore_sched_clock_state)(void);
+
+/*
+ * Hyper-V clock counter resets during hibernation. Save and restore clock
+ * offset during suspend/resume, while also considering the time passed
+ * before suspend. This is to make sure that sched_clock using hv tsc page
+ * based clocksource, proceeds from where it left off during suspend and
+ * it shows correct time for the timestamps of kernel messages after resume.
+ */
+static void save_hv_clock_tsc_state(void)
+{
+ hv_sched_clock_offset_saved = hv_read_reference_counter();
+}
+
+static void restore_hv_clock_tsc_state(void)
+{
+ /*
+ * hv_sched_clock_offset = offset that is used by hyperv_timer clocksource driver
+ * to get time.
+ * Time passed before suspend = hv_sched_clock_offset_saved
+ * - hv_sched_clock_offset (old)
+ *
+ * After Hyper-V clock counter resets, hv_sched_clock_offset needs a correction.
+ *
+ * New time = hv_read_reference_counter() (future) - hv_sched_clock_offset (new)
+ * New time = Time passed before suspend + hv_read_reference_counter() (future)
+ * - hv_read_reference_counter() (now)
+ *
+ * Solving the above two equations gives:
+ *
+ * hv_sched_clock_offset (new) = hv_sched_clock_offset (old)
+ * - hv_sched_clock_offset_saved
+ * + hv_read_reference_counter() (now))
+ */
+ hv_adj_sched_clock_offset(hv_sched_clock_offset_saved - hv_read_reference_counter());
+}
+
+/*
+ * Functions to override save_sched_clock_state and restore_sched_clock_state
+ * functions of x86_platform. The Hyper-V clock counter is reset during
+ * suspend-resume and the offset used to measure time needs to be
+ * corrected, post resume.
+ */
+static void hv_save_sched_clock_state(void)
+{
+ old_save_sched_clock_state();
+ save_hv_clock_tsc_state();
+}
+
+static void hv_restore_sched_clock_state(void)
+{
+ restore_hv_clock_tsc_state();
+ old_restore_sched_clock_state();
+}
+
+static void __init x86_setup_ops_for_tsc_pg_clock(void)
+{
+ if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
+ return;
+
+ old_save_sched_clock_state = x86_platform.save_sched_clock_state;
+ x86_platform.save_sched_clock_state = hv_save_sched_clock_state;
+
+ old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
+ x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
+}
#endif /* CONFIG_HYPERV */
static uint32_t __init ms_hyperv_platform(void)
@@ -575,6 +644,7 @@ static void __init ms_hyperv_init_platform(void)
/* Register Hyper-V specific clocksource */
hv_init_clocksource();
+ x86_setup_ops_for_tsc_pg_clock();
hv_vtl_init_platform();
#endif
/*
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e41..e424892444ed 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -27,7 +27,8 @@
#include <asm/mshyperv.h>
static struct clock_event_device __percpu *hv_clock_event;
-static u64 hv_sched_clock_offset __ro_after_init;
+/* Note: offset can hold negative values after hibernation. */
+static u64 hv_sched_clock_offset __read_mostly;
/*
* If false, we're using the old mechanism for stimer0 interrupts
@@ -456,6 +457,11 @@ static void resume_hv_clock_tsc(struct clocksource *arg)
hv_set_msr(HV_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
}
+void hv_adj_sched_clock_offset(u64 offset)
+{
+ hv_sched_clock_offset -= offset;
+}
+
#ifdef HAVE_VDSO_CLOCKMODE_HVCLOCK
static int hv_cs_enable(struct clocksource *cs)
{
diff --git a/include/clocksource/hyperv_timer.h b/include/clocksource/hyperv_timer.h
index 6cdc873ac907..62e2bad754c0 100644
--- a/include/clocksource/hyperv_timer.h
+++ b/include/clocksource/hyperv_timer.h
@@ -38,6 +38,14 @@ extern void hv_remap_tsc_clocksource(void);
extern unsigned long hv_get_tsc_pfn(void);
extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+/*
+ * Called during resume from hibernation, from overridden
+ * x86_platform.restore_sched_clock_state routine. This is to adjust offsets
+ * used to calculate time for hv tsc page based sched_clock, to account for
+ * time spent before hibernation.
+ */
+extern void hv_adj_sched_clock_offset(u64 offset);
+
static __always_inline bool
hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc, u64 *time)
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.25.1
From: Willem de Bruijn <willemb(a)google.com>
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
Changes v1->v2:
- Fix Cc:
- Add Acks from v1
---
include/linux/virtio_net.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 6c395a2600e8d..276ca543ef44d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -173,7 +173,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
break;
case SKB_GSO_TCPV4:
case SKB_GSO_TCPV6:
- if (skb->csum_offset != offsetof(struct tcphdr, check))
+ if (skb->ip_summed == CHECKSUM_PARTIAL &&
+ skb->csum_offset != offsetof(struct tcphdr, check))
return -EINVAL;
break;
}
--
2.46.0.598.g6f2099f65c-goog
From: Lai Jiangshan <jiangshan.ljs(a)antgroup.com>
Marc Hartmayer reported:
[ 23.133876] Unable to handle kernel pointer dereference in virtual kernel address space
[ 23.133950] Failing address: 0000000000000000 TEID: 0000000000000483
[ 23.133954] Fault in home space mode while using kernel ASCE.
[ 23.133957] AS:000000001b8f0007 R3:0000000056cf4007 S:0000000056cf3800 P:000000000000003d
[ 23.134207] Oops: 0004 ilc:2 [#1] SMP
(snip)
[ 23.134516] Call Trace:
[ 23.134520] [<0000024e326caf28>] worker_thread+0x48/0x430
[ 23.134525] ([<0000024e326caf18>] worker_thread+0x38/0x430)
[ 23.134528] [<0000024e326d3a3e>] kthread+0x11e/0x130
[ 23.134533] [<0000024e3264b0dc>] __ret_from_fork+0x3c/0x60
[ 23.134536] [<0000024e333fb37a>] ret_from_fork+0xa/0x38
[ 23.134552] Last Breaking-Event-Address:
[ 23.134553] [<0000024e333f4c04>] mutex_unlock+0x24/0x30
[ 23.134562] Kernel panic - not syncing: Fatal exception: panic_on_oops
With debuging and analysis, worker_thread() accesses to the nullified
worker->pool when the newly created worker is destroyed before being
waken-up, in which case worker_thread() can see the result detach_worker()
reseting worker->pool to NULL at the begining.
Move the code "worker->pool = NULL;" out from detach_worker() to fix the
problem.
worker->pool had been designed to be constant for regular workers and
changeable for rescuer. To share attaching/detaching code for regular
and rescuer workers and to avoid worker->pool being accessed inadvertently
when the worker has been detached, worker->pool is reset to NULL when
detached no matter the worker is rescuer or not.
To maintain worker->pool being reset after detached, move the code
"worker->pool = NULL;" in the worker thread context after detached.
It is either be in the regular worker thread context after PF_WQ_WORKER
is cleared or in rescuer worker thread context with wq_pool_attach_mutex
held. So it is safe to do so.
Cc: Marc Hartmayer <mhartmay(a)linux.ibm.com>
Link: https://lore.kernel.org/lkml/87wmjj971b.fsf@linux.ibm.com/
Reported-by: Marc Hartmayer <mhartmay(a)linux.ibm.com>
Fixes: f4b7b53c94af ("workqueue: Detach workers directly in idle_cull_fn()")
Cc: stable(a)vger.kernel.org # v6.11+
Signed-off-by: Lai Jiangshan <jiangshan.ljs(a)antgroup.com>
---
kernel/workqueue.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e7b005ff3750..6f2545037e57 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2709,7 +2709,6 @@ static void detach_worker(struct worker *worker)
unbind_worker(worker);
list_del(&worker->node);
- worker->pool = NULL;
}
/**
@@ -2729,6 +2728,7 @@ static void worker_detach_from_pool(struct worker *worker)
mutex_lock(&wq_pool_attach_mutex);
detach_worker(worker);
+ worker->pool = NULL;
mutex_unlock(&wq_pool_attach_mutex);
/* clear leftover flags without pool->lock after it is detached */
@@ -3349,7 +3349,11 @@ static int worker_thread(void *__worker)
if (unlikely(worker->flags & WORKER_DIE)) {
raw_spin_unlock_irq(&pool->lock);
set_pf_worker(false);
-
+ /*
+ * The worker is dead and PF_WQ_WORKER is cleared, worker->pool
+ * shouldn't be accessed, reset it to NULL in case otherwise.
+ */
+ worker->pool = NULL;
ida_free(&pool->worker_ida, worker->id);
return 0;
}
--
2.19.1.6.gb485710b
The violation of atomicity occurs when the brbd_uuid_set_bm function is
executed simultaneously with modifying the value of
device->ldev->md.uuid[UI_BITMAP]. Consider a scenario where, while
device->ldev->md.uuid[UI_BITMAP] passes the validity check when its value
is not zero, the value of device->ldev->md.uuid[UI_BITMAP] is written to
zero. In this case, the check in brbd_uuid_set_bm might refer to the old
value of device->ldev->md.uuid[UI_BITMAP] (before locking), which allows
an invalid value to pass the validity check, resulting in inconsistency.
To address this issue, it is recommended to include the data validity check
within the locked section of the function. This modification ensures that
the value of device->ldev->md.uuid[UI_BITMAP] does not change during the
validation process, thereby maintaining its integrity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency bugs
including data races and atomicity violations.
Fixes: 9f2247bb9b75 ("drbd: Protect accesses to the uuid set with a spinlock")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/block/drbd/drbd_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a9e49b212341..abafc4edf9ed 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3399,10 +3399,12 @@ void drbd_uuid_new_current(struct drbd_device *device) __must_hold(local)
void drbd_uuid_set_bm(struct drbd_device *device, u64 val) __must_hold(local)
{
unsigned long flags;
- if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0)
+ spin_lock_irqsave(&device->ldev->md.uuid_lock, flags);
+ if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0) {
+ spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags);
return;
+ }
- spin_lock_irqsave(&device->ldev->md.uuid_lock, flags);
if (val == 0) {
drbd_uuid_move_history(device);
device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[UI_BITMAP];
--
2.34.1
As we process the second byte of a control transfer, transfers
of less than 2 bytes must be discarded.
This bug is as old as the driver.
SIgned-off-by: Oliver Neukum <oneukum(a)suse.com>
CC: stable(a)vger.kernel.org
---
drivers/usb/misc/cypress_cy7c63.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/usb/misc/cypress_cy7c63.c b/drivers/usb/misc/cypress_cy7c63.c
index cecd7693b741..75f5a740cba3 100644
--- a/drivers/usb/misc/cypress_cy7c63.c
+++ b/drivers/usb/misc/cypress_cy7c63.c
@@ -88,6 +88,9 @@ static int vendor_command(struct cypress *dev, unsigned char request,
USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_OTHER,
address, data, iobuf, CYPRESS_MAX_REQSIZE,
USB_CTRL_GET_TIMEOUT);
+ /* we must not process garbage */
+ if (retval < 2)
+ goto err_buf;
/* store returned data (more READs to be added) */
switch (request) {
@@ -107,6 +110,7 @@ static int vendor_command(struct cypress *dev, unsigned char request,
break;
}
+err_buf:
kfree(iobuf);
error:
return retval;
--
2.45.2
There is a small window during probing when IO is running
but the backlight is not registered. Processing events
during that time will crash. The completion handler
needs to check for a backlight before scheduling work.
The bug is as old as the driver.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
CC: stable(a)vger.kernel.org
---
drivers/usb/misc/appledisplay.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/misc/appledisplay.c b/drivers/usb/misc/appledisplay.c
index c8098e9b432e..62b5a30edc42 100644
--- a/drivers/usb/misc/appledisplay.c
+++ b/drivers/usb/misc/appledisplay.c
@@ -107,7 +107,12 @@ static void appledisplay_complete(struct urb *urb)
case ACD_BTN_BRIGHT_UP:
case ACD_BTN_BRIGHT_DOWN:
pdata->button_pressed = 1;
- schedule_delayed_work(&pdata->work, 0);
+ /*
+ * there is a window during which no device
+ * is registered
+ */
+ if (pdata->bd )
+ schedule_delayed_work(&pdata->work, 0);
break;
case ACD_BTN_NONE:
default:
@@ -202,6 +207,7 @@ static int appledisplay_probe(struct usb_interface *iface,
const struct usb_device_id *id)
{
struct backlight_properties props;
+ struct backlight_device *backlight;
struct appledisplay *pdata;
struct usb_device *udev = interface_to_usbdev(iface);
struct usb_endpoint_descriptor *endpoint;
@@ -272,13 +278,14 @@ static int appledisplay_probe(struct usb_interface *iface,
memset(&props, 0, sizeof(struct backlight_properties));
props.type = BACKLIGHT_RAW;
props.max_brightness = 0xff;
- pdata->bd = backlight_device_register(bl_name, NULL, pdata,
+ backlight = backlight_device_register(bl_name, NULL, pdata,
&appledisplay_bl_data, &props);
- if (IS_ERR(pdata->bd)) {
+ if (IS_ERR(backlight)) {
dev_err(&iface->dev, "Backlight registration failed\n");
- retval = PTR_ERR(pdata->bd);
+ retval = PTR_ERR(backlight);
goto error;
}
+ pdata->bd = backlight;
/* Try to get brightness */
brightness = appledisplay_bl_get_brightness(pdata->bd);
--
2.45.2
Virtual machines under Xen Hypervisor (DomU) running in Xen PV mode use a
special, nonstandard synthetized CPU topology which "just works" under
kernels 6.9.x while newer kernels wrongly assuming a "crash kernel" and
disable SMP (reducing to one CPU core) because the newer topology
implementation produces a wrong error "[Firmware Bug]: APIC enumeration
order not specification compliant" after new topology checks which are
improper for Xen PV platform. As a result, the kernel disables SMT and
activates just one CPU core within the VM (DomU).
The patch disables the regarding checks if it is running in Xen PV
mode (only) and bring back SMP / all CPUs as in the past to such DomU
VMs.
Signed-off-by: Niels Dettenbach <nd(a)syndicat.com>
---
The current behaviour leads all of our production Xen Host platforms
(amd64 - HPE proliant) unusable after updating to newer linux kernels
(with just one CPU available/activated per VM) while older kernels and
other OS (current NetBSD PV DomU) still work fully (and stable since many
years on the platform).
Xen PV mode is still provided by current Xen and widely used - even
if less wide as the newer Xen PVH mode today. So a solution probably
will be required.
So we assume that bug affects stable(a)vger.kernel.org as well.
dmesg from affected kernel:
-- snip --
[ 0.640364] CPU topo: Enumerated BSP APIC 0 is not marked in APICBASE MSR
[ 0.640367] CPU topo: Assuming crash kernel. Limiting to one CPU to prevent machine INIT
[ 0.640368] CPU topo: [Firmware Bug]: APIC enumeration order not specification compliant
[ 0.640376] CPU topo: Max. logical packages: 1
[ 0.640378] CPU topo: Max. logical dies: 1
[ 0.640379] CPU topo: Max. dies per package: 1
[ 0.640386] CPU topo: Max. threads per core: 1
[ 0.640388] CPU topo: Num. cores per package: 1
[ 0.640389] CPU topo: Num. threads per package: 1
[ 0.640390] CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
[ 0.640402] Cannot find an available gap in the 32-bit address range
-- snap --
after patch applied:
-- snip --
[ 0.369439] CPU topo: Max. logical packages: 1
[ 0.369441] CPU topo: Max. logical dies: 1
[ 0.369442] CPU topo: Max. dies per package: 1
[ 0.369450] CPU topo: Max. threads per core: 2
[ 0.369452] CPU topo: Num. cores per package: 3
[ 0.369453] CPU topo: Num. threads per package: 6
[ 0.369453] CPU topo: Allowing 6 present CPUs plus 0 hotplug CPUs
-- snap --
We tested the patch intensely under productive / high load since 2 days now with no issues.
references:
arch/x86/kernel/cpu/topology.c
[line 448]
-- snip --
/*
* XEN PV is special as it does not advertise the local APIC
* properly, but provides a fake topology for it so that the
* infrastructure works. So don't apply the restrictions vs. APIC
* here.
*/
--snap --
--- linux/arch/x86/kernel/cpu/topology.c 2024-09-11 17:42:42.699278317 +0200
+++ linux/arch/x86/kernel/cpu/topology.c.orig 2024-09-11 09:53:16.194095250 +0200
@@ -132,14 +132,6 @@
u64 msr;
/*
- * assume Xen PV has a working (special) topology
- */
- if (xen_pv_domain()) {
- topo_info.real_bsp_apic_id = topo_info.boot_cpu_apic_id;
- return false;
- }
-
- /*
* There is no real good way to detect whether this a kdump()
* kernel, but except on the Voyager SMP monstrosity which is not
* longer supported, the real BSP APIC ID is the first one which is
Witam,
Pomagamy pozyskiwać nowych klientów B2B.
Czy interesuje Państwa dotarcie do nowych potencjalnych partnerów oraz uruchomienie z nimi rozmów handlowch ?
Pozdrawiam
Szymon Jankowski
From: Eliav Bar-ilan <eliavb(a)nvidia.com>
An incorrect argument order calling amd_iommu_dev_flush_pasid_pages()
causes improper flushing of the IOMMU, leaving the old value of GCR3 from
a previous process attached to the same PASID.
The function has the signature:
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
ioasid_t pasid, u64 address, size_t size)
Correct the argument order.
Cc: stable(a)vger.kernel.org
Fixes: 474bf01ed9f0 ("iommu/amd: Add support for device based TLB invalidation")
Signed-off-by: Eliav Bar-ilan <eliavb(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
drivers/iommu/amd/iommu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
This was discovered while testing SVA, but I suppose it is probably a bigger issue.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b19e8c0f48fa25..6bc4030a6ba8ed 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1552,8 +1552,8 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
void amd_iommu_dev_flush_pasid_all(struct iommu_dev_data *dev_data,
ioasid_t pasid)
{
- amd_iommu_dev_flush_pasid_pages(dev_data, 0,
- CMD_INV_IOMMU_ALL_PAGES_ADDRESS, pasid);
+ amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0,
+ CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
}
void amd_iommu_domain_flush_complete(struct protection_domain *domain)
base-commit: cf2840f59119f41de3d9641a8b18a5da1b2cf6bf
--
2.46.0
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline xattr.
The problematic function is ocfs2_reflink_xattr_inline(). By the time this
function is called the reflink tree is already recreated at the destination
inode from the source inode. At this point, this function reserves space
space inline xattrs at the destination inode without even checking if there
is space at the root metadata block. It simply reduces the l_count from 243
to 227 thereby making space of 256 bytes for inline xattr whereas the inode
already has extents beyond this index (in this case upto 230), thereby causing
corruption.
The fix for this is to reserve space for inline metadata before the at the
destination inode before the reflink tree gets recreated. The customer has
verified the fix.
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 3f80a56d0d60..05105d271fc8 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4155,8 +4156,9 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4182,6 +4184,26 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = new_bh->b_data;
+ struct ocfs2_dinode *old_di = old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4189,7 +4211,7 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3b81213ed7b8..a9f716ec89e2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(struct ocfs2_xattr_reflink *args)
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
--
2.39.3
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue()
blk_mq_run_hw_queue() blk_queue_flag_clear(QUEUE_FLAG_QUIESCED) 3) store
if (blk_queue_quiesced()) 2) load blk_mq_run_hw_queues()
return blk_mq_run_hw_queue()
blk_mq_sched_dispatch_requests() if (!blk_mq_hctx_has_pending()) 4) load
return
The full memory barrier should be inserted between 1) and 2), as well as
between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is
cleared or CPU1 sees dispatch list or setting of bitmap of software queue.
Otherwise, either CPU will not re-run the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the
problem, another solution is to 2) use hctx->queue->queue_lock to synchronize
QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not
easy to be maintained.
Fixes: f4560ffe8cec1 ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
---
block/blk-mq.c | 47 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b2d0f22de0c7f..ac39f2a346a52 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2202,6 +2202,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
}
EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
+static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx)
+{
+ bool need_run;
+
+ /*
+ * When queue is quiesced, we may be switching io scheduler, or
+ * updating nr_hw_queues, or other things, and we can't run queue
+ * any more, even blk_mq_hctx_has_pending() can't be called safely.
+ *
+ * And queue will be rerun in blk_mq_unquiesce_queue() if it is
+ * quiesced.
+ */
+ __blk_mq_run_dispatch_ops(hctx->queue, false,
+ need_run = !blk_queue_quiesced(hctx->queue) &&
+ blk_mq_hctx_has_pending(hctx));
+ return need_run;
+}
+
/**
* blk_mq_run_hw_queue - Start to run a hardware queue.
* @hctx: Pointer to the hardware queue to run.
@@ -2222,20 +2240,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING);
- /*
- * When queue is quiesced, we may be switching io scheduler, or
- * updating nr_hw_queues, or other things, and we can't run queue
- * any more, even __blk_mq_hctx_has_pending() can't be called safely.
- *
- * And queue will be rerun in blk_mq_unquiesce_queue() if it is
- * quiesced.
- */
- __blk_mq_run_dispatch_ops(hctx->queue, false,
- need_run = !blk_queue_quiesced(hctx->queue) &&
- blk_mq_hctx_has_pending(hctx));
+ need_run = blk_mq_hw_queue_need_run(hctx);
+ if (!need_run) {
+ unsigned long flags;
- if (!need_run)
- return;
+ /*
+ * Synchronize with blk_mq_unquiesce_queue(), becuase we check
+ * if hw queue is quiesced locklessly above, we need the use
+ * ->queue_lock to make sure we see the up-to-date status to
+ * not miss rerunning the hw queue.
+ */
+ spin_lock_irqsave(&hctx->queue->queue_lock, flags);
+ need_run = blk_mq_hw_queue_need_run(hctx);
+ spin_unlock_irqrestore(&hctx->queue->queue_lock, flags);
+
+ if (!need_run)
+ return;
+ }
if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) {
blk_mq_delay_run_hw_queue(hctx, 0);
--
2.20.1
virtual machines under Xen Hypervisor (DomU) running in Xen PV mode use a
special, nonstandard synthetized CPU topology which "just works" under
kernels 6.9.x while newer kernels assuming a "crash kernel" and disable
SMT (reducing to one CPU core) because the newer topology implementation
produces a wrong error "[Firmware Bug]: APIC enumeration order not
specification compliant" after new topology checks which are improper for
Xen PV platform. As a result, the kernel disables SMT and activates just
one CPU core within the VM (DomU).
The patch disables the regarding checks if it is running in Xen PV
mode (only) and bring back SMT / all CPUs as in the past to such DomU
VMs.
Signed-off-by: Niels Dettenbach <nd(a)syndicat.com>
---
The current behaviour leads all of our production Xen Host platforms
unusable after updating to newer linux kernels (with just one CPU
available/activated per VM) while older kernels other OS still work
fully (and stable since many years on the platform).
Xen PV mode is still provided by current Xen and widely used - even
if less wide as the newer Xen PVH mode today. So a solution probably
will be required.
So we assume that bug affects stable(a)vger.kernel.org as well.
dmesg from affected kernel:
-- snip --
Sep 10 14:35:50 ffm1 kernel: [ 0.640364] CPU topo: Enumerated BSP APIC 0 is not marked in APICBASE MSR
Sep 10 14:35:50 ffm1 kernel: [ 0.640367] CPU topo: Assuming crash kernel. Limiting to one CPU to prevent machine INIT
Sep 10 14:35:50 ffm1 kernel: [ 0.640368] CPU topo: [Firmware Bug]: APIC enumeration order not specification compliant
Sep 10 14:35:50 ffm1 kernel: [ 0.640376] CPU topo: Max. logical packages: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640378] CPU topo: Max. logical dies: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640379] CPU topo: Max. dies per package: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640386] CPU topo: Max. threads per core: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640388] CPU topo: Num. cores per package: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640389] CPU topo: Num. threads per package: 1
Sep 10 14:35:50 ffm1 kernel: [ 0.640390] CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
Sep 10 14:35:50 ffm1 kernel: [ 0.640402] Cannot find an available gap in the 32-bit address range
-- snap --
We tested the patch intensely under productive / high load since 2 days now with no issues.
references:
arch/x86/kernel/cpu/topology.c
[line 448]
--- snip ---
/*
* XEN PV is special as it does not advertise the local APIC
* properly, but provides a fake topology for it so that the
* infrastructure works. So don't apply the restrictions vs. APIC
* here.
*/
---snap ---
Related errors / tickets:
https://forum.qubes-os.org/t/fedora-sees-only-1-cpu-core-after-updating-the…
--- linux/arch/x86/kernel/cpu/topology.c.orig 2024-09-11 09:53:16.194095250 +0200
+++ linux/arch/x86/kernel/cpu/topology.c 2024-09-11 09:55:17.338448094 +0200
@@ -158,7 +158,7 @@ static __init bool check_for_real_bsp(u3
is_bsp = !!(msr & MSR_IA32_APICBASE_BSP);
}
- if (apic_id == topo_info.boot_cpu_apic_id) {
+ if (!xen_pv_domain() && apic_id == topo_info.boot_cpu_apic_id) {
/*
* If the boot CPU has the APIC BSP bit set then the
* firmware enumeration is agreeing. If the CPU does not
@@ -185,7 +185,7 @@ static __init bool check_for_real_bsp(u3
pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x != %x\n",
topo_info.boot_cpu_apic_id, apic_id);
- if (is_bsp) {
+ if (!xen_pv_domain() && is_bsp) {
/*
* The boot CPU has the APIC BSP bit set. Use it and complain
* about the broken firmware enumeration.
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline xattr.
The problematic function is ocfs2_reflink_xattr_inline(). By the time this
function is called the reflink tree is already recreated at the destination
inode from the source inode. At this point, this function reserves space
space inline xattrs at the destination inode without even checking if there
is space at the root metadata block. It simply reduces the l_count from 243
to 227 thereby making space of 256 bytes for inline xattr whereas the inode
already has extents beyond this index (in this case upto 230), thereby causing
corruption.
The fix for this is to reserve space for inline metadata before the at the
destination inode before the reflink tree gets recreated. The customer has
verified the fix.
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 3f80a56d0d60..1d427da06bee 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4155,8 +4156,9 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4182,6 +4184,26 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *ni = OCFS2_I(new_inode);
+
+ if (!(ni->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = new_bh->b_data;
+ struct ocfs2_dinode *old_di = old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4189,7 +4211,7 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3b81213ed7b8..a9f716ec89e2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(struct ocfs2_xattr_reflink *args)
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
--
2.39.3
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline xattr.
The problematic function is ocfs2_reflink_xattr_inline(). By the time this
function is called the reflink tree is already recreated at the destination
inode from the source inode. At this point, this function reserves space
space inline xattrs at the destination inode without even checking if there
is space at the root metadata block. It simply reduces the l_count from 243
to 227 thereby making space of 256 bytes for inline xattr whereas the inode
already has extents beyond this index (in this case upto 230), thereby causing
corruption.
The fix for this is to reserve space for inline metadata before the at the
destination inode before the reflink tree gets recreated. The customer has
verified the fix.
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 3f80a56d0d60..1d427da06bee 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4155,8 +4156,9 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4182,6 +4184,26 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *ni = OCFS2_I(new_inode);
+
+ if (!(ni->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = new_bh->b_data;
+ struct ocfs2_dinode *old_di = old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4189,7 +4211,7 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3b81213ed7b8..a9f716ec89e2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(struct ocfs2_xattr_reflink *args)
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
--
2.39.3
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
commit 61cbfb5368dd50ed0d65ce21d305aa923581db2b upstream
The cfg pointer is set before reading the channel number that the
configuration should point to. This causes configurations to be shifted
by one channel.
For example setting bipolar to the first channel defined in the DT will
cause bipolar mode to be active on the second defined channel.
Fix by moving the cfg pointer setting after reading the channel number.
Fixes: 7b8d045e497a ("iio: adc: ad7124: allow more than 8 channels")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240806085133.114547-1-dumitru.ceclan@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: He Lugang <helugang(a)uniontech.com>
---
drivers/iio/adc/ad7124.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index e7b1d517d3de..089398a7664a 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -837,8 +837,6 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels = channels;
device_for_each_child_node_scoped(dev, child) {
- cfg = &st->channels[channel].cfg;
-
ret = fwnode_property_read_u32(child, "reg", &channel);
if (ret)
return ret;
@@ -856,6 +854,7 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels[channel].ain = AD7124_CHANNEL_AINP(ain[0]) |
AD7124_CHANNEL_AINM(ain[1]);
+ cfg = &st->channels[channel].cfg;
cfg->bipolar = fwnode_property_read_bool(child, "bipolar");
ret = fwnode_property_read_u32(child, "adi,reference-select", &tmp);
--
2.45.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 61cbfb5368dd50ed0d65ce21d305aa923581db2b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090914-province-underdone-1eea@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
61cbfb5368dd ("iio: adc: ad7124: fix DT configuration parsing")
a6eaf02b8274 ("iio: adc: ad7124: Switch from of specific to fwnode based property handling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61cbfb5368dd50ed0d65ce21d305aa923581db2b Mon Sep 17 00:00:00 2001
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
Date: Tue, 6 Aug 2024 11:51:33 +0300
Subject: [PATCH] iio: adc: ad7124: fix DT configuration parsing
The cfg pointer is set before reading the channel number that the
configuration should point to. This causes configurations to be shifted
by one channel.
For example setting bipolar to the first channel defined in the DT will
cause bipolar mode to be active on the second defined channel.
Fix by moving the cfg pointer setting after reading the channel number.
Fixes: 7b8d045e497a ("iio: adc: ad7124: allow more than 8 channels")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240806085133.114547-1-dumitru.ceclan@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index afb5f4d741e6..108e9ccab1ef 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -844,8 +844,6 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels = channels;
device_for_each_child_node_scoped(dev, child) {
- cfg = &st->channels[channel].cfg;
-
ret = fwnode_property_read_u32(child, "reg", &channel);
if (ret)
return ret;
@@ -863,6 +861,7 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels[channel].ain = AD7124_CHANNEL_AINP(ain[0]) |
AD7124_CHANNEL_AINM(ain[1]);
+ cfg = &st->channels[channel].cfg;
cfg->bipolar = fwnode_property_read_bool(child, "bipolar");
ret = fwnode_property_read_u32(child, "adi,reference-select", &tmp);
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
From: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
[ Upstream commit 61cbfb5368dd50ed0d65ce21d305aa923581db2b ]
The cfg pointer is set before reading the channel number that the
configuration should point to. This causes configurations to be shifted
by one channel.
For example setting bipolar to the first channel defined in the DT will
cause bipolar mode to be active on the second defined channel.
Fix by moving the cfg pointer setting after reading the channel number.
Fixes: 7b8d045e497a ("iio: adc: ad7124: allow more than 8 channels")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240806085133.114547-1-dumitru.ceclan@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: He Lugang <helugang(a)uniontech.com>
---
drivers/iio/adc/ad7124.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index e7b1d517d3de..089398a7664a 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -837,8 +837,6 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels = channels;
device_for_each_child_node_scoped(dev, child) {
- cfg = &st->channels[channel].cfg;
-
ret = fwnode_property_read_u32(child, "reg", &channel);
if (ret)
return ret;
@@ -856,6 +854,7 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels[channel].ain = AD7124_CHANNEL_AINP(ain[0]) |
AD7124_CHANNEL_AINM(ain[1]);
+ cfg = &st->channels[channel].cfg;
cfg->bipolar = fwnode_property_read_bool(child, "bipolar");
ret = fwnode_property_read_u32(child, "adi,reference-select", &tmp);
--
2.45.2
The number of transmit and receive descriptors must be a multiple of 128
due to the hardware limitation. If it is set to a multiple of 8 instead of
a multiple 128, the queues will easily be hung.
Cc: stable(a)vger.kernel.org
Fixes: 883b5984a5d2 ("net: wangxun: add ethtool_ops for ring parameters")
Signed-off-by: Jiawen Wu <jiawenwu(a)trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_type.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index 1d57b047817b..b54bffda027b 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -426,9 +426,9 @@ enum WX_MSCA_CMD_value {
#define WX_MIN_RXD 128
#define WX_MIN_TXD 128
-/* Number of Transmit and Receive Descriptors must be a multiple of 8 */
-#define WX_REQ_RX_DESCRIPTOR_MULTIPLE 8
-#define WX_REQ_TX_DESCRIPTOR_MULTIPLE 8
+/* Number of Transmit and Receive Descriptors must be a multiple of 128 */
+#define WX_REQ_RX_DESCRIPTOR_MULTIPLE 128
+#define WX_REQ_TX_DESCRIPTOR_MULTIPLE 128
#define WX_MAX_JUMBO_FRAME_SIZE 9432 /* max payload 9414 */
#define VMDQ_P(p) p
--
2.27.0
Here are some various fixes for the MPTCP selftests.
Patch 1 fixes a recently modified test to continue to work as expected
on older kernels. This is a fix for a recent fix that can be backported
up to v5.15.
Patch 2 and 3 include dependences when exporting or installing the
tests. Two fixes for v6.11-rc1.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
selftests: mptcp: join: restrict fullmesh endp on 1st sf
selftests: mptcp: include lib.sh file
selftests: mptcp: include net_helper.sh file
tools/testing/selftests/net/mptcp/Makefile | 2 ++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
---
base-commit: 48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
change-id: 20240910-net-selftests-mptcp-fix-install-2b82ae5a99c8
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
It is possible that the usb power_supply is registered after the probe
of dwc3. In this case, trying to get the usb power_supply during the
probe will fail and there is no chance to try again. Also the usb
power_supply might be unregistered at anytime so that the handle of it
in dwc3 would become invalid. To fix this, get the handle right before
calling to power_supply functions and put it afterward.
Fixes: 6f0764b5adea ("usb: dwc3: add a power supply for current control")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kyle Tso <kyletso(a)google.com>
---
drivers/usb/dwc3/core.c | 25 +++++--------------------
drivers/usb/dwc3/core.h | 4 ++--
drivers/usb/dwc3/gadget.c | 19 ++++++++++++++-----
3 files changed, 21 insertions(+), 27 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..ab563edd9b4c 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1631,8 +1631,6 @@ static void dwc3_get_properties(struct dwc3 *dwc)
u8 tx_thr_num_pkt_prd = 0;
u8 tx_max_burst_prd = 0;
u8 tx_fifo_resize_max_num;
- const char *usb_psy_name;
- int ret;
/* default to highest possible threshold */
lpm_nyet_threshold = 0xf;
@@ -1667,12 +1665,7 @@ static void dwc3_get_properties(struct dwc3 *dwc)
dwc->sys_wakeup = device_may_wakeup(dwc->sysdev);
- ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name);
- if (ret >= 0) {
- dwc->usb_psy = power_supply_get_by_name(usb_psy_name);
- if (!dwc->usb_psy)
- dev_err(dev, "couldn't get usb power supply\n");
- }
+ device_property_read_string(dev, "usb-psy-name", &dwc->usb_psy_name);
dwc->has_lpm_erratum = device_property_read_bool(dev,
"snps,has-lpm-erratum");
@@ -2133,18 +2126,16 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_get_software_properties(dwc);
dwc->reset = devm_reset_control_array_get_optional_shared(dev);
- if (IS_ERR(dwc->reset)) {
- ret = PTR_ERR(dwc->reset);
- goto err_put_psy;
- }
+ if (IS_ERR(dwc->reset))
+ return PTR_ERR(dwc->reset);
ret = dwc3_get_clocks(dwc);
if (ret)
- goto err_put_psy;
+ return ret;
ret = reset_control_deassert(dwc->reset);
if (ret)
- goto err_put_psy;
+ return ret;
ret = dwc3_clk_enable(dwc);
if (ret)
@@ -2245,9 +2236,6 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_clk_disable(dwc);
err_assert_reset:
reset_control_assert(dwc->reset);
-err_put_psy:
- if (dwc->usb_psy)
- power_supply_put(dwc->usb_psy);
return ret;
}
@@ -2276,9 +2264,6 @@ static void dwc3_remove(struct platform_device *pdev)
pm_runtime_set_suspended(&pdev->dev);
dwc3_free_event_buffers(dwc);
-
- if (dwc->usb_psy)
- power_supply_put(dwc->usb_psy);
}
#ifdef CONFIG_PM
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..ecfe2cc224f7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1045,7 +1045,7 @@ struct dwc3_scratchpad_array {
* @role_sw: usb_role_switch handle
* @role_switch_default_mode: default operation mode of controller while
* usb role is USB_ROLE_NONE.
- * @usb_psy: pointer to power supply interface.
+ * @usb_psy_name: name of the usb power supply interface.
* @usb2_phy: pointer to USB2 PHY
* @usb3_phy: pointer to USB3 PHY
* @usb2_generic_phy: pointer to array of USB2 PHYs
@@ -1223,7 +1223,7 @@ struct dwc3 {
struct usb_role_switch *role_sw;
enum usb_dr_mode role_switch_default_mode;
- struct power_supply *usb_psy;
+ const char *usb_psy_name;
u32 fladj;
u32 ref_clk_per;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..c89b5b5a64cf 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3049,20 +3049,29 @@ static void dwc3_gadget_set_ssp_rate(struct usb_gadget *g,
static int dwc3_gadget_vbus_draw(struct usb_gadget *g, unsigned int mA)
{
- struct dwc3 *dwc = gadget_to_dwc(g);
+ struct dwc3 *dwc = gadget_to_dwc(g);
+ struct power_supply *usb_psy;
union power_supply_propval val = {0};
int ret;
if (dwc->usb2_phy)
return usb_phy_set_power(dwc->usb2_phy, mA);
- if (!dwc->usb_psy)
+ if (!dwc->usb_psy_name)
return -EOPNOTSUPP;
- val.intval = 1000 * mA;
- ret = power_supply_set_property(dwc->usb_psy, POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT, &val);
+ usb_psy = power_supply_get_by_name(dwc->usb_psy_name);
+ if (usb_psy) {
+ val.intval = 1000 * mA;
+ ret = power_supply_set_property(usb_psy, POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT,
+ &val);
+ power_supply_put(usb_psy);
+ return ret;
+ }
- return ret;
+ dev_err(dwc->dev, "couldn't get usb power supply\n");
+
+ return -EOPNOTSUPP;
}
/**
--
2.45.2.993.g49e7a77208-goog
The qcom_geni_serial_poll_bit() can be used to wait for events like
command completion and is supposed to wait for the time it takes to
clear a full fifo before timing out.
As noted by Doug, the current implementation does not account for start,
stop and parity bits when determining the timeout. The helper also does
not currently account for the shift register and the two-word
intermediate transfer register.
A too short timeout can specifically lead to lost characters when
waiting for a transfer to complete as the transfer is cancelled on
timeout.
Instead of determining the poll timeout on every call, store the fifo
timeout when updating it in set_termios() and make sure to take the
shift and intermediate registers into account. Note that serial core has
already added a 20 ms margin to the fifo timeout.
Also note that the current uart_fifo_timeout() interface does
unnecessary calculations on every call and did not exist in earlier
kernels so only store its result once. This facilitates backports too as
earlier kernels can derive the timeout from uport->timeout, which has
since been removed.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 31 +++++++++++++++------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 69a632fefc41..309c0bddf26a 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -124,7 +124,7 @@ struct qcom_geni_serial_port {
dma_addr_t tx_dma_addr;
dma_addr_t rx_dma_addr;
bool setup;
- unsigned int baud;
+ unsigned long poll_timeout_us;
unsigned long clk_rate;
void *rx_buf;
u32 loopback;
@@ -270,22 +270,13 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport,
{
u32 reg;
struct qcom_geni_serial_port *port;
- unsigned int baud;
- unsigned int fifo_bits;
unsigned long timeout_us = 20000;
struct qcom_geni_private_data *private_data = uport->private_data;
if (private_data->drv) {
port = to_dev_port(uport);
- baud = port->baud;
- if (!baud)
- baud = 115200;
- fifo_bits = port->tx_fifo_depth * port->tx_fifo_width;
- /*
- * Total polling iterations based on FIFO worth of bytes to be
- * sent at current baud. Add a little fluff to the wait.
- */
- timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500;
+ if (port->poll_timeout_us)
+ timeout_us = port->poll_timeout_us;
}
/*
@@ -1244,11 +1235,11 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport,
unsigned long clk_rate;
u32 ver, sampling_rate;
unsigned int avg_bw_core;
+ unsigned long timeout;
qcom_geni_serial_stop_rx(uport);
/* baud rate */
baud = uart_get_baud_rate(uport, termios, old, 300, 4000000);
- port->baud = baud;
sampling_rate = UART_OVERSAMPLING;
/* Sampling rate is halved for IP versions >= 2.5 */
@@ -1326,9 +1317,21 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport,
else
tx_trans_cfg |= UART_CTS_MASK;
- if (baud)
+ if (baud) {
uart_update_timeout(uport, termios->c_cflag, baud);
+ /*
+ * Make sure that qcom_geni_serial_poll_bitfield() waits for
+ * the FIFO, two-word intermediate transfer register and shift
+ * register to clear.
+ *
+ * Note that uart_fifo_timeout() also adds a 20 ms margin.
+ */
+ timeout = jiffies_to_usecs(uart_fifo_timeout(uport));
+ timeout += 3 * timeout / port->tx_fifo_depth;
+ WRITE_ONCE(port->poll_timeout_us, timeout);
+ }
+
if (!uart_console(uport))
writel(port->loopback,
uport->membase + SE_UART_LOOPBACK_CFG);
--
2.44.2
Avoid unnecessary nested min()/max() which results in egregious macro
expansion. Use clamp_t() as this introduces the least possible expansion.
Not doing so results in an impact on build times.
This resolves an issue with slackware 15.0 32-bit compilation as reported
by Richard Narron.
Presumably the min/max fixups would be difficult to backport, this patch
should be easier and fix's Richard's problem in 5.15.
Reported-by: Richard Narron <richard(a)aaazen.com>
Closes: https://lore.kernel.org/all/4a5321bd-b1f-1832-f0c-cea8694dc5aa@aaazen.com/
Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
---
.../staging/media/atomisp/pci/sh_css_frac.h | 26 ++++++++++++++-----
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/media/atomisp/pci/sh_css_frac.h b/drivers/staging/media/atomisp/pci/sh_css_frac.h
index b90b5b330dfa..8ba65161f7a9 100644
--- a/drivers/staging/media/atomisp/pci/sh_css_frac.h
+++ b/drivers/staging/media/atomisp/pci/sh_css_frac.h
@@ -32,12 +32,24 @@
#define uISP_VAL_MAX ((unsigned int)((1 << uISP_REG_BIT) - 1))
/* a:fraction bits for 16bit precision, b:fraction bits for ISP precision */
-#define sDIGIT_FITTING(v, a, b) \
- min_t(int, max_t(int, (((v) >> sSHIFT) >> max(sFRACTION_BITS_FITTING(a) - (b), 0)), \
- sISP_VAL_MIN), sISP_VAL_MAX)
-#define uDIGIT_FITTING(v, a, b) \
- min((unsigned int)max((unsigned)(((v) >> uSHIFT) \
- >> max((int)(uFRACTION_BITS_FITTING(a) - (b)), 0)), \
- uISP_VAL_MIN), uISP_VAL_MAX)
+static inline int sDIGIT_FITTING(int v, int a, int b)
+{
+ int fit_shift = sFRACTION_BITS_FITTING(a) - b;
+
+ v >>= sSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(int, v, sISP_VAL_MIN, sISP_VAL_MAX);
+}
+
+static inline unsigned int uDIGIT_FITTING(unsigned int v, int a, int b)
+{
+ int fit_shift = uFRACTION_BITS_FITTING(a) - b;
+
+ v >>= uSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(unsigned int, v, uISP_VAL_MIN, uISP_VAL_MAX);
+}
#endif /* __SH_CSS_FRAC_H */
--
2.46.0
Avoid unnecessary nested min()/max() which reults in egregious macro
expansion.
The nesting occurs when NET_SKB_PAD is used in a min() or max() invocation,
for instance, various ethernet drivers wrap it in a max().
Not doing so results in an impact on build times.
Cc: stable(a)vger.kernel.org
Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
---
include/linux/skbuff.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 29c3ea5b6e93..d53b296df504 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3164,7 +3164,11 @@ static inline int pskb_network_may_pull(struct sk_buff *skb, unsigned int len)
* NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8)
*/
#ifndef NET_SKB_PAD
-#define NET_SKB_PAD max(32, L1_CACHE_BYTES)
+#if L1_CACHE_BYTES < 32
+#define NET_SKB_PAD 32
+#else
+#define NET_SKB_PAD L1_CACHE_BYTES
+#endif
#endif
int ___pskb_trim(struct sk_buff *skb, unsigned int len);
--
2.46.0
Avoid unnecessary nested min()/max() which reults in egregious macro
expansion. Use clamp_t() as this introduces the least possible expansion.
Not doing so results in an impact on build times.
Cc: stable(a)vger.kernel.org
Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants")
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index e809f91c08fb..8b431f90efc3 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -23,7 +23,7 @@
/* The PacketOffset field is measured in units of 32 bytes and is 3 bits wide,
* so the maximum offset is 7 * 32 = 224
*/
-#define MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)
+#define MVPP2_SKB_HEADROOM clamp_t(int, XDP_PACKET_HEADROOM, NET_SKB_PAD, 224)
#define MVPP2_XDP_PASS 0
#define MVPP2_XDP_DROPPED BIT(0)
--
2.46.0
Avoid nested min()/max() which results in egregious macro expansion.
This issue was introduced by commit 867046cc7027 ("minmax: relax check to
allow comparison between unsigned arguments and signed constants") [2].
Work has been done to address the issue of egregious min()/max() macro
expansion in commit 22f546873149 ("minmax: improve macro expansion and type
checking") and related, however it appears that some issues remain on more
tightly constrained systems.
Adjust a few known-bad cases of deeply nested macros to avoid doing so to
mitigate this. Porting the patch first proposed in [1] to Linus's tree.
Running an allmodconfig build using the methodology described in [2] we
observe a 35 MiB reduction in generated code.
The difference is much more significant prior to recent minmax fixes which
were not backported. As per [1] prior these the reduction is more like 200
MiB.
This resolves an issue with slackware 15.0 32-bit compilation as reported
by Richard Narron.
Presumably the min/max fixups would be difficult to backport, this patch
should be easier and fix's Richard's problem in 5.15.
[0]:https://lore.kernel.org/all/b97faef60ad24922b530241c5d7c933c@AcuMS.acula…
[1]:https://lore.kernel.org/lkml/5882b96e-1287-4390-8174-3316d39038ef@lucife…
[2]:https://lore.kernel.org/linux-mm/36aa2cad-1db1-4abf-8dd2-fb20484aabc3@lu…
Reported-by: Richard Narron <richard(a)aaazen.com>
Closes: https://lore.kernel.org/all/4a5321bd-b1f-1832-f0c-cea8694dc5aa@aaazen.com/
Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +-
.../staging/media/atomisp/pci/sh_css_frac.h | 26 ++++++++++++++-----
include/linux/skbuff.h | 6 ++++-
3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index e809f91c08fb..8b431f90efc3 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -23,7 +23,7 @@
/* The PacketOffset field is measured in units of 32 bytes and is 3 bits wide,
* so the maximum offset is 7 * 32 = 224
*/
-#define MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)
+#define MVPP2_SKB_HEADROOM clamp_t(int, XDP_PACKET_HEADROOM, NET_SKB_PAD, 224)
#define MVPP2_XDP_PASS 0
#define MVPP2_XDP_DROPPED BIT(0)
diff --git a/drivers/staging/media/atomisp/pci/sh_css_frac.h b/drivers/staging/media/atomisp/pci/sh_css_frac.h
index b90b5b330dfa..a973394c5bc0 100644
--- a/drivers/staging/media/atomisp/pci/sh_css_frac.h
+++ b/drivers/staging/media/atomisp/pci/sh_css_frac.h
@@ -32,12 +32,24 @@
#define uISP_VAL_MAX ((unsigned int)((1 << uISP_REG_BIT) - 1))
/* a:fraction bits for 16bit precision, b:fraction bits for ISP precision */
-#define sDIGIT_FITTING(v, a, b) \
- min_t(int, max_t(int, (((v) >> sSHIFT) >> max(sFRACTION_BITS_FITTING(a) - (b), 0)), \
- sISP_VAL_MIN), sISP_VAL_MAX)
-#define uDIGIT_FITTING(v, a, b) \
- min((unsigned int)max((unsigned)(((v) >> uSHIFT) \
- >> max((int)(uFRACTION_BITS_FITTING(a) - (b)), 0)), \
- uISP_VAL_MIN), uISP_VAL_MAX)
+static inline int sDIGIT_FITTING(short v, int a, int b)
+{
+ int fit_shift = sFRACTION_BITS_FITTING(a) - b;
+
+ v >>= sSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(int, v, sISP_VAL_MIN, sISP_VAL_MAX);
+}
+
+static inline unsigned int uDIGIT_FITTING(unsigned int v, int a, int b)
+{
+ int fit_shift = uFRACTION_BITS_FITTING(a) - b;
+
+ v >>= uSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(unsigned int, v, uISP_VAL_MIN, uISP_VAL_MAX);
+}
#endif /* __SH_CSS_FRAC_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 29c3ea5b6e93..d53b296df504 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3164,7 +3164,11 @@ static inline int pskb_network_may_pull(struct sk_buff *skb, unsigned int len)
* NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8)
*/
#ifndef NET_SKB_PAD
-#define NET_SKB_PAD max(32, L1_CACHE_BYTES)
+#if L1_CACHE_BYTES < 32
+#define NET_SKB_PAD 32
+#else
+#define NET_SKB_PAD L1_CACHE_BYTES
+#endif
#endif
int ___pskb_trim(struct sk_buff *skb, unsigned int len);
--
2.46.0
This patch addresses a reference count handling issue in the
ice_dpll_init_rclk_pins() function. The function calls ice_dpll_get_pins(),
which increments the reference count of the relevant resources. However,
if the condition WARN_ON((!vsi || !vsi->netdev)) is met, the function
currently returns an error without properly releasing the resources
acquired by ice_dpll_get_pins(), leading to a reference count leak.
To resolve this, the check has been moved to the top of the function. This
ensures that the function verifies the state before any resources are
acquired, avoiding the need for additional resource management in the
error path.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)outlook.com>
---
v2:
* In this patch v2, the check for vsi and vsi->netdev has been moved to
the top of the function to simplify error handling and avoid the need for
resource unwinding.
Thanks to Simon Horman for suggesting this improvement.
---
drivers/net/ethernet/intel/ice/ice_dpll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index e92be6f130a3..b403d55303b8 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -1626,6 +1626,8 @@ ice_dpll_init_rclk_pins(struct ice_pf *pf, struct ice_dpll_pin *pin,
struct dpll_pin *parent;
int ret, i;
+ if (WARN_ON((!vsi || !vsi->netdev)))
+ return -EINVAL;
ret = ice_dpll_get_pins(pf, pin, start_idx, ICE_DPLL_RCLK_NUM_PER_PF,
pf->dplls.clock_id);
if (ret)
@@ -1641,8 +1643,6 @@ ice_dpll_init_rclk_pins(struct ice_pf *pf, struct ice_dpll_pin *pin,
if (ret)
goto unregister_pins;
}
- if (WARN_ON((!vsi || !vsi->netdev)))
- return -EINVAL;
dpll_netdev_pin_set(vsi->netdev, pf->dplls.rclk.pin);
return 0;
--
2.25.1
From: Roman Li <Roman.Li(a)amd.com>
[WHY]
RCG state of IPX in idle is more stable for DCN351 and some variants of
DCN35 than IPS2.
[HOW]
Rework dm_get_default_ips_mode() to specify default per ASIC and update
DCN35/DCN351 defaults accordingly.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sun peng Li <sunpeng.li(a)amd.com>
Signed-off-by: Roman Li <Roman.Li(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 50 ++++++++++++-------
1 file changed, 33 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c77982245f60..c50880422502 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1771,25 +1771,41 @@ static struct dml2_soc_bb *dm_dmub_get_vbios_bounding_box(struct amdgpu_device *
static enum dmub_ips_disable_type dm_get_default_ips_mode(
struct amdgpu_device *adev)
{
- /*
- * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
- * cause a hard hang. A fix exists for newer PMFW.
- *
- * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
- * IPS state in all cases, except for s0ix and all displays off (DPMS),
- * where IPS2 is allowed.
- *
- * When checking pmfw version, use the major and minor only.
- */
- if (amdgpu_ip_version(adev, DCE_HWIP, 0) == IP_VERSION(3, 5, 0) &&
- (adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
- return DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ enum dmub_ips_disable_type ret = DMUB_IPS_ENABLE;
- if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 5, 0))
- return DMUB_IPS_ENABLE;
+ switch (amdgpu_ip_version(adev, DCE_HWIP, 0)) {
+ case IP_VERSION(3, 5, 0):
+ /*
+ * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
+ * cause a hard hang. A fix exists for newer PMFW.
+ *
+ * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
+ * IPS state in all cases, except for s0ix and all displays off (DPMS),
+ * where IPS2 is allowed.
+ *
+ * When checking pmfw version, use the major and minor only.
+ */
+ if ((adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ else if (amdgpu_ip_version(adev, GC_HWIP, 0) > IP_VERSION(11, 5, 0))
+ /*
+ * Other ASICs with DCN35 that have residency issues with
+ * IPS2 in idle.
+ * We want them to use IPS2 only in display off cases.
+ */
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ break;
+ case IP_VERSION(3, 5, 1):
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ break;
+ default:
+ /* ASICs older than DCN35 do not have IPSs */
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) < IP_VERSION(3, 5, 0))
+ ret = DMUB_IPS_DISABLE_ALL;
+ break;
+ }
- /* ASICs older than DCN35 do not have IPSs */
- return DMUB_IPS_DISABLE_ALL;
+ return ret;
}
static int amdgpu_dm_init(struct amdgpu_device *adev)
--
2.34.1
From: Charlene Liu <Charlene.Liu(a)amd.com>
[WHY & HOW]
1) We did linear/non linear transition properly long ago
2) We used that path to handle SystemDisplayEnable
3) We fixed a SystemDisplayEnable inability to fallback to passive by
impacting the transition flow generically
4) AFMF later relied on the generic transition behavior
Separating the two flows to make (3) non-generic is the best immediate
coarse of action.
DC can discern SSAMPO3 very easily from SDE.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Chris Park <chris.park(a)amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 6 +++---
drivers/gpu/drm/amd/display/dc/dc.h | 1 +
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 67812fbbb006..a1652130e4be 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2376,7 +2376,7 @@ static bool is_surface_in_context(
return false;
}
-static enum surface_update_type get_plane_info_update_type(const struct dc_surface_update *u)
+static enum surface_update_type get_plane_info_update_type(const struct dc *dc, const struct dc_surface_update *u)
{
union surface_update_flags *update_flags = &u->surface->update_flags;
enum surface_update_type update_type = UPDATE_TYPE_FAST;
@@ -2455,7 +2455,7 @@ static enum surface_update_type get_plane_info_update_type(const struct dc_surfa
/* todo: below are HW dependent, we should add a hook to
* DCE/N resource and validated there.
*/
- if (u->plane_info->tiling_info.gfx9.swizzle != DC_SW_LINEAR) {
+ if (!dc->debug.skip_full_updated_if_possible) {
/* swizzled mode requires RQ to be setup properly,
* thus need to run DML to calculate RQ settings
*/
@@ -2547,7 +2547,7 @@ static enum surface_update_type det_surface_update(const struct dc *dc,
update_flags->raw = 0; // Reset all flags
- type = get_plane_info_update_type(u);
+ type = get_plane_info_update_type(dc, u);
elevate_update_type(&overall_type, type);
type = get_scaling_info_update_type(dc, u);
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h
index e659f4fed19f..78ebe636389e 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -1060,6 +1060,7 @@ struct dc_debug_options {
bool enable_ips_visual_confirm;
unsigned int sharpen_policy;
unsigned int scale_to_sharpness_policy;
+ bool skip_full_updated_if_possible;
};
--
2.34.1
From: Zhikai Zhai <zhikai.zhai(a)amd.com>
[WHY]
It makes DSC enable when we commit the stream which need
keep power off, and then it will skip to disable DSC if
pipe reset at this situation as power has been off. It may
cause the DSC unexpected enable on the pipe with the
next new stream which doesn't support DSC.
[HOW]
Check the DSC used on current pipe status when update stream.
Skip to enable if it has been off. The operation enable
DSC should happen when set power on.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Wenjing Liu <wenjing.liu(a)amd.com>
Signed-off-by: Zhikai Zhai <zhikai.zhai(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
.../drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c | 14 ++++++++++++++
.../drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 13 +++++++++++++
2 files changed, 27 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index a36e11606f90..2e8c9f738259 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -1032,6 +1032,20 @@ void dcn32_update_dsc_on_stream(struct pipe_ctx *pipe_ctx, bool enable)
struct dsc_config dsc_cfg;
struct dsc_optc_config dsc_optc_cfg = {0};
enum optc_dsc_mode optc_dsc_mode;
+ struct dcn_dsc_state dsc_state = {0};
+
+ if (!dsc) {
+ DC_LOG_DSC("DSC is NULL for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+
+ if (dsc->funcs->dsc_read_state) {
+ dsc->funcs->dsc_read_state(dsc, &dsc_state);
+ if (!dsc_state.dsc_fw_en) {
+ DC_LOG_DSC("DSC has been disabled for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+ }
/* Enable DSC hw block */
dsc_cfg.pic_width = (stream->timing.h_addressable + stream->timing.h_border_left + stream->timing.h_border_right) / opp_cnt;
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
index 479fd3e89e5a..bd309dbdf7b2 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
@@ -334,7 +334,20 @@ static void update_dsc_on_stream(struct pipe_ctx *pipe_ctx, bool enable)
struct dsc_config dsc_cfg;
struct dsc_optc_config dsc_optc_cfg = {0};
enum optc_dsc_mode optc_dsc_mode;
+ struct dcn_dsc_state dsc_state = {0};
+ if (!dsc) {
+ DC_LOG_DSC("DSC is NULL for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+
+ if (dsc->funcs->dsc_read_state) {
+ dsc->funcs->dsc_read_state(dsc, &dsc_state);
+ if (!dsc_state.dsc_fw_en) {
+ DC_LOG_DSC("DSC has been disabled for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+ }
/* Enable DSC hw block */
dsc_cfg.pic_width = (stream->timing.h_addressable + stream->timing.h_border_left + stream->timing.h_border_right) / opp_cnt;
dsc_cfg.pic_height = stream->timing.v_addressable + stream->timing.v_border_top + stream->timing.v_border_bottom;
--
2.34.1
From: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
[WHY & HOW]
When underscan is set through xrandr, it causes the stream destination
rect to change in a way it becomes complicated to handle the calculations
for subvp. Since this is a corner case, disable subvp when underscan is
set.
Fix the existing check that is supposed to catch this corner case by
adding a check based on the parameters in the stream
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
.../drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
index b0d9aed0f265..8697eac1e1f7 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
@@ -858,7 +858,9 @@ static void populate_dml21_plane_config_from_plane_state(struct dml2_context *dm
plane->immediate_flip = plane_state->flip_immediate;
- plane->composition.rect_out_height_spans_vactive = plane_state->dst_rect.height >= stream->timing.v_addressable;
+ plane->composition.rect_out_height_spans_vactive =
+ plane_state->dst_rect.height >= stream->timing.v_addressable &&
+ stream->dst.height >= stream->timing.v_addressable;
}
//TODO : Could be possibly moved to a common helper layer.
--
2.34.1
From: Martin Tsai <martin.tsai(a)amd.com>
[WHY]
DSC on eDP could be enabled during VBIOS post. The enabled
DSC may not be disabled when enter to OS, once the system was
in second screen only mode before entering to S4. In this
case, OS will not send setTimings to reset eDP path again.
The enabled DSC HW will make a new stream without DSC cannot
output normally if it reused this pipe with enabled DSC.
[HOW]
In accelerated mode, to clean up DSC blocks if eDP is on link
but not active when we are not in fast boot and seamless boot.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu(a)amd.com>
Signed-off-by: Martin Tsai <martin.tsai(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
.../amd/display/dc/hwss/dce110/dce110_hwseq.c | 50 +++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index d52ce58c6a98..aa7479b31898 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -57,6 +57,7 @@
#include "panel_cntl.h"
#include "dc_state_priv.h"
#include "dpcd_defs.h"
+#include "dsc.h"
/* include DCE11 register header files */
#include "dce/dce_11_0_d.h"
#include "dce/dce_11_0_sh_mask.h"
@@ -1823,6 +1824,48 @@ static void get_edp_links_with_sink(
}
}
+static void clean_up_dsc_blocks(struct dc *dc)
+{
+ struct display_stream_compressor *dsc = NULL;
+ struct timing_generator *tg = NULL;
+ struct stream_encoder *se = NULL;
+ struct dccg *dccg = dc->res_pool->dccg;
+ struct pg_cntl *pg_cntl = dc->res_pool->pg_cntl;
+ int i;
+
+ if (dc->ctx->dce_version != DCN_VERSION_3_5 &&
+ dc->ctx->dce_version != DCN_VERSION_3_51)
+ return;
+
+ for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+ struct dcn_dsc_state s = {0};
+
+ dsc = dc->res_pool->dscs[i];
+ dsc->funcs->dsc_read_state(dsc, &s);
+ if (s.dsc_fw_en) {
+ /* disable DSC in OPTC */
+ if (i < dc->res_pool->timing_generator_count) {
+ tg = dc->res_pool->timing_generators[i];
+ tg->funcs->set_dsc_config(tg, OPTC_DSC_DISABLED, 0, 0);
+ }
+ /* disable DSC in stream encoder */
+ if (i < dc->res_pool->stream_enc_count) {
+ se = dc->res_pool->stream_enc[i];
+ se->funcs->dp_set_dsc_config(se, OPTC_DSC_DISABLED, 0, 0);
+ se->funcs->dp_set_dsc_pps_info_packet(se, false, NULL, true);
+ }
+ /* disable DSC block */
+ if (dccg->funcs->set_ref_dscclk)
+ dccg->funcs->set_ref_dscclk(dccg, dsc->inst);
+ dsc->funcs->dsc_disable(dsc);
+
+ /* power down DSC */
+ if (pg_cntl != NULL)
+ pg_cntl->funcs->dsc_pg_control(pg_cntl, dsc->inst, false);
+ }
+ }
+}
+
/*
* When ASIC goes from VBIOS/VGA mode to driver/accelerated mode we need:
* 1. Power down all DC HW blocks
@@ -1927,6 +1970,13 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context)
clk_mgr_exit_optimized_pwr_state(dc, dc->clk_mgr);
power_down_all_hw_blocks(dc);
+
+ /* DSC could be enabled on eDP during VBIOS post.
+ * To clean up dsc blocks if eDP is in link but not active.
+ */
+ if (edp_link_with_sink && (edp_stream_num == 0))
+ clean_up_dsc_blocks(dc);
+
disable_vga_and_power_gate_all_controllers(dc);
if (edp_link_with_sink && !keep_edp_vdd_on)
dc->hwss.edp_power_control(edp_link_with_sink, false);
--
2.34.1
There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay(a)intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost(a)intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay(a)intel.com>
---
drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index e64f4b645e2e..badfa045ead8 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
struct xe_drm_client *client;
struct drm_gem_object *obj;
struct xe_bo *bo;
+ LLIST_HEAD(deferred);
unsigned int id;
u32 mem_type;
@@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
list_for_each_entry(bo, &client->bos_list, client_link) {
if (!kref_get_unless_zero(&bo->ttm.base.refcount))
continue;
+
bo_meminfo(bo, stats);
- xe_bo_put(bo);
+ xe_bo_put_deferred(bo, &deferred);
}
spin_unlock(&client->bos_lock);
+ xe_bo_put_commit(&deferred);
+
for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
if (!xe_mem_type_to_name[mem_type])
continue;
--
2.46.0
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
---
arch/x86/coco/tdx/tdx.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2bac2553..c90d2fdb5fc4 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -405,6 +405,11 @@ static bool mmio_write(int size, unsigned long addr, unsigned long val)
EPT_WRITE, addr, val);
}
+static inline bool is_kernel_addr(unsigned long addr)
+{
+ return (long)addr < 0;
+}
+
static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
{
unsigned long *reg, val, vaddr;
@@ -434,6 +439,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!user_mode(regs) && !is_kernel_addr(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
--
2.46.0
Commit 413db06c05e7 ("phy: qcom-qmp-usb: clean up probe initialisation")
removed most users of the platform device driver data from the
qcom-qmp-usb driver, but mistakenly also removed the initialisation
despite the data still being used in the runtime PM callbacks. This bug
was later reproduced when the driver was copied to create the qmp-usbc
driver.
Restore the driver data initialisation at probe to avoid a NULL-pointer
dereference on runtime suspend.
Apparently no one uses runtime PM, which currently needs to be enabled
manually through sysfs, with these drivers.
Fixes: 19281571a4d5 ("phy: qcom: qmp-usb: split USB-C PHY driver")
Cc: stable(a)vger.kernel.org # 6.9
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-usbc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usbc.c b/drivers/phy/qualcomm/phy-qcom-qmp-usbc.c
index 5cbc5fd529eb..dea3456f88b1 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-usbc.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-usbc.c
@@ -1049,6 +1049,7 @@ static int qmp_usbc_probe(struct platform_device *pdev)
return -ENOMEM;
qmp->dev = dev;
+ dev_set_drvdata(dev, qmp);
qmp->orientation = TYPEC_ORIENTATION_NORMAL;
--
2.44.2
Commit 413db06c05e7 ("phy: qcom-qmp-usb: clean up probe initialisation")
removed most users of the platform device driver data from the
qcom-qmp-usb driver, but mistakenly also removed the initialisation
despite the data still being used in the runtime PM callbacks. This bug
was later reproduced when the driver was copied to create the
qmp-usb-legacy driver.
Restore the driver data initialisation at probe to avoid a NULL-pointer
dereference on runtime suspend.
Apparently no one uses runtime PM, which currently needs to be enabled
manually through sysfs, with these drivers.
Fixes: e464a3180a43 ("phy: qcom-qmp-usb: split off the legacy USB+dp_com support")
Cc: stable(a)vger.kernel.org # 6.6
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c b/drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c
index 6d0ba39c1943..8bf951b0490c 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c
@@ -1248,6 +1248,7 @@ static int qmp_usb_legacy_probe(struct platform_device *pdev)
return -ENOMEM;
qmp->dev = dev;
+ dev_set_drvdata(dev, qmp);
qmp->cfg = of_device_get_match_data(dev);
if (!qmp->cfg)
--
2.44.2
Commit 413db06c05e7 ("phy: qcom-qmp-usb: clean up probe initialisation")
removed most users of the platform device driver data, but mistakenly
also removed the initialisation despite the data still being used in the
runtime PM callbacks.
Restore the driver data initialisation at probe to avoid a NULL-pointer
dereference on runtime suspend.
Apparently no one uses runtime PM, which currently needs to be enabled
manually through sysfs, with this driver.
Fixes: 413db06c05e7 ("phy: qcom-qmp-usb: clean up probe initialisation")
Cc: stable(a)vger.kernel.org # 6.2
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
index 49f4a53f9b2c..76068393e4ba 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
@@ -2191,6 +2191,7 @@ static int qmp_usb_probe(struct platform_device *pdev)
return -ENOMEM;
qmp->dev = dev;
+ dev_set_drvdata(dev, qmp);
qmp->cfg = of_device_get_match_data(dev);
if (!qmp->cfg)
--
2.44.2
From: Kan Liang <kan.liang(a)linux.intel.com>
The BPF subsystem may capture LBR data on a counting event. However, the
current implementation assumes that LBR can/should only be used with
sampling events.
For instance, retsnoop tool ([0]) makes an extensive use of this
functionality and sets up perf event as follows:
struct perf_event_attr attr;
memset(&attr, 0, sizeof(attr));
attr.size = sizeof(attr);
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_CPU_CYCLES;
attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL;
To limit the LBR for a sampling event is to avoid unnecessary branch
stack setup for a counting event in the sample read. Because LBR is only
read in the sampling event's overflow.
Although in most cases LBR is used in sampling, there is no HW limit to
bind LBR to the sampling mode. Allow an LBR setup for a counting event
unless in the sample read mode.
Fixes: 85846b27072d ("perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag")
Reported-by: Andrii Nakryiko <andrii.nakryiko(a)gmail.com>
Closes: https://lore.kernel.org/lkml/20240905180055.1221620-1-andrii@kernel.org/
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 605ed19043ed..2b5ff112d8d1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3981,8 +3981,12 @@ static int intel_pmu_hw_config(struct perf_event *event)
x86_pmu.pebs_aliases(event);
}
- if (needs_branch_stack(event) && is_sampling_event(event))
- event->hw.flags |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
+ if (needs_branch_stack(event)) {
+ /* Avoid branch stack setup for counting events in SAMPLE READ */
+ if (is_sampling_event(event) ||
+ !(event->attr.sample_type & PERF_SAMPLE_READ))
+ event->hw.flags |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
+ }
if (branch_sample_counters(event)) {
struct perf_event *leader, *sibling;
--
2.38.1
There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay(a)intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index e64f4b645e2e..badfa045ead8 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
struct xe_drm_client *client;
struct drm_gem_object *obj;
struct xe_bo *bo;
+ LLIST_HEAD(deferred);
unsigned int id;
u32 mem_type;
@@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
list_for_each_entry(bo, &client->bos_list, client_link) {
if (!kref_get_unless_zero(&bo->ttm.base.refcount))
continue;
+
bo_meminfo(bo, stats);
- xe_bo_put(bo);
+ xe_bo_put_deferred(bo, &deferred);
}
spin_unlock(&client->bos_lock);
+ xe_bo_put_commit(&deferred);
+
for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
if (!xe_mem_type_to_name[mem_type])
continue;
--
2.46.0
This reverts commit ab8d66d132bc8f1992d3eb6cab8d32dda6733c84 because it
breaks codecs using non-continuous masks in source and sink ports. The
commit missed the point that port numbers are not used as indices for
iterating over prop.sink_ports or prop.source_ports.
Soundwire core and existing codecs expect that the array passed as
prop.sink_ports and prop.source_ports is continuous. The port mask still
might be non-continuous, but that's unrelated.
Reported-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Closes: https://lore.kernel.org/all/b6c75eee-761d-44c8-8413-2a5b34ee2f98@linux.inte…
Fixes: ab8d66d132bc ("soundwire: stream: fix programming slave ports for non-continous port maps")
Acked-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Reviewed-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Resending with Ack/Rb tags and missing Cc-stable.
---
drivers/soundwire/stream.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index f275143d7b18..7aa4900dcf31 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -1291,18 +1291,18 @@ struct sdw_dpn_prop *sdw_get_slave_dpn_prop(struct sdw_slave *slave,
unsigned int port_num)
{
struct sdw_dpn_prop *dpn_prop;
- unsigned long mask;
+ u8 num_ports;
int i;
if (direction == SDW_DATA_DIR_TX) {
- mask = slave->prop.source_ports;
+ num_ports = hweight32(slave->prop.source_ports);
dpn_prop = slave->prop.src_dpn_prop;
} else {
- mask = slave->prop.sink_ports;
+ num_ports = hweight32(slave->prop.sink_ports);
dpn_prop = slave->prop.sink_dpn_prop;
}
- for_each_set_bit(i, &mask, 32) {
+ for (i = 0; i < num_ports; i++) {
if (dpn_prop[i].num == port_num)
return &dpn_prop[i];
}
--
2.43.0
Hi,
the issue was so far handled in https://lore.kernel.org/regressions/ZsyMzW-4ee_U8NoX@eldamar.lan/T/#m390d6e… and https://bugzilla.kernel.org/show_bug.cgi?id=219129
I haven’t seen any communication whether a backport for 5.15 is already in progress, so I thought I’d follow up here.
Today we rolled out 5.15.165 and immediately ran into the same issue. There’s at least one other person asking for a 5.15 backport in Bugzilla.
Hugs,
Christian
--
Christian Theune - A97C62CE - 0179 7808366
@theuni - christian(a)theune.cc
From: Peter Wang <peter.wang(a)mediatek.com>
ufshcd_abort_all forcibly aborts all on-going commands and the host
controller will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.
MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.
SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.
For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 35 ++++++++++++++++++++---------------
include/ufs/ufshcd.h | 3 +++
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..8380e8861d83 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
lrbp->req_abort_skip = false;
+ lrbp->abort_initiated_by_eh = false;
ufshcd_comp_scsi_upiu(hba, lrbp);
@@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
+ if (lrbp->abort_initiated_by_eh)
+ result |= DID_REQUEUE << 16;
+ else
+ result |= DID_ABORT << 16;
break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
@@ -7561,6 +7551,21 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}
+ /*
+ * When the host software receives a "FUNCTION COMPLETE", set flag
+ * to requeue command after receive response with OCS_ABORTED
+ * SDB mode: UTRLCLR Task Management response which means a Transfer
+ * Request was aborted.
+ * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+ *
+ * This flag is set because error handler ufshcd_abort_all forcibly
+ * aborts all commands, and the host controller will automatically
+ * fill in the OCS field of the corresponding response with OCS_ABORTED.
+ * Therefore, upon receiving this response, it needs to be requeued.
+ */
+ if (!err)
+ lrbp->abort_initiated_by_eh = true;
+
err = ufshcd_clear_cmd(hba, tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..07b5e60e6c44 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
* @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
* @data_unit_num: the data unit number for the first block for inline crypto
* @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_eh: The flag is specifically used to handle
+ * ufshcd_err_handler forcibly aborts.
*/
struct ufshcd_lrb {
struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@ struct ufshcd_lrb {
#endif
bool req_abort_skip;
+ bool abort_initiated_by_eh;
};
/**
--
2.45.2
From: Jason Andryuk <jason.andryuk(a)amd.com>
Probing xen-fbfront faults in video_is_primary_device(). The passed-in
struct device is NULL since xen-fbfront doesn't assign it and the
memory is kzalloc()-ed. Assign fb_info->device to avoid this.
This was exposed by the conversion of fb_is_primary_device() to
video_is_primary_device() which dropped a NULL check for struct device.
Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
Reported-by: Arthur Borsboom <arthurborsboom(a)gmail.com>
Closes: https://lore.kernel.org/xen-devel/CALUcmUncX=LkXWeiSiTKsDY-cOe8QksWhFvcCneO…
Tested-by: Arthur Borsboom <arthurborsboom(a)gmail.com>
CC: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
The other option would be to re-instate the NULL check in
video_is_primary_device()
---
drivers/video/fbdev/xen-fbfront.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c
index 66d4628a96ae..c90f48ebb15e 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -407,6 +407,7 @@ static int xenfb_probe(struct xenbus_device *dev,
/* complete the abuse: */
fb_info->pseudo_palette = fb_info->par;
fb_info->par = info;
+ fb_info->device = &dev->dev;
fb_info->screen_buffer = info->fb;
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Cc: <stable(a)vger.kernel.net>
---
Verified that the syzbot repro is still caught.
An equivalent alternative would be to move the check for csum_offset
to where the csum_start check is in segmentation:
- if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb) ||
+ skb->csum_offset != offsetof(struct tcphdr, check)))
Cleaner, but messier stable backport.
We'll need an equivalent patch to this for VIRTIO_NET_HDR_GSO_UDP_L4.
But that csum_offset test was in a different commit, so different
Fixes tag.
---
include/linux/virtio_net.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 6c395a2600e8d..276ca543ef44d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -173,7 +173,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
break;
case SKB_GSO_TCPV4:
case SKB_GSO_TCPV6:
- if (skb->csum_offset != offsetof(struct tcphdr, check))
+ if (skb->ip_summed == CHECKSUM_PARTIAL &&
+ skb->csum_offset != offsetof(struct tcphdr, check))
return -EINVAL;
break;
}
--
2.46.0.598.g6f2099f65c-goog
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
virtblk_done()
blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queues()
/* Add IO request to dispatch list */ 1) store blk_mq_start_stopped_hw_queue()
clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue() blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) if (!blk_mq_hctx_has_pending()) 4) load
return return
blk_mq_sched_dispatch_requests() blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load if (blk_mq_hctx_stopped())
return return
__blk_mq_sched_dispatch_requests() __blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work()
/* Add IO request to dispatch list */ 1) store virtblk_done()
blk_mq_run_hw_queues()/blk_mq_delay_run_hw_queues() blk_mq_start_stopped_hw_queues()
if (blk_mq_hctx_stopped()) 2) load blk_mq_start_stopped_hw_queue()
continue clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()/blk_mq_delay_run_hw_queue() blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
Both scenarios are similar, the full memory barrier should be inserted between
1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees
BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU
will not rerun the hardware queue causing starvation of the request.
The easy way to fix it is to add the essential full memory barrier into helper
of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue
is not stopped most of the time), we only insert the barrier into the slow path.
Actually, only slow path needs to care about missing of dispatching the request
to the low-level device driver.
Fixes: 320ae51feed5c ("blk-mq: new multi-queue block IO queueing mechanism")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
---
block/blk-mq.c | 6 ++++++
block/blk-mq.h | 13 +++++++++++++
2 files changed, 19 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ac39f2a346a52..48a6a437fba5e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2413,6 +2413,12 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
return;
clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+ /*
+ * Pairs with the smp_mb() in blk_mq_hctx_stopped() to order the
+ * clearing of BLK_MQ_S_STOPPED above and the checking of dispatch
+ * list in the subsequent routine.
+ */
+ smp_mb__after_atomic();
blk_mq_run_hw_queue(hctx, async);
}
EXPORT_SYMBOL_GPL(blk_mq_start_stopped_hw_queue);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 260beea8e332c..f36f3bff70d86 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -228,6 +228,19 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
{
+ /* Fast path: hardware queue is not stopped most of the time. */
+ if (likely(!test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+ return false;
+
+ /*
+ * This barrier is used to order adding of dispatch list before and
+ * the test of BLK_MQ_S_STOPPED below. Pairs with the memory barrier
+ * in blk_mq_start_stopped_hw_queue() so that dispatch code could
+ * either see BLK_MQ_S_STOPPED is cleared or dispatch list is not
+ * empty to avoid missing dispatching requests.
+ */
+ smp_mb();
+
return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
}
--
2.20.1
Supposing the following scenario with a virtio_blk driver.
CPU0 CPU1 CPU2
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
blk_mq_try_issue_directly() virtblk_done()
if (blk_mq_hctx_stopped())
blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queue()
blk_mq_run_hw_queue() blk_mq_run_hw_queue()
blk_mq_insert_request()
return // Who is responsible for dispatching this IO request?
After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped.
But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt
of completion of request, so it will run the hardware queue and marks the queue
as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1
and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same
hardware queue and returns. It misses dispatching a request. Fix it by running
the hardware queue explicitly. And blk_mq_request_issue_directly() should handle
a similar situation. Fix it as well.
Fixes: d964f04a8fde8 ("blk-mq: fix direct issue")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b553..b2d0f22de0c7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2619,6 +2619,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return;
}
@@ -2649,6 +2650,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return BLK_STS_OK;
}
--
2.20.1
When we share memory through FF-A and the description of the buffers
exceeds the size of the mapped buffer, the fragmentation API is used.
The fragmentation API allows specifying chunks of descriptors in subsequent
FF-A fragment calls and no upper limit has been established for this.
The entire memory region transferred is identified by a handle which can be
used to reclaim the transferred memory.
To be able to reclaim the memory, the description of the buffers has to fit
in the ffa_desc_buf.
Add a bounds check on the FF-A sharing path to prevent the memory reclaim
from failing.
Also do_ffa_mem_xfer() does not need __always_inline
Fixes: 634d90cf0ac65 ("KVM: arm64: Handle FFA_MEM_LEND calls from the host")
Cc: stable(a)vger.kernel.org
Reviewed-by: Sebastian Ene <sebastianene(a)google.com>
Signed-off-by: Snehal Koukuntla <snehalreddy(a)google.com>
---
arch/arm64/kvm/hyp/nvhe/ffa.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
index e715c157c2c4..637425f63fd1 100644
--- a/arch/arm64/kvm/hyp/nvhe/ffa.c
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
@@ -426,7 +426,7 @@ static void do_ffa_mem_frag_tx(struct arm_smccc_res *res,
return;
}
-static __always_inline void do_ffa_mem_xfer(const u64 func_id,
+static void do_ffa_mem_xfer(const u64 func_id,
struct arm_smccc_res *res,
struct kvm_cpu_context *ctxt)
{
@@ -461,6 +461,11 @@ static __always_inline void do_ffa_mem_xfer(const u64 func_id,
goto out_unlock;
}
+ if (len > ffa_desc_buf.len) {
+ ret = FFA_RET_NO_MEMORY;
+ goto out_unlock;
+ }
+
buf = hyp_buffers.tx;
memcpy(buf, host_buffers.tx, fraglen);
--
2.46.0.598.g6f2099f65c-goog
Add explicit casting to prevent expantion of 32th bit of
u32 into highest half of u64.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
From: Andrea Parri <parri.andrea(a)gmail.com>
[ Upstream commit d6cfd1770f20392d7009ae1fdb04733794514fa9 ]
The membarrier system call requires a full memory barrier after storing
to rq->curr, before going back to user-space. The barrier is only
needed when switching between processes: the barrier is implied by
mmdrop() when switching from kernel to userspace, and it's not needed
when switching from userspace to kernel.
Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the
primitive membarrier_arch_switch_mm(), already adopted by the PowerPC
architecture, to insert the required barrier.
Fixes: fab957c11efe2f ("RISC-V: Atomic and Locking Code")
Signed-off-by: Andrea Parri <parri.andrea(a)gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/r/20240131144936.29190-2-parri.andrea@gmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
MAINTAINERS | 2 +-
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/membarrier.h | 31 +++++++++++++++++++++++++++++
arch/riscv/mm/context.c | 2 ++
kernel/sched/core.c | 5 +++--
5 files changed, 38 insertions(+), 3 deletions(-)
create mode 100644 arch/riscv/include/asm/membarrier.h
diff --git a/MAINTAINERS b/MAINTAINERS
index f09415b2b3c5..ae4c0cec5073 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13702,7 +13702,7 @@ M: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
M: "Paul E. McKenney" <paulmck(a)kernel.org>
L: linux-kernel(a)vger.kernel.org
S: Supported
-F: arch/powerpc/include/asm/membarrier.h
+F: arch/*/include/asm/membarrier.h
F: include/uapi/linux/membarrier.h
F: kernel/sched/membarrier.c
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c785a0200573..1df4afccc381 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MMIOWB
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h
new file mode 100644
index 000000000000..6c016ebb5020
--- /dev/null
+++ b/arch/riscv/include/asm/membarrier.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_MEMBARRIER_H
+#define _ASM_RISCV_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+ struct mm_struct *next,
+ struct task_struct *tsk)
+{
+ /*
+ * Only need the full barrier when switching between processes.
+ * Barrier when switching from kernel to userspace is not
+ * required here, given that it is implied by mmdrop(). Barrier
+ * when switching from userspace to kernel is not needed after
+ * store to rq->curr.
+ */
+ if (IS_ENABLED(CONFIG_SMP) &&
+ likely(!(atomic_read(&next->membarrier_state) &
+ (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+ MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
+ return;
+
+ /*
+ * The membarrier system call requires a full memory barrier
+ * after storing to rq->curr, before going back to user-space.
+ * Matches a full barrier in the proximity of the membarrier
+ * system call entry.
+ */
+ smp_mb();
+}
+
+#endif /* _ASM_RISCV_MEMBARRIER_H */
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 217fd4de6134..ba8eb3944687 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -323,6 +323,8 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
if (unlikely(prev == next))
return;
+ membarrier_arch_switch_mm(prev, next, task);
+
/*
* Mark the current MM context as inactive, and the next as
* active. This is at least used by the icache flushing
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 97571d390f18..9b406d988654 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6679,8 +6679,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
*
* Here are the schemes providing that barrier on the
* various architectures:
- * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
- * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+ * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
+ * RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
+ * on PowerPC and on RISC-V.
* - finish_lock_switch() for weakly-ordered
* architectures where spin_unlock is a full barrier,
* - switch_to() for arm64 (weakly-ordered, spin_unlock
--
2.43.4
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
[ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ]
When changing the interface from CAN-CC to CAN-FD mode the old
coalescing parameters are re-used. This might cause problem, as the
configured parameters are too big for CAN-FD mode.
During testing an invalid TX coalescing configuration has been seen.
The problem should be been fixed in the previous patch, but add a
safeguard here to ensure that the number of TEF coalescing buffers (if
configured) is exactly the half of all TEF buffers.
Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0c…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
index 4d0246a0779a..f677a4cd00f3 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
@@ -279,7 +279,7 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
const struct mcp251xfd_rx_ring *rx_ring;
u16 base = 0, ram_used;
u8 fifo_nr = 1;
- int i;
+ int err = 0, i;
netdev_reset_queue(priv->ndev);
@@ -375,10 +375,18 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
netdev_err(priv->ndev,
"Error during ring configuration, using more RAM (%u bytes) than available (%u bytes).\n",
ram_used, MCP251XFD_RAM_SIZE);
- return -ENOMEM;
+ err = -ENOMEM;
}
- return 0;
+ if (priv->tx_obj_num_coalesce_irq &&
+ priv->tx_obj_num_coalesce_irq * 2 != priv->tx->obj_num) {
+ netdev_err(priv->ndev,
+ "Error during ring configuration, number of TEF coalescing buffers (%u) must be half of TEF buffers (%u).\n",
+ priv->tx_obj_num_coalesce_irq, priv->tx->obj_num);
+ err = -EINVAL;
+ }
+
+ return err;
}
void mcp251xfd_ring_free(struct mcp251xfd_priv *priv)
--
2.43.0
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
[ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ]
When changing the interface from CAN-CC to CAN-FD mode the old
coalescing parameters are re-used. This might cause problem, as the
configured parameters are too big for CAN-FD mode.
During testing an invalid TX coalescing configuration has been seen.
The problem should be been fixed in the previous patch, but add a
safeguard here to ensure that the number of TEF coalescing buffers (if
configured) is exactly the half of all TEF buffers.
Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0c…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
index 4cb79a4f2461..489d1439563a 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
@@ -289,7 +289,7 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
const struct mcp251xfd_rx_ring *rx_ring;
u16 base = 0, ram_used;
u8 fifo_nr = 1;
- int i;
+ int err = 0, i;
netdev_reset_queue(priv->ndev);
@@ -385,10 +385,18 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
netdev_err(priv->ndev,
"Error during ring configuration, using more RAM (%u bytes) than available (%u bytes).\n",
ram_used, MCP251XFD_RAM_SIZE);
- return -ENOMEM;
+ err = -ENOMEM;
}
- return 0;
+ if (priv->tx_obj_num_coalesce_irq &&
+ priv->tx_obj_num_coalesce_irq * 2 != priv->tx->obj_num) {
+ netdev_err(priv->ndev,
+ "Error during ring configuration, number of TEF coalescing buffers (%u) must be half of TEF buffers (%u).\n",
+ priv->tx_obj_num_coalesce_irq, priv->tx->obj_num);
+ err = -EINVAL;
+ }
+
+ return err;
}
void mcp251xfd_ring_free(struct mcp251xfd_priv *priv)
--
2.43.0
From: Larysa Zaremba <larysa.zaremba(a)intel.com>
[ Upstream commit f50c68763436bc8f805712a7c5ceaf58cfcf5f07 ]
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
VSI without applying new ring configuration. When unconfiguring the VSI, we
can encounter the state in which there is an XDP program but no XDP rings
to destroy or there will be XDP rings that need to be destroyed, but no XDP
program to indicate their presence.
When unconfiguring, rely on the presence of XDP rings rather then XDP
program, as they better represent the current state that has to be
destroyed.
Reviewed-by: Wojciech Drewek <wojciech.drewek(a)intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout(a)intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski(a)intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 7629b0190578..73d03535561a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2426,7 +2426,7 @@ void ice_vsi_decfg(struct ice_vsi *vsi)
dev_err(ice_pf_to_dev(pf), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
vsi->vsi_num, err);
- if (ice_is_xdp_ena_vsi(vsi))
+ if (vsi->xdp_rings)
/* return value check can be skipped here, it always returns
* 0 if reset is in progress
*/
@@ -2528,7 +2528,7 @@ static void ice_vsi_release_msix(struct ice_vsi *vsi)
for (q = 0; q < q_vector->num_ring_tx; q++) {
ice_write_itr(&q_vector->tx, 0);
wr32(hw, QINT_TQCTL(vsi->txq_map[txq]), 0);
- if (ice_is_xdp_ena_vsi(vsi)) {
+ if (vsi->xdp_rings) {
u32 xdp_txq = txq + vsi->num_xdp_txq;
wr32(hw, QINT_TQCTL(vsi->txq_map[xdp_txq]), 0);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f16d13e9ff6e..448b854d1128 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7222,7 +7222,7 @@ int ice_down(struct ice_vsi *vsi)
if (tx_err)
netdev_err(vsi->netdev, "Failed stop Tx rings, VSI %d error %d\n",
vsi->vsi_num, tx_err);
- if (!tx_err && ice_is_xdp_ena_vsi(vsi)) {
+ if (!tx_err && vsi->xdp_rings) {
tx_err = ice_vsi_stop_xdp_tx_rings(vsi);
if (tx_err)
netdev_err(vsi->netdev, "Failed stop XDP rings, VSI %d error %d\n",
@@ -7239,7 +7239,7 @@ int ice_down(struct ice_vsi *vsi)
ice_for_each_txq(vsi, i)
ice_clean_tx_ring(vsi->tx_rings[i]);
- if (ice_is_xdp_ena_vsi(vsi))
+ if (vsi->xdp_rings)
ice_for_each_xdp_txq(vsi, i)
ice_clean_tx_ring(vsi->xdp_rings[i]);
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 240a7bec242b..c2aa6f589937 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
- if (ice_is_xdp_ena_vsi(vsi))
+ if (vsi->xdp_rings)
memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
}
@@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
{
ice_clean_tx_ring(vsi->tx_rings[q_idx]);
- if (ice_is_xdp_ena_vsi(vsi))
+ if (vsi->xdp_rings)
ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
ice_clean_rx_ring(vsi->rx_rings[q_idx]);
}
@@ -194,7 +194,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
if (!fail)
fail = err;
- if (ice_is_xdp_ena_vsi(vsi)) {
+ if (vsi->xdp_rings) {
struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
memset(&txq_meta, 0, sizeof(txq_meta));
--
2.43.0
From: Paulo Alcantara <pc(a)manguebit.com>
[ Upstream commit 7ccc1465465d78e6411b7bd730d06e7435802b5c ]
Call cifs_reconnect() to wake up processes waiting on negotiate
protocol to handle the case where server abruptly shut down and had no
chance to properly close the socket.
Simple reproducer:
ssh 192.168.2.100 pkill -STOP smbd
mount.cifs //192.168.2.100/test /mnt -o ... [never returns]
Cc: Rickard Andersson <rickaran(a)axis.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/smb/client/connect.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index d2307162a2de..e325e06357ff 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -656,6 +656,19 @@ allocate_buffers(struct TCP_Server_Info *server)
static bool
server_unresponsive(struct TCP_Server_Info *server)
{
+ /*
+ * If we're in the process of mounting a share or reconnecting a session
+ * and the server abruptly shut down (e.g. socket wasn't closed, packet
+ * had been ACK'ed but no SMB response), don't wait longer than 20s to
+ * negotiate protocol.
+ */
+ spin_lock(&server->srv_lock);
+ if (server->tcpStatus == CifsInNegotiate &&
+ time_after(jiffies, server->lstrp + 20 * HZ)) {
+ spin_unlock(&server->srv_lock);
+ cifs_reconnect(server, false);
+ return true;
+ }
/*
* We need to wait 3 echo intervals to make sure we handle such
* situations right:
@@ -667,7 +680,6 @@ server_unresponsive(struct TCP_Server_Info *server)
* 65s kernel_recvmsg times out, and we see that we haven't gotten
* a response in >60s.
*/
- spin_lock(&server->srv_lock);
if ((server->tcpStatus == CifsGood ||
server->tcpStatus == CifsNeedNegotiate) &&
(!server->ops->can_echo || server->ops->can_echo(server)) &&
--
2.43.0
From: Baochen Qiang <quic_bqiang(a)quicinc.com>
[ Upstream commit d3e154d7776ba57ab679fb816fb87b627fba21c9 ]
This reverts commit 7f0343b7b8710436c1e6355c71782d32ada47e0c.
We are going to revert commit 166a490f59ac ("wifi: ath11k: support hibernation"), on
which this commit depends. With that commit reverted, this one is not needed any
more, so revert this commit first.
Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson(a)quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://patch.msgid.link/20240830073420.5790-2-quic_bqiang@quicinc.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath11k/core.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 47554c361963..a14d0c65000a 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -1009,16 +1009,6 @@ int ath11k_core_resume(struct ath11k_base *ab)
return -ETIMEDOUT;
}
- if (ab->hw_params.current_cc_support &&
- ar->alpha2[0] != 0 && ar->alpha2[1] != 0) {
- ret = ath11k_reg_set_cc(ar);
- if (ret) {
- ath11k_warn(ab, "failed to set country code during resume: %d\n",
- ret);
- return ret;
- }
- }
-
ret = ath11k_dp_rx_pktlog_start(ab);
if (ret)
ath11k_warn(ab, "failed to start rx pktlog during resume: %d\n",
--
2.43.0
From: "hongchi.peng" <hongchi.peng(a)siengine.com>
[ Upstream commit 258905cb9a6414be5c9ca4aa20ef855f8dc894d4 ]
We use komeda_crtc_normalize_zpos to normalize zpos of affected planes
to their blending zorder in CU. If there's only one slave plane in
affected planes and its layer_split property is enabled, order++ for
its split layer, so that when calculating the normalized_zpos
of master planes, the split layer of the slave plane is included, but
the max_slave_zorder does not include the split layer and keep zero
because there's only one slave plane in affacted planes, although we
actually use two slave layers in this commit.
In most cases, this bug does not result in a commit failure, but assume
the following situation:
slave_layer 0: zpos = 0, layer split enabled, normalized_zpos =
0;(use slave_layer 2 as its split layer)
master_layer 0: zpos = 2, layer_split enabled, normalized_zpos =
2;(use master_layer 2 as its split layer)
master_layer 1: zpos = 4, normalized_zpos = 4;
master_layer 3: zpos = 5, normalized_zpos = 5;
kcrtc_st->max_slave_zorder = 0;
When we use master_layer 3 as a input of CU in function
komeda_compiz_set_input and check it with function
komeda_component_check_input, the parameter idx is equal to
normailzed_zpos minus max_slave_zorder, the value of idx is 5
and is euqal to CU's max_active_inputs, so that
komeda_component_check_input returns a -EINVAL value.
To fix the bug described above, when calculating the max_slave_zorder
with the layer_split enabled, count the split layer in this calculation
directly.
Signed-off-by: hongchi.peng <hongchi.peng(a)siengine.com>
Acked-by: Liviu Dudau <liviu.dudau(a)arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau(a)arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826024517.3739-1-hongchi…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/arm/display/komeda/komeda_kms.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c
index fe46b0ebefea..e5eb5d672bcd 100644
--- a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c
+++ b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c
@@ -160,6 +160,7 @@ static int komeda_crtc_normalize_zpos(struct drm_crtc *crtc,
struct drm_plane *plane;
struct list_head zorder_list;
int order = 0, err;
+ u32 slave_zpos = 0;
DRM_DEBUG_ATOMIC("[CRTC:%d:%s] calculating normalized zpos values\n",
crtc->base.id, crtc->name);
@@ -199,10 +200,13 @@ static int komeda_crtc_normalize_zpos(struct drm_crtc *crtc,
plane_st->zpos, plane_st->normalized_zpos);
/* calculate max slave zorder */
- if (has_bit(drm_plane_index(plane), kcrtc->slave_planes))
+ if (has_bit(drm_plane_index(plane), kcrtc->slave_planes)) {
+ slave_zpos = plane_st->normalized_zpos;
+ if (to_kplane_st(plane_st)->layer_split)
+ slave_zpos++;
kcrtc_st->max_slave_zorder =
- max(plane_st->normalized_zpos,
- kcrtc_st->max_slave_zorder);
+ max(slave_zpos, kcrtc_st->max_slave_zorder);
+ }
}
crtc_st->zpos_changed = true;
--
2.43.0
From: Hans de Goede <hdegoede(a)redhat.com>
[ Upstream commit 839a4ec06f75cec8fec2cc5fc14e921d0c3f7369 ]
There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it
turns out that the 2G version has a DMI product name of
"CHERRYVIEW D1 PLATFORM" where as the 4G version has
"CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are
unique enough that the product-name check is not necessary.
Drop the product-name check so that the existing DMI match for the 4G
RAM version also matches the 2G RAM version.
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Link: https://patch.msgid.link/20240823074305.16873-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/intel/common/soc-acpi-intel-cht-match.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/sound/soc/intel/common/soc-acpi-intel-cht-match.c b/sound/soc/intel/common/soc-acpi-intel-cht-match.c
index 5e2ec60e2954..e4c3492a0c28 100644
--- a/sound/soc/intel/common/soc-acpi-intel-cht-match.c
+++ b/sound/soc/intel/common/soc-acpi-intel-cht-match.c
@@ -84,7 +84,6 @@ static const struct dmi_system_id lenovo_yoga_tab3_x90[] = {
/* Lenovo Yoga Tab 3 Pro YT3-X90, codec missing from DSDT */
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Intel Corporation"),
- DMI_MATCH(DMI_PRODUCT_NAME, "CHERRYVIEW D1 PLATFORM"),
DMI_MATCH(DMI_PRODUCT_VERSION, "Blade3-10A-001"),
},
},
--
2.43.0
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
[ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ]
When changing the interface from CAN-CC to CAN-FD mode the old
coalescing parameters are re-used. This might cause problem, as the
configured parameters are too big for CAN-FD mode.
During testing an invalid TX coalescing configuration has been seen.
The problem should be been fixed in the previous patch, but add a
safeguard here to ensure that the number of TEF coalescing buffers (if
configured) is exactly the half of all TEF buffers.
Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0c…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
index 4cb79a4f2461..489d1439563a 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
@@ -289,7 +289,7 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
const struct mcp251xfd_rx_ring *rx_ring;
u16 base = 0, ram_used;
u8 fifo_nr = 1;
- int i;
+ int err = 0, i;
netdev_reset_queue(priv->ndev);
@@ -385,10 +385,18 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv)
netdev_err(priv->ndev,
"Error during ring configuration, using more RAM (%u bytes) than available (%u bytes).\n",
ram_used, MCP251XFD_RAM_SIZE);
- return -ENOMEM;
+ err = -ENOMEM;
}
- return 0;
+ if (priv->tx_obj_num_coalesce_irq &&
+ priv->tx_obj_num_coalesce_irq * 2 != priv->tx->obj_num) {
+ netdev_err(priv->ndev,
+ "Error during ring configuration, number of TEF coalescing buffers (%u) must be half of TEF buffers (%u).\n",
+ priv->tx_obj_num_coalesce_irq, priv->tx->obj_num);
+ err = -EINVAL;
+ }
+
+ return err;
}
void mcp251xfd_ring_free(struct mcp251xfd_priv *priv)
--
2.43.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a5a3c952e82c1ada12bf8c55b73af26f1a454bd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090831-camera-backlands-a643@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a5a3c952e82c ("rust: macros: provide correct provenance when constructing THIS_MODULE")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a3c952e82c1ada12bf8c55b73af26f1a454bd2 Mon Sep 17 00:00:00 2001
From: Boqun Feng <boqun.feng(a)gmail.com>
Date: Wed, 28 Aug 2024 11:01:29 -0700
Subject: [PATCH] rust: macros: provide correct provenance when constructing
THIS_MODULE
Currently while defining `THIS_MODULE` symbol in `module!()`, the
pointer used to construct `ThisModule` is derived from an immutable
reference of `__this_module`, which means the pointer doesn't have
the provenance for writing, and that means any write to that pointer
is UB regardless of data races or not. However, the usage of
`THIS_MODULE` includes passing this pointer to functions that may write
to it (probably in unsafe code), and this will create soundness issues.
One way to fix this is using `addr_of_mut!()` but that requires the
unstable feature "const_mut_refs". So instead of `addr_of_mut()!`,
an extern static `Opaque` is used here: since `Opaque<T>` is transparent
to `T`, an extern static `Opaque` will just wrap the C symbol (defined
in a C compile unit) in an `Opaque`, which provides a pointer with
writable provenance via `Opaque::get()`. This fix the potential UBs
because of pointer provenance unmatched.
Reported-by: Alice Ryhl <aliceryhl(a)google.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Reviewed-by: Trevor Gross <tmgross(a)umich.edu>
Reviewed-by: Benno Lossin <benno.lossin(a)proton.me>
Reviewed-by: Gary Guo <gary(a)garyguo.net>
Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/x/topic/x/near/465412664
Fixes: 1fbde52bde73 ("rust: add `macros` crate")
Cc: stable(a)vger.kernel.org # 6.6.x: be2ca1e03965: ("rust: types: Make Opaque::get const")
Link: https://lore.kernel.org/r/20240828180129.4046355-1-boqun.feng@gmail.com
[ Fixed two typos, reworded title. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 411dc103d82e..7a5b899e47b7 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -217,7 +217,11 @@ pub(crate) fn module(ts: TokenStream) -> TokenStream {
// freed until the module is unloaded.
#[cfg(MODULE)]
static THIS_MODULE: kernel::ThisModule = unsafe {{
- kernel::ThisModule::from_ptr(&kernel::bindings::__this_module as *const _ as *mut _)
+ extern \"C\" {{
+ static __this_module: kernel::types::Opaque<kernel::bindings::module>;
+ }}
+
+ kernel::ThisModule::from_ptr(__this_module.get())
}};
#[cfg(not(MODULE))]
static THIS_MODULE: kernel::ThisModule = unsafe {{
From: Hans de Goede <hdegoede(a)redhat.com>
commit 815d5121927093017947fd76e627da03f0f70be7 upstream.
There is only one "goto done;" in set_device_flags() and this happens
*before* hci_dev_lock() is called, move the done label to after the
hci_dev_unlock() to fix the following unlock balance:
[ 31.493567] =====================================
[ 31.493571] WARNING: bad unlock balance detected!
[ 31.493576] 5.17.0-rc2+ #13 Tainted: G C E
[ 31.493581] -------------------------------------
[ 31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at:
[ 31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth]
[ 31.493684] but there are no more locks to release!
Note this bug has been around for a couple of years, but before
commit fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags")
supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so
the check for unsupported flags which does the "goto done;" never
triggered.
Fixes: fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Cc: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Marcel Holtmann <marcel(a)holtmann.org>
[fp: the check for unsupported flags can actually be triggered before
commit fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params
flags") by "Set Device Flags - Invalid Parameter 2" Bluez test as
current_flags value comes from userspace]
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
net/bluetooth/mgmt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 0078e33e12ba..5d2289cf2bb1 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -4138,9 +4138,9 @@ static int set_device_flags(struct sock *sk, struct hci_dev *hdev, void *data,
}
}
-done:
hci_dev_unlock(hdev);
+done:
if (status == MGMT_STATUS_SUCCESS)
device_flags_changed(sk, hdev, &cp->addr.bdaddr, cp->addr.type,
supported_flags, current_flags);
--
2.39.2
On Mon, 9 Sep 2024 08:48:14 -0400
Sasha Levin <sashal(a)kernel.org> wrote:
> This is a note to let you know that I've just added the patch titled
>
> rust: macros: provide correct provenance when constructing THIS_MODULE
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> rust-macros-provide-correct-provenance-when-construc.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
Hi Sasha,
The `Opaque` type doesn't exist yet in 6.1, so this patch should not be
applied to it.
Best,
Gary
>
> commit 5b320b29ddf985d8de92c3afa9aebe13ecd5cfad
> Author: Boqun Feng <boqun.feng(a)gmail.com>
> Date: Wed Aug 28 11:01:29 2024 -0700
>
> rust: macros: provide correct provenance when constructing THIS_MODULE
>
> [ Upstream commit a5a3c952e82c1ada12bf8c55b73af26f1a454bd2 ]
>
> Currently while defining `THIS_MODULE` symbol in `module!()`, the
> pointer used to construct `ThisModule` is derived from an immutable
> reference of `__this_module`, which means the pointer doesn't have
> the provenance for writing, and that means any write to that pointer
> is UB regardless of data races or not. However, the usage of
> `THIS_MODULE` includes passing this pointer to functions that may write
> to it (probably in unsafe code), and this will create soundness issues.
>
> One way to fix this is using `addr_of_mut!()` but that requires the
> unstable feature "const_mut_refs". So instead of `addr_of_mut()!`,
> an extern static `Opaque` is used here: since `Opaque<T>` is transparent
> to `T`, an extern static `Opaque` will just wrap the C symbol (defined
> in a C compile unit) in an `Opaque`, which provides a pointer with
> writable provenance via `Opaque::get()`. This fix the potential UBs
> because of pointer provenance unmatched.
>
> Reported-by: Alice Ryhl <aliceryhl(a)google.com>
> Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
> Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
> Reviewed-by: Trevor Gross <tmgross(a)umich.edu>
> Reviewed-by: Benno Lossin <benno.lossin(a)proton.me>
> Reviewed-by: Gary Guo <gary(a)garyguo.net>
> Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/x/topic/x/near/465412664
> Fixes: 1fbde52bde73 ("rust: add `macros` crate")
> Cc: stable(a)vger.kernel.org # 6.6.x: be2ca1e03965: ("rust: types: Make Opaque::get const")
> Link: https://lore.kernel.org/r/20240828180129.4046355-1-boqun.feng@gmail.com
> [ Fixed two typos, reworded title. - Miguel ]
> Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/rust/macros/module.rs b/rust/macros/module.rs
> index 031028b3dc41..071b96639a2e 100644
> --- a/rust/macros/module.rs
> +++ b/rust/macros/module.rs
> @@ -183,7 +183,11 @@ pub(crate) fn module(ts: TokenStream) -> TokenStream {
> // freed until the module is unloaded.
> #[cfg(MODULE)]
> static THIS_MODULE: kernel::ThisModule = unsafe {{
> - kernel::ThisModule::from_ptr(&kernel::bindings::__this_module as *const _ as *mut _)
> + extern \"C\" {{
> + static __this_module: kernel::types::Opaque<kernel::bindings::module>;
> + }}
> +
> + kernel::ThisModule::from_ptr(__this_module.get())
> }};
> #[cfg(not(MODULE))]
> static THIS_MODULE: kernel::ThisModule = unsafe {{
init() was removed from ramgp102 when reworking the memory detection, as
it was thought that the code was only necessary when the driver performs
mclk changes, which nouveau doesn't support on pascal.
However, it turns out that we still need to execute this on some GPUs to
restore settings after DEVINIT, so revert to the original behaviour.
v2: fix tags in commit message, cc stable
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/319
Fixes: 2c0c15a22fa0 ("drm/nouveau/fb/gp102-ga100: switch to simpler vram size detection method")
Cc: <stable(a)vger.kernel.org> # 6.6+
Signed-off-by: Ben Skeggs <bskeggs(a)nvidia.com>
---
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.h | 2 ++
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp102.c | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.h b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.h
index 50f0c1914f58..4c3f74396579 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.h
@@ -46,6 +46,8 @@ u32 gm107_ram_probe_fbp(const struct nvkm_ram_func *,
u32 gm200_ram_probe_fbp_amount(const struct nvkm_ram_func *, u32,
struct nvkm_device *, int, int *);
+int gp100_ram_init(struct nvkm_ram *);
+
/* RAM type-specific MR calculation routines */
int nvkm_sddr2_calc(struct nvkm_ram *);
int nvkm_sddr3_calc(struct nvkm_ram *);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.c
index 378f6fb70990..8987a21e81d1 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp100.c
@@ -27,7 +27,7 @@
#include <subdev/bios/init.h>
#include <subdev/bios/rammap.h>
-static int
+int
gp100_ram_init(struct nvkm_ram *ram)
{
struct nvkm_subdev *subdev = &ram->fb->subdev;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp102.c b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp102.c
index 8550f5e47347..b6b6ee59019d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp102.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgp102.c
@@ -5,6 +5,7 @@
static const struct nvkm_ram_func
gp102_ram = {
+ .init = gp100_ram_init,
};
int
--
2.45.1
In __iodyn_find_io_region(), pcmcia_make_resource() is assigned to
res and used in pci_bus_alloc_resource(). There is a dereference of res
in pci_bus_alloc_resource(), which could lead to a NULL pointer
dereference on failure of pcmcia_make_resource().
Fix this bug by adding a check of res.
Found by code review, complie tested only.
Cc: stable(a)vger.kernel.org
Fixes: 49b1153adfe1 ("pcmcia: move all pcmcia_resource_ops providers into one module")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pcmcia/rsrc_iodyn.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pcmcia/rsrc_iodyn.c b/drivers/pcmcia/rsrc_iodyn.c
index b04b16496b0c..2677b577c1f8 100644
--- a/drivers/pcmcia/rsrc_iodyn.c
+++ b/drivers/pcmcia/rsrc_iodyn.c
@@ -62,6 +62,9 @@ static struct resource *__iodyn_find_io_region(struct pcmcia_socket *s,
unsigned long min = base;
int ret;
+ if (!res)
+ return NULL;
+
data.mask = align - 1;
data.offset = base & data.mask;
--
2.25.1
This reverts commit 2e42b7f817acd6e8d78226445eb6fe44fe79c12a.
If the GC victim section has a pinned block when fallocate() trigger
FG_GC, the section is not able to be recycled. And this will return
-EAGAIN cause fallocate() failed, even though there are much spare space
as user see. As the GC policy prone to chose the same victim,
fallocate() may not successed at a long period.
This scenario has been found during Android OTA.
Link: https://lore.kernel.org/linux-f2fs-devel/20231030094024.263707-1-bo.wu@vivo…
CC: stable(a)vger.kernel.org
Signed-off-by: Wu Bo <bo.wu(a)vivo.com>
---
fs/f2fs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index b58ab1157b7e..19915faccee9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1725,7 +1725,7 @@ static int f2fs_expand_inode_data(struct inode *inode, loff_t offset,
f2fs_down_write(&sbi->gc_lock);
stat_inc_gc_call_count(sbi, FOREGROUND);
err = f2fs_gc(sbi, &gc_control);
- if (err && err != -ENODATA)
+ if (err && err != -ENODATA && err != -EAGAIN)
goto out_err;
}
--
2.25.1
From: Daniel Borkmann <daniel(a)iogearbox.net>
commit 626dfed5fa3bfb41e0dffd796032b555b69f9cde upstream.
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <usiegl00(a)gmail.com>
Signed-off-by: Lex Siegel <usiegl00(a)gmail.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Neil Brown <neilb(a)suse.de>
Cc: Trond Myklebust <trondmy(a)kernel.org>
Cc: Anna Schumaker <anna(a)kernel.org>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.ne…
Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.172007507…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource(a)witekio.com>
---
net/sunrpc/xprtsock.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 938c649c5c9f..625b5f69d3ca 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2466,6 +2466,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
case -EALREADY:
xprt_unlock_connect(xprt, transport);
return;
+ case -EPERM:
+ /* Happens, for instance, if a BPF program is preventing
+ * the connect. Remap the error so upper layers can better
+ * deal with it.
+ */
+ status = -ECONNREFUSED;
+ /* fall through */
case -EINVAL:
/* Happens, for instance, if the user specified a link
* local IPv6 address without a scope-id.
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9972605a238339b85bd16b084eed5f18414d22db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081218-demote-shakily-f31c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9972605a2383 ("memcg: protect concurrent access to mem_cgroup_idr")
6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
7348cc91821b ("mm: multi-gen LRU: remove aging fairness safeguard")
a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
adb8213014b2 ("mm: memcg: fix stale protection of reclaim target memcg")
57e9cc50f4dd ("mm: vmscan: split khugepaged stats from direct reclaim stats")
e4fea72b1438 ("mglru: mm/vmscan.c: fix imprecise comments")
d396def5d86d ("memcg: rearrange code")
410f8e82689e ("memcg: extract memcg_vmstats from struct mem_cgroup")
d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
1332a809d95a ("mm: multi-gen LRU: thrashing prevention")
354ed5974429 ("mm: multi-gen LRU: kill switch")
f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs")
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap")
ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
ec1c86b25f4b ("mm: multi-gen LRU: groundwork")
f1e1a7be4718 ("mm/vmscan.c: refactor shrink_node()")
d3629af59f41 ("mm/vmscan: make the annotations of refaults code at the right place")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9972605a238339b85bd16b084eed5f18414d22db Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Date: Fri, 2 Aug 2024 16:58:22 -0700
Subject: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..f29157288b7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
#define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
static DEFINE_IDR(mem_cgroup_idr);
+static DEFINE_SPINLOCK(memcg_idr_lock);
+
+static int mem_cgroup_alloc_id(void)
+{
+ int ret;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&memcg_idr_lock);
+ ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
+ GFP_NOWAIT);
+ spin_unlock(&memcg_idr_lock);
+ idr_preload_end();
+ return ret;
+}
static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
{
if (memcg->id.id > 0) {
+ spin_lock(&memcg_idr_lock);
idr_remove(&mem_cgroup_idr, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
+
memcg->id.id = 0;
}
}
@@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
if (!memcg)
return ERR_PTR(error);
- memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
+ memcg->id.id = mem_cgroup_alloc_id();
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
@@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
* publish it here at the end of onlining. This matches the
* regular ID destruction during offlining.
*/
+ spin_lock(&memcg_idr_lock);
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
return 0;
offline_kmem:
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f19c6aa3ea4b..17ebcf19b444 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1383,7 +1383,6 @@ struct btrfs_drop_extents_args {
struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
- bool fsync_skip_inode_lock;
};
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c44dfb4370d7..44160d4ad53e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1992,13 +1992,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -2007,13 +2000,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
err = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -2195,7 +2185,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2205,7 +2194,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 0ded32bbd001..a06bc6ad4764 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -11,6 +11,12 @@
#include "delayed-ref.h"
#include "ctree.h"
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
enum btrfs_trans_state {
TRANS_STATE_RUNNING,
TRANS_STATE_COMMIT_START,
--
2.43.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.
Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().
v2:
* Improve commit message. (Philipp)
* Cache the scheduler pointer directly. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Philipp Stanner <pstanner(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
---
drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index ae8be30472cd..76e422548d40 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+ sched = rq->sched;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(sched, entity);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
From: Daniel Borkmann <daniel(a)iogearbox.net>
commit 626dfed5fa3bfb41e0dffd796032b555b69f9cde upstream.
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <usiegl00(a)gmail.com>
Signed-off-by: Lex Siegel <usiegl00(a)gmail.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Neil Brown <neilb(a)suse.de>
Cc: Trond Myklebust <trondmy(a)kernel.org>
Cc: Anna Schumaker <anna(a)kernel.org>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.ne…
Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.172007507…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource(a)witekio.com>
---
net/sunrpc/xprtsock.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 0666f981618a..e0cd6d735053 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2314,6 +2314,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
case -EALREADY:
xprt_unlock_connect(xprt, transport);
return;
+ case -EPERM:
+ /* Happens, for instance, if a BPF program is preventing
+ * the connect. Remap the error so upper layers can better
+ * deal with it.
+ */
+ status = -ECONNREFUSED;
+ fallthrough;
case -EINVAL:
/* Happens, for instance, if the user specified a link
* local IPv6 address without a scope-id.
--
2.43.0
On Mon, Sep 09 2024 at 05:03, Jamie Heilman wrote:
> 3db03fb4995e ("x86/mm: Fix pti_clone_entry_text() for i386") which got
> landed in 6.6.46, has introduced two back to back warnings on boot on
> my 32bit system (found on 6.6.50):
Right.
> Reverting that commit removes the warnings (tested against 6.6.50).
> The follow-on commit of c48b5a4cf312 ("x86/mm: Fix PTI for i386 some
> more") doesn't apply cleanly to 6.6.50, but I did try out a build of
> 6.11-rc7 and that works fine too with no warnings on boot.
See backport below.
Thanks,
tglx
---
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue Aug 6 20:48:43 2024 +0200
Subject: [PATCH 6.6.y, 6.1.y, 5.10.y] x86/mm: Fix PTI for i386 some more
commit c48b5a4cf3125adb679e28ef093f66ff81368d05 upstream.
So it turns out that we have to do two passes of
pti_clone_entry_text(), once before initcalls, such that device and
late initcalls can use user-mode-helper / modprobe and once after
free_initmem() / mark_readonly().
Now obviously mark_readonly() can cause PMD splits, and
pti_clone_pgtable() doesn't like that much.
Allow the late clone to split PMDs so that pagetables stay in sync.
[peterz: Changelog and comments]
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lkml.kernel.org/r/20240806184843.GX37996@noisy.programming.kicks-as…
---
arch/x86/mm/pti.c | 45 +++++++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 16 deletions(-)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -241,7 +241,7 @@ static pmd_t *pti_user_pagetable_walk_pm
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
-static pte_t *pti_user_pagetable_walk_pte(unsigned long address)
+static pte_t *pti_user_pagetable_walk_pte(unsigned long address, bool late_text)
{
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
pmd_t *pmd;
@@ -251,10 +251,15 @@ static pte_t *pti_user_pagetable_walk_pt
if (!pmd)
return NULL;
- /* We can't do anything sensible if we hit a large mapping. */
+ /* Large PMD mapping found */
if (pmd_large(*pmd)) {
- WARN_ON(1);
- return NULL;
+ /* Clear the PMD if we hit a large mapping from the first round */
+ if (late_text) {
+ set_pmd(pmd, __pmd(0));
+ } else {
+ WARN_ON_ONCE(1);
+ return NULL;
+ }
}
if (pmd_none(*pmd)) {
@@ -283,7 +288,7 @@ static void __init pti_setup_vsyscall(vo
if (!pte || WARN_ON(level != PG_LEVEL_4K) || pte_none(*pte))
return;
- target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR);
+ target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR, false);
if (WARN_ON(!target_pte))
return;
@@ -301,7 +306,7 @@ enum pti_clone_level {
static void
pti_clone_pgtable(unsigned long start, unsigned long end,
- enum pti_clone_level level)
+ enum pti_clone_level level, bool late_text)
{
unsigned long addr;
@@ -390,7 +395,7 @@ pti_clone_pgtable(unsigned long start, u
return;
/* Allocate PTE in the user page-table */
- target_pte = pti_user_pagetable_walk_pte(addr);
+ target_pte = pti_user_pagetable_walk_pte(addr, late_text);
if (WARN_ON(!target_pte))
return;
@@ -453,7 +458,7 @@ static void __init pti_clone_user_shared
phys_addr_t pa = per_cpu_ptr_to_phys((void *)va);
pte_t *target_pte;
- target_pte = pti_user_pagetable_walk_pte(va);
+ target_pte = pti_user_pagetable_walk_pte(va, false);
if (WARN_ON(!target_pte))
return;
@@ -476,7 +481,7 @@ static void __init pti_clone_user_shared
start = CPU_ENTRY_AREA_BASE;
end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
- pti_clone_pgtable(start, end, PTI_CLONE_PMD);
+ pti_clone_pgtable(start, end, PTI_CLONE_PMD, false);
}
#endif /* CONFIG_X86_64 */
@@ -493,11 +498,11 @@ static void __init pti_setup_espfix64(vo
/*
* Clone the populated PMDs of the entry text and force it RO.
*/
-static void pti_clone_entry_text(void)
+static void pti_clone_entry_text(bool late)
{
pti_clone_pgtable((unsigned long) __entry_text_start,
(unsigned long) __entry_text_end,
- PTI_LEVEL_KERNEL_IMAGE);
+ PTI_LEVEL_KERNEL_IMAGE, late);
}
/*
@@ -572,7 +577,7 @@ static void pti_clone_kernel_text(void)
* pti_set_kernel_image_nonglobal() did to clear the
* global bit.
*/
- pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);
+ pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE, false);
/*
* pti_clone_pgtable() will set the global bit in any PMDs
@@ -639,8 +644,15 @@ void __init pti_init(void)
/* Undo all global bits from the init pagetables in head_64.S: */
pti_set_kernel_image_nonglobal();
+
/* Replace some of the global bits just for shared entry text: */
- pti_clone_entry_text();
+ /*
+ * This is very early in boot. Device and Late initcalls can do
+ * modprobe before free_initmem() and mark_readonly(). This
+ * pti_clone_entry_text() allows those user-mode-helpers to function,
+ * but notably the text is still RW.
+ */
+ pti_clone_entry_text(false);
pti_setup_espfix64();
pti_setup_vsyscall();
}
@@ -657,10 +669,11 @@ void pti_finalize(void)
if (!boot_cpu_has(X86_FEATURE_PTI))
return;
/*
- * We need to clone everything (again) that maps parts of the
- * kernel image.
+ * This is after free_initmem() (all initcalls are done) and we've done
+ * mark_readonly(). Text is now NX which might've split some PMDs
+ * relative to the early clone.
*/
- pti_clone_entry_text();
+ pti_clone_entry_text(true);
pti_clone_kernel_text();
debug_checkwx_user();
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x d33d26036a0274b472299d7dcdaa5fb34329f91b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090850-nuclear-radar-ea2b@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d33d26036a02 ("rtmutex: Drop rt_mutex::wait_lock before scheduling")
add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
1c143c4b65da ("locking/rtmutex: Provide the spin/rwlock core lock function")
e17ba59b7e8e ("locking/rtmutex: Guard regular sleeping locks specific functions")
7980aa397cc0 ("locking/rtmutex: Use rt_mutex_wake_q_head")
c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
42254105dfe8 ("locking/rwsem: Add rtmutex based R/W semaphore implementation")
ebbdc41e90ff ("locking/rtmutex: Provide rt_mutex_slowlock_locked()")
830e6acc8a1c ("locking/rtmutex: Split out the inner parts of 'struct rtmutex'")
531ae4b06a73 ("locking/rtmutex: Split API from implementation")
785159301bed ("locking/rtmutex: Convert macros to inlines")
b41cda037655 ("locking/rtmutex: Set proper wait context for lockdep")
2f064a59a11f ("sched: Change task_struct::state")
d6c23bb3a2ad ("sched: Add get_current_state()")
b03fbd4ff24c ("sched: Introduce task_is_running()")
a9e906b71f96 ("Merge branch 'sched/urgent' into sched/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d33d26036a0274b472299d7dcdaa5fb34329f91b Mon Sep 17 00:00:00 2001
From: Roland Xu <mu001999(a)outlook.com>
Date: Thu, 15 Aug 2024 10:58:13 +0800
Subject: [PATCH] rtmutex: Drop rt_mutex::wait_lock before scheduling
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the
good case it returns with the lock held and in the deadlock case it emits a
warning and goes into an endless scheduling loop with the lock held, which
triggers the 'scheduling in atomic' warning.
Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning
and dropping into the schedule for ever loop.
[ tglx: Moved unlock before the WARN(), removed the pointless comment,
massaged changelog, added Fixes tag ]
Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter")
Signed-off-by: Roland Xu <mu001999(a)outlook.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300M…
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 88d08eeb8bc0..fba1229f1de6 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1644,6 +1644,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_mutex_base *lock,
}
static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
+ struct rt_mutex_base *lock,
struct rt_mutex_waiter *w)
{
/*
@@ -1656,10 +1657,10 @@ static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
if (build_ww_mutex() && w->ww_ctx)
return;
- /*
- * Yell loudly and stop the task right here.
- */
+ raw_spin_unlock_irq(&lock->wait_lock);
+
WARN(1, "rtmutex deadlock detected\n");
+
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
rt_mutex_schedule();
@@ -1713,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
} else {
__set_current_state(TASK_RUNNING);
remove_waiter(lock, waiter);
- rt_mutex_handle_deadlock(ret, chwalk, waiter);
+ rt_mutex_handle_deadlock(ret, chwalk, lock, waiter);
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x aea62c744a9ae2a8247c54ec42138405216414da
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090852-importer-unadorned-f55b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
aea62c744a9a ("mmc: cqhci: Fix checking of CQHCI_HALT state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aea62c744a9ae2a8247c54ec42138405216414da Mon Sep 17 00:00:00 2001
From: Seunghwan Baek <sh8267.baek(a)samsung.com>
Date: Thu, 29 Aug 2024 15:18:22 +0900
Subject: [PATCH] mmc: cqhci: Fix checking of CQHCI_HALT state
To check if mmc cqe is in halt state, need to check set/clear of CQHCI_HALT
bit. At this time, we need to check with &, not &&.
Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: stable(a)vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek(a)samsung.com>
Reviewed-by: Ritesh Harjani <ritesh.list(a)gmail.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20240829061823.3718-2-sh8267.baek@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index c14d7251d0bb..a02da26a1efd 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -617,7 +617,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
cqhci_writel(cq_host, 0, CQHCI_CTL);
mmc->cqe_on = true;
pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
- if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+ if (cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT) {
pr_err("%s: cqhci: CQE failed to exit halt state\n",
mmc_hostname(mmc));
}
From: Willem de Bruijn <willemb(a)google.com>
Backport the following commit, because it fixes an existing backport
that has caused multiple reports of breakage on 5.15 based kernels:
net: drop bad gso csum_start and offset in virtio_net_hdr
To backport without conflicts, also backport its two dependencies:
net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
gso: fix dodgy bit handling for GSO_UDP_L4
Also backport the one patch in netdev-net/main that references one
of the above in its Fixes tag:
net: change maximum number of UDP segments to 128
All four patches also exist in 6.1.109
include/linux/udp.h | 2 +-
include/linux/virtio_net.h | 35 +++++++++++++++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 17 +++++++++++---
tools/testing/selftests/net/udpgso.c | 2 +-
5 files changed, 40 insertions(+), 19 deletions(-)
--
2.46.0.598.g6f2099f65c-goog
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is
bigger than the variable hv_sched_clock_offset, which is cached during
early boot, but depending on the timing this assumption may be false
when a hibernated VM starts again (the clock counter starts from 0
again) and is resuming back (Note: hv_init_tsc_clocksource() is not
called during hibernation/resume); consequently,
read_hv_sched_clock_tsc() may return a negative integer (which is
interpreted as a huge positive integer since the return type is u64)
and new kernel messages are prefixed with huge timestamps before
read_hv_sched_clock_tsc() grows big enough (which typically takes
several seconds).
Fix the issue by saving the Hyper-V clock counter just before the
suspend, and using it to correct the hv_sched_clock_offset in
resume. Override x86_platform.save_sched_clock_state and
x86_platform.restore_sched_clock_state so that we don't
have to touch the common x86 code.
Note: if Invariant TSC is available, the issue doesn't happen because
1) we don't register read_hv_sched_clock_tsc() for sched clock:
See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework
clocksource and sched clock setup");
2) the common x86 code adjusts TSC similarly: see
__restore_processor_state() -> tsc_verify_tsc_adjust(true) and
x86_platform.restore_sched_clock_state().
Cc: stable(a)vger.kernel.org
Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation")
Co-developed-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
drivers/clocksource/hyperv_timer.c | 64 +++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e41..7aa44b8aae2e 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -27,7 +27,10 @@
#include <asm/mshyperv.h>
static struct clock_event_device __percpu *hv_clock_event;
-static u64 hv_sched_clock_offset __ro_after_init;
+
+/* Can have negative values, after resume from hibernation, so keep them s64 */
+static s64 hv_sched_clock_offset __read_mostly;
+static s64 hv_sched_clock_offset_saved;
/*
* If false, we're using the old mechanism for stimer0 interrupts
@@ -51,6 +54,9 @@ static int stimer0_irq = -1;
static int stimer0_message_sint;
static __maybe_unused DEFINE_PER_CPU(long, stimer0_evt);
+static void (*old_save_sched_clock_state)(void);
+static void (*old_restore_sched_clock_state)(void);
+
/*
* Common code for stimer0 interrupts coming via Direct Mode or
* as a VMbus message.
@@ -434,6 +440,39 @@ static u64 noinstr read_hv_sched_clock_tsc(void)
(NSEC_PER_SEC / HV_CLOCK_HZ);
}
+/*
+ * Hyper-V clock counter resets during hibernation. Save and restore clock
+ * offset during suspend/resume, while also considering the time passed
+ * before suspend. This is to make sure that sched_clock using hv tsc page
+ * based clocksource, proceeds from where it left off during suspend and
+ * it shows correct time for the timestamps of kernel messages after resume.
+ */
+static void save_hv_clock_tsc_state(void)
+{
+ hv_sched_clock_offset_saved = hv_read_reference_counter();
+}
+
+static void restore_hv_clock_tsc_state(void)
+{
+ /*
+ * Time passed before suspend = hv_sched_clock_offset_saved
+ * - hv_sched_clock_offset (old)
+ *
+ * After Hyper-V clock counter resets, hv_sched_clock_offset needs a correction.
+ *
+ * New time = hv_read_reference_counter() (future) - hv_sched_clock_offset (new)
+ * New time = Time passed before suspend + hv_read_reference_counter() (future)
+ * - hv_read_reference_counter() (now)
+ *
+ * Solving the above two equations gives:
+ *
+ * hv_sched_clock_offset (new) = hv_sched_clock_offset (old)
+ * - hv_sched_clock_offset_saved
+ * + hv_read_reference_counter() (now))
+ */
+ hv_sched_clock_offset -= hv_sched_clock_offset_saved - hv_read_reference_counter();
+}
+
static void suspend_hv_clock_tsc(struct clocksource *arg)
{
union hv_reference_tsc_msr tsc_msr;
@@ -456,6 +495,24 @@ static void resume_hv_clock_tsc(struct clocksource *arg)
hv_set_msr(HV_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
}
+/*
+ * Functions to override save_sched_clock_state and restore_sched_clock_state
+ * functions of x86_platform. The Hyper-V clock counter is reset during
+ * suspend-resume and the offset used to measure time needs to be
+ * corrected, post resume.
+ */
+static void hv_save_sched_clock_state(void)
+{
+ save_hv_clock_tsc_state();
+ old_save_sched_clock_state();
+}
+
+static void hv_restore_sched_clock_state(void)
+{
+ restore_hv_clock_tsc_state();
+ old_restore_sched_clock_state();
+}
+
#ifdef HAVE_VDSO_CLOCKMODE_HVCLOCK
static int hv_cs_enable(struct clocksource *cs)
{
@@ -539,6 +596,11 @@ static void __init hv_init_tsc_clocksource(void)
hv_read_reference_counter = read_hv_clock_tsc;
+ old_save_sched_clock_state = x86_platform.save_sched_clock_state;
+ x86_platform.save_sched_clock_state = hv_save_sched_clock_state;
+ old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
+ x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
+
/*
* TSC page mapping works differently in root compared to guest.
* - In guest partition the guest PFN has to be passed to the
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.25.1
If the net_conf pointer is NULL and the code attempts to access its
fields without a check, it will lead to a null pointer dereference.
Add a NULL check before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov(a)rosalinux.ru>
---
drivers/block/drbd/drbd_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 287a8d1d3f70..87cf5883078f 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -876,7 +876,7 @@ is_valid_state(struct drbd_device *device, union drbd_state ns)
ns.disk == D_OUTDATED)
rv = SS_CONNECTED_OUTDATES;
- else if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
+ else if (nc && (ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
(nc->verify_alg[0] == 0))
rv = SS_NO_VERIFY_ALG;
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable(a)vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
v1->v2
- skb_transport_header instead of skb->transport_header (edumazet@)
- typo: migitate -> mitigate
---
include/linux/virtio_net.h | 16 +++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 4 ++++
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index d1d7825318c32..6c395a2600e8d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,7 +56,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
- u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -99,16 +98,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
- if (hdr->gso_size) {
- gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
- ret = div64_u64_rem(skb->len, gso_size, &remainder);
- if (!(ret && (hdr->gso_size > needed) &&
- ((remainder > needed) || (remainder == 0)))) {
- return -EINVAL;
- }
- skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
- }
-
if (!pskb_may_pull(skb, needed))
return -EINVAL;
@@ -182,6 +171,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
if (gso_type != SKB_GSO_UDP_L4)
return -EINVAL;
break;
+ case SKB_GSO_TCPV4:
+ case SKB_GSO_TCPV6:
+ if (skb->csum_offset != offsetof(struct tcphdr, check))
+ return -EINVAL;
+ break;
}
/* Kernel has a special handling for GSO_BY_FRAGS. */
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 4b791e74529e1..e4ad3311e1489 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -140,6 +140,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
if (thlen < sizeof(*th))
goto out;
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ goto out;
+
if (!pskb_may_pull(skb, thlen))
goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index aa2e0a28ca613..bc8a9da750fed 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -278,6 +278,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (gso_skb->len <= sizeof(*uh) + mss)
return ERR_PTR(-EINVAL);
+ if (unlikely(skb_checksum_start(gso_skb) !=
+ skb_transport_header(gso_skb)))
+ return ERR_PTR(-EINVAL);
+
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
--
2.46.0.rc1.232.g9752f9e123-goog
The quilt patch titled
Subject: mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
has been removed from the -mm tree. Its filename was
mm-damon-vaddr-protect-vma-traversal-in-__damon_va_thre_regions-with-rcu-read-lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
Date: Wed, 4 Sep 2024 17:12:04 -0700
Traversing VMAs of a given maple tree should be protected by rcu read
lock. However, __damon_va_three_regions() is not doing the protection.
Hold the lock.
Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org
Fixes: d0cf3dd47f0d ("damon: convert __damon_va_three_regions to use the VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/vaddr.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/damon/vaddr.c~mm-damon-vaddr-protect-vma-traversal-in-__damon_va_thre_regions-with-rcu-read-lock
+++ a/mm/damon/vaddr.c
@@ -126,6 +126,7 @@ static int __damon_va_three_regions(stru
* If this is too slow, it can be optimised to examine the maple
* tree gaps.
*/
+ rcu_read_lock();
for_each_vma(vmi, vma) {
unsigned long gap;
@@ -146,6 +147,7 @@ static int __damon_va_three_regions(stru
next:
prev = vma;
}
+ rcu_read_unlock();
if (!sz_range(&second_gap) || !sz_range(&first_gap))
return -EINVAL;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-mark-three-functions-as-__maybe_unused.patch
The quilt patch titled
Subject: ocfs2: cancel dqi_sync_work before freeing oinfo
has been removed from the -mm tree. Its filename was
ocfs2-cancel-dqi_sync_work-before-freeing-oinfo.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: cancel dqi_sync_work before freeing oinfo
Date: Wed, 4 Sep 2024 15:10:03 +0800
ocfs2_global_read_info() will initialize and schedule dqi_sync_work at the
end, if error occurs after successfully reading global quota, it will
trigger the following warning with CONFIG_DEBUG_OBJECTS_* enabled:
ODEBUG: free active (active state 0) object: 00000000d8b0ce28 object type: timer_list hint: qsync_work_fn+0x0/0x16c
This reports that there is an active delayed work when freeing oinfo in
error handling, so cancel dqi_sync_work first. BTW, return status instead
of -1 when .read_file_info fails.
Link: https://syzkaller.appspot.com/bug?extid=f7af59df5d6b25f0febd
Link: https://lkml.kernel.org/r/20240904071004.2067695-1-joseph.qi@linux.alibaba.…
Fixes: 171bf93ce11f ("ocfs2: Periodic quota syncing")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Reported-by: syzbot+f7af59df5d6b25f0febd(a)syzkaller.appspotmail.com
Tested-by: syzbot+f7af59df5d6b25f0febd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/quota_local.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/quota_local.c~ocfs2-cancel-dqi_sync_work-before-freeing-oinfo
+++ a/fs/ocfs2/quota_local.c
@@ -692,7 +692,7 @@ static int ocfs2_local_read_info(struct
int status;
struct buffer_head *bh = NULL;
struct ocfs2_quota_recovery *rec;
- int locked = 0;
+ int locked = 0, global_read = 0;
info->dqi_max_spc_limit = 0x7fffffffffffffffLL;
info->dqi_max_ino_limit = 0x7fffffffffffffffLL;
@@ -700,6 +700,7 @@ static int ocfs2_local_read_info(struct
if (!oinfo) {
mlog(ML_ERROR, "failed to allocate memory for ocfs2 quota"
" info.");
+ status = -ENOMEM;
goto out_err;
}
info->dqi_priv = oinfo;
@@ -712,6 +713,7 @@ static int ocfs2_local_read_info(struct
status = ocfs2_global_read_info(sb, type);
if (status < 0)
goto out_err;
+ global_read = 1;
status = ocfs2_inode_lock(lqinode, &oinfo->dqi_lqi_bh, 1);
if (status < 0) {
@@ -782,10 +784,12 @@ out_err:
if (locked)
ocfs2_inode_unlock(lqinode, 1);
ocfs2_release_local_quota_bitmaps(&oinfo->dqi_chunk);
+ if (global_read)
+ cancel_delayed_work_sync(&oinfo->dqi_sync_work);
kfree(oinfo);
}
brelse(bh);
- return -1;
+ return status;
}
/* Write local info to quota file */
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-cleanup-return-value-and-mlog-in-ocfs2_global_read_info.patch
The quilt patch titled
Subject: ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
has been removed from the -mm tree. Its filename was
ocfs2-fix-possible-null-ptr-deref-in-ocfs2_set_buffer_uptodate.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lizhi Xu <lizhi.xu(a)windriver.com>
Subject: ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
Date: Mon, 2 Sep 2024 10:36:36 +0800
When doing cleanup, if flags without OCFS2_BH_READAHEAD, it may trigger
NULL pointer dereference in the following ocfs2_set_buffer_uptodate() if
bh is NULL.
Link: https://lkml.kernel.org/r/20240902023636.1843422-3-joseph.qi@linux.alibaba.…
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: Heming Zhao <heming.zhao(a)suse.com>
Suggested-by: Heming Zhao <heming.zhao(a)suse.com>
Cc: <stable(a)vger.kernel.org> [4.20+]
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/buffer_head_io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/buffer_head_io.c~ocfs2-fix-possible-null-ptr-deref-in-ocfs2_set_buffer_uptodate
+++ a/fs/ocfs2/buffer_head_io.c
@@ -388,7 +388,8 @@ read_failure:
/* Always set the buffer in the cache, even if it was
* a forced read, or read-ahead which hasn't yet
* completed. */
- ocfs2_set_buffer_uptodate(ci, bh);
+ if (bh)
+ ocfs2_set_buffer_uptodate(ci, bh);
}
ocfs2_metadata_cache_io_unlock(ci);
_
Patches currently in -mm which might be from lizhi.xu(a)windriver.com are
The quilt patch titled
Subject: ocfs2: remove unreasonable unlock in ocfs2_read_blocks
has been removed from the -mm tree. Its filename was
ocfs2-remove-unreasonable-unlock-in-ocfs2_read_blocks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lizhi Xu <lizhi.xu(a)windriver.com>
Subject: ocfs2: remove unreasonable unlock in ocfs2_read_blocks
Date: Mon, 2 Sep 2024 10:36:35 +0800
Patch series "Misc fixes for ocfs2_read_blocks", v5.
This series contains 2 fixes for ocfs2_read_blocks(). The first patch fix
the issue reported by syzbot, which detects bad unlock balance in
ocfs2_read_blocks(). The second patch fixes an issue reported by Heming
Zhao when reviewing above fix.
This patch (of 2):
There was a lock release before exiting, so remove the unreasonable unlock.
Link: https://lkml.kernel.org/r/20240902023636.1843422-1-joseph.qi@linux.alibaba.…
Link: https://lkml.kernel.org/r/20240902023636.1843422-2-joseph.qi@linux.alibaba.…
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+ab134185af9ef88dfed5(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ab134185af9ef88dfed5
Tested-by: syzbot+ab134185af9ef88dfed5(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [4.20+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/buffer_head_io.c | 1 -
1 file changed, 1 deletion(-)
--- a/fs/ocfs2/buffer_head_io.c~ocfs2-remove-unreasonable-unlock-in-ocfs2_read_blocks
+++ a/fs/ocfs2/buffer_head_io.c
@@ -235,7 +235,6 @@ int ocfs2_read_blocks(struct ocfs2_cachi
if (bhs[i] == NULL) {
bhs[i] = sb_getblk(sb, block++);
if (bhs[i] == NULL) {
- ocfs2_metadata_cache_io_unlock(ci);
status = -ENOMEM;
mlog_errno(status);
/* Don't forget to put previous bh! */
_
Patches currently in -mm which might be from lizhi.xu(a)windriver.com are
The quilt patch titled
Subject: ocfs2: fix null-ptr-deref when journal load failed.
has been removed from the -mm tree. Its filename was
ocfs2-fix-null-ptr-deref-when-journal-load-failed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Julian Sun <sunjunchao2870(a)gmail.com>
Subject: ocfs2: fix null-ptr-deref when journal load failed.
Date: Mon, 2 Sep 2024 11:08:44 +0800
During the mounting process, if journal_reset() fails because of too short
journal, then lead to jbd2_journal_load() fails with NULL j_sb_buffer.
Subsequently, ocfs2_journal_shutdown() calls
jbd2_journal_flush()->jbd2_cleanup_journal_tail()->
__jbd2_update_log_tail()->jbd2_journal_update_sb_log_tail()
->lock_buffer(journal->j_sb_buffer), resulting in a null-pointer
dereference error.
To resolve this issue, we should check the JBD2_LOADED flag to ensure the
journal was properly loaded. Additionally, use journal instead of
osb->journal directly to simplify the code.
Link: https://syzkaller.appspot.com/bug?extid=05b9b39d8bdfe1a0861f
Link: https://lkml.kernel.org/r/20240902030844.422725-1-sunjunchao2870@gmail.com
Fixes: f6f50e28f0cb ("jbd2: Fail to load a journal if it is too short")
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
Reported-by: syzbot+05b9b39d8bdfe1a0861f(a)syzkaller.appspotmail.com
Suggested-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-fix-null-ptr-deref-when-journal-load-failed
+++ a/fs/ocfs2/journal.c
@@ -1055,7 +1055,7 @@ void ocfs2_journal_shutdown(struct ocfs2
if (!igrab(inode))
BUG();
- num_running_trans = atomic_read(&(osb->journal->j_num_trans));
+ num_running_trans = atomic_read(&(journal->j_num_trans));
trace_ocfs2_journal_shutdown(num_running_trans);
/* Do a commit_cache here. It will flush our journal, *and*
@@ -1074,9 +1074,10 @@ void ocfs2_journal_shutdown(struct ocfs2
osb->commit_task = NULL;
}
- BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);
+ BUG_ON(atomic_read(&(journal->j_num_trans)) != 0);
- if (ocfs2_mount_local(osb)) {
+ if (ocfs2_mount_local(osb) &&
+ (journal->j_journal->j_flags & JBD2_LOADED)) {
jbd2_journal_lock_updates(journal->j_journal);
status = jbd2_journal_flush(journal->j_journal, 0);
jbd2_journal_unlock_updates(journal->j_journal);
_
Patches currently in -mm which might be from sunjunchao2870(a)gmail.com are
The mwifiex chips support simultaneous Accesspoint and station mode,
but this only works when all are using the same channel. The downstream
driver uses ECSA which makes the Accesspoint automatically switch to the
channel the station is going to use. Until this is implemented in the
mwifiex driver at least catch this situation and bail out with an error.
Userspace doesn't have a meaningful way to figure out what went wrong,
so print an error message to give the user a clue.
Without this patch the driver would timeout on the
HostCmd_CMD_802_11_ASSOCIATE command when creating a station with a
channel different from the one that an existing accesspoint uses.
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 52 ++++++++++++++++++++++++
drivers/net/wireless/marvell/mwifiex/main.h | 1 +
drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 3 ++
3 files changed, 56 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 5697a02e6b8d3..0d3bf624cd3de 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2054,6 +2054,55 @@ static int mwifiex_cfg80211_stop_ap(struct wiphy *wiphy, struct net_device *dev,
return 0;
}
+bool mwifiex_channel_conflict(struct mwifiex_private *priv, struct ieee80211_channel *ch)
+{
+ struct mwifiex_adapter *adapter = priv->adapter;
+ struct mwifiex_current_bss_params *bss_params;
+ u8 band;
+ int freq, i;
+
+ for (i = 0; i < adapter->priv_num; i++) {
+ struct mwifiex_private *p = adapter->priv[i];
+ struct ieee80211_channel *used = NULL;
+
+ if (p == priv)
+ continue;
+
+ switch (GET_BSS_ROLE(p)) {
+ case MWIFIEX_BSS_ROLE_UAP:
+ if (!netif_carrier_ok(p->netdev))
+ break;
+
+ if (!cfg80211_chandef_valid(&p->bss_chandef))
+ break;
+
+ used = p->bss_chandef.chan;
+
+ break;
+ case MWIFIEX_BSS_ROLE_STA:
+ if (!p->media_connected)
+ break;
+
+ bss_params = &p->curr_bss_params;
+ band = mwifiex_band_to_radio_type(bss_params->band);
+ freq = ieee80211_channel_to_frequency(bss_params->bss_descriptor.channel,
+ band);
+
+ used = ieee80211_get_channel(priv->wdev.wiphy, freq);
+
+ break;
+ }
+
+ if (used && !ieee80211_channel_equal(used, ch)) {
+ mwifiex_dbg(priv->adapter, MSG,
+ "all AP and STA must operate on same channel\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
/* cfg80211 operation handler for start_ap.
* Function sets beacon period, DTIM period, SSID and security into
* AP config structure.
@@ -2069,6 +2118,9 @@ static int mwifiex_cfg80211_start_ap(struct wiphy *wiphy,
if (GET_BSS_ROLE(priv) != MWIFIEX_BSS_ROLE_UAP)
return -1;
+ if (!mwifiex_channel_conflict(priv, params->chandef.chan))
+ return -EBUSY;
+
bss_cfg = kzalloc(sizeof(struct mwifiex_uap_bss_param), GFP_KERNEL);
if (!bss_cfg)
return -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 529863edd7a25..b68dbf884156b 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -1697,6 +1697,7 @@ int mwifiex_set_mac_address(struct mwifiex_private *priv,
struct net_device *dev,
bool external, u8 *new_mac);
void mwifiex_devdump_tmo_func(unsigned long function_context);
+bool mwifiex_channel_conflict(struct mwifiex_private *priv, struct ieee80211_channel *ch);
#ifdef CONFIG_DEBUG_FS
void mwifiex_debugfs_init(void);
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
index d3cba6895f8ce..9794816d8a0c6 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
@@ -291,6 +291,9 @@ int mwifiex_bss_start(struct mwifiex_private *priv, struct cfg80211_bss *bss,
if (!bss_desc)
return -1;
+ if (!mwifiex_channel_conflict(priv, bss->channel))
+ return -EBUSY;
+
if (mwifiex_band_to_radio_type(bss_desc->bss_band) ==
HostCmd_SCAN_RADIO_TYPE_BG) {
config_bands = BAND_B | BAND_G | BAND_GN;
---
base-commit: 67a72043aa2e6f60f7bbe7bfa598ba168f16d04f
change-id: 20240830-mwifiex-check-channel-f411a156bbe0
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9149c9b0c7e046273141e41eebd8a517416144ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090943-justness-geologist-75e9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9149c9b0c7e0 ("usb: dwc3: core: update LC timer as per USB Spec V3.2")
63d7f9810a38 ("usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug")
843714bb37d9 ("usb: dwc3: Decouple USB 2.0 L1 & L2 events")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9149c9b0c7e046273141e41eebd8a517416144ac Mon Sep 17 00:00:00 2001
From: Faisal Hassan <quic_faisalh(a)quicinc.com>
Date: Thu, 29 Aug 2024 15:15:02 +0530
Subject: [PATCH] usb: dwc3: core: update LC timer as per USB Spec V3.2
This fix addresses STAR 9001285599, which only affects DWC_usb3 version
3.20a. The timer value for PM_LC_TIMER in DWC_usb3 3.20a for the Link
ECN changes is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19],
the link compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240829094502.26502-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..9eb085f359ce 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1386,6 +1386,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects DWC_usb3 version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9149c9b0c7e046273141e41eebd8a517416144ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090942-clunky-disobey-80a9@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
9149c9b0c7e0 ("usb: dwc3: core: update LC timer as per USB Spec V3.2")
63d7f9810a38 ("usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug")
843714bb37d9 ("usb: dwc3: Decouple USB 2.0 L1 & L2 events")
9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP")
8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode")
d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit")
586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name")
7ba6b09fda5e ("usb: dwc3: core: add support for disabling SS instances in park mode")
5eb5afb07853 ("usb: dwc3: use proper initializers for property entries")
9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9149c9b0c7e046273141e41eebd8a517416144ac Mon Sep 17 00:00:00 2001
From: Faisal Hassan <quic_faisalh(a)quicinc.com>
Date: Thu, 29 Aug 2024 15:15:02 +0530
Subject: [PATCH] usb: dwc3: core: update LC timer as per USB Spec V3.2
This fix addresses STAR 9001285599, which only affects DWC_usb3 version
3.20a. The timer value for PM_LC_TIMER in DWC_usb3 3.20a for the Link
ECN changes is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19],
the link compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240829094502.26502-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..9eb085f359ce 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1386,6 +1386,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects DWC_usb3 version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 96f9ab0d5933c1c00142dd052f259fce0bc3ced2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090907-pancreas-remodeler-f80d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
96f9ab0d5933 ("iio: adc: ad7124: fix chip ID mismatch")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 96f9ab0d5933c1c00142dd052f259fce0bc3ced2 Mon Sep 17 00:00:00 2001
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
Date: Wed, 31 Jul 2024 15:37:22 +0300
Subject: [PATCH] iio: adc: ad7124: fix chip ID mismatch
The ad7124_soft_reset() function has the assumption that the chip will
assert the "power-on reset" bit in the STATUS register after a software
reset without any delay. The POR bit =0 is used to check if the chip
initialization is done.
A chip ID mismatch probe error appears intermittently when the probe
continues too soon and the ID register does not contain the expected
value.
Fix by adding a 200us delay after the software reset command is issued.
Fixes: b3af341bbd96 ("iio: adc: Add ad7124 support")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240731-ad7124-fix-v1-1-46a76aa4b9be@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index 3beed78496c5..c0b82f64c976 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -764,6 +764,7 @@ static int ad7124_soft_reset(struct ad7124_state *st)
if (ret < 0)
return ret;
+ fsleep(200);
timeout = 100;
do {
ret = ad_sd_read_reg(&st->sd, AD7124_STATUS, 1, &readval);
Hi,
We are offering you the visitors contact list of Big Data LDN Expo 2024.
We have 12,137 Verified Contact List with discount.
List Contains: Contact Name, Title, Phone Number, Fax Number, Physical address, Company Name, Company URL, Employee Size, Revenue Size, Industry, and more…
Let me know if you’re interested so that I can share you the pricing for the same.
Kind Regards,
Jacob Smith
Senior Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
On 9/9/24 16:36, Charles Keepax wrote:
> On Wed, Sep 04, 2024 at 04:52:28PM +0200, Krzysztof Kozlowski wrote:
>> This reverts commit ab8d66d132bc8f1992d3eb6cab8d32dda6733c84 because it
>> breaks codecs using non-continuous masks in source and sink ports. The
>> commit missed the point that port numbers are not used as indices for
>> iterating over prop.sink_ports or prop.source_ports.
>>
>> Soundwire core and existing codecs expect that the array passed as
>> prop.sink_ports and prop.source_ports is continuous. The port mask still
>> might be non-continuous, but that's unrelated.
>>
>> Reported-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
>> Closes: https://lore.kernel.org/all/b6c75eee-761d-44c8-8413-2a5b34ee2f98@linux.inte…
>> Fixes: ab8d66d132bc ("soundwire: stream: fix programming slave ports for non-continous port maps")
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
>>
>> ---
>
> Would be good to merge this as soon as we can, this is causing
> soundwire regressions from rc6 onwards.
the revert also needs to happen in -stable. 6.10.8 is broken as well.
https://github.com/thesofproject/linux/issues/5168
These are a few patches broken out from [1]. Kalle requested to limit
the number of patches per series to approximately 12 and Francesco to
move the fixes to the front of the series, so here we go.
First two patches are fixes. First one is for host mlme support which
currently is in wireless-next, so no stable tag needed, second one has a
stable tag.
The remaining patches except the last one I have chosen to upstream
first. I'll continue with the other patches after having this series
in shape and merged.
The last one is a new patch not included in [1].
Sascha
[1] https://lore.kernel.org/all/20240820-mwifiex-cleanup-v1-0-320d8de4a4b7@peng…
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
---
Sascha Hauer (12):
wifi: mwifiex: add missing locking
wifi: mwifiex: fix MAC address handling
wifi: mwifiex: deduplicate code in mwifiex_cmd_tx_rate_cfg()
wifi: mwifiex: use adapter as context pointer for mwifiex_hs_activated_event()
wifi: mwifiex: drop unnecessary initialization
wifi: mwifiex: make region_code_mapping_t const
wifi: mwifiex: pass adapter to mwifiex_dnld_cmd_to_fw()
wifi: mwifiex: simplify mwifiex_setup_ht_caps()
wifi: mwifiex: fix indention
wifi: mwifiex: make locally used function static
wifi: mwifiex: move common settings out of switch/case
wifi: mwifiex: drop asynchronous init waiting code
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 38 ++++------
drivers/net/wireless/marvell/mwifiex/cfp.c | 4 +-
drivers/net/wireless/marvell/mwifiex/cmdevt.c | 76 +++++++-------------
drivers/net/wireless/marvell/mwifiex/init.c | 19 ++---
drivers/net/wireless/marvell/mwifiex/main.c | 94 +++++++++----------------
drivers/net/wireless/marvell/mwifiex/main.h | 16 ++---
drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 49 ++++---------
drivers/net/wireless/marvell/mwifiex/txrx.c | 3 +-
drivers/net/wireless/marvell/mwifiex/util.c | 22 +-----
drivers/net/wireless/marvell/mwifiex/wmm.c | 12 ++--
10 files changed, 105 insertions(+), 228 deletions(-)
---
base-commit: 67a72043aa2e6f60f7bbe7bfa598ba168f16d04f
change-id: 20240826-mwifiex-cleanup-1-b5035c7faff6
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
It's incorrect to assume that LBR can/should only be used with sampling
events. BPF subsystem provides bpf_get_branch_snapshot() BPF helper,
which expects a properly setup and activated perf event which allows
kernel to capture LBR data.
For instance, retsnoop tool ([0]) makes an extensive use of this
functionality and sets up perf event as follows:
struct perf_event_attr attr;
memset(&attr, 0, sizeof(attr));
attr.size = sizeof(attr);
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_CPU_CYCLES;
attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL;
Commit referenced in Fixes tag broke this setup by making invalid assumption
that LBR is useful only for sampling events. Remove that assumption.
Note, earlier we removed a similar assumption on AMD side of LBR support,
see [1] for details.
[0] https://github.com/anakryiko/retsnoop
[1] 9794563d4d05 ("perf/x86/amd: Don't reject non-sampling events with configured LBR")
Cc: stable(a)vger.kernel.org # 6.8+
Fixes: 85846b27072d ("perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag")
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
---
arch/x86/events/intel/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9e519d8a810a..f82a342b8852 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3972,7 +3972,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
x86_pmu.pebs_aliases(event);
}
- if (needs_branch_stack(event) && is_sampling_event(event))
+ if (needs_branch_stack(event))
event->hw.flags |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
if (branch_sample_counters(event)) {
--
2.43.5
On Mon, Sep 9, 2024 at 2:48 PM Sasha Levin <sashal(a)kernel.org> wrote:
> This is a note to let you know that I've just added the patch titled
>
> userfaultfd: fix checks for huge PMDs
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
Thanks for the backport!
Are you also doing the backport for older trees, or should someone
else take care of that?
The following commit has been merged into the core/debugobjects branch of tip:
Commit-ID: 684d28feb8546d1e9597aa363c3bfcf52fe250b7
Gitweb: https://git.kernel.org/tip/684d28feb8546d1e9597aa363c3bfcf52fe250b7
Author: Zhen Lei <thunder.leizhen(a)huawei.com>
AuthorDate: Wed, 04 Sep 2024 21:39:40 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 09 Sep 2024 16:40:25 +02:00
debugobjects: Fix conditions in fill_pool()
fill_pool() uses 'obj_pool_min_free' to decide whether objects should be
handed back to the kmem cache. But 'obj_pool_min_free' records the lowest
historical value of the number of objects in the object pool and not the
minimum number of objects which should be kept in the pool.
Use 'debug_objects_pool_min_level' instead, which holds the minimum number
which was scaled to the number of CPUs at boot time.
[ tglx: Massage change log ]
Fixes: d26bf5056fc0 ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()")
Fixes: 36c4ead6f6df ("debugobjects: Add global free list and the counter")
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com
---
lib/debugobjects.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 7226fdb..6329a86 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -142,13 +142,14 @@ static void fill_pool(void)
* READ_ONCE()s pair with the WRITE_ONCE()s in pool_lock critical
* sections.
*/
- while (READ_ONCE(obj_nr_tofree) && (READ_ONCE(obj_pool_free) < obj_pool_min_free)) {
+ while (READ_ONCE(obj_nr_tofree) &&
+ READ_ONCE(obj_pool_free) < debug_objects_pool_min_level) {
raw_spin_lock_irqsave(&pool_lock, flags);
/*
* Recheck with the lock held as the worker thread might have
* won the race and freed the global free list already.
*/
- while (obj_nr_tofree && (obj_pool_free < obj_pool_min_free)) {
+ while (obj_nr_tofree && (obj_pool_free < debug_objects_pool_min_level)) {
obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
hlist_del(&obj->node);
WRITE_ONCE(obj_nr_tofree, obj_nr_tofree - 1);
Two bitmasks in 'struct sdw_slave_prop' - 'source_ports' and
'sink_ports' - define which ports to program in
sdw_program_slave_port_params(). The masks are used to get the
appropriate data port properties ('struct sdw_get_slave_dpn_prop') from
an array.
Bitmasks can be non-continuous or can start from index different than 0,
thus when looking for matching port property for given port, we must
iterate over mask bits, not from 0 up to number of ports.
This fixes allocation and programming slave ports, when a source or sink
masks start from further index.
Fixes: f8101c74aa54 ("soundwire: Add Master and Slave port programming")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
drivers/soundwire/stream.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index 7aa4900dcf31..f275143d7b18 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -1291,18 +1291,18 @@ struct sdw_dpn_prop *sdw_get_slave_dpn_prop(struct sdw_slave *slave,
unsigned int port_num)
{
struct sdw_dpn_prop *dpn_prop;
- u8 num_ports;
+ unsigned long mask;
int i;
if (direction == SDW_DATA_DIR_TX) {
- num_ports = hweight32(slave->prop.source_ports);
+ mask = slave->prop.source_ports;
dpn_prop = slave->prop.src_dpn_prop;
} else {
- num_ports = hweight32(slave->prop.sink_ports);
+ mask = slave->prop.sink_ports;
dpn_prop = slave->prop.sink_dpn_prop;
}
- for (i = 0; i < num_ports; i++) {
+ for_each_set_bit(i, &mask, 32) {
if (dpn_prop[i].num == port_num)
return &dpn_prop[i];
}
--
2.43.0
This patch addresses an issue with improper reference count handling in the
ice_sriov_set_msix_vec_count() function.
First, the function calls ice_get_vf_by_id(), which increments the
reference count of the vf pointer. If the subsequent call to
ice_get_vf_vsi() fails, the function currently returns an error without
decrementing the reference count of the vf pointer, leading to a reference
count leak. The correct behavior, as implemented in this patch, is to
decrement the reference count using ice_put_vf(vf) before returning an
error when vsi is NULL.
Second, the function calls ice_sriov_get_irqs(), which sets
vf->first_vector_idx. If this call returns a negative value, indicating an
error, the function returns an error without decrementing the reference
count of the vf pointer, resulting in another reference count leak. The
patch addresses this by adding a call to ice_put_vf(vf) before returning
an error when vf->first_vector_idx < 0.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and identifying potential mismanagement of reference counts. In this case,
the tool flagged the missing decrement operation as a potential issue,
leading to this patch.
Fixes: 4035c72dc1ba ("ice: reconfig host after changing MSI-X on VF")
Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)outlook.com>
---
v2:
* In this patch v2, an additional resource leak was addressed when
vf->first_vector_idx < 0. The issue is now fixed by adding ice_put_vf(vf)
before returning an error.
Thanks to Simon Horman for identifying this additional leak scenario.
---
drivers/net/ethernet/intel/ice/ice_sriov.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 55ef33208456..fbf18ac97875 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1096,8 +1096,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
return -ENOENT;
vsi = ice_get_vf_vsi(vf);
- if (!vsi)
+ if (!vsi) {
+ ice_put_vf(vf);
return -ENOENT;
+ }
prev_msix = vf->num_msix;
prev_queues = vf->num_vf_qs;
@@ -1142,8 +1144,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
vf->num_msix = prev_msix;
vf->num_vf_qs = prev_queues;
vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
- if (vf->first_vector_idx < 0)
+ if (vf->first_vector_idx < 0) {
+ ice_put_vf(vf);
return -EINVAL;
+ }
if (needs_rebuild) {
ice_vf_reconfig_vsi(vf);
--
2.25.1
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Without the locking amdgpu currently can race
amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(), leading to the
latter accesing potentially inconsitent entity->sched_list and
entity->num_sched_list pair.
The comment on drm_sched_entity_modify_sched() however says:
"""
* Note that this must be called under the same common lock for @entity as
* drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver needs to
* guarantee through some other means that this is never called while new jobs
* can be pushed to @entity.
"""
It is unclear if that is referring to this race or something else.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
---
drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..ae8be30472cd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
{
WARN_ON(!num_sched_list || !sched_list);
+ spin_lock(&entity->rq_lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
+ spin_unlock(&entity->rq_lock);
}
EXPORT_SYMBOL(drm_sched_entity_modify_sched);
--
2.46.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9972605a238339b85bd16b084eed5f18414d22db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081211-owl-snowdrop-d2aa@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
9972605a2383 ("memcg: protect concurrent access to mem_cgroup_idr")
6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
7348cc91821b ("mm: multi-gen LRU: remove aging fairness safeguard")
a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
adb8213014b2 ("mm: memcg: fix stale protection of reclaim target memcg")
57e9cc50f4dd ("mm: vmscan: split khugepaged stats from direct reclaim stats")
e4fea72b1438 ("mglru: mm/vmscan.c: fix imprecise comments")
d396def5d86d ("memcg: rearrange code")
410f8e82689e ("memcg: extract memcg_vmstats from struct mem_cgroup")
d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
1332a809d95a ("mm: multi-gen LRU: thrashing prevention")
354ed5974429 ("mm: multi-gen LRU: kill switch")
f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs")
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap")
ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
ec1c86b25f4b ("mm: multi-gen LRU: groundwork")
f1e1a7be4718 ("mm/vmscan.c: refactor shrink_node()")
d3629af59f41 ("mm/vmscan: make the annotations of refaults code at the right place")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9972605a238339b85bd16b084eed5f18414d22db Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Date: Fri, 2 Aug 2024 16:58:22 -0700
Subject: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..f29157288b7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
#define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
static DEFINE_IDR(mem_cgroup_idr);
+static DEFINE_SPINLOCK(memcg_idr_lock);
+
+static int mem_cgroup_alloc_id(void)
+{
+ int ret;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&memcg_idr_lock);
+ ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
+ GFP_NOWAIT);
+ spin_unlock(&memcg_idr_lock);
+ idr_preload_end();
+ return ret;
+}
static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
{
if (memcg->id.id > 0) {
+ spin_lock(&memcg_idr_lock);
idr_remove(&mem_cgroup_idr, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
+
memcg->id.id = 0;
}
}
@@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
if (!memcg)
return ERR_PTR(error);
- memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
+ memcg->id.id = mem_cgroup_alloc_id();
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
@@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
* publish it here at the end of onlining. This matches the
* regular ID destruction during offlining.
*/
+ spin_lock(&memcg_idr_lock);
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
return 0;
offline_kmem:
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9972605a238339b85bd16b084eed5f18414d22db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081259-plow-freezing-a93e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
9972605a2383 ("memcg: protect concurrent access to mem_cgroup_idr")
6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
7348cc91821b ("mm: multi-gen LRU: remove aging fairness safeguard")
a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
adb8213014b2 ("mm: memcg: fix stale protection of reclaim target memcg")
57e9cc50f4dd ("mm: vmscan: split khugepaged stats from direct reclaim stats")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9972605a238339b85bd16b084eed5f18414d22db Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Date: Fri, 2 Aug 2024 16:58:22 -0700
Subject: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..f29157288b7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
#define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
static DEFINE_IDR(mem_cgroup_idr);
+static DEFINE_SPINLOCK(memcg_idr_lock);
+
+static int mem_cgroup_alloc_id(void)
+{
+ int ret;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&memcg_idr_lock);
+ ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
+ GFP_NOWAIT);
+ spin_unlock(&memcg_idr_lock);
+ idr_preload_end();
+ return ret;
+}
static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
{
if (memcg->id.id > 0) {
+ spin_lock(&memcg_idr_lock);
idr_remove(&mem_cgroup_idr, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
+
memcg->id.id = 0;
}
}
@@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
if (!memcg)
return ERR_PTR(error);
- memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
+ memcg->id.id = mem_cgroup_alloc_id();
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
@@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
* publish it here at the end of onlining. This matches the
* regular ID destruction during offlining.
*/
+ spin_lock(&memcg_idr_lock);
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
return 0;
offline_kmem:
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 853b1f96b1fd..cca1acf2e037 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1553,7 +1553,6 @@ struct btrfs_drop_extents_args {
struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
- bool fsync_skip_inode_lock;
};
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e23d178f9778..c8231677c79e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1534,13 +1534,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1549,13 +1542,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
err = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1795,7 +1785,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1805,7 +1794,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 970ff316069d..8b88446df36d 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -11,6 +11,12 @@
#include "delayed-ref.h"
#include "ctree.h"
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
enum btrfs_trans_state {
TRANS_STATE_RUNNING,
TRANS_STATE_COMMIT_START,
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 86c7f8ce1715..06333a74d6c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -445,7 +445,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 952cf145c629..15fd8c00f4c0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1543,13 +1543,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1558,13 +1551,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
err = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1796,7 +1786,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1806,7 +1795,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 238a0ab85df9..7623db359881 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -12,6 +12,12 @@
#include "ctree.h"
#include "misc.h"
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a56209d275c1..b2e4b30b8fae 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -457,7 +457,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ca434f0cd27f..66dfee873906 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1558,13 +1558,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1573,13 +1566,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1811,7 +1801,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
@@ -1821,7 +1810,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 4e451ab173b1..62ec85f4b777 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
--
2.43.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue
lets make sure to only derefernce the pointer once so both adding and
waking up are guaranteed to be consistent.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
---
drivers/gpu/drm/scheduler/sched_entity.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index ae8be30472cd..62b07ef7630a 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +610,15 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(rq->sched, entity);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
Memory access #VEs are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VEs so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is set to "send nasty #VEs", the kernel can dynamically
request that they be disabled. Once they are disabled, access to private
memory that is not in the Mapped state in the Secure-EPT (SEPT) will
result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
Cc: stable(a)vger.kernel.org
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++-------
arch/x86/include/asm/shared/tdx.h | 10 +++-
2 files changed, 69 insertions(+), 17 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 08ce488b54d0..f969f4f5ebf8 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -78,7 +78,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
}
/* Read TD-scoped metadata */
-static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+static inline u64 tdg_vm_rd(u64 field, u64 *value)
{
struct tdx_module_args args = {
.rdx = field,
@@ -193,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
+/*
+ * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
+ * that no #VE will be delivered for accesses to TD-private memory.
+ *
+ * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
+ * controls if the guest will receive such #VE with TD attribute
+ * ATTR_SEPT_VE_DISABLE.
+ *
+ * Newer TDX modules allow the guest to control if it wants to receive SEPT
+ * violation #VEs.
+ *
+ * Check if the feature is available and disable SEPT #VE if possible.
+ *
+ * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
+ * attribute is no longer reliable. It reflects the initial state of the
+ * control for the TD, but it will not be updated if someone (e.g. bootloader)
+ * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
+ * determine if SEPT #VEs are enabled or disabled.
+ */
+static void disable_sept_ve(u64 td_attr)
+{
+ const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
+ bool debug = td_attr & ATTR_DEBUG;
+ u64 config, controls;
+
+ /* Is this TD allowed to disable SEPT #VE */
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+ if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
+ /* No SEPT #VE controls for the guest: check the attribute */
+ if (td_attr & ATTR_SEPT_VE_DISABLE)
+ return;
+
+ /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
+ if (debug)
+ pr_warn("%s\n", msg);
+ else
+ tdx_panic(msg);
+ return;
+ }
+
+ /* Check if SEPT #VE has been disabled before us */
+ tdg_vm_rd(TDCS_TD_CTLS, &controls);
+ if (controls & TD_CTLS_PENDING_VE_DISABLE)
+ return;
+
+ /* Keep #VEs enabled for splats in debugging environments */
+ if (debug)
+ return;
+
+ /* Disable SEPT #VEs */
+ tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
+ TD_CTLS_PENDING_VE_DISABLE);
+
+ return;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -218,24 +274,12 @@ static void tdx_setup(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ td_attr = args.rdx;
+
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
- /*
- * The kernel can not handle #VE's when accessing normal kernel
- * memory. Ensure that no #VE will be delivered for accesses to
- * TD-private memory. Only VMM-shared memory (MMIO) will #VE.
- */
- td_attr = args.rdx;
- if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
- const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";
-
- /* Relax SEPT_VE_DISABLE check for debug TD. */
- if (td_attr & ATTR_DEBUG)
- pr_warn("%s\n", msg);
- else
- tdx_panic(msg);
- }
+ disable_sept_ve(td_attr);
}
/*
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 7e12cfa28bec..fecb2a6e864b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -19,9 +19,17 @@
#define TDG_VM_RD 7
#define TDG_VM_WR 8
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
+#define TDCS_CONFIG_FLAGS 0x1110000300000016
+#define TDCS_TD_CTLS 0x1110000300000017
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+/* TDCS_CONFIG_FLAGS bits */
+#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+
+/* TDCS_TD_CTLS bits */
+#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0)
+
/* TDX hypercall Leaf IDs */
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
--
2.45.2
Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES
there.
The function will be extended to adjust TD configuration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 64717a96a936..08ce488b54d0 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -193,7 +193,7 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
-static void tdx_parse_tdinfo(u64 *cc_mask)
+static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
unsigned int gpa_width;
@@ -218,6 +218,9 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+
/*
* The kernel can not handle #VE's when accessing normal kernel
* memory. Ensure that no #VE will be delivered for accesses to
@@ -964,11 +967,11 @@ void __init tdx_early_init(void)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
cc_vendor = CC_VENDOR_INTEL;
- tdx_parse_tdinfo(&cc_mask);
- cc_set_mask(cc_mask);
- /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+ /* Configure the TD */
+ tdx_setup(&cc_mask);
+
+ cc_set_mask(cc_mask);
/*
* All bits above GPA width are reserved and kernel treats shared bit
--
2.45.2
From: Jason Andryuk <jason.andryuk(a)amd.com>
Hi Arthur,
Can you give the patch below a try? If it works, please respond with a
Tested-by. I'll then submit it with your Reported-by and Tested-by.
Thanks,
Jason
[PATCH] fbdev/xen-fbfront: Assign fb_info->device
Probing xen-fbfront faults in video_is_primary_device(). The passed-in
struct device is NULL since xen-fbfront doesn't assign it and the
memory is kzalloc()-ed. Assign fb_info->device to avoid this.
This was exposed by the conversion of fb_is_primary_device() to
video_is_primary_device() which dropped a NULL check for struct device.
Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
CC: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
The other option would be to re-instate the NULL check in
video_is_primary_device()
---
drivers/video/fbdev/xen-fbfront.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c
index 66d4628a96ae..c90f48ebb15e 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -407,6 +407,7 @@ static int xenfb_probe(struct xenbus_device *dev,
/* complete the abuse: */
fb_info->pseudo_palette = fb_info->par;
fb_info->par = info;
+ fb_info->device = &dev->dev;
fb_info->screen_buffer = info->fb;
--
2.43.0