hi,
we see broken access to user space with bpf_probe_read/bpf_probe_read_str
helpers on arm64 with 5.4 kernel. The problem is that both helpers try to
read user memory by calling probe_kernel_read, which seems to work on x86
but fails on arm64.
There are several fixes after v5.4 to address this issue.
There was an attempt to fix that in past [1], but it deviated far from
upstream changes.
This patchset tries to follow the upstream changes with 2 notable exceptions:
1) bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
- this upsgream patch adds new helpers, which we don't need to do,
we just need following functions (and related helper's glue):
bpf_probe_read_kernel_common
bpf_probe_read_kernel_str_common
that implement bpf_probe_read* helpers and receive fix in next
patch below, ommiting any other hunks
2) bpf: rework the compat kernel probe handling
- taking only fixes for functions and realted helper's glue that we took
from patch above, ommiting any other hunks
It's possible to add new helpers and keep the patches closer to upstream,
but I thought trying this way first as RFC without adding any new helpers
into stable kernel, which might possibly end up later with additional fixes.
Also I'm sending this as RFC, because I might be missing some mm related
dependency change, that I'm not aware of.
We tested new kernel with our use case on arm64 and x86.
thanks,
jirka
[1] https://yhbt.net/lore/all/YHGMFzQlHomDtZYG@kroah.com/t/
---
Andrii Nakryiko (1):
bpf: bpf_probe_read_kernel_str() has to return amount of data read on success
Christoph Hellwig (4):
maccess: clarify kerneldoc comments
maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
bpf: rework the compat kernel probe handling
Daniel Borkmann (3):
uaccess: Add strict non-pagefault kernel-space read function
bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/maccess.c | 43 ++++++++++++++++++++++++++++++++++++++
include/linux/uaccess.h | 8 +++++--
init/Kconfig | 3 +++
kernel/trace/bpf_trace.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------
kernel/trace/trace_kprobe.c | 2 +-
mm/maccess.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-------
10 files changed, 207 insertions(+), 57 deletions(-)
create mode 100644 arch/x86/mm/maccess.c
---------- Forwarded message ---------
From: beld zhang
Date: Sun, May 28, 2023 at 2:07 PM
Subject: Re: 6.1.30: thunderbolt: Clear registers properly when auto
clear isn't in use cause call trace after resume
To: Mario Limonciello
On Sun, May 28, 2023 at 8:55 AM Mario Limonciello
<mario.limonciello(a)amd.com> wrote:
>
> This is specific resuming from s2idle, doesn't happen at boot?
>
> Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
>
> In addition to checking mainline, can you please attach a full dmesg to
> somewhere ephemeral like a kernel bugzilla with thunderbolt.dyndbg='+p'
> on the kernel command line set?
>
6.4-rc4:
*) test 1~4 was done with usb hub with ethernet plugged-in
model: UE330, usb 3.0 3-port hub & GIgabit Ether adapter
a rapoo wireless mouse in one of the ports
1) no crash at boot
until [169.099024]
2) no crash after plug an extra usb dock
from [297.004691]
3) no crash after remove it
from [373.273511]
4) crash after suspend/resume: 2 call-stacks
from [438.356253]
5) removed that hub(only ac-power left): NO crash after resume
from [551.820333]
6) plug in the hub(no mouse): NO crash after resume
from [1250.256607]
7) put on mouse: CRASH after resume
from [1311.400963]
mouse model: Rapoo Wireless Optical Mouse 1620
sorry I have no idea how to fill a proper bug report at kernel
bugzilla, hope these shared links work.
btw I have no TB devices to test.
dmesg:
https://drive.google.com/file/d/1bUWnV7q2ziM4tdTzmuGiVuvEzaLcdfKm/view?usp=…
config:
https://drive.google.com/file/d/1It75_AV5tOzfkXXBAX5zAiZMoeJAe0Au/view?usp=…
From: Liu Peibao <liupeibao(a)loongson.cn>
In DeviceTree path, when ht_vec_base is not zero, the hwirq of PCH PIC
will be assigned incorrectly. Because when pch_pic_domain_translate()
adds the ht_vec_base to hwirq, the hwirq does not have the ht_vec_base
subtracted when calling irq_domain_set_info().
The ht_vec_base is designed for the parent irq chip/domain of the PCH PIC.
It seems not proper to deal this in callbacks of the PCH PIC domain and
let's put this back like the initial commit ef8c01eb64ca ("irqchip: Add
Loongson PCH PIC controller").
Fixes: bcdd75c596c8 ("irqchip/loongson-pch-pic: Add ACPI init support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Huacai Chen <chenhuacai(a)loongson.cn>
Signed-off-by: Liu Peibao <liupeibao(a)loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin(a)loongson.cn>
---
drivers/irqchip/irq-loongson-pch-pic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/irqchip/irq-loongson-pch-pic.c b/drivers/irqchip/irq-loongson-pch-pic.c
index 921c5c0190d1..93a71f66efeb 100644
--- a/drivers/irqchip/irq-loongson-pch-pic.c
+++ b/drivers/irqchip/irq-loongson-pch-pic.c
@@ -164,7 +164,7 @@ static int pch_pic_domain_translate(struct irq_domain *d,
if (fwspec->param_count < 2)
return -EINVAL;
- *hwirq = fwspec->param[0] + priv->ht_vec_base;
+ *hwirq = fwspec->param[0];
*type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
} else {
if (fwspec->param_count < 1)
@@ -196,7 +196,7 @@ static int pch_pic_alloc(struct irq_domain *domain, unsigned int virq,
parent_fwspec.fwnode = domain->parent->fwnode;
parent_fwspec.param_count = 1;
- parent_fwspec.param[0] = hwirq;
+ parent_fwspec.param[0] = hwirq + priv->ht_vec_base;
err = irq_domain_alloc_irqs_parent(domain, virq, 1, &parent_fwspec);
if (err)
--
2.31.1
In an ACPI-based dual-bridge system, IRQ of each bridge's
PCH PIC sent to CPU is always a zero-based number, which
means that the IRQ on PCH PIC of each bridge is mapped into
vector range from 0 to 63 of upstream irqchip(e.g. EIOINTC).
EIOINTC N: [0 ... 63 | 64 ... 255]
-------- ----------
^ ^
| |
PCH PIC N |
PCH MSI N
For example, the IRQ vector number of sata controller on
PCH PIC of each bridge is 16, which is sent to upstream
irqchip of EIOINTC when an interrupt occurs, which will set
bit 16 of EIOINTC. Since hwirq of 16 on EIOINTC has been
mapped to a irq_desc for sata controller during hierarchy
irq allocation, the related mapped IRQ will be found through
irq_resolve_mapping() in the IRQ domain of EIOINTC.
So, the IRQ number set in HT vector register should be fixed
to be a zero-based number.
Cc: stable(a)vger.kernel.org
Reviewed-by: Huacai Chen <chenhuacai(a)loongson.cn>
Co-developed-by: liuyun <liuyun(a)loongson.cn>
Signed-off-by: liuyun <liuyun(a)loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin(a)loongson.cn>
---
drivers/irqchip/irq-loongson-pch-pic.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/irqchip/irq-loongson-pch-pic.c b/drivers/irqchip/irq-loongson-pch-pic.c
index e5fe4d50be05..921c5c0190d1 100644
--- a/drivers/irqchip/irq-loongson-pch-pic.c
+++ b/drivers/irqchip/irq-loongson-pch-pic.c
@@ -401,14 +401,12 @@ static int __init acpi_cascade_irqdomain_init(void)
int __init pch_pic_acpi_init(struct irq_domain *parent,
struct acpi_madt_bio_pic *acpi_pchpic)
{
- int ret, vec_base;
+ int ret;
struct fwnode_handle *domain_handle;
if (find_pch_pic(acpi_pchpic->gsi_base) >= 0)
return 0;
- vec_base = acpi_pchpic->gsi_base - GSI_MIN_PCH_IRQ;
-
domain_handle = irq_domain_alloc_fwnode(&acpi_pchpic->address);
if (!domain_handle) {
pr_err("Unable to allocate domain handle\n");
@@ -416,7 +414,7 @@ int __init pch_pic_acpi_init(struct irq_domain *parent,
}
ret = pch_pic_init(acpi_pchpic->address, acpi_pchpic->size,
- vec_base, parent, domain_handle, acpi_pchpic->gsi_base);
+ 0, parent, domain_handle, acpi_pchpic->gsi_base);
if (ret < 0) {
irq_domain_free_fwnode(domain_handle);
--
2.31.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 626d312028bec44209d0ecd5beaa9b1aa8945f7d Mon Sep 17 00:00:00 2001
From: Masahiro Honda <honda(a)mechatrax.com>
Date: Thu, 18 May 2023 20:08:16 +0900
Subject: iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY
flag
The Sigma-Delta ADCs supported by this driver can use SDO as an interrupt
line to indicate the completion of a conversion. However, some devices
cannot properly detect the completion of a conversion by an interrupt.
This is for the reason mentioned in the following commit.
commit e9849777d0e2 ("genirq: Add flag to force mask in
disable_irq[_nosync]()")
A read operation is performed by an extra interrupt before the completion
of a conversion. At this time, the value read from the ADC data register
is the same as the previous conversion result. This patch fixes the issue
by setting IRQ_DISABLE_UNLAZY flag.
Fixes: 0c6ef985a1fd ("iio: adc: ad7791: fix IRQ flags")
Fixes: 1a913270e57a ("iio: adc: ad7793: Fix IRQ flag")
Fixes: e081102f3077 ("iio: adc: ad7780: Fix IRQ flag")
Fixes: 89a86da5cb8e ("iio: adc: ad7192: Add IRQ flag")
Fixes: 79ef91493f54 ("iio: adc: ad7124: Set IRQ type to falling")
Signed-off-by: Masahiro Honda <honda(a)mechatrax.com>
Link: https://lore.kernel.org/r/20230518110816.248-1-honda@mechatrax.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad_sigma_delta.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c
index d8570f620785..7e2192870743 100644
--- a/drivers/iio/adc/ad_sigma_delta.c
+++ b/drivers/iio/adc/ad_sigma_delta.c
@@ -584,6 +584,10 @@ static int devm_ad_sd_probe_trigger(struct device *dev, struct iio_dev *indio_de
init_completion(&sigma_delta->completion);
sigma_delta->irq_dis = true;
+
+ /* the IRQ core clears IRQ_DISABLE_UNLAZY flag when freeing an IRQ */
+ irq_set_status_flags(sigma_delta->spi->irq, IRQ_DISABLE_UNLAZY);
+
ret = devm_request_irq(dev, sigma_delta->spi->irq,
ad_sd_data_rdy_trig_poll,
sigma_delta->info->irq_flags | IRQF_NO_AUTOEN,
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: imu: inv_icm42600: fix timestamp reset
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From bbaae0c79ebd49f61ad942a8bf9e12bfc7f821bb Mon Sep 17 00:00:00 2001
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Date: Tue, 9 May 2023 15:22:02 +0000
Subject: iio: imu: inv_icm42600: fix timestamp reset
Timestamp reset is not done in the correct place. It must be done
before enabling buffer. The reason is that interrupt timestamping
is always happening when the chip is on, even if the
corresponding sensor is off. When the sensor restarts, timestamp
is wrong if you don't do a reset first.
Fixes: ec74ae9fd37c ("iio: imu: inv_icm42600: add accurate timestamping")
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230509152202.245444-1-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
index 99576b2c171f..32d7f8364230 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
@@ -275,9 +275,14 @@ static int inv_icm42600_buffer_preenable(struct iio_dev *indio_dev)
{
struct inv_icm42600_state *st = iio_device_get_drvdata(indio_dev);
struct device *dev = regmap_get_device(st->map);
+ struct inv_icm42600_timestamp *ts = iio_priv(indio_dev);
pm_runtime_get_sync(dev);
+ mutex_lock(&st->lock);
+ inv_icm42600_timestamp_reset(ts);
+ mutex_unlock(&st->lock);
+
return 0;
}
@@ -375,7 +380,6 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
struct device *dev = regmap_get_device(st->map);
unsigned int sensor;
unsigned int *watermark;
- struct inv_icm42600_timestamp *ts;
struct inv_icm42600_sensor_conf conf = INV_ICM42600_SENSOR_CONF_INIT;
unsigned int sleep_temp = 0;
unsigned int sleep_sensor = 0;
@@ -385,11 +389,9 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
if (indio_dev == st->indio_gyro) {
sensor = INV_ICM42600_SENSOR_GYRO;
watermark = &st->fifo.watermark.gyro;
- ts = iio_priv(st->indio_gyro);
} else if (indio_dev == st->indio_accel) {
sensor = INV_ICM42600_SENSOR_ACCEL;
watermark = &st->fifo.watermark.accel;
- ts = iio_priv(st->indio_accel);
} else {
return -EINVAL;
}
@@ -417,8 +419,6 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
if (!st->fifo.on)
ret = inv_icm42600_set_temp_conf(st, false, &sleep_temp);
- inv_icm42600_timestamp_reset(ts);
-
out_unlock:
mutex_unlock(&st->lock);
--
2.40.1
This is a note to let you know that I've just added the patch titled
dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 55720d242052e860b9fde445e302e0425722e7f1 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas(a)glider.be>
Date: Tue, 9 May 2023 14:34:22 +0200
Subject: dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476
compatible value
The conversion to json-schema accidentally dropped the "ad" part prefix
from the compatible value.
Fixes: 8c41245872e2 ("dt-bindings:iio:adc:renesas,rcar-gyroadc: txt to yaml conversion.")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Reviewed-by: Marek Vasut <marek.vasut+renesas(a)mailbox.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Reviewed-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Link: https://lore.kernel.org/r/6b328a3f52657c20759f3a5bb2fe033d47644ba8.16836354…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
.../devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml b/Documentation/devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml
index 1c7aee5ed3e0..36dff3250ea7 100644
--- a/Documentation/devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml
+++ b/Documentation/devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml
@@ -90,7 +90,7 @@ patternProperties:
of the MAX chips to the GyroADC, while MISO line of each Maxim
ADC connects to a shared input pin of the GyroADC.
enum:
- - adi,7476
+ - adi,ad7476
- fujitsu,mb88101a
- maxim,max1162
- maxim,max11100
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: dac: mcp4725: Fix i2c_master_send() return value handling
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 09d3bec7009186bdba77039df01e5834788b3f95 Mon Sep 17 00:00:00 2001
From: Marek Vasut <marex(a)denx.de>
Date: Thu, 11 May 2023 02:43:30 +0200
Subject: iio: dac: mcp4725: Fix i2c_master_send() return value handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The i2c_master_send() returns number of sent bytes on success,
or negative on error. The suspend/resume callbacks expect zero
on success and non-zero on error. Adapt the return value of the
i2c_master_send() to the expectation of the suspend and resume
callbacks, including proper validation of the return value.
Fixes: cf35ad61aca2 ("iio: add mcp4725 I2C DAC driver")
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230511004330.206942-1-marex@denx.de
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/dac/mcp4725.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/dac/mcp4725.c b/drivers/iio/dac/mcp4725.c
index 46bf758760f8..3f5661a3718f 100644
--- a/drivers/iio/dac/mcp4725.c
+++ b/drivers/iio/dac/mcp4725.c
@@ -47,12 +47,18 @@ static int mcp4725_suspend(struct device *dev)
struct mcp4725_data *data = iio_priv(i2c_get_clientdata(
to_i2c_client(dev)));
u8 outbuf[2];
+ int ret;
outbuf[0] = (data->powerdown_mode + 1) << 4;
outbuf[1] = 0;
data->powerdown = true;
- return i2c_master_send(data->client, outbuf, 2);
+ ret = i2c_master_send(data->client, outbuf, 2);
+ if (ret < 0)
+ return ret;
+ else if (ret != 2)
+ return -EIO;
+ return 0;
}
static int mcp4725_resume(struct device *dev)
@@ -60,13 +66,19 @@ static int mcp4725_resume(struct device *dev)
struct mcp4725_data *data = iio_priv(i2c_get_clientdata(
to_i2c_client(dev)));
u8 outbuf[2];
+ int ret;
/* restore previous DAC value */
outbuf[0] = (data->dac_value >> 8) & 0xf;
outbuf[1] = data->dac_value & 0xff;
data->powerdown = false;
- return i2c_master_send(data->client, outbuf, 2);
+ ret = i2c_master_send(data->client, outbuf, 2);
+ if (ret < 0)
+ return ret;
+ else if (ret != 2)
+ return -EIO;
+ return 0;
}
static DEFINE_SIMPLE_DEV_PM_OPS(mcp4725_pm_ops, mcp4725_suspend,
mcp4725_resume);
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: accel: kx022a fix irq getting
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 56cd3d1c5c5b073a1a444eafdcf97d4d866d351a Mon Sep 17 00:00:00 2001
From: Matti Vaittinen <mazziesaccount(a)gmail.com>
Date: Fri, 12 May 2023 10:53:41 +0300
Subject: iio: accel: kx022a fix irq getting
The fwnode_irq_get_byname() was returning 0 at device-tree mapping
error. If this occurred, the KX022A driver did abort the probe but
errorneously directly returned the return value from
fwnode_irq_get_byname() from probe. In case of a device-tree mapping
error this indicated success.
The fwnode_irq_get_byname() has since been fixed to not return zero on
error so the check for fwnode_irq_get_byname() can be relaxed to only
treat negative values as errors. This will also do decent fix even when
backported to branches where fwnode_irq_get_byname() can still return
zero on error because KX022A probe should later fail at IRQ requesting
and a prober error handling should follow.
Relax the return value check for fwnode_irq_get_byname() to only treat
negative values as errors.
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <error27(a)gmail.com>
Closes: https://lore.kernel.org/r/202305110245.MFxC9bUj-lkp@intel.com/
Link: https://lore.kernel.org/r/202305110245.MFxC9bUj-lkp@intel.com/
Signed-off-by: Matti Vaittinen <mazziesaccount(a)gmail.com>
Fixes: 7c1d1677b322 ("iio: accel: Support Kionix/ROHM KX022A accelerometer")
Link: https://lore.kernel.org/r/b45b4b638db109c6078d243252df3a7b0485f7d5.16838753…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/kionix-kx022a.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/accel/kionix-kx022a.c b/drivers/iio/accel/kionix-kx022a.c
index f98393d74666..b8636fa8eaeb 100644
--- a/drivers/iio/accel/kionix-kx022a.c
+++ b/drivers/iio/accel/kionix-kx022a.c
@@ -1048,7 +1048,7 @@ int kx022a_probe_internal(struct device *dev)
data->ien_reg = KX022A_REG_INC4;
} else {
irq = fwnode_irq_get_byname(fwnode, "INT2");
- if (irq <= 0)
+ if (irq < 0)
return dev_err_probe(dev, irq, "No suitable IRQ\n");
data->inc_reg = KX022A_REG_INC5;
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: dac: build ad5758 driver when AD5758 is selected
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From a146eccb68be161ae9eab5f3f68bb0ed7c0fbaa8 Mon Sep 17 00:00:00 2001
From: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
Date: Mon, 8 May 2023 06:02:08 +0200
Subject: iio: dac: build ad5758 driver when AD5758 is selected
Commit 28d1a7ac2a0d ("iio: dac: Add AD5758 support") adds the config AD5758
and the corresponding driver ad5758.c. In the Makefile, the ad5758 driver
is however included when AD5755 is selected, not when AD5758 is selected.
Probably, this was simply a mistake that happened by copy-and-paste and
forgetting to adjust the actual line. Surprisingly, no one has ever noticed
that this driver is actually only included when AD5755 is selected and that
the config AD5758 has actually no effect on the build.
Fixes: 28d1a7ac2a0d ("iio: dac: Add AD5758 support")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
Link: https://lore.kernel.org/r/20230508040208.12033-1-lukas.bulwahn@gmail.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/dac/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/dac/Makefile b/drivers/iio/dac/Makefile
index 6c74fea21736..addd97a78838 100644
--- a/drivers/iio/dac/Makefile
+++ b/drivers/iio/dac/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_AD5592R_BASE) += ad5592r-base.o
obj-$(CONFIG_AD5592R) += ad5592r.o
obj-$(CONFIG_AD5593R) += ad5593r.o
obj-$(CONFIG_AD5755) += ad5755.o
-obj-$(CONFIG_AD5755) += ad5758.o
+obj-$(CONFIG_AD5758) += ad5758.o
obj-$(CONFIG_AD5761) += ad5761.o
obj-$(CONFIG_AD5764) += ad5764.o
obj-$(CONFIG_AD5766) += ad5766.o
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: addac: ad74413: fix resistance input processing
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 24febc99ca725dcf42d57168a2f4e8a75a5ade92 Mon Sep 17 00:00:00 2001
From: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Date: Wed, 3 May 2023 11:58:17 +0200
Subject: iio: addac: ad74413: fix resistance input processing
On success, ad74413r_get_single_adc_result() returns IIO_VAL_INT aka
1. So currently, the IIO_CHAN_INFO_PROCESSED case is effectively
equivalent to the IIO_CHAN_INFO_RAW case, and we never call
ad74413r_adc_to_resistance_result() to convert the adc measurement to
ohms.
Check ret for being negative rather than non-zero.
Fixes: fea251b6a5dbd (iio: addac: add AD74413R driver)
Signed-off-by: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://lore.kernel.org/r/20230503095817.452551-1-linux@rasmusvillemoes.dk
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/addac/ad74413r.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/addac/ad74413r.c b/drivers/iio/addac/ad74413r.c
index 07e9f6ae16a8..e3366cf5eb31 100644
--- a/drivers/iio/addac/ad74413r.c
+++ b/drivers/iio/addac/ad74413r.c
@@ -1007,7 +1007,7 @@ static int ad74413r_read_raw(struct iio_dev *indio_dev,
ret = ad74413r_get_single_adc_result(indio_dev, chan->channel,
val);
- if (ret)
+ if (ret < 0)
return ret;
ad74413r_adc_to_resistance_result(*val, val);
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: light: vcnl4035: fixed chip ID check
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From a551c26e8e568fad42120843521529241b9bceec Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li(a)nxp.com>
Date: Mon, 1 May 2023 10:36:04 -0400
Subject: iio: light: vcnl4035: fixed chip ID check
VCNL4035 register(0xE) ID_L and ID_M define as:
ID_L: 0x80
ID_H: 7:6 (0:0)
5:4 (0:0) slave address = 0x60 (7-bit)
(0:1) slave address = 0x51 (7-bit)
(1:0) slave address = 0x40 (7-bit)
(1:0) slave address = 0x41 (7-bit)
3:0 Version code default (0:0:0:0)
So just check ID_L.
Fixes: 55707294c4eb ("iio: light: Add support for vishay vcnl4035")
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Link: https://lore.kernel.org/r/20230501143605.1615549-1-Frank.Li@nxp.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/light/vcnl4035.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/iio/light/vcnl4035.c b/drivers/iio/light/vcnl4035.c
index 14e29330e972..94f5d611e98c 100644
--- a/drivers/iio/light/vcnl4035.c
+++ b/drivers/iio/light/vcnl4035.c
@@ -8,6 +8,7 @@
* TODO: Proximity
*/
#include <linux/bitops.h>
+#include <linux/bitfield.h>
#include <linux/i2c.h>
#include <linux/module.h>
#include <linux/pm_runtime.h>
@@ -42,6 +43,7 @@
#define VCNL4035_ALS_PERS_MASK GENMASK(3, 2)
#define VCNL4035_INT_ALS_IF_H_MASK BIT(12)
#define VCNL4035_INT_ALS_IF_L_MASK BIT(13)
+#define VCNL4035_DEV_ID_MASK GENMASK(7, 0)
/* Default values */
#define VCNL4035_MODE_ALS_ENABLE BIT(0)
@@ -413,6 +415,7 @@ static int vcnl4035_init(struct vcnl4035_data *data)
return ret;
}
+ id = FIELD_GET(VCNL4035_DEV_ID_MASK, id);
if (id != VCNL4035_DEV_ID_VAL) {
dev_err(&data->client->dev, "Wrong id, got %x, expected %x\n",
id, VCNL4035_DEV_ID_VAL);
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: adc: stm32-adc: skip adc-channels setup if none is present
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 3e27ef0ced49f8ae7883c25fadf76a2086e99025 Mon Sep 17 00:00:00 2001
From: Sean Nyekjaer <sean(a)geanix.com>
Date: Wed, 3 May 2023 18:20:29 +0200
Subject: iio: adc: stm32-adc: skip adc-channels setup if none is present
If only adc differential channels are defined driver will fail with
stm32-adc: probe of 48003000.adc:adc@0 failed with error -22
Fix this by skipping the initialization if no channels are defined.
This applies only to the legacy way of initializing adc channels.
Fixes: d7705f35448a ("iio: adc: stm32-adc: convert to device properties")
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
Reviewed-by: Olivier Moysan <olivier.moysan(a)foss.st.com>
Link: https://lore.kernel.org/r/20230503162029.3654093-2-sean@geanix.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/stm32-adc.c | 42 ++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
index c105d82c3e45..bd7e2408bf28 100644
--- a/drivers/iio/adc/stm32-adc.c
+++ b/drivers/iio/adc/stm32-adc.c
@@ -2036,6 +2036,7 @@ static int stm32_adc_legacy_chan_init(struct iio_dev *indio_dev,
struct stm32_adc_diff_channel diff[STM32_ADC_CH_MAX];
struct device *dev = &indio_dev->dev;
u32 num_diff = adc->num_diff;
+ int num_se = nchans - num_diff;
int size = num_diff * sizeof(*diff) / sizeof(u32);
int scan_index = 0, ret, i, c;
u32 smp = 0, smps[STM32_ADC_CH_MAX], chans[STM32_ADC_CH_MAX];
@@ -2062,29 +2063,32 @@ static int stm32_adc_legacy_chan_init(struct iio_dev *indio_dev,
scan_index++;
}
}
-
- ret = device_property_read_u32_array(dev, "st,adc-channels", chans,
- nchans);
- if (ret)
- return ret;
-
- for (c = 0; c < nchans; c++) {
- if (chans[c] >= adc_info->max_channels) {
- dev_err(&indio_dev->dev, "Invalid channel %d\n",
- chans[c]);
- return -EINVAL;
+ if (num_se > 0) {
+ ret = device_property_read_u32_array(dev, "st,adc-channels", chans, num_se);
+ if (ret) {
+ dev_err(&indio_dev->dev, "Failed to get st,adc-channels %d\n", ret);
+ return ret;
}
- /* Channel can't be configured both as single-ended & diff */
- for (i = 0; i < num_diff; i++) {
- if (chans[c] == diff[i].vinp) {
- dev_err(&indio_dev->dev, "channel %d misconfigured\n", chans[c]);
+ for (c = 0; c < num_se; c++) {
+ if (chans[c] >= adc_info->max_channels) {
+ dev_err(&indio_dev->dev, "Invalid channel %d\n",
+ chans[c]);
return -EINVAL;
}
+
+ /* Channel can't be configured both as single-ended & diff */
+ for (i = 0; i < num_diff; i++) {
+ if (chans[c] == diff[i].vinp) {
+ dev_err(&indio_dev->dev, "channel %d misconfigured\n",
+ chans[c]);
+ return -EINVAL;
+ }
+ }
+ stm32_adc_chan_init_one(indio_dev, &channels[scan_index],
+ chans[c], 0, scan_index, false);
+ scan_index++;
}
- stm32_adc_chan_init_one(indio_dev, &channels[scan_index],
- chans[c], 0, scan_index, false);
- scan_index++;
}
if (adc->nsmps > 0) {
@@ -2305,7 +2309,7 @@ static int stm32_adc_chan_fw_init(struct iio_dev *indio_dev, bool timestamping)
if (legacy)
ret = stm32_adc_legacy_chan_init(indio_dev, adc, channels,
- num_channels);
+ timestamping ? num_channels - 1 : num_channels);
else
ret = stm32_adc_generic_chan_init(indio_dev, adc, channels);
if (ret < 0)
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: adc: stm32-adc: skip adc-diff-channels setup if none is present
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 9c0d6ccd7d6bbd275e390b55a3390b4274291d95 Mon Sep 17 00:00:00 2001
From: Sean Nyekjaer <sean(a)geanix.com>
Date: Wed, 3 May 2023 18:20:28 +0200
Subject: iio: adc: stm32-adc: skip adc-diff-channels setup if none is present
If no adc differential channels are defined driver will fail with EINVAL:
stm32-adc: probe of 48003000.adc:adc@0 failed with error -22
Fix this by skipping the initialization if no channels are defined.
This applies only to the legacy way of initializing adc channels.
Fixes: d7705f35448a ("iio: adc: stm32-adc: convert to device properties")
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
Link: https://lore.kernel.org/r/20230503162029.3654093-1-sean@geanix.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/stm32-adc.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
index 1aadb2ad2cab..c105d82c3e45 100644
--- a/drivers/iio/adc/stm32-adc.c
+++ b/drivers/iio/adc/stm32-adc.c
@@ -2006,16 +2006,15 @@ static int stm32_adc_get_legacy_chan_count(struct iio_dev *indio_dev, struct stm
* to get the *real* number of channels.
*/
ret = device_property_count_u32(dev, "st,adc-diff-channels");
- if (ret < 0)
- return ret;
-
- ret /= (int)(sizeof(struct stm32_adc_diff_channel) / sizeof(u32));
- if (ret > adc_info->max_channels) {
- dev_err(&indio_dev->dev, "Bad st,adc-diff-channels?\n");
- return -EINVAL;
- } else if (ret > 0) {
- adc->num_diff = ret;
- num_channels += ret;
+ if (ret > 0) {
+ ret /= (int)(sizeof(struct stm32_adc_diff_channel) / sizeof(u32));
+ if (ret > adc_info->max_channels) {
+ dev_err(&indio_dev->dev, "Bad st,adc-diff-channels?\n");
+ return -EINVAL;
+ } else if (ret > 0) {
+ adc->num_diff = ret;
+ num_channels += ret;
+ }
}
/* Optional sample time is provided either for each, or all channels */
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad7192: Change "shorted" channels to differential
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From e55245d115bb9054cb72cdd5dda5660f4484873a Mon Sep 17 00:00:00 2001
From: Paul Cercueil <paul(a)crapouillou.net>
Date: Thu, 30 Mar 2023 12:21:00 +0200
Subject: iio: adc: ad7192: Change "shorted" channels to differential
The AD7192 provides a specific channel configuration where both negative
and positive inputs are connected to AIN2. This was represented in the
ad7192 driver as a IIO channel with .channel = 2 and .extended_name set
to "shorted".
The problem with this approach, is that the driver provided two IIO
channels with the identifier .channel = 2; one "shorted" and the other
not. This goes against the IIO ABI, as a channel identifier should be
unique.
Address this issue by changing "shorted" channels to being differential
instead, with channel 2 vs. itself, as we're actually measuring AIN2 vs.
itself.
Note that the fix tag is for the commit that moved the driver out of
staging. The bug existed before that, but backporting would become very
complex further down and unlikely to happen.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
Co-developed-by: Alisa Roman <alisa.roman(a)analog.com>
Signed-off-by: Alisa Roman <alisa.roman(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://lore.kernel.org/r/20230330102100.17590-1-paul@crapouillou.net
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7192.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index 55a6ab591016..99bb604b78c8 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -897,10 +897,6 @@ static const struct iio_info ad7195_info = {
__AD719x_CHANNEL(_si, _channel1, -1, _address, NULL, IIO_VOLTAGE, \
BIT(IIO_CHAN_INFO_SCALE), ad7192_calibsys_ext_info)
-#define AD719x_SHORTED_CHANNEL(_si, _channel1, _address) \
- __AD719x_CHANNEL(_si, _channel1, -1, _address, "shorted", IIO_VOLTAGE, \
- BIT(IIO_CHAN_INFO_SCALE), ad7192_calibsys_ext_info)
-
#define AD719x_TEMP_CHANNEL(_si, _address) \
__AD719x_CHANNEL(_si, 0, -1, _address, NULL, IIO_TEMP, 0, NULL)
@@ -908,7 +904,7 @@ static const struct iio_chan_spec ad7192_channels[] = {
AD719x_DIFF_CHANNEL(0, 1, 2, AD7192_CH_AIN1P_AIN2M),
AD719x_DIFF_CHANNEL(1, 3, 4, AD7192_CH_AIN3P_AIN4M),
AD719x_TEMP_CHANNEL(2, AD7192_CH_TEMP),
- AD719x_SHORTED_CHANNEL(3, 2, AD7192_CH_AIN2P_AIN2M),
+ AD719x_DIFF_CHANNEL(3, 2, 2, AD7192_CH_AIN2P_AIN2M),
AD719x_CHANNEL(4, 1, AD7192_CH_AIN1),
AD719x_CHANNEL(5, 2, AD7192_CH_AIN2),
AD719x_CHANNEL(6, 3, AD7192_CH_AIN3),
@@ -922,7 +918,7 @@ static const struct iio_chan_spec ad7193_channels[] = {
AD719x_DIFF_CHANNEL(2, 5, 6, AD7193_CH_AIN5P_AIN6M),
AD719x_DIFF_CHANNEL(3, 7, 8, AD7193_CH_AIN7P_AIN8M),
AD719x_TEMP_CHANNEL(4, AD7193_CH_TEMP),
- AD719x_SHORTED_CHANNEL(5, 2, AD7193_CH_AIN2P_AIN2M),
+ AD719x_DIFF_CHANNEL(5, 2, 2, AD7193_CH_AIN2P_AIN2M),
AD719x_CHANNEL(6, 1, AD7193_CH_AIN1),
AD719x_CHANNEL(7, 2, AD7193_CH_AIN2),
AD719x_CHANNEL(8, 3, AD7193_CH_AIN3),
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: accel: st_accel: Fix invalid mount_matrix on devices without
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 79b8ded9d9c595db9bd5b2f62f5f738b36de1e22 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Sun, 16 Apr 2023 23:24:09 +0200
Subject: iio: accel: st_accel: Fix invalid mount_matrix on devices without
ACPI _ONT method
When apply_acpi_orientation() fails, st_accel_common_probe() will fall back
to iio_read_mount_matrix(), which checks for a mount-matrix device property
and if that is not set falls back to the identity matrix.
But when a sensor has no ACPI companion fwnode, or when the ACPI fwnode
does not have a "_ONT" method apply_acpi_orientation() was returning 0,
causing iio_read_mount_matrix() to never get called resulting in an
invalid mount_matrix:
[root@fedora ~]# cat /sys/bus/iio/devices/iio\:device0/mount_matrix
(null), (null), (null); (null), (null), (null); (null), (null), (null)
Fix this by making apply_acpi_orientation() always return an error when
it did not set the mount_matrix.
Fixes: 3d8ad94bb175 ("iio: accel: st_sensors: Support generic mounting matrix")
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Tested-by: Marius Hoch <mail(a)mariushoch.de>
Link: https://lore.kernel.org/r/20230416212409.310936-1-hdegoede@redhat.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/st_accel_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c
index 5f7d81b44b1d..282e539157f8 100644
--- a/drivers/iio/accel/st_accel_core.c
+++ b/drivers/iio/accel/st_accel_core.c
@@ -1291,12 +1291,12 @@ static int apply_acpi_orientation(struct iio_dev *indio_dev)
adev = ACPI_COMPANION(indio_dev->dev.parent);
if (!adev)
- return 0;
+ return -ENXIO;
/* Read _ONT data, which should be a package of 6 integers. */
status = acpi_evaluate_object(adev->handle, "_ONT", NULL, &buffer);
if (status == AE_NOT_FOUND) {
- return 0;
+ return -ENXIO;
} else if (ACPI_FAILURE(status)) {
dev_warn(&indio_dev->dev, "failed to execute _ONT: %d\n",
status);
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: adc: mxs-lradc: fix the order of two cleanup operations
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 27b2ed5b6d53cd62fc61c3f259ae52f5cac23b66 Mon Sep 17 00:00:00 2001
From: Jiakai Luo <jkluo(a)hust.edu.cn>
Date: Sat, 22 Apr 2023 06:34:06 -0700
Subject: iio: adc: mxs-lradc: fix the order of two cleanup operations
Smatch reports:
drivers/iio/adc/mxs-lradc-adc.c:766 mxs_lradc_adc_probe() warn:
missing unwind goto?
the order of three init operation:
1.mxs_lradc_adc_trigger_init
2.iio_triggered_buffer_setup
3.mxs_lradc_adc_hw_init
thus, the order of three cleanup operation should be:
1.mxs_lradc_adc_hw_stop
2.iio_triggered_buffer_cleanup
3.mxs_lradc_adc_trigger_remove
we exchange the order of two cleanup operations,
introducing the following differences:
1.if mxs_lradc_adc_trigger_init fails, returns directly;
2.if trigger_init succeeds but iio_triggered_buffer_setup fails,
goto err_trig and remove the trigger.
In addition, we also reorder the unwind that goes on in the
remove() callback to match the new ordering.
Fixes: 6dd112b9f85e ("iio: adc: mxs-lradc: Add support for ADC driver")
Signed-off-by: Jiakai Luo <jkluo(a)hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91(a)hust.edu.cn>
Link: https://lore.kernel.org/r/20230422133407.72908-1-jkluo@hust.edu.cn
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/mxs-lradc-adc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/iio/adc/mxs-lradc-adc.c b/drivers/iio/adc/mxs-lradc-adc.c
index bca79a93cbe4..a50f39143d3e 100644
--- a/drivers/iio/adc/mxs-lradc-adc.c
+++ b/drivers/iio/adc/mxs-lradc-adc.c
@@ -757,13 +757,13 @@ static int mxs_lradc_adc_probe(struct platform_device *pdev)
ret = mxs_lradc_adc_trigger_init(iio);
if (ret)
- goto err_trig;
+ return ret;
ret = iio_triggered_buffer_setup(iio, &iio_pollfunc_store_time,
&mxs_lradc_adc_trigger_handler,
&mxs_lradc_adc_buffer_ops);
if (ret)
- return ret;
+ goto err_trig;
adc->vref_mv = mxs_lradc_adc_vref_mv[lradc->soc];
@@ -801,9 +801,9 @@ static int mxs_lradc_adc_probe(struct platform_device *pdev)
err_dev:
mxs_lradc_adc_hw_stop(adc);
- mxs_lradc_adc_trigger_remove(iio);
-err_trig:
iio_triggered_buffer_cleanup(iio);
+err_trig:
+ mxs_lradc_adc_trigger_remove(iio);
return ret;
}
@@ -814,8 +814,8 @@ static int mxs_lradc_adc_remove(struct platform_device *pdev)
iio_device_unregister(iio);
mxs_lradc_adc_hw_stop(adc);
- mxs_lradc_adc_trigger_remove(iio);
iio_triggered_buffer_cleanup(iio);
+ mxs_lradc_adc_trigger_remove(iio);
return 0;
}
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: tmag5273: Fix runtime PM leak on measurement error
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 265c82ea8b172129cb6d4eff41af856c3aff6168 Mon Sep 17 00:00:00 2001
From: Lars-Peter Clausen <lars(a)metafoo.de>
Date: Thu, 13 Apr 2023 18:37:52 -0700
Subject: iio: tmag5273: Fix runtime PM leak on measurement error
The tmag5273 gets a runtime PM reference before reading a measurement and
releases it when done. But if the measurement fails the tmag5273_read_raw()
function exits before releasing the reference.
Make sure that this error path also releases the runtime PM reference.
Fixes: 866a1389174b ("iio: magnetometer: add ti tmag5273 driver")
Signed-off-by: Lars-Peter Clausen <lars(a)metafoo.de>
Acked-by: Gerald Loacker <gerald.loacker(a)wolfvision.net>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://lore.kernel.org/r/20230414013752.498767-1-lars@metafoo.de
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/magnetometer/tmag5273.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/magnetometer/tmag5273.c b/drivers/iio/magnetometer/tmag5273.c
index 28bb7efe8df8..e155a75b3cd2 100644
--- a/drivers/iio/magnetometer/tmag5273.c
+++ b/drivers/iio/magnetometer/tmag5273.c
@@ -296,12 +296,13 @@ static int tmag5273_read_raw(struct iio_dev *indio_dev,
return ret;
ret = tmag5273_get_measure(data, &t, &x, &y, &z, &angle, &magnitude);
- if (ret)
- return ret;
pm_runtime_mark_last_busy(data->dev);
pm_runtime_put_autosuspend(data->dev);
+ if (ret)
+ return ret;
+
switch (chan->address) {
case TEMPERATURE:
*val = t;
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: ad4130: Make sure clock provider gets removed
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 28f73ded19d403697f87473c9b85a27eb8ed9cf2 Mon Sep 17 00:00:00 2001
From: Lars-Peter Clausen <lars(a)metafoo.de>
Date: Fri, 14 Apr 2023 08:07:02 -0700
Subject: iio: ad4130: Make sure clock provider gets removed
The ad4130 driver registers a clock provider, but never removes it. This
leaves a stale clock provider behind that references freed clocks when the
device is unbound.
Register a managed action to remove the clock provider when the device is
removed.
Fixes: 62094060cf3a ("iio: adc: ad4130: add AD4130 driver")
Signed-off-by: Lars-Peter Clausen <lars(a)metafoo.de>
Link: https://lore.kernel.org/r/20230414150702.518441-1-lars@metafoo.de
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad4130.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad4130.c b/drivers/iio/adc/ad4130.c
index 38394341fd6e..5a5dd5e87ffc 100644
--- a/drivers/iio/adc/ad4130.c
+++ b/drivers/iio/adc/ad4130.c
@@ -1817,6 +1817,11 @@ static const struct clk_ops ad4130_int_clk_ops = {
.unprepare = ad4130_int_clk_unprepare,
};
+static void ad4130_clk_del_provider(void *of_node)
+{
+ of_clk_del_provider(of_node);
+}
+
static int ad4130_setup_int_clk(struct ad4130_state *st)
{
struct device *dev = &st->spi->dev;
@@ -1824,6 +1829,7 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
struct clk_init_data init;
const char *clk_name;
struct clk *clk;
+ int ret;
if (st->int_pin_sel == AD4130_INT_PIN_CLK ||
st->mclk_sel != AD4130_MCLK_76_8KHZ)
@@ -1843,7 +1849,11 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
if (IS_ERR(clk))
return PTR_ERR(clk);
- return of_clk_add_provider(of_node, of_clk_src_simple_get, clk);
+ ret = of_clk_add_provider(of_node, of_clk_src_simple_get, clk);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(dev, ad4130_clk_del_provider, of_node);
}
static int ad4130_setup(struct iio_dev *indio_dev)
--
2.40.1
This is a note to let you know that I've just added the patch titled
iio: adc: mt6370: Fix ibus and ibat scaling value of some specific
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 00ffdd6fa90298522d45ca0c348b23485584dcdc Mon Sep 17 00:00:00 2001
From: ChiaEn Wu <chiaen_wu(a)richtek.com>
Date: Mon, 10 Apr 2023 18:34:22 +0800
Subject: iio: adc: mt6370: Fix ibus and ibat scaling value of some specific
vendor ID chips
The scale value of ibus and ibat on the datasheet is incorrect due to the
customer report after the experimentation with some specific vendor ID
chips.
Fixes: c1404d1b659f ("iio: adc: mt6370: Add MediaTek MT6370 support")
Signed-off-by: ChiaEn Wu <chiaen_wu(a)richtek.com>
Reviewed-by: Alexandre Mergnat <amergnat(a)baylibre.com>
Link: https://lore.kernel.org/r/1681122862-1994-1-git-send-email-chiaen_wu@richte…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/mt6370-adc.c | 53 ++++++++++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/mt6370-adc.c b/drivers/iio/adc/mt6370-adc.c
index bc62e5a9d50d..0bc112135bca 100644
--- a/drivers/iio/adc/mt6370-adc.c
+++ b/drivers/iio/adc/mt6370-adc.c
@@ -19,6 +19,7 @@
#include <dt-bindings/iio/adc/mediatek,mt6370_adc.h>
+#define MT6370_REG_DEV_INFO 0x100
#define MT6370_REG_CHG_CTRL3 0x113
#define MT6370_REG_CHG_CTRL7 0x117
#define MT6370_REG_CHG_ADC 0x121
@@ -27,6 +28,7 @@
#define MT6370_ADC_START_MASK BIT(0)
#define MT6370_ADC_IN_SEL_MASK GENMASK(7, 4)
#define MT6370_AICR_ICHG_MASK GENMASK(7, 2)
+#define MT6370_VENID_MASK GENMASK(7, 4)
#define MT6370_AICR_100_mA 0x0
#define MT6370_AICR_150_mA 0x1
@@ -47,6 +49,10 @@
#define ADC_CONV_TIME_MS 35
#define ADC_CONV_POLLING_TIME_US 1000
+#define MT6370_VID_RT5081 0x8
+#define MT6370_VID_RT5081A 0xA
+#define MT6370_VID_MT6370 0xE
+
struct mt6370_adc_data {
struct device *dev;
struct regmap *regmap;
@@ -55,6 +61,7 @@ struct mt6370_adc_data {
* from being read at the same time.
*/
struct mutex adc_lock;
+ unsigned int vid;
};
static int mt6370_adc_read_channel(struct mt6370_adc_data *priv, int chan,
@@ -98,6 +105,30 @@ static int mt6370_adc_read_channel(struct mt6370_adc_data *priv, int chan,
return ret;
}
+static int mt6370_adc_get_ibus_scale(struct mt6370_adc_data *priv)
+{
+ switch (priv->vid) {
+ case MT6370_VID_RT5081:
+ case MT6370_VID_RT5081A:
+ case MT6370_VID_MT6370:
+ return 3350;
+ default:
+ return 3875;
+ }
+}
+
+static int mt6370_adc_get_ibat_scale(struct mt6370_adc_data *priv)
+{
+ switch (priv->vid) {
+ case MT6370_VID_RT5081:
+ case MT6370_VID_RT5081A:
+ case MT6370_VID_MT6370:
+ return 2680;
+ default:
+ return 3870;
+ }
+}
+
static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
int chan, int *val1, int *val2)
{
@@ -123,7 +154,7 @@ static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
case MT6370_AICR_250_mA:
case MT6370_AICR_300_mA:
case MT6370_AICR_350_mA:
- *val1 = 3350;
+ *val1 = mt6370_adc_get_ibus_scale(priv);
break;
default:
*val1 = 5000;
@@ -150,7 +181,7 @@ static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
case MT6370_ICHG_600_mA:
case MT6370_ICHG_700_mA:
case MT6370_ICHG_800_mA:
- *val1 = 2680;
+ *val1 = mt6370_adc_get_ibat_scale(priv);
break;
default:
*val1 = 5000;
@@ -251,6 +282,20 @@ static const struct iio_chan_spec mt6370_adc_channels[] = {
MT6370_ADC_CHAN(TEMP_JC, IIO_TEMP, 12, BIT(IIO_CHAN_INFO_OFFSET)),
};
+static int mt6370_get_vendor_info(struct mt6370_adc_data *priv)
+{
+ unsigned int dev_info;
+ int ret;
+
+ ret = regmap_read(priv->regmap, MT6370_REG_DEV_INFO, &dev_info);
+ if (ret)
+ return ret;
+
+ priv->vid = FIELD_GET(MT6370_VENID_MASK, dev_info);
+
+ return 0;
+}
+
static int mt6370_adc_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -272,6 +317,10 @@ static int mt6370_adc_probe(struct platform_device *pdev)
priv->regmap = regmap;
mutex_init(&priv->adc_lock);
+ ret = mt6370_get_vendor_info(priv);
+ if (ret)
+ return dev_err_probe(dev, ret, "Failed to get vid\n");
+
ret = regmap_write(priv->regmap, MT6370_REG_CHG_ADC, 0);
if (ret)
return dev_err_probe(dev, ret, "Failed to reset ADC\n");
--
2.40.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x afbed3f74830163f9559579dee382cac3cff82da
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052832-bunkbed-probable-b8e8@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
afbed3f74830 ("net/mlx5e: do as little as possible in napi poll when budget is 0")
214baf22870c ("net/mlx5e: Support HTB offload")
1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
145e5637d941 ("net/mlx5e: Add TX PTP port object support")
1a7f51240dfb ("net/mlx5e: Split SW group counters update function")
0b676aaecc25 ("net/mlx5e: Change skb fifo push/pop API to be used without SQ")
579524c6eace ("net/mlx5e: Validate stop_room size upon user input")
3180472f582b ("net/mlx5: Add functions to set/query MFRL register")
573a8095f68c ("Merge tag 'mlx5-updates-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From afbed3f74830163f9559579dee382cac3cff82da Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba(a)kernel.org>
Date: Tue, 16 May 2023 18:59:35 -0700
Subject: [PATCH] net/mlx5e: do as little as possible in napi poll when budget
is 0
NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.
Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.
In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.
The immediate stack trace I was seeing is:
__do_softirq+0xd1/0x2c0
__local_bh_enable_ip+0xc7/0x120
</IRQ>
<TASK>
page_pool_put_defragged_page+0x267/0x320
mlx5e_free_xdpsq_desc+0x99/0xd0
mlx5e_poll_xdpsq_cq+0x138/0x3b0
mlx5e_napi_poll+0xc3/0x8b0
netpoll_poll_dev+0xce/0x150
AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.
Reviewed-by: Tariq Toukan <tariqt(a)nvidia.com>
Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index a50bfda18e96..fbb2d963fb7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -161,20 +161,22 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
}
}
+ /* budget=0 means we may be in IRQ context, do as little as possible */
+ if (unlikely(!budget))
+ goto out;
+
busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
if (c->xdp)
busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq);
- if (likely(budget)) { /* budget=0 means: don't poll rx rings */
- if (xsk_open)
- work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
+ if (xsk_open)
+ work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
- if (likely(budget - work_done))
- work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
+ if (likely(budget - work_done))
+ work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
- busy |= work_done == budget;
- }
+ busy |= work_done == budget;
mlx5e_poll_ico_cq(&c->icosq.cq);
if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 71460c9ec5c743e9ffffca3c874d66267c36345e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052856-propeller-drainpipe-465e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
71460c9ec5c7 ("net: phy: mscc: enable VSC8501/2 RGMII RX clock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71460c9ec5c743e9ffffca3c874d66267c36345e Mon Sep 17 00:00:00 2001
From: David Epping <david.epping(a)missinglinkelectronics.com>
Date: Tue, 23 May 2023 17:31:08 +0200
Subject: [PATCH] net: phy: mscc: enable VSC8501/2 RGMII RX clock
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping(a)missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h
index 79cbb2418664..defe5cc6d4fc 100644
--- a/drivers/net/phy/mscc/mscc.h
+++ b/drivers/net/phy/mscc/mscc.h
@@ -179,6 +179,7 @@ enum rgmii_clock_delay {
#define VSC8502_RGMII_CNTL 20
#define VSC8502_RGMII_RX_DELAY_MASK 0x0070
#define VSC8502_RGMII_TX_DELAY_MASK 0x0007
+#define VSC8502_RGMII_RX_CLK_DISABLE 0x0800
#define MSCC_PHY_WOL_LOWER_MAC_ADDR 21
#define MSCC_PHY_WOL_MID_MAC_ADDR 22
diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 0c39b3ecb1f2..28df8a2e4230 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -519,14 +519,27 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
* * 2.0 ns (which causes the data to be sampled at exactly half way between
* clock transitions at 1000 Mbps) if delays should be enabled
*/
-static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
- u16 rgmii_rx_delay_mask,
- u16 rgmii_tx_delay_mask)
+static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl,
+ u16 rgmii_rx_delay_mask,
+ u16 rgmii_tx_delay_mask)
{
u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1;
u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1;
u16 reg_val = 0;
- int rc;
+ u16 mask = 0;
+ int rc = 0;
+
+ /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit
+ * to be unset for all PHY modes, so do that as part of the paged
+ * register modification.
+ * For some family members (like VSC8530/31/40/41) this bit is reserved
+ * and read-only, and the RX clock is enabled by default.
+ */
+ if (rgmii_cntl == VSC8502_RGMII_CNTL)
+ mask |= VSC8502_RGMII_RX_CLK_DISABLE;
+
+ if (phy_interface_is_rgmii(phydev))
+ mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
@@ -535,29 +548,20 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
- rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
- rgmii_cntl,
- rgmii_rx_delay_mask | rgmii_tx_delay_mask,
- reg_val);
+ if (mask)
+ rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
+ rgmii_cntl, mask, reg_val);
return rc;
}
static int vsc85xx_default_config(struct phy_device *phydev)
{
- int rc;
-
phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
- if (phy_interface_mode_is_rgmii(phydev->interface)) {
- rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL,
- VSC8502_RGMII_RX_DELAY_MASK,
- VSC8502_RGMII_TX_DELAY_MASK);
- if (rc)
- return rc;
- }
-
- return 0;
+ return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL,
+ VSC8502_RGMII_RX_DELAY_MASK,
+ VSC8502_RGMII_TX_DELAY_MASK);
}
static int vsc85xx_get_tunable(struct phy_device *phydev,
@@ -1754,13 +1758,11 @@ static int vsc8584_config_init(struct phy_device *phydev)
if (ret)
return ret;
- if (phy_interface_is_rgmii(phydev)) {
- ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL,
- VSC8572_RGMII_RX_DELAY_MASK,
- VSC8572_RGMII_TX_DELAY_MASK);
- if (ret)
- return ret;
- }
+ ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL,
+ VSC8572_RGMII_RX_DELAY_MASK,
+ VSC8572_RGMII_TX_DELAY_MASK);
+ if (ret)
+ return ret;
ret = genphy_soft_reset(phydev);
if (ret)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 71460c9ec5c743e9ffffca3c874d66267c36345e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052856-economy-footwork-7159@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
71460c9ec5c7 ("net: phy: mscc: enable VSC8501/2 RGMII RX clock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71460c9ec5c743e9ffffca3c874d66267c36345e Mon Sep 17 00:00:00 2001
From: David Epping <david.epping(a)missinglinkelectronics.com>
Date: Tue, 23 May 2023 17:31:08 +0200
Subject: [PATCH] net: phy: mscc: enable VSC8501/2 RGMII RX clock
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping(a)missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h
index 79cbb2418664..defe5cc6d4fc 100644
--- a/drivers/net/phy/mscc/mscc.h
+++ b/drivers/net/phy/mscc/mscc.h
@@ -179,6 +179,7 @@ enum rgmii_clock_delay {
#define VSC8502_RGMII_CNTL 20
#define VSC8502_RGMII_RX_DELAY_MASK 0x0070
#define VSC8502_RGMII_TX_DELAY_MASK 0x0007
+#define VSC8502_RGMII_RX_CLK_DISABLE 0x0800
#define MSCC_PHY_WOL_LOWER_MAC_ADDR 21
#define MSCC_PHY_WOL_MID_MAC_ADDR 22
diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 0c39b3ecb1f2..28df8a2e4230 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -519,14 +519,27 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
* * 2.0 ns (which causes the data to be sampled at exactly half way between
* clock transitions at 1000 Mbps) if delays should be enabled
*/
-static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
- u16 rgmii_rx_delay_mask,
- u16 rgmii_tx_delay_mask)
+static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl,
+ u16 rgmii_rx_delay_mask,
+ u16 rgmii_tx_delay_mask)
{
u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1;
u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1;
u16 reg_val = 0;
- int rc;
+ u16 mask = 0;
+ int rc = 0;
+
+ /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit
+ * to be unset for all PHY modes, so do that as part of the paged
+ * register modification.
+ * For some family members (like VSC8530/31/40/41) this bit is reserved
+ * and read-only, and the RX clock is enabled by default.
+ */
+ if (rgmii_cntl == VSC8502_RGMII_CNTL)
+ mask |= VSC8502_RGMII_RX_CLK_DISABLE;
+
+ if (phy_interface_is_rgmii(phydev))
+ mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
@@ -535,29 +548,20 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
- rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
- rgmii_cntl,
- rgmii_rx_delay_mask | rgmii_tx_delay_mask,
- reg_val);
+ if (mask)
+ rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
+ rgmii_cntl, mask, reg_val);
return rc;
}
static int vsc85xx_default_config(struct phy_device *phydev)
{
- int rc;
-
phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
- if (phy_interface_mode_is_rgmii(phydev->interface)) {
- rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL,
- VSC8502_RGMII_RX_DELAY_MASK,
- VSC8502_RGMII_TX_DELAY_MASK);
- if (rc)
- return rc;
- }
-
- return 0;
+ return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL,
+ VSC8502_RGMII_RX_DELAY_MASK,
+ VSC8502_RGMII_TX_DELAY_MASK);
}
static int vsc85xx_get_tunable(struct phy_device *phydev,
@@ -1754,13 +1758,11 @@ static int vsc8584_config_init(struct phy_device *phydev)
if (ret)
return ret;
- if (phy_interface_is_rgmii(phydev)) {
- ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL,
- VSC8572_RGMII_RX_DELAY_MASK,
- VSC8572_RGMII_TX_DELAY_MASK);
- if (ret)
- return ret;
- }
+ ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL,
+ VSC8572_RGMII_RX_DELAY_MASK,
+ VSC8572_RGMII_TX_DELAY_MASK);
+ if (ret)
+ return ret;
ret = genphy_soft_reset(phydev);
if (ret)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 71460c9ec5c743e9ffffca3c874d66267c36345e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052855-kindling-naturist-d30b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
71460c9ec5c7 ("net: phy: mscc: enable VSC8501/2 RGMII RX clock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71460c9ec5c743e9ffffca3c874d66267c36345e Mon Sep 17 00:00:00 2001
From: David Epping <david.epping(a)missinglinkelectronics.com>
Date: Tue, 23 May 2023 17:31:08 +0200
Subject: [PATCH] net: phy: mscc: enable VSC8501/2 RGMII RX clock
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping(a)missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h
index 79cbb2418664..defe5cc6d4fc 100644
--- a/drivers/net/phy/mscc/mscc.h
+++ b/drivers/net/phy/mscc/mscc.h
@@ -179,6 +179,7 @@ enum rgmii_clock_delay {
#define VSC8502_RGMII_CNTL 20
#define VSC8502_RGMII_RX_DELAY_MASK 0x0070
#define VSC8502_RGMII_TX_DELAY_MASK 0x0007
+#define VSC8502_RGMII_RX_CLK_DISABLE 0x0800
#define MSCC_PHY_WOL_LOWER_MAC_ADDR 21
#define MSCC_PHY_WOL_MID_MAC_ADDR 22
diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 0c39b3ecb1f2..28df8a2e4230 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -519,14 +519,27 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
* * 2.0 ns (which causes the data to be sampled at exactly half way between
* clock transitions at 1000 Mbps) if delays should be enabled
*/
-static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
- u16 rgmii_rx_delay_mask,
- u16 rgmii_tx_delay_mask)
+static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl,
+ u16 rgmii_rx_delay_mask,
+ u16 rgmii_tx_delay_mask)
{
u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1;
u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1;
u16 reg_val = 0;
- int rc;
+ u16 mask = 0;
+ int rc = 0;
+
+ /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit
+ * to be unset for all PHY modes, so do that as part of the paged
+ * register modification.
+ * For some family members (like VSC8530/31/40/41) this bit is reserved
+ * and read-only, and the RX clock is enabled by default.
+ */
+ if (rgmii_cntl == VSC8502_RGMII_CNTL)
+ mask |= VSC8502_RGMII_RX_CLK_DISABLE;
+
+ if (phy_interface_is_rgmii(phydev))
+ mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
@@ -535,29 +548,20 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
- rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
- rgmii_cntl,
- rgmii_rx_delay_mask | rgmii_tx_delay_mask,
- reg_val);
+ if (mask)
+ rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
+ rgmii_cntl, mask, reg_val);
return rc;
}
static int vsc85xx_default_config(struct phy_device *phydev)
{
- int rc;
-
phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
- if (phy_interface_mode_is_rgmii(phydev->interface)) {
- rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL,
- VSC8502_RGMII_RX_DELAY_MASK,
- VSC8502_RGMII_TX_DELAY_MASK);
- if (rc)
- return rc;
- }
-
- return 0;
+ return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL,
+ VSC8502_RGMII_RX_DELAY_MASK,
+ VSC8502_RGMII_TX_DELAY_MASK);
}
static int vsc85xx_get_tunable(struct phy_device *phydev,
@@ -1754,13 +1758,11 @@ static int vsc8584_config_init(struct phy_device *phydev)
if (ret)
return ret;
- if (phy_interface_is_rgmii(phydev)) {
- ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL,
- VSC8572_RGMII_RX_DELAY_MASK,
- VSC8572_RGMII_TX_DELAY_MASK);
- if (ret)
- return ret;
- }
+ ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL,
+ VSC8572_RGMII_RX_DELAY_MASK,
+ VSC8572_RGMII_TX_DELAY_MASK);
+ if (ret)
+ return ret;
ret = genphy_soft_reset(phydev);
if (ret)
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x 71460c9ec5c743e9ffffca3c874d66267c36345e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052855-unsold-dimple-67d9@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
71460c9ec5c7 ("net: phy: mscc: enable VSC8501/2 RGMII RX clock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71460c9ec5c743e9ffffca3c874d66267c36345e Mon Sep 17 00:00:00 2001
From: David Epping <david.epping(a)missinglinkelectronics.com>
Date: Tue, 23 May 2023 17:31:08 +0200
Subject: [PATCH] net: phy: mscc: enable VSC8501/2 RGMII RX clock
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping(a)missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h
index 79cbb2418664..defe5cc6d4fc 100644
--- a/drivers/net/phy/mscc/mscc.h
+++ b/drivers/net/phy/mscc/mscc.h
@@ -179,6 +179,7 @@ enum rgmii_clock_delay {
#define VSC8502_RGMII_CNTL 20
#define VSC8502_RGMII_RX_DELAY_MASK 0x0070
#define VSC8502_RGMII_TX_DELAY_MASK 0x0007
+#define VSC8502_RGMII_RX_CLK_DISABLE 0x0800
#define MSCC_PHY_WOL_LOWER_MAC_ADDR 21
#define MSCC_PHY_WOL_MID_MAC_ADDR 22
diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 0c39b3ecb1f2..28df8a2e4230 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -519,14 +519,27 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
* * 2.0 ns (which causes the data to be sampled at exactly half way between
* clock transitions at 1000 Mbps) if delays should be enabled
*/
-static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
- u16 rgmii_rx_delay_mask,
- u16 rgmii_tx_delay_mask)
+static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl,
+ u16 rgmii_rx_delay_mask,
+ u16 rgmii_tx_delay_mask)
{
u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1;
u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1;
u16 reg_val = 0;
- int rc;
+ u16 mask = 0;
+ int rc = 0;
+
+ /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit
+ * to be unset for all PHY modes, so do that as part of the paged
+ * register modification.
+ * For some family members (like VSC8530/31/40/41) this bit is reserved
+ * and read-only, and the RX clock is enabled by default.
+ */
+ if (rgmii_cntl == VSC8502_RGMII_CNTL)
+ mask |= VSC8502_RGMII_RX_CLK_DISABLE;
+
+ if (phy_interface_is_rgmii(phydev))
+ mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
@@ -535,29 +548,20 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
- rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
- rgmii_cntl,
- rgmii_rx_delay_mask | rgmii_tx_delay_mask,
- reg_val);
+ if (mask)
+ rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
+ rgmii_cntl, mask, reg_val);
return rc;
}
static int vsc85xx_default_config(struct phy_device *phydev)
{
- int rc;
-
phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
- if (phy_interface_mode_is_rgmii(phydev->interface)) {
- rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL,
- VSC8502_RGMII_RX_DELAY_MASK,
- VSC8502_RGMII_TX_DELAY_MASK);
- if (rc)
- return rc;
- }
-
- return 0;
+ return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL,
+ VSC8502_RGMII_RX_DELAY_MASK,
+ VSC8502_RGMII_TX_DELAY_MASK);
}
static int vsc85xx_get_tunable(struct phy_device *phydev,
@@ -1754,13 +1758,11 @@ static int vsc8584_config_init(struct phy_device *phydev)
if (ret)
return ret;
- if (phy_interface_is_rgmii(phydev)) {
- ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL,
- VSC8572_RGMII_RX_DELAY_MASK,
- VSC8572_RGMII_TX_DELAY_MASK);
- if (ret)
- return ret;
- }
+ ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL,
+ VSC8572_RGMII_RX_DELAY_MASK,
+ VSC8572_RGMII_TX_DELAY_MASK);
+ if (ret)
+ return ret;
ret = genphy_soft_reset(phydev);
if (ret)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 368d3cb406cdd074d1df2ad9ec06d1bfcb664882
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052820-treachery-paper-0d3a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
368d3cb406cd ("page_pool: fix inconsistency for page_pool_ring_[un]lock()")
542bcea4be86 ("net: page_pool: use in_softirq() instead")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 368d3cb406cdd074d1df2ad9ec06d1bfcb664882 Mon Sep 17 00:00:00 2001
From: Yunsheng Lin <linyunsheng(a)huawei.com>
Date: Mon, 22 May 2023 11:17:14 +0800
Subject: [PATCH] page_pool: fix inconsistency for page_pool_ring_[un]lock()
page_pool_ring_[un]lock() use in_softirq() to decide which
spin lock variant to use, and when they are called in the
context with in_softirq() being false, spin_lock_bh() is
called in page_pool_ring_lock() while spin_unlock() is
called in page_pool_ring_unlock(), because spin_lock_bh()
has disabled the softirq in page_pool_ring_lock(), which
causes inconsistency for spin lock pair calling.
This patch fixes it by returning in_softirq state from
page_pool_producer_lock(), and use it to decide which
spin lock variant to use in page_pool_producer_unlock().
As pool->ring has both producer and consumer lock, so
rename it to page_pool_producer_[un]lock() to reflect
the actual usage. Also move them to page_pool.c as they
are only used there, and remove the 'inline' as the
compiler may have better idea to do inlining or not.
Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring")
Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c8ec2f34722b..126f9e294389 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -399,22 +399,4 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
page_pool_update_nid(pool, new_nid);
}
-static inline void page_pool_ring_lock(struct page_pool *pool)
- __acquires(&pool->ring.producer_lock)
-{
- if (in_softirq())
- spin_lock(&pool->ring.producer_lock);
- else
- spin_lock_bh(&pool->ring.producer_lock);
-}
-
-static inline void page_pool_ring_unlock(struct page_pool *pool)
- __releases(&pool->ring.producer_lock)
-{
- if (in_softirq())
- spin_unlock(&pool->ring.producer_lock);
- else
- spin_unlock_bh(&pool->ring.producer_lock);
-}
-
#endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e212e9d7edcb..a3e12a61d456 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -134,6 +134,29 @@ EXPORT_SYMBOL(page_pool_ethtool_stats_get);
#define recycle_stat_add(pool, __stat, val)
#endif
+static bool page_pool_producer_lock(struct page_pool *pool)
+ __acquires(&pool->ring.producer_lock)
+{
+ bool in_softirq = in_softirq();
+
+ if (in_softirq)
+ spin_lock(&pool->ring.producer_lock);
+ else
+ spin_lock_bh(&pool->ring.producer_lock);
+
+ return in_softirq;
+}
+
+static void page_pool_producer_unlock(struct page_pool *pool,
+ bool in_softirq)
+ __releases(&pool->ring.producer_lock)
+{
+ if (in_softirq)
+ spin_unlock(&pool->ring.producer_lock);
+ else
+ spin_unlock_bh(&pool->ring.producer_lock);
+}
+
static int page_pool_init(struct page_pool *pool,
const struct page_pool_params *params)
{
@@ -617,6 +640,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
int count)
{
int i, bulk_len = 0;
+ bool in_softirq;
for (i = 0; i < count; i++) {
struct page *page = virt_to_head_page(data[i]);
@@ -635,7 +659,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
return;
/* Bulk producer into ptr_ring page_pool cache */
- page_pool_ring_lock(pool);
+ in_softirq = page_pool_producer_lock(pool);
for (i = 0; i < bulk_len; i++) {
if (__ptr_ring_produce(&pool->ring, data[i])) {
/* ring full */
@@ -644,7 +668,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
}
}
recycle_stat_add(pool, ring, i);
- page_pool_ring_unlock(pool);
+ page_pool_producer_unlock(pool, in_softirq);
/* Hopefully all pages was return into ptr_ring */
if (likely(i == bulk_len))
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2be5bd42a5bba1a05daedc86cf0e248210009669
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052830-jeep-cod-05b2@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2be5bd42a5bb ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
c9668f0b1d28 ("net/mlx5e: Fix cleanup null-ptr deref on encap lock")
d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter")
0e414518d6d8 ("net/mlx5e: Add hairpin debugfs files")
1a8034720f38 ("net/mlx5e: Add hairpin params structure")
8facc02f22f1 ("net/mlx5e: TC, reuse flow attribute post parser processing")
f2bb566f5c97 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2be5bd42a5bba1a05daedc86cf0e248210009669 Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Mon, 20 Mar 2023 13:07:53 +0200
Subject: [PATCH] net/mlx5: Handle pairing of E-switch via uplink un/load APIs
In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.
The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.
[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.
[ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[ 157.964662 ] #PF: supervisor read access in kernel mode
[ 157.965123 ] #PF: error_code(0x0000) - not-present page
[ 157.965582 ] PGD 0 P4D 0
[ 157.965866 ] Oops: 0000 [#1] SMP
[ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[ 157.976164 ] Call Trace:
[ 157.976437 ] <TASK>
[ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core]
[ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core]
[ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core]
[ 157.988549 ] pci_device_remove+0x31/0xa0
[ 157.988933 ] device_release_driver_internal+0x18f/0x1f0
[ 157.989402 ] driver_detach+0x3f/0x80
[ 157.989754 ] bus_remove_driver+0x70/0xf0
[ 157.990129 ] pci_unregister_driver+0x34/0x90
[ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core]
[ 157.990972 ] __x64_sys_delete_module+0x15a/0x250
[ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110
[ 157.991840 ] do_syscall_64+0x3d/0x90
[ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Reviewed-by: Roi Dayan <roid(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 728b82ce4031..65fe40f55d84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5301,6 +5301,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
goto err_action_counter;
}
+ mlx5_esw_offloads_devcom_init(esw);
+
return 0;
err_action_counter:
@@ -5329,7 +5331,7 @@ void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
priv = netdev_priv(rpriv->netdev);
esw = priv->mdev->priv.eswitch;
- mlx5e_tc_clean_fdb_peer_flows(esw);
+ mlx5_esw_offloads_devcom_cleanup(esw);
mlx5e_tc_tun_cleanup(uplink_priv->encap);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1a042c981713..9f007c5438ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -369,6 +369,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs);
void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf);
void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw);
void mlx5_eswitch_disable(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw);
int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
u16 vport, const u8 *mac);
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
@@ -767,6 +769,8 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; }
static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) {}
static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) {}
static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; }
static inline
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw, u16 vport, int link_state) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 69215ffb9999..7c34c7cf506f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2779,7 +2779,7 @@ static int mlx5_esw_offloads_devcom_event(int event,
return err;
}
-static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -2802,7 +2802,7 @@ static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
ESW_OFFLOADS_DEVCOM_PAIR, esw);
}
-static void esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -3250,8 +3250,6 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
if (err)
goto err_vports;
- esw_offloads_devcom_init(esw);
-
return 0;
err_vports:
@@ -3292,7 +3290,6 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
void esw_offloads_disable(struct mlx5_eswitch *esw)
{
- esw_offloads_devcom_cleanup(esw);
mlx5_eswitch_disable_pf_vf_vports(esw);
esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK);
esw_set_passing_vport_metadata(esw, false);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2be5bd42a5bba1a05daedc86cf0e248210009669
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052828-hatchet-drier-8e0e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2be5bd42a5bb ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
c9668f0b1d28 ("net/mlx5e: Fix cleanup null-ptr deref on encap lock")
d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter")
0e414518d6d8 ("net/mlx5e: Add hairpin debugfs files")
1a8034720f38 ("net/mlx5e: Add hairpin params structure")
8facc02f22f1 ("net/mlx5e: TC, reuse flow attribute post parser processing")
f2bb566f5c97 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2be5bd42a5bba1a05daedc86cf0e248210009669 Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Mon, 20 Mar 2023 13:07:53 +0200
Subject: [PATCH] net/mlx5: Handle pairing of E-switch via uplink un/load APIs
In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.
The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.
[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.
[ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[ 157.964662 ] #PF: supervisor read access in kernel mode
[ 157.965123 ] #PF: error_code(0x0000) - not-present page
[ 157.965582 ] PGD 0 P4D 0
[ 157.965866 ] Oops: 0000 [#1] SMP
[ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[ 157.976164 ] Call Trace:
[ 157.976437 ] <TASK>
[ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core]
[ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core]
[ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core]
[ 157.988549 ] pci_device_remove+0x31/0xa0
[ 157.988933 ] device_release_driver_internal+0x18f/0x1f0
[ 157.989402 ] driver_detach+0x3f/0x80
[ 157.989754 ] bus_remove_driver+0x70/0xf0
[ 157.990129 ] pci_unregister_driver+0x34/0x90
[ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core]
[ 157.990972 ] __x64_sys_delete_module+0x15a/0x250
[ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110
[ 157.991840 ] do_syscall_64+0x3d/0x90
[ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Reviewed-by: Roi Dayan <roid(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 728b82ce4031..65fe40f55d84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5301,6 +5301,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
goto err_action_counter;
}
+ mlx5_esw_offloads_devcom_init(esw);
+
return 0;
err_action_counter:
@@ -5329,7 +5331,7 @@ void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
priv = netdev_priv(rpriv->netdev);
esw = priv->mdev->priv.eswitch;
- mlx5e_tc_clean_fdb_peer_flows(esw);
+ mlx5_esw_offloads_devcom_cleanup(esw);
mlx5e_tc_tun_cleanup(uplink_priv->encap);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1a042c981713..9f007c5438ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -369,6 +369,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs);
void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf);
void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw);
void mlx5_eswitch_disable(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw);
int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
u16 vport, const u8 *mac);
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
@@ -767,6 +769,8 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; }
static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) {}
static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) {}
static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; }
static inline
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw, u16 vport, int link_state) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 69215ffb9999..7c34c7cf506f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2779,7 +2779,7 @@ static int mlx5_esw_offloads_devcom_event(int event,
return err;
}
-static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -2802,7 +2802,7 @@ static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
ESW_OFFLOADS_DEVCOM_PAIR, esw);
}
-static void esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -3250,8 +3250,6 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
if (err)
goto err_vports;
- esw_offloads_devcom_init(esw);
-
return 0;
err_vports:
@@ -3292,7 +3290,6 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
void esw_offloads_disable(struct mlx5_eswitch *esw)
{
- esw_offloads_devcom_cleanup(esw);
mlx5_eswitch_disable_pf_vf_vports(esw);
esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK);
esw_set_passing_vport_metadata(esw, false);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2be5bd42a5bba1a05daedc86cf0e248210009669
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052826-curve-data-272d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2be5bd42a5bb ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
c9668f0b1d28 ("net/mlx5e: Fix cleanup null-ptr deref on encap lock")
d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter")
0e414518d6d8 ("net/mlx5e: Add hairpin debugfs files")
1a8034720f38 ("net/mlx5e: Add hairpin params structure")
8facc02f22f1 ("net/mlx5e: TC, reuse flow attribute post parser processing")
f2bb566f5c97 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2be5bd42a5bba1a05daedc86cf0e248210009669 Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Mon, 20 Mar 2023 13:07:53 +0200
Subject: [PATCH] net/mlx5: Handle pairing of E-switch via uplink un/load APIs
In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.
The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.
[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.
[ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[ 157.964662 ] #PF: supervisor read access in kernel mode
[ 157.965123 ] #PF: error_code(0x0000) - not-present page
[ 157.965582 ] PGD 0 P4D 0
[ 157.965866 ] Oops: 0000 [#1] SMP
[ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[ 157.976164 ] Call Trace:
[ 157.976437 ] <TASK>
[ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core]
[ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core]
[ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core]
[ 157.988549 ] pci_device_remove+0x31/0xa0
[ 157.988933 ] device_release_driver_internal+0x18f/0x1f0
[ 157.989402 ] driver_detach+0x3f/0x80
[ 157.989754 ] bus_remove_driver+0x70/0xf0
[ 157.990129 ] pci_unregister_driver+0x34/0x90
[ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core]
[ 157.990972 ] __x64_sys_delete_module+0x15a/0x250
[ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110
[ 157.991840 ] do_syscall_64+0x3d/0x90
[ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Reviewed-by: Roi Dayan <roid(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 728b82ce4031..65fe40f55d84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5301,6 +5301,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
goto err_action_counter;
}
+ mlx5_esw_offloads_devcom_init(esw);
+
return 0;
err_action_counter:
@@ -5329,7 +5331,7 @@ void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
priv = netdev_priv(rpriv->netdev);
esw = priv->mdev->priv.eswitch;
- mlx5e_tc_clean_fdb_peer_flows(esw);
+ mlx5_esw_offloads_devcom_cleanup(esw);
mlx5e_tc_tun_cleanup(uplink_priv->encap);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1a042c981713..9f007c5438ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -369,6 +369,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs);
void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf);
void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw);
void mlx5_eswitch_disable(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw);
int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
u16 vport, const u8 *mac);
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
@@ -767,6 +769,8 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; }
static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) {}
static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) {}
static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; }
static inline
int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw, u16 vport, int link_state) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 69215ffb9999..7c34c7cf506f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2779,7 +2779,7 @@ static int mlx5_esw_offloads_devcom_event(int event,
return err;
}
-static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -2802,7 +2802,7 @@ static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
ESW_OFFLOADS_DEVCOM_PAIR, esw);
}
-static void esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
{
struct mlx5_devcom *devcom = esw->dev->priv.devcom;
@@ -3250,8 +3250,6 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
if (err)
goto err_vports;
- esw_offloads_devcom_init(esw);
-
return 0;
err_vports:
@@ -3292,7 +3290,6 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
void esw_offloads_disable(struct mlx5_eswitch *esw)
{
- esw_offloads_devcom_cleanup(esw);
mlx5_eswitch_disable_pf_vf_vports(esw);
esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK);
esw_set_passing_vport_metadata(esw, false);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x afbed3f74830163f9559579dee382cac3cff82da
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052845-probably-overpass-092e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
afbed3f74830 ("net/mlx5e: do as little as possible in napi poll when budget is 0")
214baf22870c ("net/mlx5e: Support HTB offload")
1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
145e5637d941 ("net/mlx5e: Add TX PTP port object support")
1a7f51240dfb ("net/mlx5e: Split SW group counters update function")
0b676aaecc25 ("net/mlx5e: Change skb fifo push/pop API to be used without SQ")
579524c6eace ("net/mlx5e: Validate stop_room size upon user input")
3180472f582b ("net/mlx5: Add functions to set/query MFRL register")
573a8095f68c ("Merge tag 'mlx5-updates-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From afbed3f74830163f9559579dee382cac3cff82da Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba(a)kernel.org>
Date: Tue, 16 May 2023 18:59:35 -0700
Subject: [PATCH] net/mlx5e: do as little as possible in napi poll when budget
is 0
NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.
Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.
In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.
The immediate stack trace I was seeing is:
__do_softirq+0xd1/0x2c0
__local_bh_enable_ip+0xc7/0x120
</IRQ>
<TASK>
page_pool_put_defragged_page+0x267/0x320
mlx5e_free_xdpsq_desc+0x99/0xd0
mlx5e_poll_xdpsq_cq+0x138/0x3b0
mlx5e_napi_poll+0xc3/0x8b0
netpoll_poll_dev+0xce/0x150
AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.
Reviewed-by: Tariq Toukan <tariqt(a)nvidia.com>
Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index a50bfda18e96..fbb2d963fb7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -161,20 +161,22 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
}
}
+ /* budget=0 means we may be in IRQ context, do as little as possible */
+ if (unlikely(!budget))
+ goto out;
+
busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
if (c->xdp)
busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq);
- if (likely(budget)) { /* budget=0 means: don't poll rx rings */
- if (xsk_open)
- work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
+ if (xsk_open)
+ work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
- if (likely(budget - work_done))
- work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
+ if (likely(budget - work_done))
+ work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
- busy |= work_done == budget;
- }
+ busy |= work_done == budget;
mlx5e_poll_ico_cq(&c->icosq.cq);
if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e764f12208b99ac7892c4e3f6bf88d71ca71036f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052827-dancing-proud-0d5b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e764f12208b9 ("cxl: Move cxl_await_media_ready() to before capacity info retrieval")
fd35fdcbf75b ("cxl/test: Add mock test for set_timestamp")
f8d22bf50ca5 ("tools/testing/cxl: Mock support for Get Poison List")
a5fcd228ca1d ("Merge branch 'for-6.3/cxl-rr-emu' into cxl/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e764f12208b99ac7892c4e3f6bf88d71ca71036f Mon Sep 17 00:00:00 2001
From: Dave Jiang <dave.jiang(a)intel.com>
Date: Thu, 18 May 2023 16:38:20 -0700
Subject: [PATCH] cxl: Move cxl_await_media_ready() to before capacity info
retrieval
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams(a)intel.com>
Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Dave Jiang <dave.jiang(a)intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@dj…
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 23b9ff920d7e..2c8dc7e2b84d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1028,7 +1028,7 @@ static int cxl_mem_get_partition_info(struct cxl_dev_state *cxlds)
* cxl_dev_state_identify() - Send the IDENTIFY command to the device.
* @cxlds: The device data for the operation
*
- * Return: 0 if identify was executed successfully.
+ * Return: 0 if identify was executed successfully or media not ready.
*
* This will dispatch the identify command to the device and on success populate
* structures to be exported to sysfs.
@@ -1041,6 +1041,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
u32 val;
int rc;
+ if (!cxlds->media_ready)
+ return 0;
+
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_IDENTIFY,
.size_out = sizeof(id),
@@ -1115,10 +1118,12 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
cxlds->persistent_only_bytes, "pmem");
}
- rc = cxl_mem_get_partition_info(cxlds);
- if (rc) {
- dev_err(dev, "Failed to query partition information\n");
- return rc;
+ if (cxlds->media_ready) {
+ rc = cxl_mem_get_partition_info(cxlds);
+ if (rc) {
+ dev_err(dev, "Failed to query partition information\n");
+ return rc;
+ }
}
rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index db12b6313afb..a2845a7a69d8 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -266,6 +266,7 @@ struct cxl_poison_state {
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
* @payload_size: Size of space for payload
* (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
* @lsa_size: Size of Label Storage Area
@@ -303,6 +304,7 @@ struct cxl_dev_state {
int cxl_dvsec;
bool rcd;
+ bool media_ready;
size_t payload_size;
size_t lsa_size;
struct mutex mbox_mutex; /* Protects device mailbox and firmware */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 10caf180b3fa..519edd0eb196 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -124,6 +124,9 @@ static int cxl_mem_probe(struct device *dev)
struct dentry *dentry;
int rc;
+ if (!cxlds->media_ready)
+ return -EBUSY;
+
/*
* Someone is trying to reattach this device after it lost its port
* connection (an endpoint port previously registered by this memdev was
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index f7a5b8e9c102..0872f2233ed0 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+ rc = cxl_await_media_ready(cxlds);
+ if (rc == 0)
+ cxlds->media_ready = true;
+ else
+ dev_warn(&pdev->dev, "Media not active (%d)\n", rc);
+
rc = cxl_pci_setup_mailbox(cxlds);
if (rc)
return rc;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 17a95f469c26..c23b6164e1c0 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -117,12 +117,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
if (rc)
return rc;
- rc = cxl_await_media_ready(cxlds);
- if (rc) {
- dev_err(&port->dev, "Media not active (%d)\n", rc);
- return rc;
- }
-
rc = devm_cxl_enumerate_decoders(cxlhdm, &info);
if (rc)
return rc;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index ba572d03c687..34b48027b3de 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1256,6 +1256,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
+ cxlds->media_ready = true;
rc = cxl_dev_state_identify(cxlds);
if (rc)
return rc;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e764f12208b99ac7892c4e3f6bf88d71ca71036f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052826-shallow-paltry-04ed@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
e764f12208b9 ("cxl: Move cxl_await_media_ready() to before capacity info retrieval")
fd35fdcbf75b ("cxl/test: Add mock test for set_timestamp")
f8d22bf50ca5 ("tools/testing/cxl: Mock support for Get Poison List")
a5fcd228ca1d ("Merge branch 'for-6.3/cxl-rr-emu' into cxl/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e764f12208b99ac7892c4e3f6bf88d71ca71036f Mon Sep 17 00:00:00 2001
From: Dave Jiang <dave.jiang(a)intel.com>
Date: Thu, 18 May 2023 16:38:20 -0700
Subject: [PATCH] cxl: Move cxl_await_media_ready() to before capacity info
retrieval
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams(a)intel.com>
Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Dave Jiang <dave.jiang(a)intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@dj…
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 23b9ff920d7e..2c8dc7e2b84d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1028,7 +1028,7 @@ static int cxl_mem_get_partition_info(struct cxl_dev_state *cxlds)
* cxl_dev_state_identify() - Send the IDENTIFY command to the device.
* @cxlds: The device data for the operation
*
- * Return: 0 if identify was executed successfully.
+ * Return: 0 if identify was executed successfully or media not ready.
*
* This will dispatch the identify command to the device and on success populate
* structures to be exported to sysfs.
@@ -1041,6 +1041,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
u32 val;
int rc;
+ if (!cxlds->media_ready)
+ return 0;
+
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_IDENTIFY,
.size_out = sizeof(id),
@@ -1115,10 +1118,12 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
cxlds->persistent_only_bytes, "pmem");
}
- rc = cxl_mem_get_partition_info(cxlds);
- if (rc) {
- dev_err(dev, "Failed to query partition information\n");
- return rc;
+ if (cxlds->media_ready) {
+ rc = cxl_mem_get_partition_info(cxlds);
+ if (rc) {
+ dev_err(dev, "Failed to query partition information\n");
+ return rc;
+ }
}
rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index db12b6313afb..a2845a7a69d8 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -266,6 +266,7 @@ struct cxl_poison_state {
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
* @payload_size: Size of space for payload
* (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
* @lsa_size: Size of Label Storage Area
@@ -303,6 +304,7 @@ struct cxl_dev_state {
int cxl_dvsec;
bool rcd;
+ bool media_ready;
size_t payload_size;
size_t lsa_size;
struct mutex mbox_mutex; /* Protects device mailbox and firmware */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 10caf180b3fa..519edd0eb196 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -124,6 +124,9 @@ static int cxl_mem_probe(struct device *dev)
struct dentry *dentry;
int rc;
+ if (!cxlds->media_ready)
+ return -EBUSY;
+
/*
* Someone is trying to reattach this device after it lost its port
* connection (an endpoint port previously registered by this memdev was
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index f7a5b8e9c102..0872f2233ed0 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+ rc = cxl_await_media_ready(cxlds);
+ if (rc == 0)
+ cxlds->media_ready = true;
+ else
+ dev_warn(&pdev->dev, "Media not active (%d)\n", rc);
+
rc = cxl_pci_setup_mailbox(cxlds);
if (rc)
return rc;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 17a95f469c26..c23b6164e1c0 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -117,12 +117,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
if (rc)
return rc;
- rc = cxl_await_media_ready(cxlds);
- if (rc) {
- dev_err(&port->dev, "Media not active (%d)\n", rc);
- return rc;
- }
-
rc = devm_cxl_enumerate_decoders(cxlhdm, &info);
if (rc)
return rc;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index ba572d03c687..34b48027b3de 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1256,6 +1256,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
+ cxlds->media_ready = true;
rc = cxl_dev_state_identify(cxlds);
if (rc)
return rc;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b34ffb0c6d23583830f9327864b9c1f486003305
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052848-esquire-crisped-c290@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b34ffb0c6d23 ("bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b34ffb0c6d23583830f9327864b9c1f486003305 Mon Sep 17 00:00:00 2001
From: Anton Protopopov <aspsk(a)isovalent.com>
Date: Mon, 22 May 2023 15:45:58 +0000
Subject: [PATCH] bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
The LRU and LRU_PERCPU maps allocate a new element on update before locking the
target hash table bucket. Right after that the maps try to lock the bucket.
If this fails, then maps return -EBUSY to the caller without releasing the
allocated element. This makes the element untracked: it doesn't belong to
either of free lists, and it doesn't belong to the hash table, so can't be
re-used; this eventually leads to the permanent -ENOMEM on LRU map updates,
which is unexpected. Fix this by returning the element to the local free list
if bucket locking fails.
Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")
Signed-off-by: Anton Protopopov <aspsk(a)isovalent.com>
Link: https://lore.kernel.org/r/20230522154558.2166815-1-aspsk@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau(a)kernel.org>
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..9901efee4339 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1215,7 +1215,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
ret = htab_lock_bucket(htab, b, hash, &flags);
if (ret)
- return ret;
+ goto err_lock_bucket;
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1236,6 +1236,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
err:
htab_unlock_bucket(htab, b, hash, flags);
+err_lock_bucket:
if (ret)
htab_lru_push_free(htab, l_new);
else if (l_old)
@@ -1338,7 +1339,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
ret = htab_lock_bucket(htab, b, hash, &flags);
if (ret)
- return ret;
+ goto err_lock_bucket;
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1361,6 +1362,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
ret = 0;
err:
htab_unlock_bucket(htab, b, hash, flags);
+err_lock_bucket:
if (l_new)
bpf_lru_push_free(&htab->lru, &l_new->lru_node);
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 0613d8ca9ab382caabe9ed2dceb429e9781e443f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052830-mothproof-folic-5a0f@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
0613d8ca9ab3 ("bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields")
e2f7fc0ac695 ("bpf: fix undefined behavior in narrow load handling")
46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
bc23105ca0ab ("bpf: fix context access in tracing progs on 32 bit archs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0613d8ca9ab382caabe9ed2dceb429e9781e443f Mon Sep 17 00:00:00 2001
From: Will Deacon <will(a)kernel.org>
Date: Thu, 18 May 2023 11:25:28 +0100
Subject: [PATCH] bpf: Fix mask generation for 32-bit narrow loads of 64-bit
fields
A narrow load from a 64-bit context field results in a 64-bit load
followed potentially by a 64-bit right-shift and then a bitwise AND
operation to extract the relevant data.
In the case of a 32-bit access, an immediate mask of 0xffffffff is used
to construct a 64-bit BPP_AND operation which then sign-extends the mask
value and effectively acts as a glorified no-op. For example:
0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
results in the following code generation for a 64-bit field:
ldr x7, [x7] // 64-bit load
mov x10, #0xffffffffffffffff
and x7, x7, x10
Fix the mask generation so that narrow loads always perform a 32-bit AND
operation:
ldr x7, [x7] // 64-bit load
mov w10, #0xffffffff
and w7, w7, w10
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: John Fastabend <john.fastabend(a)gmail.com>
Cc: Krzesimir Nowak <krzesimir(a)kinvolk.io>
Cc: Andrey Ignatov <rdna(a)fb.com>
Acked-by: Yonghong Song <yhs(a)fb.com>
Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Will Deacon <will(a)kernel.org>
Link: https://lore.kernel.org/r/20230518102528.1341-1-will@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbcf5a4e2fcd..5871aa78d01a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17033,7 +17033,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
insn->dst_reg,
shift);
- insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
+ insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
(1ULL << size * 8) - 1);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052819-uptake-scorpion-a23a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
ae9b15fbe634 ("net: fix stack overflow when LRO is disabled for virtual interfaces")
0e400d602f46 ("bonding: fix NULL deref in bond_rr_gen_slave_id")
ad9bd8daf2f9 ("bonding: fix using uninitialized mode_lock")
f3b0a18bb6cb ("net: remove unnecessary variables and callback")
2bce1ebed17d ("macsec: fix refcnt leak in module exit routine")
089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass")
ab92d68fc22f ("net: core: add generic lockdep keys")
76052d8c4f2d ("vlan: do not transfer link state in vlan bridge binding mode")
35a605db168c ("net/mlx5e: Offload TC e-switch rules with ingress VLAN device")
278748a95aa3 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device")
b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
4bd97d51a5e6 ("net: dev: rename queue selection helpers.")
b473b0d23529 ("devlink: create a special NDO for getting the devlink instance")
4eceba17200c ("ethtool: add compat for flash update")
3ceb745baa4c ("devlink: fix condition for compat device info")
a655fe9f1948 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6 Mon Sep 17 00:00:00 2001
From: Taehee Yoo <ap420073(a)gmail.com>
Date: Wed, 17 May 2023 14:30:10 +0000
Subject: [PATCH] net: fix stack overflow when LRO is disabled for virtual
interfaces
When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.
team0
|
+------+------+-----+-----+
| | | | |
team1 team2 team3 ... team200
If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.
But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.
Reproducer:
ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
ip link add team$i master team0 type team
ethtool -K team$i lro on
done
ethtool -K team0 lro off
In order to fix it, the notifier_ctx member of bonding/team is introduced.
Reported-by: syzbot+60748c96cf5c6df8e581(a)syzkaller.appspotmail.com
Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073(a)gmail.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3fed888629f7..edbaa1444f8e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3947,7 +3947,11 @@ static int bond_slave_netdev_event(unsigned long event,
unblock_netpoll_tx();
break;
case NETDEV_FEAT_CHANGE:
- bond_compute_features(bond);
+ if (!bond->notifier_ctx) {
+ bond->notifier_ctx = true;
+ bond_compute_features(bond);
+ bond->notifier_ctx = false;
+ }
break;
case NETDEV_RESEND_IGMP:
/* Propagate to master device */
@@ -6342,6 +6346,8 @@ static int bond_init(struct net_device *bond_dev)
if (!bond->wq)
return -ENOMEM;
+ bond->notifier_ctx = false;
+
spin_lock_init(&bond->stats_lock);
netdev_lockdep_set_classes(bond_dev);
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index d10606f257c4..555b0b1e9a78 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1629,6 +1629,7 @@ static int team_init(struct net_device *dev)
team->dev = dev;
team_set_no_mode(team);
+ team->notifier_ctx = false;
team->pcpu_stats = netdev_alloc_pcpu_stats(struct team_pcpu_stats);
if (!team->pcpu_stats)
@@ -3022,7 +3023,11 @@ static int team_device_event(struct notifier_block *unused,
team_del_slave(port->team->dev, dev);
break;
case NETDEV_FEAT_CHANGE:
- team_compute_features(port->team);
+ if (!port->team->notifier_ctx) {
+ port->team->notifier_ctx = true;
+ team_compute_features(port->team);
+ port->team->notifier_ctx = false;
+ }
break;
case NETDEV_PRECHANGEMTU:
/* Forbid to change mtu of underlaying device */
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index fc985e5c739d..8de6b6e67829 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -208,6 +208,7 @@ struct team {
bool queue_override_enabled;
struct list_head *qom_lists; /* array of queue override mapping lists */
bool port_mtu_change_allowed;
+ bool notifier_ctx;
struct {
unsigned int count;
unsigned int interval; /* in ms */
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 0efef2a952b7..59955ac33157 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -221,6 +221,7 @@ struct bonding {
struct bond_up_slave __rcu *usable_slaves;
struct bond_up_slave __rcu *all_slaves;
bool force_primary;
+ bool notifier_ctx;
s32 slave_cnt; /* never change this value outside the attach/detach wrappers */
int (*recv_probe)(const struct sk_buff *, struct bonding *,
struct slave *);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052818-trapdoor-acclaim-d86d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ae9b15fbe634 ("net: fix stack overflow when LRO is disabled for virtual interfaces")
0e400d602f46 ("bonding: fix NULL deref in bond_rr_gen_slave_id")
ad9bd8daf2f9 ("bonding: fix using uninitialized mode_lock")
f3b0a18bb6cb ("net: remove unnecessary variables and callback")
2bce1ebed17d ("macsec: fix refcnt leak in module exit routine")
089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass")
ab92d68fc22f ("net: core: add generic lockdep keys")
76052d8c4f2d ("vlan: do not transfer link state in vlan bridge binding mode")
35a605db168c ("net/mlx5e: Offload TC e-switch rules with ingress VLAN device")
278748a95aa3 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device")
b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
4bd97d51a5e6 ("net: dev: rename queue selection helpers.")
b473b0d23529 ("devlink: create a special NDO for getting the devlink instance")
4eceba17200c ("ethtool: add compat for flash update")
3ceb745baa4c ("devlink: fix condition for compat device info")
a655fe9f1948 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6 Mon Sep 17 00:00:00 2001
From: Taehee Yoo <ap420073(a)gmail.com>
Date: Wed, 17 May 2023 14:30:10 +0000
Subject: [PATCH] net: fix stack overflow when LRO is disabled for virtual
interfaces
When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.
team0
|
+------+------+-----+-----+
| | | | |
team1 team2 team3 ... team200
If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.
But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.
Reproducer:
ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
ip link add team$i master team0 type team
ethtool -K team$i lro on
done
ethtool -K team0 lro off
In order to fix it, the notifier_ctx member of bonding/team is introduced.
Reported-by: syzbot+60748c96cf5c6df8e581(a)syzkaller.appspotmail.com
Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073(a)gmail.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3fed888629f7..edbaa1444f8e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3947,7 +3947,11 @@ static int bond_slave_netdev_event(unsigned long event,
unblock_netpoll_tx();
break;
case NETDEV_FEAT_CHANGE:
- bond_compute_features(bond);
+ if (!bond->notifier_ctx) {
+ bond->notifier_ctx = true;
+ bond_compute_features(bond);
+ bond->notifier_ctx = false;
+ }
break;
case NETDEV_RESEND_IGMP:
/* Propagate to master device */
@@ -6342,6 +6346,8 @@ static int bond_init(struct net_device *bond_dev)
if (!bond->wq)
return -ENOMEM;
+ bond->notifier_ctx = false;
+
spin_lock_init(&bond->stats_lock);
netdev_lockdep_set_classes(bond_dev);
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index d10606f257c4..555b0b1e9a78 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1629,6 +1629,7 @@ static int team_init(struct net_device *dev)
team->dev = dev;
team_set_no_mode(team);
+ team->notifier_ctx = false;
team->pcpu_stats = netdev_alloc_pcpu_stats(struct team_pcpu_stats);
if (!team->pcpu_stats)
@@ -3022,7 +3023,11 @@ static int team_device_event(struct notifier_block *unused,
team_del_slave(port->team->dev, dev);
break;
case NETDEV_FEAT_CHANGE:
- team_compute_features(port->team);
+ if (!port->team->notifier_ctx) {
+ port->team->notifier_ctx = true;
+ team_compute_features(port->team);
+ port->team->notifier_ctx = false;
+ }
break;
case NETDEV_PRECHANGEMTU:
/* Forbid to change mtu of underlaying device */
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index fc985e5c739d..8de6b6e67829 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -208,6 +208,7 @@ struct team {
bool queue_override_enabled;
struct list_head *qom_lists; /* array of queue override mapping lists */
bool port_mtu_change_allowed;
+ bool notifier_ctx;
struct {
unsigned int count;
unsigned int interval; /* in ms */
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 0efef2a952b7..59955ac33157 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -221,6 +221,7 @@ struct bonding {
struct bond_up_slave __rcu *usable_slaves;
struct bond_up_slave __rcu *all_slaves;
bool force_primary;
+ bool notifier_ctx;
s32 slave_cnt; /* never change this value outside the attach/detach wrappers */
int (*recv_probe)(const struct sk_buff *, struct bonding *,
struct slave *);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x edc0a2b5957652f4685ef3516f519f84807087db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052818-glider-vivacious-f7d5@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
edc0a2b59576 ("x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms")
7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
55e6d279abd9 ("x86/cpu: Remove the pointless CPU printout")
9305bd6ca7b4 ("x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()")
a2aa578fec8c ("x86/Centaur: Report correct CPU/cache topology")
2cc61be60e37 ("x86/CPU: Make intel_num_cpu_cores() generic")
b5cf8707e6c9 ("x86/CPU: Move cpu local function declarations to local header")
6c4f5abaf356 ("x86/CPU: Modify detect_extended_topology() to return result")
60882cc159e1 ("x86/Centaur: Initialize supported CPU features properly")
7d5905dc14a8 ("x86 / CPU: Always show current CPU frequency in /proc/cpuinfo")
b29c6ef7bb12 ("x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()")
d6ec9d9a4def ("Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From edc0a2b5957652f4685ef3516f519f84807087db Mon Sep 17 00:00:00 2001
From: Zhang Rui <rui.zhang(a)intel.com>
Date: Thu, 23 Mar 2023 09:56:40 +0800
Subject: [PATCH] x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid
platforms
Traditionally, all CPUs in a system have identical numbers of SMT
siblings. That changes with hybrid processors where some logical CPUs
have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it
is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should
specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.
Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
Notice that the "before" 'package_cpus_list' has only one CPU. This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:
-Core(s) per socket: 1
-Socket(s): 11
+Core(s) per socket: 16
+Socket(s): 1
This is also expected to make the scheduler do rather wonky things
too.
[ dhansen: remove CPUID detail from changelog, add end user effects ]
CC: stable(a)kernel.org
Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..0270925fe013 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -79,7 +79,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
* initial apic id, which also represents 32-bit extended x2apic id.
*/
c->initial_apicid = edx;
- smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
#endif
return 0;
}
@@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
*/
cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
c->initial_apicid = edx;
- core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x edc0a2b5957652f4685ef3516f519f84807087db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052817-extras-parasail-a281@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
edc0a2b59576 ("x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms")
7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From edc0a2b5957652f4685ef3516f519f84807087db Mon Sep 17 00:00:00 2001
From: Zhang Rui <rui.zhang(a)intel.com>
Date: Thu, 23 Mar 2023 09:56:40 +0800
Subject: [PATCH] x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid
platforms
Traditionally, all CPUs in a system have identical numbers of SMT
siblings. That changes with hybrid processors where some logical CPUs
have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it
is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should
specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.
Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
Notice that the "before" 'package_cpus_list' has only one CPU. This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:
-Core(s) per socket: 1
-Socket(s): 11
+Core(s) per socket: 16
+Socket(s): 1
This is also expected to make the scheduler do rather wonky things
too.
[ dhansen: remove CPUID detail from changelog, add end user effects ]
CC: stable(a)kernel.org
Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..0270925fe013 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -79,7 +79,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
* initial apic id, which also represents 32-bit extended x2apic id.
*/
c->initial_apicid = edx;
- smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
#endif
return 0;
}
@@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
*/
cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
c->initial_apicid = edx;
- core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
On some arch (for example IPQ8074) and other with
KERNEL_STACKPROTECTOR_STRONG enabled, the following compilation error is
triggered:
drivers/net/wireguard/allowedips.c: In function 'root_remove_peer_lists':
drivers/net/wireguard/allowedips.c:80:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
80 | }
| ^
drivers/net/wireguard/allowedips.c: In function 'root_free_rcu':
drivers/net/wireguard/allowedips.c:67:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
67 | }
| ^
cc1: all warnings being treated as errors
Since these are free function and returns void, using function that can
fail is not ideal since an error would result in data not freed.
Since the free are under RCU lock, we can allocate the required stack
array as static outside the function and memset when needed.
This effectively fix the stack frame warning without changing how the
function work.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
Changes v2:
- Fix double Fixes in fixes tag
drivers/net/wireguard/allowedips.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
index 5bf7822c53f1..c129082f04c6 100644
--- a/drivers/net/wireguard/allowedips.c
+++ b/drivers/net/wireguard/allowedips.c
@@ -53,12 +53,16 @@ static void node_free_rcu(struct rcu_head *rcu)
kmem_cache_free(node_cache, container_of(rcu, struct allowedips_node, rcu));
}
+static struct allowedips_node *tmpstack[MAX_ALLOWEDIPS_BITS];
+
static void root_free_rcu(struct rcu_head *rcu)
{
- struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = {
- container_of(rcu, struct allowedips_node, rcu) };
+ struct allowedips_node *node, **stack = tmpstack;
unsigned int len = 1;
+ memset(stack, 0, sizeof(*stack) * MAX_ALLOWEDIPS_BITS);
+ stack[0] = container_of(rcu, struct allowedips_node, rcu);
+
while (len > 0 && (node = stack[--len])) {
push_rcu(stack, node->bit[0], &len);
push_rcu(stack, node->bit[1], &len);
@@ -68,9 +72,12 @@ static void root_free_rcu(struct rcu_head *rcu)
static void root_remove_peer_lists(struct allowedips_node *root)
{
- struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = { root };
+ struct allowedips_node *node, **stack = tmpstack;
unsigned int len = 1;
+ memset(stack, 0, sizeof(*stack) * MAX_ALLOWEDIPS_BITS);
+ stack[0] = root;
+
while (len > 0 && (node = stack[--len])) {
push_rcu(stack, node->bit[0], &len);
push_rcu(stack, node->bit[1], &len);
--
2.39.2
We are Zhao & Associates. We contacting you prior to the COVID relief donation
from Joseph Tsai foundation. You were selected amongst 4 others from different
parts of the world. Kindly confirm your email with us, before we proceed.
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46930b7cc7727271c9c27aac1fdc97a8645e2d00
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052842-facedown-sliceable-49ca@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46930b7cc772 ("block: fix bio-cache for passthru IO")
7e2e355dd9c9 ("block: extend bio-cache for non-polled requests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46930b7cc7727271c9c27aac1fdc97a8645e2d00 Mon Sep 17 00:00:00 2001
From: Anuj Gupta <anuj20.g(a)samsung.com>
Date: Tue, 23 May 2023 16:47:09 +0530
Subject: [PATCH] block: fix bio-cache for passthru IO
commit <8af870aa5b847> ("block: enable bio caching use for passthru IO")
introduced bio-cache for passthru IO. In case when nr_vecs are greater
than BIO_INLINE_VECS, bio and bvecs are allocated from mempool (instead
of percpu cache) and REQ_ALLOC_CACHE is cleared. This causes the side
effect of not freeing bio/bvecs into mempool on completion.
This patch lets the passthru IO fallback to allocation using bio_kmalloc
when nr_vecs are greater than BIO_INLINE_VECS. The corresponding bio
is freed during call to blk_mq_map_bio_put during completion.
Cc: stable(a)vger.kernel.org # 6.1
fixes <8af870aa5b847> ("block: enable bio caching use for passthru IO")
Signed-off-by: Anuj Gupta <anuj20.g(a)samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k(a)samsung.com>
Link: https://lore.kernel.org/r/20230523111709.145676-1-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-map.c b/block/blk-map.c
index 04c55f1c492e..46eed2e627c3 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -248,7 +248,7 @@ static struct bio *blk_rq_map_bio_alloc(struct request *rq,
{
struct bio *bio;
- if (rq->cmd_flags & REQ_ALLOC_CACHE) {
+ if (rq->cmd_flags & REQ_ALLOC_CACHE && (nr_vecs <= BIO_INLINE_VECS)) {
bio = bio_alloc_bioset(NULL, nr_vecs, rq->cmd_flags, gfp_mask,
&fs_bio_set);
if (!bio)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 597441b3436a43011f31ce71dc0a6c0bf5ce958a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052843-pants-drearily-a496@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
597441b3436a ("btrfs: use nofs when cleaning up aborted transactions")
fe816d0f1d4c ("btrfs: Fix delalloc inodes invalidation during transaction abort")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 597441b3436a43011f31ce71dc0a6c0bf5ce958a Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 May 2023 12:45:59 -0400
Subject: [PATCH] btrfs: use nofs when cleaning up aborted transactions
Our CI system caught a lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc7+ #1167 Not tainted
------------------------------------------------------
kswapd0/46 is trying to acquire lock:
ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120
but task is already holding lock:
ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0xa5/0xe0
kmem_cache_alloc+0x31/0x2c0
alloc_extent_state+0x1d/0xd0
__clear_extent_bit+0x2e0/0x4f0
try_release_extent_mapping+0x216/0x280
btrfs_release_folio+0x2e/0x90
invalidate_inode_pages2_range+0x397/0x470
btrfs_cleanup_dirty_bgs+0x9e/0x210
btrfs_cleanup_one_transaction+0x22/0x760
btrfs_commit_transaction+0x3b7/0x13a0
create_subvol+0x59b/0x970
btrfs_mksubvol+0x435/0x4f0
__btrfs_ioctl_snap_create+0x11e/0x1b0
btrfs_ioctl_snap_create_v2+0xbf/0x140
btrfs_ioctl+0xa45/0x28f0
__x64_sys_ioctl+0x88/0xc0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #0 (sb_internal#2){++++}-{0:0}:
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
start_transaction+0x401/0x730
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
kthread+0xf0/0x120
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
*** DEADLOCK ***
3 locks held by kswapd0/46:
#0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
#1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290
#2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0
stack backtrace:
CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x58/0x90
check_noncircular+0xd6/0x100
? save_trace+0x3f/0x310
? add_lock_to_list+0x97/0x120
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
? btrfs_commit_inode_delayed_inode+0x5f/0x120
start_transaction+0x401/0x730
? btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
? lock_release+0x134/0x270
? __pfx_wake_bit_function+0x10/0x10
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
? __pfx_inode_lru_isolate+0x10/0x10
? __pfx_inode_lru_isolate+0x10/0x10
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_kswapd+0x10/0x10
kthread+0xf0/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
This happens because when we abort the transaction in the transaction
commit path we call invalidate_inode_pages2_range on our block group
cache inodes (if we have space cache v1) and any delalloc inodes we may
have. The plain invalidate_inode_pages2_range() call passes through
GFP_KERNEL, which makes sense in most cases, but not here. Wrap these
two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to
make sure we don't end up with the fs reclaim dependency under the
transaction dependency.
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index fbf9006c6234..6de6dcf2743e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4936,7 +4936,11 @@ static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
*/
inode = igrab(&btrfs_inode->vfs_inode);
if (inode) {
+ unsigned int nofs_flag;
+
+ nofs_flag = memalloc_nofs_save();
invalidate_inode_pages2(inode->i_mapping);
+ memalloc_nofs_restore(nofs_flag);
iput(inode);
}
spin_lock(&root->delalloc_lock);
@@ -5042,7 +5046,12 @@ static void btrfs_cleanup_bg_io(struct btrfs_block_group *cache)
inode = cache->io_ctl.inode;
if (inode) {
+ unsigned int nofs_flag;
+
+ nofs_flag = memalloc_nofs_save();
invalidate_inode_pages2(inode->i_mapping);
+ memalloc_nofs_restore(nofs_flag);
+
BTRFS_I(inode)->generation = 0;
cache->io_ctl.inode = NULL;
iput(inode);
Partially backport v6.3 commit 11f75a01448f ("selftests/memfd: add
tests for MFD_NOEXEC_SEAL MFD_EXEC") to fix an unknown type name
build error.
In some systems, the __u64 typedef is not present due to differences
in system headers, causing compilation errors like this one:
fuse_test.c:64:8: error: unknown type name '__u64'
64 | static __u64 mfd_assert_get_seals(int fd)
This header includes the __u64 typedef which increases the
likelihood of successful compilation on a wider variety of systems.
Signed-off-by: Hardik Garg <hargar(a)linux.microsoft.com>
---
tools/testing/selftests/memfd/fuse_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/memfd/fuse_test.c b/tools/testing/selftests/memfd/fuse_test.c
index be675002f918..93798c8c5d54 100644
--- a/tools/testing/selftests/memfd/fuse_test.c
+++ b/tools/testing/selftests/memfd/fuse_test.c
@@ -22,6 +22,7 @@
#include <linux/falloc.h>
#include <fcntl.h>
#include <linux/memfd.h>
+#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
--
2.25.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ce0b15d11ad837fbacc5356941712218e38a0a83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052612-reproach-snowbird-d3a2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ce0b15d11ad8 ("x86/mm: Avoid incomplete Global INVLPG flushes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce0b15d11ad837fbacc5356941712218e38a0a83 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 16 May 2023 12:24:25 -0700
Subject: [PATCH] x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..8192452d1d2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
}
}
+#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ {}
+};
+
static void setup_pcid(void)
{
if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
+ if (x86_match_cpu(invlpg_miss_ids)) {
+ pr_info("Incomplete global flushes, disabling PCID");
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ return;
+ }
+
if (boot_cpu_has(X86_FEATURE_PGE)) {
/*
* This can't be cr4_set_bits_and_update_boot() -- the
Hi Greg, Sasha,
[ This is v2 including two initial missing backport patches and one new
patch at the end of this batch. ]
This is second round of -stable backport fixes for 4.14. This batch
includes dependency patches which are not currently in the 4.14 branch.
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 4f16d25c68ec ("netfilter: nftables: add nft_parse_register_load() and use it")
2) 345023b0db31 ("netfilter: nftables: add nft_parse_register_store() and use it")
3) 08a01c11a5bb ("netfilter: nftables: statify nft_parse_register()")
4) 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.")
5) 20a1452c3542 ("netfilter: nf_tables: add nft_setelem_parse_key()")
6) fdb9c405e35b ("netfilter: nf_tables: allow up to 64 bytes in the set element data area")
7) 7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data")
8) 215a31f19ded ("netfilter: nft_dynset: do not reject set updates with NFT_SET_EVAL")
9) 36d5b2913219 ("netfilter: nf_tables: do not allow RULE_ID to refer to another chain")
10) 470ee20e069a ("netfilter: nf_tables: do not allow SET_ID to refer to another table")
11) d209df3e7f70 ("netfilter: nf_tables: fix register ordering")
Please, apply.
Thanks.
Florian Westphal (1):
netfilter: nf_tables: fix register ordering
Pablo Neira Ayuso (10):
netfilter: nftables: add nft_parse_register_load() and use it
netfilter: nftables: add nft_parse_register_store() and use it
netfilter: nftables: statify nft_parse_register()
netfilter: nf_tables: validate registers coming from userspace.
netfilter: nf_tables: add nft_setelem_parse_key()
netfilter: nf_tables: allow up to 64 bytes in the set element data area
netfilter: nf_tables: stricter validation of element data
netfilter: nft_dynset: do not reject set updates with NFT_SET_EVAL
netfilter: nf_tables: do not allow RULE_ID to refer to another chain
netfilter: nf_tables: do not allow SET_ID to refer to another table
include/net/netfilter/nf_tables.h | 17 +-
include/net/netfilter/nf_tables_core.h | 14 +-
include/net/netfilter/nft_fib.h | 2 +-
include/net/netfilter/nft_masq.h | 4 +-
include/net/netfilter/nft_meta.h | 4 +-
include/net/netfilter/nft_redir.h | 4 +-
include/uapi/linux/netfilter/nf_tables.h | 2 +-
net/bridge/netfilter/nft_meta_bridge.c | 5 +-
net/ipv4/netfilter/nft_dup_ipv4.c | 18 +-
net/ipv6/netfilter/nft_dup_ipv6.c | 18 +-
net/netfilter/nf_tables_api.c | 220 +++++++++++++++--------
net/netfilter/nft_bitwise.c | 14 +-
net/netfilter/nft_byteorder.c | 14 +-
net/netfilter/nft_cmp.c | 8 +-
net/netfilter/nft_ct.c | 12 +-
net/netfilter/nft_dup_netdev.c | 6 +-
net/netfilter/nft_dynset.c | 16 +-
net/netfilter/nft_exthdr.c | 14 +-
net/netfilter/nft_fib.c | 5 +-
net/netfilter/nft_fwd_netdev.c | 6 +-
net/netfilter/nft_hash.c | 25 ++-
net/netfilter/nft_immediate.c | 8 +-
net/netfilter/nft_lookup.c | 14 +-
net/netfilter/nft_masq.c | 14 +-
net/netfilter/nft_meta.c | 8 +-
net/netfilter/nft_nat.c | 35 ++--
net/netfilter/nft_numgen.c | 15 +-
net/netfilter/nft_objref.c | 6 +-
net/netfilter/nft_payload.c | 10 +-
net/netfilter/nft_queue.c | 12 +-
net/netfilter/nft_range.c | 6 +-
net/netfilter/nft_redir.c | 14 +-
net/netfilter/nft_rt.c | 7 +-
33 files changed, 320 insertions(+), 257 deletions(-)
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052834-shakable-mobile-fdb0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()")
c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
7b0dbd940765 ("binder: fix binder_alloc kernel-doc warnings")
d6d04d71daae ("binder: remove binder_alloc_set_vma()")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 19 May 2023 19:59:49 +0000
Subject: [PATCH] binder: fix UAF of alloc->vma in race with munmap()
[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix
UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
in mainline after the revert of commit a43cfc87caaf ("android: binder:
stop saving a pointer to the VMA") as pointed out by Liam. The commit
log and tags have been tweaked to reflect this. ]
In commit 720c24192404 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable(a)vger.kernel.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yang Shi <yang.shi(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e7c9d466f8e8..662a2a2e2e84 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -212,7 +212,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
mm = alloc->mm;
if (mm) {
- mmap_read_lock(mm);
+ mmap_write_lock(mm);
vma = alloc->vma;
}
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
trace_binder_alloc_page_end(alloc, index);
}
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return 0;
@@ -303,7 +303,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
err_no_vma:
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052832-opposing-flavoring-f084@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()")
c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
7b0dbd940765 ("binder: fix binder_alloc kernel-doc warnings")
d6d04d71daae ("binder: remove binder_alloc_set_vma()")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 19 May 2023 19:59:49 +0000
Subject: [PATCH] binder: fix UAF of alloc->vma in race with munmap()
[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix
UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
in mainline after the revert of commit a43cfc87caaf ("android: binder:
stop saving a pointer to the VMA") as pointed out by Liam. The commit
log and tags have been tweaked to reflect this. ]
In commit 720c24192404 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable(a)vger.kernel.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yang Shi <yang.shi(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e7c9d466f8e8..662a2a2e2e84 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -212,7 +212,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
mm = alloc->mm;
if (mm) {
- mmap_read_lock(mm);
+ mmap_write_lock(mm);
vma = alloc->vma;
}
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
trace_binder_alloc_page_end(alloc, index);
}
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return 0;
@@ -303,7 +303,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
err_no_vma:
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052830-unblended-envision-4bbd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()")
c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
7b0dbd940765 ("binder: fix binder_alloc kernel-doc warnings")
d6d04d71daae ("binder: remove binder_alloc_set_vma()")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 19 May 2023 19:59:49 +0000
Subject: [PATCH] binder: fix UAF of alloc->vma in race with munmap()
[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix
UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
in mainline after the revert of commit a43cfc87caaf ("android: binder:
stop saving a pointer to the VMA") as pointed out by Liam. The commit
log and tags have been tweaked to reflect this. ]
In commit 720c24192404 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable(a)vger.kernel.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yang Shi <yang.shi(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e7c9d466f8e8..662a2a2e2e84 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -212,7 +212,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
mm = alloc->mm;
if (mm) {
- mmap_read_lock(mm);
+ mmap_write_lock(mm);
vma = alloc->vma;
}
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
trace_binder_alloc_page_end(alloc, index);
}
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return 0;
@@ -303,7 +303,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
err_no_vma:
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x bdc1c5fac982845a58d28690cdb56db8c88a530d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052804-exploit-passion-ce51@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
bdc1c5fac982 ("binder: fix UAF caused by faulty buffer cleanup")
9864bb480133 ("Binder: add TF_UPDATE_TXN to replace outdated txn")
32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bdc1c5fac982845a58d28690cdb56db8c88a530d Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 5 May 2023 20:30:20 +0000
Subject: [PATCH] binder: fix UAF caused by faulty buffer cleanup
In binder_transaction_buffer_release() the 'failed_at' offset indicates
the number of objects to clean up. However, this function was changed by
commit 44d8047f1d87 ("binder: use standard functions to allocate fds"),
to release all the objects in the buffer when 'failed_at' is zero.
This introduced an issue when a transaction buffer is released without
any objects having been processed so far. In this case, 'failed_at' is
indeed zero yet it is misinterpreted as releasing the entire buffer.
This leads to use-after-free errors where nodes are incorrectly freed
and subsequently accessed. Such is the case in the following KASAN
report:
==================================================================
BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30
Read of size 8 at addr ffff4faf037cfc58 by task poc/474
CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x48/0x60
print_report+0xf8/0x5b8
kasan_report+0xb8/0xfc
__asan_load8+0x9c/0xb8
binder_thread_read+0xc40/0x1f30
binder_ioctl+0xd9c/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Allocated by task 474:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_alloc_info+0x24/0x34
__kasan_kmalloc+0xb8/0xbc
kmalloc_trace+0x48/0x5c
binder_new_node+0x3c/0x3a4
binder_transaction+0x2b58/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Freed by task 475:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_free_info+0x38/0x5c
__kasan_slab_free+0xe8/0x154
__kmem_cache_free+0x128/0x2bc
kfree+0x58/0x70
binder_dec_node_tmpref+0x178/0x1fc
binder_transaction_buffer_release+0x430/0x628
binder_transaction+0x1954/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
==================================================================
In order to avoid these issues, let's always calculate the intended
'failed_at' offset beforehand. This is renamed and wrapped in a helper
function to make it clear and convenient.
Fixes: 32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup")
Reported-by: Zi Fan Tan <zifantan(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20230505203020.4101154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index fb56bfc45096..8fb7672021ee 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1934,24 +1934,23 @@ static void binder_deferred_fd_close(int fd)
static void binder_transaction_buffer_release(struct binder_proc *proc,
struct binder_thread *thread,
struct binder_buffer *buffer,
- binder_size_t failed_at,
+ binder_size_t off_end_offset,
bool is_failure)
{
int debug_id = buffer->debug_id;
- binder_size_t off_start_offset, buffer_offset, off_end_offset;
+ binder_size_t off_start_offset, buffer_offset;
binder_debug(BINDER_DEBUG_TRANSACTION,
"%d buffer release %d, size %zd-%zd, failed at %llx\n",
proc->pid, buffer->debug_id,
buffer->data_size, buffer->offsets_size,
- (unsigned long long)failed_at);
+ (unsigned long long)off_end_offset);
if (buffer->target_node)
binder_dec_node(buffer->target_node, 1, 0);
off_start_offset = ALIGN(buffer->data_size, sizeof(void *));
- off_end_offset = is_failure && failed_at ? failed_at :
- off_start_offset + buffer->offsets_size;
+
for (buffer_offset = off_start_offset; buffer_offset < off_end_offset;
buffer_offset += sizeof(binder_size_t)) {
struct binder_object_header *hdr;
@@ -2111,6 +2110,21 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
}
}
+/* Clean up all the objects in the buffer */
+static inline void binder_release_entire_buffer(struct binder_proc *proc,
+ struct binder_thread *thread,
+ struct binder_buffer *buffer,
+ bool is_failure)
+{
+ binder_size_t off_end_offset;
+
+ off_end_offset = ALIGN(buffer->data_size, sizeof(void *));
+ off_end_offset += buffer->offsets_size;
+
+ binder_transaction_buffer_release(proc, thread, buffer,
+ off_end_offset, is_failure);
+}
+
static int binder_translate_binder(struct flat_binder_object *fp,
struct binder_transaction *t,
struct binder_thread *thread)
@@ -2806,7 +2820,7 @@ static int binder_proc_transaction(struct binder_transaction *t,
t_outdated->buffer = NULL;
buffer->transaction = NULL;
trace_binder_transaction_update_buffer_release(buffer);
- binder_transaction_buffer_release(proc, NULL, buffer, 0, 0);
+ binder_release_entire_buffer(proc, NULL, buffer, false);
binder_alloc_free_buf(&proc->alloc, buffer);
kfree(t_outdated);
binder_stats_deleted(BINDER_STAT_TRANSACTION);
@@ -3775,7 +3789,7 @@ binder_free_buf(struct binder_proc *proc,
binder_node_inner_unlock(buf_node);
}
trace_binder_transaction_buffer_release(buffer);
- binder_transaction_buffer_release(proc, thread, buffer, 0, is_failure);
+ binder_release_entire_buffer(proc, thread, buffer, is_failure);
binder_alloc_free_buf(&proc->alloc, buffer);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0fa53349c3acba0239369ba4cd133740a408d246
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052832-ideally-gleeful-6ac6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0fa53349c3ac ("binder: add lockless binder_alloc_(set|get)_vma()")
c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
b15655b12ddc ("Revert "binder_alloc: add missing mmap_lock calls when using the VMA"")
7b0dbd940765 ("binder: fix binder_alloc kernel-doc warnings")
d6d04d71daae ("binder: remove binder_alloc_set_vma()")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fa53349c3acba0239369ba4cd133740a408d246 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Tue, 2 May 2023 20:12:19 +0000
Subject: [PATCH] binder: add lockless binder_alloc_(set|get)_vma()
Bring back the original lockless design in binder_alloc to determine
whether the buffer setup has been completed by the ->mmap() handler.
However, this time use smp_load_acquire() and smp_store_release() to
wrap all the ordering in a single macro call.
Also, add comments to make it evident that binder uses alloc->vma to
determine when the binder_alloc has been fully initialized. In these
scenarios acquiring the mmap_lock is not required.
Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index eb082b33115b..e7c9d466f8e8 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -309,17 +309,18 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
return vma ? -ENOMEM : -ESRCH;
}
+static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
+ struct vm_area_struct *vma)
+{
+ /* pairs with smp_load_acquire in binder_alloc_get_vma() */
+ smp_store_release(&alloc->vma, vma);
+}
+
static inline struct vm_area_struct *binder_alloc_get_vma(
struct binder_alloc *alloc)
{
- struct vm_area_struct *vma = NULL;
-
- if (alloc->vma) {
- /* Look at description in binder_alloc_set_vma */
- smp_rmb();
- vma = alloc->vma;
- }
- return vma;
+ /* pairs with smp_store_release in binder_alloc_set_vma() */
+ return smp_load_acquire(&alloc->vma);
}
static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
@@ -382,6 +383,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
size_t size, data_offsets_size;
int ret;
+ /* Check binder_alloc is fully initialized */
if (!binder_alloc_get_vma(alloc)) {
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf, no vma\n",
@@ -777,7 +779,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
buffer->free = 1;
binder_insert_free_buffer(alloc, buffer);
alloc->free_async_space = alloc->buffer_size / 2;
- alloc->vma = vma;
+
+ /* Signal binder_alloc is fully initialized */
+ binder_alloc_set_vma(alloc, vma);
return 0;
@@ -959,7 +963,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
*/
void binder_alloc_vma_close(struct binder_alloc *alloc)
{
- alloc->vma = 0;
+ binder_alloc_set_vma(alloc, NULL);
}
/**
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c0fd2101781ef761b636769b2f445351f71c3626
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052814-bannister-undaunted-57c2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
7b0dbd940765 ("binder: fix binder_alloc kernel-doc warnings")
d6d04d71daae ("binder: remove binder_alloc_set_vma()")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0fd2101781ef761b636769b2f445351f71c3626 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Tue, 2 May 2023 20:12:18 +0000
Subject: [PATCH] Revert "android: binder: stop saving a pointer to the VMA"
This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
This patch fixed an issue reported by syzkaller in [1]. However, this
turned out to be only a band-aid in binder. The root cause, as bisected
by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap()
when mas_preallocate() fails"). We no longer need the patch for binder.
Reverting such patch allows us to have a lockless access to alloc->vma
in specific cases where the mmap_lock is not required. This approach
avoids the contention that caused a performance regression.
[1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com
[cmllamas: resolved conflicts with rework of alloc->mm and removal of
binder_alloc_set_vma() also fixed comment section]
Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 92c814ec44fe..eb082b33115b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -213,7 +213,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
if (mm) {
mmap_read_lock(mm);
- vma = vma_lookup(mm, alloc->vma_addr);
+ vma = alloc->vma;
}
if (!vma && need_mm) {
@@ -314,9 +314,11 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
{
struct vm_area_struct *vma = NULL;
- if (alloc->vma_addr)
- vma = vma_lookup(alloc->mm, alloc->vma_addr);
-
+ if (alloc->vma) {
+ /* Look at description in binder_alloc_set_vma */
+ smp_rmb();
+ vma = alloc->vma;
+ }
return vma;
}
@@ -775,7 +777,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
buffer->free = 1;
binder_insert_free_buffer(alloc, buffer);
alloc->free_async_space = alloc->buffer_size / 2;
- alloc->vma_addr = vma->vm_start;
+ alloc->vma = vma;
return 0;
@@ -805,8 +807,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
buffers = 0;
mutex_lock(&alloc->mutex);
- BUG_ON(alloc->vma_addr &&
- vma_lookup(alloc->mm, alloc->vma_addr));
+ BUG_ON(alloc->vma);
while ((n = rb_first(&alloc->allocated_buffers))) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -958,7 +959,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
*/
void binder_alloc_vma_close(struct binder_alloc *alloc)
{
- alloc->vma_addr = 0;
+ alloc->vma = 0;
}
/**
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 0f811ac4bcff..138d1d5af9ce 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -75,7 +75,7 @@ struct binder_lru_page {
/**
* struct binder_alloc - per-binder proc state for binder allocator
* @mutex: protects binder_alloc fields
- * @vma_addr: vm_area_struct->vm_start passed to mmap_handler
+ * @vma: vm_area_struct passed to mmap_handler
* (invariant after mmap)
* @mm: copy of task->mm (invariant after open)
* @buffer: base of per-proc address space mapped via mmap
@@ -99,7 +99,7 @@ struct binder_lru_page {
*/
struct binder_alloc {
struct mutex mutex;
- unsigned long vma_addr;
+ struct vm_area_struct *vma;
struct mm_struct *mm;
void __user *buffer;
struct list_head buffers;
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 43a881073a42..c2b323bc3b3a 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
if (!binder_selftest_run)
return;
mutex_lock(&binder_selftest_lock);
- if (!binder_selftest_run || !alloc->vma_addr)
+ if (!binder_selftest_run || !alloc->vma)
goto done;
pr_info("STARTED\n");
binder_selftest_alloc_offset(alloc, end_offset, 0);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b15655b12ddca7ade09807f790bafb6fab61b50a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052842-ability-badness-50c6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b15655b12ddc ("Revert "binder_alloc: add missing mmap_lock calls when using the VMA"")
e66b77e50522 ("binder: rename alloc->vma_vm_mm to alloc->mm")
50e177c5bfd9 ("Merge 6.0-rc4 into char-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b15655b12ddca7ade09807f790bafb6fab61b50a Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Tue, 2 May 2023 20:12:17 +0000
Subject: [PATCH] Revert "binder_alloc: add missing mmap_lock calls when using
the VMA"
This reverts commit 44e602b4e52f70f04620bbbf4fe46ecb40170bde.
This caused a performance regression particularly when pages are getting
reclaimed. We don't need to acquire the mmap_lock to determine when the
binder buffer has been fully initialized. A subsequent patch will bring
back the lockless approach for this.
[cmllamas: resolved trivial conflicts with renaming of alloc->mm]
Fixes: 44e602b4e52f ("binder_alloc: add missing mmap_lock calls when using the VMA")
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 55a3c3c2409f..92c814ec44fe 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -380,15 +380,12 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
size_t size, data_offsets_size;
int ret;
- mmap_read_lock(alloc->mm);
if (!binder_alloc_get_vma(alloc)) {
- mmap_read_unlock(alloc->mm);
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf, no vma\n",
alloc->pid);
return ERR_PTR(-ESRCH);
}
- mmap_read_unlock(alloc->mm);
data_offsets_size = ALIGN(data_size, sizeof(void *)) +
ALIGN(offsets_size, sizeof(void *));
@@ -916,25 +913,17 @@ void binder_alloc_print_pages(struct seq_file *m,
* Make sure the binder_alloc is fully initialized, otherwise we might
* read inconsistent state.
*/
-
- mmap_read_lock(alloc->mm);
- if (binder_alloc_get_vma(alloc) == NULL) {
- mmap_read_unlock(alloc->mm);
- goto uninitialized;
- }
-
- mmap_read_unlock(alloc->mm);
- for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
- page = &alloc->pages[i];
- if (!page->page_ptr)
- free++;
- else if (list_empty(&page->lru))
- active++;
- else
- lru++;
+ if (binder_alloc_get_vma(alloc) != NULL) {
+ for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+ page = &alloc->pages[i];
+ if (!page->page_ptr)
+ free++;
+ else if (list_empty(&page->lru))
+ active++;
+ else
+ lru++;
+ }
}
-
-uninitialized:
mutex_unlock(&alloc->mutex);
seq_printf(m, " pages: %d:%d:%d\n", active, lru, free);
seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0f554e37dad416f445cd3ec5935f5aec1b0e7ba5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052807-thorn-tavern-5f16@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0f554e37dad4 ("arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed")
a8bd7f155126 ("arm64: dts: imx8qxp: add cadence usb3 support")
8065fc937f0f ("arm64: dts: imx8dxl: add usb1 and usb2 support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0f554e37dad416f445cd3ec5935f5aec1b0e7ba5 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li(a)nxp.com>
Date: Mon, 15 May 2023 12:20:53 -0400
Subject: [PATCH] arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at
super speed
Resolve USB 3.0 gadget failure for QM and QXPB0 in super speed mode with
single IN and OUT endpoints, like mass storage devices, due to incorrect
ACTUAL_MEM_SIZE in ep_cap2 (32k instead of actual 18k). Implement dt
property cdns,on-chip-buff-size to override ep_cap2 and set it to 18k for
imx8QM and imx8QXP chips. No adverse effects for 8QXP C0.
Cc: stable(a)vger.kernel.org
Fixes: dce49449e04f ("usb: cdns3: allocate TX FIFO size according to composite EP number")
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
index 2209c1ac6e9b..e62a43591361 100644
--- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
@@ -171,6 +171,7 @@ usbotg3_cdns3: usb@5b120000 {
interrupt-names = "host", "peripheral", "otg", "wakeup";
phys = <&usb3_phy>;
phy-names = "cdns3,usb3-phy";
+ cdns,on-chip-buff-size = /bits/ 16 <18>;
status = "disabled";
};
};
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x 0f554e37dad416f445cd3ec5935f5aec1b0e7ba5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052806-unlawful-nearest-9622@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
0f554e37dad4 ("arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed")
a8bd7f155126 ("arm64: dts: imx8qxp: add cadence usb3 support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0f554e37dad416f445cd3ec5935f5aec1b0e7ba5 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li(a)nxp.com>
Date: Mon, 15 May 2023 12:20:53 -0400
Subject: [PATCH] arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at
super speed
Resolve USB 3.0 gadget failure for QM and QXPB0 in super speed mode with
single IN and OUT endpoints, like mass storage devices, due to incorrect
ACTUAL_MEM_SIZE in ep_cap2 (32k instead of actual 18k). Implement dt
property cdns,on-chip-buff-size to override ep_cap2 and set it to 18k for
imx8QM and imx8QXP chips. No adverse effects for 8QXP C0.
Cc: stable(a)vger.kernel.org
Fixes: dce49449e04f ("usb: cdns3: allocate TX FIFO size according to composite EP number")
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
index 2209c1ac6e9b..e62a43591361 100644
--- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
@@ -171,6 +171,7 @@ usbotg3_cdns3: usb@5b120000 {
interrupt-names = "host", "peripheral", "otg", "wakeup";
phys = <&usb3_phy>;
phy-names = "cdns3,usb3-phy";
+ cdns,on-chip-buff-size = /bits/ 16 <18>;
status = "disabled";
};
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3bf8c6307bad5c0cc09cde982e146d847859b651
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052832-arise-impeding-18c7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3bf8c6307bad ("cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()")
2dd6d0ebf740 ("cpufreq: amd-pstate: Add guided autonomous mode")
3e6e07805764 ("Documentation: cpufreq: amd-pstate: Move amd_pstate param to alphabetical order")
5014603e409b ("Documentation: introduce amd pstate active mode kernel command line options")
ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
36c5014e5460 ("cpufreq: amd-pstate: optimize driver working mode selection in amd_pstate_param()")
4f3085f87b51 ("cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3bf8c6307bad5c0cc09cde982e146d847859b651 Mon Sep 17 00:00:00 2001
From: Wyes Karny <wyes.karny(a)amd.com>
Date: Thu, 18 May 2023 05:58:19 +0000
Subject: [PATCH] cpufreq: amd-pstate: Update policy->cur in
amd_pstate_adjust_perf()
Driver should update policy->cur after updating the frequency.
Currently amd_pstate doesn't update policy->cur when `adjust_perf`
is used. Which causes /proc/cpuinfo to show wrong cpu frequency.
Fix this by updating policy->cur with correct frequency value in
adjust_perf function callback.
- Before the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1777.016
1 cpu MHz : 1797.160
1 cpu MHz : 1797.270
189 cpu MHz : 400.000
- After the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1753.353
1 cpu MHz : 1756.838
1 cpu MHz : 1776.466
1 cpu MHz : 1776.873
1 cpu MHz : 1777.308
1 cpu MHz : 1779.900
183 cpu MHz : 1805.231
1 cpu MHz : 1956.815
1 cpu MHz : 2246.203
1 cpu MHz : 2259.984
Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
[ rjw: Subject edits ]
Cc: 5.17+ <stable(a)vger.kernel.org> # 5.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ac54779f5a49..ddd346a239e0 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -501,12 +501,14 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
unsigned long capacity)
{
unsigned long max_perf, min_perf, des_perf,
- cap_perf, lowest_nonlinear_perf;
+ cap_perf, lowest_nonlinear_perf, max_freq;
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
struct amd_cpudata *cpudata = policy->driver_data;
+ unsigned int target_freq;
cap_perf = READ_ONCE(cpudata->highest_perf);
lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+ max_freq = READ_ONCE(cpudata->max_freq);
des_perf = cap_perf;
if (target_perf < capacity)
@@ -523,6 +525,10 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
if (max_perf < min_perf)
max_perf = min_perf;
+ des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf);
+ target_freq = div_u64(des_perf * max_freq, max_perf);
+ policy->cur = target_freq;
+
amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
policy->governor->flags);
cpufreq_cpu_put(policy);
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x 3bf8c6307bad5c0cc09cde982e146d847859b651
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052831-chest-shabby-7961@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
3bf8c6307bad ("cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()")
2dd6d0ebf740 ("cpufreq: amd-pstate: Add guided autonomous mode")
3e6e07805764 ("Documentation: cpufreq: amd-pstate: Move amd_pstate param to alphabetical order")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3bf8c6307bad5c0cc09cde982e146d847859b651 Mon Sep 17 00:00:00 2001
From: Wyes Karny <wyes.karny(a)amd.com>
Date: Thu, 18 May 2023 05:58:19 +0000
Subject: [PATCH] cpufreq: amd-pstate: Update policy->cur in
amd_pstate_adjust_perf()
Driver should update policy->cur after updating the frequency.
Currently amd_pstate doesn't update policy->cur when `adjust_perf`
is used. Which causes /proc/cpuinfo to show wrong cpu frequency.
Fix this by updating policy->cur with correct frequency value in
adjust_perf function callback.
- Before the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1777.016
1 cpu MHz : 1797.160
1 cpu MHz : 1797.270
189 cpu MHz : 400.000
- After the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1753.353
1 cpu MHz : 1756.838
1 cpu MHz : 1776.466
1 cpu MHz : 1776.873
1 cpu MHz : 1777.308
1 cpu MHz : 1779.900
183 cpu MHz : 1805.231
1 cpu MHz : 1956.815
1 cpu MHz : 2246.203
1 cpu MHz : 2259.984
Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
[ rjw: Subject edits ]
Cc: 5.17+ <stable(a)vger.kernel.org> # 5.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ac54779f5a49..ddd346a239e0 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -501,12 +501,14 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
unsigned long capacity)
{
unsigned long max_perf, min_perf, des_perf,
- cap_perf, lowest_nonlinear_perf;
+ cap_perf, lowest_nonlinear_perf, max_freq;
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
struct amd_cpudata *cpudata = policy->driver_data;
+ unsigned int target_freq;
cap_perf = READ_ONCE(cpudata->highest_perf);
lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+ max_freq = READ_ONCE(cpudata->max_freq);
des_perf = cap_perf;
if (target_perf < capacity)
@@ -523,6 +525,10 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
if (max_perf < min_perf)
max_perf = min_perf;
+ des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf);
+ target_freq = div_u64(des_perf * max_freq, max_perf);
+ policy->cur = target_freq;
+
amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
policy->governor->flags);
cpufreq_cpu_put(policy);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x eb0764b822b9b26880b28ccb9100b2983e01bc17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052837-carve-antics-2e83@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports")
7bba261e0aa6 ("cxl/port: Scan single-target ports for decoders")
a5fcd228ca1d ("Merge branch 'for-6.3/cxl-rr-emu' into cxl/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eb0764b822b9b26880b28ccb9100b2983e01bc17 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 May 2023 20:19:43 -0700
Subject: [PATCH] cxl/port: Enable the HDM decoder capability for switch ports
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.
The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.
Cc: <stable(a)vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks(a)intel.com>
Fixes: 54cdbf845cf7 ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index f332fe7af92b..c35c002b65dd 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -241,17 +241,36 @@ static void disable_hdm(void *_cxlhdm)
hdm + CXL_HDM_DECODER_CTRL_OFFSET);
}
-static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
+int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
{
- void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+ void __iomem *hdm;
u32 global_ctrl;
+ /*
+ * If the hdm capability was not mapped there is nothing to enable and
+ * the caller is responsible for what happens next. For example,
+ * emulate a passthrough decoder.
+ */
+ if (IS_ERR(cxlhdm))
+ return 0;
+
+ hdm = cxlhdm->regs.hdm_decoder;
global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+ /*
+ * If the HDM decoder capability was enabled on entry, skip
+ * registering disable_hdm() since this decode capability may be
+ * owned by platform firmware.
+ */
+ if (global_ctrl & CXL_HDM_DECODER_ENABLE)
+ return 0;
+
writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
hdm + CXL_HDM_DECODER_CTRL_OFFSET);
- return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
+ return devm_add_action_or_reset(&port->dev, disable_hdm, cxlhdm);
}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_enable_hdm, CXL);
int cxl_dvsec_rr_decode(struct device *dev, int d,
struct cxl_endpoint_dvsec_info *info)
@@ -425,7 +444,7 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
if (info->mem_enabled)
return 0;
- rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
+ rc = devm_cxl_enable_hdm(port, cxlhdm);
if (rc)
return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 044a92d9813e..f93a28538962 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -710,6 +710,7 @@ struct cxl_endpoint_dvsec_info {
struct cxl_hdm;
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
+int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm);
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index eb57324c4ad4..17a95f469c26 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -60,13 +60,17 @@ static int discover_region(struct device *dev, void *root)
static int cxl_switch_port_probe(struct cxl_port *port)
{
struct cxl_hdm *cxlhdm;
- int rc;
+ int rc, nr_dports;
- rc = devm_cxl_port_enumerate_dports(port);
- if (rc < 0)
- return rc;
+ nr_dports = devm_cxl_port_enumerate_dports(port);
+ if (nr_dports < 0)
+ return nr_dports;
cxlhdm = devm_cxl_setup_hdm(port, NULL);
+ rc = devm_cxl_enable_hdm(port, cxlhdm);
+ if (rc)
+ return rc;
+
if (!IS_ERR(cxlhdm))
return devm_cxl_enumerate_decoders(cxlhdm, NULL);
@@ -75,7 +79,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)
return PTR_ERR(cxlhdm);
}
- if (rc == 1) {
+ if (nr_dports == 1) {
dev_dbg(&port->dev, "Fallback to passthrough decoder\n");
return devm_cxl_add_passthrough_decoder(port);
}
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index fba7bec96acd..6f9347ade82c 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -6,6 +6,7 @@ ldflags-y += --wrap=acpi_pci_find_root
ldflags-y += --wrap=nvdimm_bus_register
ldflags-y += --wrap=devm_cxl_port_enumerate_dports
ldflags-y += --wrap=devm_cxl_setup_hdm
+ldflags-y += --wrap=devm_cxl_enable_hdm
ldflags-y += --wrap=devm_cxl_add_passthrough_decoder
ldflags-y += --wrap=devm_cxl_enumerate_decoders
ldflags-y += --wrap=cxl_await_media_ready
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index de3933a776fd..284416527644 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -149,6 +149,21 @@ struct cxl_hdm *__wrap_devm_cxl_setup_hdm(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_setup_hdm, CXL);
+int __wrap_devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
+{
+ int index, rc;
+ struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+ if (ops && ops->is_mock_port(port->uport))
+ rc = 0;
+ else
+ rc = devm_cxl_enable_hdm(port, cxlhdm);
+ put_cxl_mock_ops(index);
+
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_enable_hdm, CXL);
+
int __wrap_devm_cxl_add_passthrough_decoder(struct cxl_port *port)
{
int rc, index;
[Resending as a plain text email]
Turned out that this is a mixture of an ACPICA issue and an EFISTUB issue.
Kernel v6.2 can boot by reverting the *both* of the following two commits:
- 5c62d5aab8752e5ee7bfbe75ed6060db1c787f98 "ACPICA: Events: Support
fixed PCIe wake event"
- e346bebbd36b1576a3335331fed61bb48c6d8823 "efi: libstub: Always
enable initrd command line loader and bump version"
Kernel v6.3 can boot by just reverting e346bebb, as 5c62d5a has been
already reverted in 8e41e0a575664d26bb87e012c39435c4c3914ed9.
The situation is the same for v6.4-rc3 too.
Note that in my test I let Virtualization.framework directly load
bzImage without GRUB (akin to `qemu-system-x86_64 -kernel bzImage`).
Apparently, reverting e346bebb is not necessary for loading bzImage via GRUB.
> Also, the reporter can't provide dmesg log (forget to attach serial console?).
Uploaded v6.1 dmesg in the bugzilla.
v6.2 dmesg can't be provided, as it hangs before printing something in
console=hvc0.
(IIUC, console=ttyS0 (RS-232C) is not implemented in Virtualization.framework.)
> 2023年5月25日(木) 21:46 Bagas Sanjaya <bagasdotme(a)gmail.com>:
>>
>> Hi,
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>> > Linux kernel >= v6.2 no longer boots on Apple's Virtualization.framework (x86_64).
>> >
>> > It is reported that the issue is not reproducible on ARM64: https://github.com/lima-vm/lima/issues/1577#issuecomment-1561577694
>> >
>> >
>> > ## Reproduction
>> > - Checkout the kernel repo, and run `make defconfig bzImage`.
>> >
>> > - Create an initrd (see the attached `initrd-example.txt`)
>> >
>> > - Transfer the bzImage and initrd to an Intel Mac.
>> >
>> > - On Mac, download `RunningLinuxInAVirtualMachine.zip` from https://developer.apple.com/documentation/virtualization/running_linux_in_a… , and build the `LinuxVirtualMachine` binary with Xcode.
>> > Building this binary with Xcode requires logging in to Apple.
>> > If you do not like logging in, a third party equivalent such as https://github.com/Code-Hex/vz/blob/v3.0.6/example/linux/main.go can be used.
>> >
>> > - Run `LinuxVirtualMachine /tmp/bzImage /tmp/initrd.img`.
>> > v6.1 successfully boots into the busybox shell.
>> > v6.2 just hangs before printing something in the console.
>> >
>> >
>> > ## Tested versions
>> > ```
>> > v6.1: OK
>> > ...
>> > v6.1.0-rc2-00002-g60f2096b59bc (included in v6.2-rc1): OK
>> > v6.1.0-rc2-00003-g5c62d5aab875 (included in v6.2-rc1): NG <-- This commit caused a regression
>> > ...
>> > v6.2-rc1: NG
>> > ...
>> > v6.2: NG
>> > ...
>> > v6.3.0-rc7-00181-g8e41e0a57566 (included in v6.3): NG <-- Reverts 5c62d5aab875 but still NG
>> > ...
>> > v6.3: NG
>> > v6.4-rc3: NG
>> > ```
>> >
>> > Tested on MacBookPro 2020 (Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz) running macOS 13.4.
>> >
>> >
>> > The issue seems a regression in [5c62d5aab8752e5ee7bfbe75ed6060db1c787f98](https://git.kernel.org/pub/scm/li… "ACPICA: Events: Support fixed PCIe wake event".
>> >
>> > This commit was introduced in v6.2-rc1, and apparently reverted in v6.3 ([8e41e0a575664d26bb87e012c39435c4c3914ed9](https://git.kernel.org/pub/scm/li…).
>> > However, v6.3 and the latest v6.4-rc3 still don't boot.
>>
>> See bugzilla for the full thread.
>>
>> Interestingly, this regression still occurs despite the culprit is
>> reverted in 8e41e0a575664d ("Revert "ACPICA: Events: Support fixed
>> PCIe wake event""), so this (obviously) isn't wake-on-lan regression,
>> but rather early boot one.
>>
>> Also, the reporter can't provide dmesg log (forget to attach serial
>> console?).
>>
>> Anyway, I'm adding it to regzbot:
>>
>> #regzbot introduced: 5c62d5aab8752e https://bugzilla.kernel.org/show_bug.cgi?id=217485
>> #regzbot title: Linux v6.2+ (x86_64) no longer boots on Apple's Virtualization framework (ACPICA issue)
>>
>> Thanks.
>>
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217485
>>
>> --
>> An old man doll... just what I always wanted! - Clara
Hi Greg,
A regression in 6.3.0 has been identified in XFS that causes
filesystem corruption. It has been seen in the wild by a number of
users, and bisected down to an issued we'd already fixed in 6.4-rc1
with commit:
9419092fb263 ("xfs: fix livelock in delayed allocation at ENOSPC")
This was reported with much less harmful symptoms (alloctor
livelock) and it wasn't realised that it could have other, more
impactful symptoms. A reproducer for the corruption was found
yesterday and, soon after than, the cause of the corruption reports
was identified.
The commit applies cleanly to a 6.3.0 kernel here, so it should also
apply cleanly to a current 6.3.x kernel. I've included the entire
commit below in case that's easier for you.
Can you please pull this commit into the next 6.3.x release as a
matter of priority?
Cheers,
Dave.
--
Dave Chinner
david(a)fromorbit.com
xfs: fix livelock in delayed allocation at ENOSPC
From: Dave Chinner <dchinner(a)redhat.com>
On a filesystem with a non-zero stripe unit and a large sequential
write, delayed allocation will set a minimum allocation length of
the stripe unit. If allocation fails because there are no extents
long enough for an aligned minlen allocation, it is supposed to
fall back to unaligned allocation which allows single block extents
to be allocated.
When the allocator code was rewritting in the 6.3 cycle, this
fallback was broken - the old code used args->fsbno as the both the
allocation target and the allocation result, the new code passes the
target as a separate parameter. The conversion didn't handle the
aligned->unaligned fallback path correctly - it reset args->fsbno to
the target fsbno on failure which broke allocation failure detection
in the high level code and so it never fell back to unaligned
allocations.
This resulted in a loop in writeback trying to allocate an aligned
block, getting a false positive success, trying to insert the result
in the BMBT. This did nothing because the extent already was in the
BMBT (merge results in an unchanged extent) and so it returned the
prior extent to the conversion code as the current iomap.
Because the iomap returned didn't cover the offset we tried to map,
xfs_convert_blocks() then retries the allocation, which fails in the
same way and now we have a livelock.
Reported-and-tested-by: Brian Foster <bfoster(a)redhat.com>
Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()")
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
---
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1a4e446194dd..b512de0540d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3540,7 +3540,6 @@ xfs_bmap_btalloc_at_eof(
* original non-aligned state so the caller can proceed on allocation
* failure as if this function was never called.
*/
- args->fsbno = ap->blkno;
args->alignment = 1;
return 0;
}
Greetings.
I am Ms Nadage Lassou,I have something important to discuss with you.
i will send you the details once i hear from you.
Thanks,
Ms Nadage Lassou
Western Union Money Transfer
Republic of Burkina Faso
Address: CG46+G6G, Somgandé, Ouagadougou, Burkina Faso
website: www.westernunion.com
Money Available For Pick Up,
Western Union Money Transfer in conjunction with the United Nation during
their recently concluded meeting with the International Monetary Fund (IMF)
Payment Reconciliation Board Committee agreed to pay you a total amount of
($350.000.00) Three Hundred and Fifty Thousand US Dollars Only
compensation fund. The awarded compensation amount has been signed under
giving credence ensuring your payment transaction is devoid of any form of
illegality to be paid to you for the money you lost to scammers through
internet frauds.
Your name appeared on our payment schedule list 2023 of beneficiaries that
will receive their funds in this 2nd Quarter Payment of the year 2023. The
sum of $5,000.00 daily sending or its equivalent in your local currency
will be transferred to you via Western Union every day until all the
($350.000.00) Three Hundred and Fifty Thousand US Dollars Only is
completely sent across to you.
Meanwhile, we have registered your First Payment Transfer Sum $5.000 USD
online through Western Union Money Transfer. For you to confirm your First
Payment Transfer please follow the instructions. Open this website:
www.westernunion.com, then click "Track Transfer", then enter this Tracking
Number (MTCN) 9361788041 and click "Track Transfer" to confirm you will see
the status of the transaction online.
Below are the details you need to present to the Western Union officials.
Also go with your ID Card.
MTCN: 9361788041
Sender Names: Prince Maxwell Okoronkwo
Sender City: Ouagadougou
Sender Country: Burkina Faso
Amount sent: $5.000 Dollars
Money Transfer | Global Money Transfer | Western Union Go to any Western
Union Office in your area and pick it up. Make sure you go with an
Identification of yourself like a Driving License or National ID or
International Passport.
Sincerely,
Ms Omarh Waleed.
Send Money Worldwide Registered C 2007 - 2023 Western Union Money Transfer.
All Right Reserved. This email is intended for the named recipient only and
may contain privileged and confidential information. If you have received
this email in error, please notify us immediately. Please do not disclose
the contents to anyone or copy it to outside parties. Our website: http.
westernunion.com. Message from Western Union Aspect of your payment Burkina
Faso
On some arch (for example IPQ8074) and other with
KERNEL_STACKPROTECTOR_STRONG enabled, the following compilation error is
triggered:
drivers/net/wireguard/allowedips.c: In function 'root_remove_peer_lists':
drivers/net/wireguard/allowedips.c:80:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
80 | }
| ^
drivers/net/wireguard/allowedips.c: In function 'root_free_rcu':
drivers/net/wireguard/allowedips.c:67:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
67 | }
| ^
cc1: all warnings being treated as errors
Since these are free function and returns void, using function that can
fail is not ideal since an error would result in data not freed.
Since the free are under RCU lock, we can allocate the required stack
array as static outside the function and memset when needed.
This effectively fix the stack frame warning without changing how the
function work.
Fixes: Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/net/wireguard/allowedips.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
index 5bf7822c53f1..c129082f04c6 100644
--- a/drivers/net/wireguard/allowedips.c
+++ b/drivers/net/wireguard/allowedips.c
@@ -53,12 +53,16 @@ static void node_free_rcu(struct rcu_head *rcu)
kmem_cache_free(node_cache, container_of(rcu, struct allowedips_node, rcu));
}
+static struct allowedips_node *tmpstack[MAX_ALLOWEDIPS_BITS];
+
static void root_free_rcu(struct rcu_head *rcu)
{
- struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = {
- container_of(rcu, struct allowedips_node, rcu) };
+ struct allowedips_node *node, **stack = tmpstack;
unsigned int len = 1;
+ memset(stack, 0, sizeof(*stack) * MAX_ALLOWEDIPS_BITS);
+ stack[0] = container_of(rcu, struct allowedips_node, rcu);
+
while (len > 0 && (node = stack[--len])) {
push_rcu(stack, node->bit[0], &len);
push_rcu(stack, node->bit[1], &len);
@@ -68,9 +72,12 @@ static void root_free_rcu(struct rcu_head *rcu)
static void root_remove_peer_lists(struct allowedips_node *root)
{
- struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = { root };
+ struct allowedips_node *node, **stack = tmpstack;
unsigned int len = 1;
+ memset(stack, 0, sizeof(*stack) * MAX_ALLOWEDIPS_BITS);
+ stack[0] = root;
+
while (len > 0 && (node = stack[--len])) {
push_rcu(stack, node->bit[0], &len);
push_rcu(stack, node->bit[1], &len);
--
2.39.2
Hello dear
Do you have the passion for humanitarian welfare?
Can you devote your time and be totally committed and devoted
to run multi-million pounds humanitarian charity project sponsored
totally by me; with an incentive/compensation accrual to you for
your time and effort and at no cost to you.
If interested, reply me for the full details
Thanks & regards
Les Scadding.
� ��������� � ��� � ������ �������!
�������� ����� ����������
���������-�������� ��� "��������������"
���.: +7 (48444) 6-91-69 ���. 495
e-mail: info(a)ludinovocable.ru
�������������� � ������ ��������
Commit ebeb20a9cd3f ("soc: qcom: mdt_loader: Always invoke PAS
mem_setup") dropped the relocate check and made pas_mem_setup run
unconditionally. The code was later moved with commit f4e526ff7e38
("soc: qcom: mdt_loader: Extract PAS operations") to
qcom_mdt_pas_init() effectively losing track of what was actually
done.
The assumption that PAS mem_setup can be done anytime was effectively
wrong, with no good reason and this caused regression on some SoC
that use remoteproc to bringup ath11k. One example is IPQ8074 SoC that
effectively broke resulting in remoteproc silently die and ath11k not
working.
On this SoC FW relocate is not enabled and PAS mem_setup was correctly
skipped in previous kernel version resulting in correct bringup and
function of remoteproc and ath11k.
To fix the regression, reintroduce the relocate check in
qcom_mdt_pas_init() and correctly skip PAS mem_setup where relocate is
not enabled.
Fixes: ebeb20a9cd3f ("soc: qcom: mdt_loader: Always invoke PAS mem_setup")
Tested-by: Robert Marko <robimarko(a)gmail.com>
Co-developed-by: Robert Marko <robimarko(a)gmail.com>
Signed-off-by: Robert Marko <robimarko(a)gmail.com>
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/soc/qcom/mdt_loader.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/soc/qcom/mdt_loader.c b/drivers/soc/qcom/mdt_loader.c
index 33dd8c315eb7..46820bcdae98 100644
--- a/drivers/soc/qcom/mdt_loader.c
+++ b/drivers/soc/qcom/mdt_loader.c
@@ -210,6 +210,7 @@ int qcom_mdt_pas_init(struct device *dev, const struct firmware *fw,
const struct elf32_hdr *ehdr;
phys_addr_t min_addr = PHYS_ADDR_MAX;
phys_addr_t max_addr = 0;
+ bool relocate = false;
size_t metadata_len;
void *metadata;
int ret;
@@ -224,6 +225,9 @@ int qcom_mdt_pas_init(struct device *dev, const struct firmware *fw,
if (!mdt_phdr_valid(phdr))
continue;
+ if (phdr->p_flags & QCOM_MDT_RELOCATABLE)
+ relocate = true;
+
if (phdr->p_paddr < min_addr)
min_addr = phdr->p_paddr;
@@ -246,11 +250,13 @@ int qcom_mdt_pas_init(struct device *dev, const struct firmware *fw,
goto out;
}
- ret = qcom_scm_pas_mem_setup(pas_id, mem_phys, max_addr - min_addr);
- if (ret) {
- /* Unable to set up relocation */
- dev_err(dev, "error %d setting up firmware %s\n", ret, fw_name);
- goto out;
+ if (relocate) {
+ ret = qcom_scm_pas_mem_setup(pas_id, mem_phys, max_addr - min_addr);
+ if (ret) {
+ /* Unable to set up relocation */
+ dev_err(dev, "error %d setting up firmware %s\n", ret, fw_name);
+ goto out;
+ }
}
out:
--
2.39.2
If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses
in damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide
error, let's validate the values of them in damon_set_attrs() to fix it,
which similar to others attrs check.
Reported-by: syzbot+841a46899768ec7bec67(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67
Link: https://lore.kernel.org/damon/00000000000055fc4e05fc975bc2@google.com/
Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes")
Cc: <stable(a)vger.kernel.org> # 6.3.x-
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
---
v2: close checkpatch warning, add RB/cc stable, per SJ
mm/damon/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index d9ef62047bf5..91cff7f2997e 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -551,6 +551,8 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs)
return -EINVAL;
if (attrs->min_nr_regions > attrs->max_nr_regions)
return -EINVAL;
+ if (attrs->sample_interval > attrs->aggr_interval)
+ return -EINVAL;
damon_update_monitoring_results(ctx, attrs);
ctx->attrs = *attrs;
--
2.35.3
Libuv recently started using it so there is at least one consumer now.
Cc: stable(a)vger.kernel.org
Fixes: 61a2732af4b0 ("io_uring: deprecate epoll_ctl support")
Link: https://github.com/libuv/libuv/pull/3979
Signed-off-by: Ben Noordhuis <info(a)bnoordhuis.nl>
---
io_uring/epoll.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/io_uring/epoll.c b/io_uring/epoll.c
index 9aa74d2c80bc..89bff2068a19 100644
--- a/io_uring/epoll.c
+++ b/io_uring/epoll.c
@@ -25,10 +25,6 @@ int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll);
- pr_warn_once("%s: epoll_ctl support in io_uring is deprecated and will "
- "be removed in a future Linux kernel version.\n",
- current->comm);
-
if (sqe->buf_index || sqe->splice_fd_in)
return -EINVAL;
--
2.39.2
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x ce0b15d11ad837fbacc5356941712218e38a0a83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052616-audibly-grinning-73b4@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
ce0b15d11ad8 ("x86/mm: Avoid incomplete Global INVLPG flushes")
6cff64b86aaa ("x86/mm: Use INVPCID for __native_flush_tlb_single()")
6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
48e111982cda ("x86/mm: Abstract switching CR3")
2ea907c4fe7b ("x86/mm: Allow flushing for future ASID switches")
aa8c6248f8c7 ("x86/mm/pti: Add infrastructure for page table isolation")
8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching")
613e396bc0d4 ("init: Invoke init_espfix_bsp() from mm_init()")
1a3b0caeb77e ("x86/mm: Create asm/invpcid.h")
dd95f1a4b5ca ("x86/mm: Put MMU to hardware ASID translation in one place")
cb0a9144a744 ("x86/mm: Remove hard-coded ASID limit checks")
50fb83a62cf4 ("x86/mm: Move the CR3 construction functions to tlbflush.h")
3f67af51e56f ("x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what")
23cb7d46f371 ("x86/microcode: Dont abuse the TLB-flush interface")
c482feefe1ae ("x86/entry/64: Make cpu_entry_area.tss read-only")
0f9a48100fba ("x86/entry: Clean up the SYSENTER_stack code")
7fbbd5cbebf1 ("x86/entry/64: Remove the SYSENTER stack canary")
40e7f949e0d9 ("x86/entry/64: Move the IST stacks into struct cpu_entry_area")
3386bc8aed82 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce0b15d11ad837fbacc5356941712218e38a0a83 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 16 May 2023 12:24:25 -0700
Subject: [PATCH] x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..8192452d1d2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
}
}
+#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ {}
+};
+
static void setup_pcid(void)
{
if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
+ if (x86_match_cpu(invlpg_miss_ids)) {
+ pr_info("Incomplete global flushes, disabling PCID");
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ return;
+ }
+
if (boot_cpu_has(X86_FEATURE_PGE)) {
/*
* This can't be cr4_set_bits_and_update_boot() -- the
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ce0b15d11ad837fbacc5356941712218e38a0a83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052615-manila-armoire-d077@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ce0b15d11ad8 ("x86/mm: Avoid incomplete Global INVLPG flushes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce0b15d11ad837fbacc5356941712218e38a0a83 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 16 May 2023 12:24:25 -0700
Subject: [PATCH] x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..8192452d1d2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
}
}
+#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ {}
+};
+
static void setup_pcid(void)
{
if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
+ if (x86_match_cpu(invlpg_miss_ids)) {
+ pr_info("Incomplete global flushes, disabling PCID");
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ return;
+ }
+
if (boot_cpu_has(X86_FEATURE_PGE)) {
/*
* This can't be cr4_set_bits_and_update_boot() -- the
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ce0b15d11ad837fbacc5356941712218e38a0a83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052614-ascertain-reshuffle-d127@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ce0b15d11ad8 ("x86/mm: Avoid incomplete Global INVLPG flushes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce0b15d11ad837fbacc5356941712218e38a0a83 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 16 May 2023 12:24:25 -0700
Subject: [PATCH] x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..8192452d1d2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
}
}
+#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ {}
+};
+
static void setup_pcid(void)
{
if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
+ if (x86_match_cpu(invlpg_miss_ids)) {
+ pr_info("Incomplete global flushes, disabling PCID");
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ return;
+ }
+
if (boot_cpu_has(X86_FEATURE_PGE)) {
/*
* This can't be cr4_set_bits_and_update_boot() -- the
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ce0b15d11ad837fbacc5356941712218e38a0a83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052613-galore-flame-b5de@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ce0b15d11ad8 ("x86/mm: Avoid incomplete Global INVLPG flushes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce0b15d11ad837fbacc5356941712218e38a0a83 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 16 May 2023 12:24:25 -0700
Subject: [PATCH] x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..8192452d1d2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
}
}
+#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ {}
+};
+
static void setup_pcid(void)
{
if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
+ if (x86_match_cpu(invlpg_miss_ids)) {
+ pr_info("Incomplete global flushes, disabling PCID");
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ return;
+ }
+
if (boot_cpu_has(X86_FEATURE_PGE)) {
/*
* This can't be cr4_set_bits_and_update_boot() -- the
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052649-kilt-concert-cd8a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
81302b1c7c99 ("ALSA: hda: Fix unhandled register update during auto-suspend period")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Thu, 18 May 2023 13:35:20 +0200
Subject: [PATCH] ALSA: hda: Fix unhandled register update during auto-suspend
period
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It's reported that the recording started right after the driver probe
doesn't work properly, and it turned out that this is related with the
codec auto-suspend. Namely, after the probe phase, the usage count
goes zero, and the auto-suspend is programmed, but the codec is kept
still active until the auto-suspend expiration. When an application
(e.g. alsactl) updates the mixer values at this moment, the values are
cached but not actually written. Then, starting arecord thereafter
also results in the silence because of the missing unmute.
The root cause is the handling of "lazy update" mode; when a mixer
value is updated *after* the suspend, it should update only the cache
and exits. At the resume, the cached value is written to the device,
in turn. The problem is that the current code misinterprets the state
of auto-suspend as if it were already suspended.
Although we can add the check of the actual device state after
pm_runtime_get_if_in_use() for catching the missing state, this won't
suffice; the second call of regmap_update_bits_check() will skip
writing the register because the cache has been already updated by the
first call. So we'd need fixes in two different places.
OTOH, a simpler fix is to replace pm_runtime_get_if_in_use() with
pm_runtime_get_if_active() (with ign_usage_count=true). This change
implies that the driver takes the pm refcount if the device is still
in ACTIVE state and continues the processing. A small caveat is that
this will leave the auto-suspend timer. But, since the timer callback
itself checks the device state and aborts gracefully when it's active,
this won't be any substantial problem.
Long story short: we address the missing register-write problem just
by replacing the pm_runtime_*() call in snd_hda_keep_power_up().
Fixes: fc4f000bf8c0 ("ALSA: hda - Fix unexpected resume through regmap code path")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
Closes: https://lore.kernel.org/r/a7478636-af11-92ab-731c-9b13c582a70d@linux.intel.…
Suggested-by: Cezary Rojewski <cezary.rojewski(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230518113520.15213-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/hda/hdac_device.c b/sound/hda/hdac_device.c
index accc9d279ce5..6c043fbd606f 100644
--- a/sound/hda/hdac_device.c
+++ b/sound/hda/hdac_device.c
@@ -611,7 +611,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_power_up_pm);
int snd_hdac_keep_power_up(struct hdac_device *codec)
{
if (!atomic_inc_not_zero(&codec->in_pm)) {
- int ret = pm_runtime_get_if_in_use(&codec->dev);
+ int ret = pm_runtime_get_if_active(&codec->dev, true);
if (!ret)
return -1;
if (ret < 0)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052647-had-steering-38d3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
81302b1c7c99 ("ALSA: hda: Fix unhandled register update during auto-suspend period")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Thu, 18 May 2023 13:35:20 +0200
Subject: [PATCH] ALSA: hda: Fix unhandled register update during auto-suspend
period
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It's reported that the recording started right after the driver probe
doesn't work properly, and it turned out that this is related with the
codec auto-suspend. Namely, after the probe phase, the usage count
goes zero, and the auto-suspend is programmed, but the codec is kept
still active until the auto-suspend expiration. When an application
(e.g. alsactl) updates the mixer values at this moment, the values are
cached but not actually written. Then, starting arecord thereafter
also results in the silence because of the missing unmute.
The root cause is the handling of "lazy update" mode; when a mixer
value is updated *after* the suspend, it should update only the cache
and exits. At the resume, the cached value is written to the device,
in turn. The problem is that the current code misinterprets the state
of auto-suspend as if it were already suspended.
Although we can add the check of the actual device state after
pm_runtime_get_if_in_use() for catching the missing state, this won't
suffice; the second call of regmap_update_bits_check() will skip
writing the register because the cache has been already updated by the
first call. So we'd need fixes in two different places.
OTOH, a simpler fix is to replace pm_runtime_get_if_in_use() with
pm_runtime_get_if_active() (with ign_usage_count=true). This change
implies that the driver takes the pm refcount if the device is still
in ACTIVE state and continues the processing. A small caveat is that
this will leave the auto-suspend timer. But, since the timer callback
itself checks the device state and aborts gracefully when it's active,
this won't be any substantial problem.
Long story short: we address the missing register-write problem just
by replacing the pm_runtime_*() call in snd_hda_keep_power_up().
Fixes: fc4f000bf8c0 ("ALSA: hda - Fix unexpected resume through regmap code path")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
Closes: https://lore.kernel.org/r/a7478636-af11-92ab-731c-9b13c582a70d@linux.intel.…
Suggested-by: Cezary Rojewski <cezary.rojewski(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230518113520.15213-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/hda/hdac_device.c b/sound/hda/hdac_device.c
index accc9d279ce5..6c043fbd606f 100644
--- a/sound/hda/hdac_device.c
+++ b/sound/hda/hdac_device.c
@@ -611,7 +611,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_power_up_pm);
int snd_hdac_keep_power_up(struct hdac_device *codec)
{
if (!atomic_inc_not_zero(&codec->in_pm)) {
- int ret = pm_runtime_get_if_in_use(&codec->dev);
+ int ret = pm_runtime_get_if_active(&codec->dev, true);
if (!ret)
return -1;
if (ret < 0)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052628-overfull-secular-50e4@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
81302b1c7c99 ("ALSA: hda: Fix unhandled register update during auto-suspend period")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81302b1c7c997e8a56c1c2fc63a296ebeb0cd2d0 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Thu, 18 May 2023 13:35:20 +0200
Subject: [PATCH] ALSA: hda: Fix unhandled register update during auto-suspend
period
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It's reported that the recording started right after the driver probe
doesn't work properly, and it turned out that this is related with the
codec auto-suspend. Namely, after the probe phase, the usage count
goes zero, and the auto-suspend is programmed, but the codec is kept
still active until the auto-suspend expiration. When an application
(e.g. alsactl) updates the mixer values at this moment, the values are
cached but not actually written. Then, starting arecord thereafter
also results in the silence because of the missing unmute.
The root cause is the handling of "lazy update" mode; when a mixer
value is updated *after* the suspend, it should update only the cache
and exits. At the resume, the cached value is written to the device,
in turn. The problem is that the current code misinterprets the state
of auto-suspend as if it were already suspended.
Although we can add the check of the actual device state after
pm_runtime_get_if_in_use() for catching the missing state, this won't
suffice; the second call of regmap_update_bits_check() will skip
writing the register because the cache has been already updated by the
first call. So we'd need fixes in two different places.
OTOH, a simpler fix is to replace pm_runtime_get_if_in_use() with
pm_runtime_get_if_active() (with ign_usage_count=true). This change
implies that the driver takes the pm refcount if the device is still
in ACTIVE state and continues the processing. A small caveat is that
this will leave the auto-suspend timer. But, since the timer callback
itself checks the device state and aborts gracefully when it's active,
this won't be any substantial problem.
Long story short: we address the missing register-write problem just
by replacing the pm_runtime_*() call in snd_hda_keep_power_up().
Fixes: fc4f000bf8c0 ("ALSA: hda - Fix unexpected resume through regmap code path")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
Closes: https://lore.kernel.org/r/a7478636-af11-92ab-731c-9b13c582a70d@linux.intel.…
Suggested-by: Cezary Rojewski <cezary.rojewski(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230518113520.15213-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/hda/hdac_device.c b/sound/hda/hdac_device.c
index accc9d279ce5..6c043fbd606f 100644
--- a/sound/hda/hdac_device.c
+++ b/sound/hda/hdac_device.c
@@ -611,7 +611,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_power_up_pm);
int snd_hdac_keep_power_up(struct hdac_device *codec)
{
if (!atomic_inc_not_zero(&codec->in_pm)) {
- int ret = pm_runtime_get_if_in_use(&codec->dev);
+ int ret = pm_runtime_get_if_active(&codec->dev, true);
if (!ret)
return -1;
if (ret < 0)
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 05ed84c2fc4c..35417e0af3f4 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -353,16 +353,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -373,7 +383,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -384,9 +395,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -402,7 +411,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -411,14 +420,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -471,10 +489,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -518,10 +536,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -548,10 +566,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.30.2
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 3632679d9e4f879f49949bb5b050e0de553e4739
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052637-smudge-shucking-ac36@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
91d0b78c5177 ("inet: Add IP_LOCAL_PORT_RANGE socket option")
28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
d2c135619cb8 ("inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict()")
ca7af0402550 ("tcp: add small random increments to the source port")
ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
938cca9e4109 ("sock: fix /proc/net/sockstat underflow in sk_clone_lock()")
990c74e3f41d ("memcg: enable accounting for inet_bin_bucket cache")
333bb73f620e ("tcp: Keep TCP_CLOSE sockets in the reuseport group.")
5c040eaf5d17 ("tcp: Add num_closed_socks to struct sock_reuseport.")
c579bd1b4021 ("tcp: add some entropy in __inet_hash_connect()")
190cc82489f4 ("tcp: change source port randomizarion at connect() time")
bbc20b70424a ("net: reduce indentation level in sk_clone_lock()")
62ffc589abb1 ("net: refactor bind_bucket fastreuse into helper")
47ec5303d73e ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3632679d9e4f879f49949bb5b050e0de553e4739 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Date: Mon, 22 May 2023 14:08:20 +0200
Subject: [PATCH] ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..acec504c469a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@ struct ipcm_cookie {
__be32 addr;
int oif;
struct ip_options_rcu *opt;
+ __u8 protocol;
__u8 ttl;
__s16 tos;
char priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm->sockc.tsflags = inet->sk.sk_tsflags;
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
+ ipcm->protocol = inet->inet_num;
}
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 4b7f2df66b99..e682ab628dfa 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -163,6 +163,7 @@ struct in_addr {
#define IP_MULTICAST_ALL 49
#define IP_UNICAST_IF 50
#define IP_LOCAL_PORT_RANGE 51
+#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0
#define MCAST_INCLUDE 1
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b511ff0adc0a..8e97d8d4cc9d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->tos = val;
ipc->priority = rt_tos2priority(ipc->tos);
break;
-
+ case IP_PROTOCOL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->protocol = val;
+ break;
default:
return -EINVAL;
}
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
+ case IP_PROTOCOL:
+ val = inet_sk(sk)->inet_num;
+ break;
default:
sockopt_release_sock(sk);
return -ENOPROTOOPT;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..eadf1c9ef7e4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
ipcm_init_sk(&ipc, inet);
+ /* Keep backward compat */
+ if (hdrincl)
+ ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) {
err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
RT_SCOPE_UNIVERSE,
- hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? ipc.protocol : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..44ee7a2e72ac 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto)
proto = inet->inet_num;
- else if (proto != inet->inet_num)
+ else if (proto != inet->inet_num &&
+ inet->inet_num != IPPROTO_RAW)
return -EINVAL;
if (proto > 255)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 3632679d9e4f879f49949bb5b050e0de553e4739
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052636-scrounger-dirtiness-37d2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
91d0b78c5177 ("inet: Add IP_LOCAL_PORT_RANGE socket option")
28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
d2c135619cb8 ("inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict()")
ca7af0402550 ("tcp: add small random increments to the source port")
ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
938cca9e4109 ("sock: fix /proc/net/sockstat underflow in sk_clone_lock()")
990c74e3f41d ("memcg: enable accounting for inet_bin_bucket cache")
333bb73f620e ("tcp: Keep TCP_CLOSE sockets in the reuseport group.")
5c040eaf5d17 ("tcp: Add num_closed_socks to struct sock_reuseport.")
c579bd1b4021 ("tcp: add some entropy in __inet_hash_connect()")
190cc82489f4 ("tcp: change source port randomizarion at connect() time")
bbc20b70424a ("net: reduce indentation level in sk_clone_lock()")
62ffc589abb1 ("net: refactor bind_bucket fastreuse into helper")
47ec5303d73e ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3632679d9e4f879f49949bb5b050e0de553e4739 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Date: Mon, 22 May 2023 14:08:20 +0200
Subject: [PATCH] ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..acec504c469a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@ struct ipcm_cookie {
__be32 addr;
int oif;
struct ip_options_rcu *opt;
+ __u8 protocol;
__u8 ttl;
__s16 tos;
char priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm->sockc.tsflags = inet->sk.sk_tsflags;
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
+ ipcm->protocol = inet->inet_num;
}
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 4b7f2df66b99..e682ab628dfa 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -163,6 +163,7 @@ struct in_addr {
#define IP_MULTICAST_ALL 49
#define IP_UNICAST_IF 50
#define IP_LOCAL_PORT_RANGE 51
+#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0
#define MCAST_INCLUDE 1
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b511ff0adc0a..8e97d8d4cc9d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->tos = val;
ipc->priority = rt_tos2priority(ipc->tos);
break;
-
+ case IP_PROTOCOL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->protocol = val;
+ break;
default:
return -EINVAL;
}
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
+ case IP_PROTOCOL:
+ val = inet_sk(sk)->inet_num;
+ break;
default:
sockopt_release_sock(sk);
return -ENOPROTOOPT;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..eadf1c9ef7e4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
ipcm_init_sk(&ipc, inet);
+ /* Keep backward compat */
+ if (hdrincl)
+ ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) {
err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
RT_SCOPE_UNIVERSE,
- hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? ipc.protocol : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..44ee7a2e72ac 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto)
proto = inet->inet_num;
- else if (proto != inet->inet_num)
+ else if (proto != inet->inet_num &&
+ inet->inet_num != IPPROTO_RAW)
return -EINVAL;
if (proto > 255)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3632679d9e4f879f49949bb5b050e0de553e4739
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052635-styling-unbutton-ac91@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
91d0b78c5177 ("inet: Add IP_LOCAL_PORT_RANGE socket option")
28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
d2c135619cb8 ("inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict()")
ca7af0402550 ("tcp: add small random increments to the source port")
ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
938cca9e4109 ("sock: fix /proc/net/sockstat underflow in sk_clone_lock()")
990c74e3f41d ("memcg: enable accounting for inet_bin_bucket cache")
333bb73f620e ("tcp: Keep TCP_CLOSE sockets in the reuseport group.")
5c040eaf5d17 ("tcp: Add num_closed_socks to struct sock_reuseport.")
c579bd1b4021 ("tcp: add some entropy in __inet_hash_connect()")
190cc82489f4 ("tcp: change source port randomizarion at connect() time")
bbc20b70424a ("net: reduce indentation level in sk_clone_lock()")
62ffc589abb1 ("net: refactor bind_bucket fastreuse into helper")
47ec5303d73e ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3632679d9e4f879f49949bb5b050e0de553e4739 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Date: Mon, 22 May 2023 14:08:20 +0200
Subject: [PATCH] ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..acec504c469a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@ struct ipcm_cookie {
__be32 addr;
int oif;
struct ip_options_rcu *opt;
+ __u8 protocol;
__u8 ttl;
__s16 tos;
char priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm->sockc.tsflags = inet->sk.sk_tsflags;
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
+ ipcm->protocol = inet->inet_num;
}
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 4b7f2df66b99..e682ab628dfa 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -163,6 +163,7 @@ struct in_addr {
#define IP_MULTICAST_ALL 49
#define IP_UNICAST_IF 50
#define IP_LOCAL_PORT_RANGE 51
+#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0
#define MCAST_INCLUDE 1
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b511ff0adc0a..8e97d8d4cc9d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->tos = val;
ipc->priority = rt_tos2priority(ipc->tos);
break;
-
+ case IP_PROTOCOL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->protocol = val;
+ break;
default:
return -EINVAL;
}
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
+ case IP_PROTOCOL:
+ val = inet_sk(sk)->inet_num;
+ break;
default:
sockopt_release_sock(sk);
return -ENOPROTOOPT;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..eadf1c9ef7e4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
ipcm_init_sk(&ipc, inet);
+ /* Keep backward compat */
+ if (hdrincl)
+ ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) {
err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
RT_SCOPE_UNIVERSE,
- hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? ipc.protocol : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..44ee7a2e72ac 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto)
proto = inet->inet_num;
- else if (proto != inet->inet_num)
+ else if (proto != inet->inet_num &&
+ inet->inet_num != IPPROTO_RAW)
return -EINVAL;
if (proto > 255)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3632679d9e4f879f49949bb5b050e0de553e4739
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052634-surgical-sulfite-a551@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
91d0b78c5177 ("inet: Add IP_LOCAL_PORT_RANGE socket option")
28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
d2c135619cb8 ("inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict()")
ca7af0402550 ("tcp: add small random increments to the source port")
ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
938cca9e4109 ("sock: fix /proc/net/sockstat underflow in sk_clone_lock()")
990c74e3f41d ("memcg: enable accounting for inet_bin_bucket cache")
333bb73f620e ("tcp: Keep TCP_CLOSE sockets in the reuseport group.")
5c040eaf5d17 ("tcp: Add num_closed_socks to struct sock_reuseport.")
c579bd1b4021 ("tcp: add some entropy in __inet_hash_connect()")
190cc82489f4 ("tcp: change source port randomizarion at connect() time")
bbc20b70424a ("net: reduce indentation level in sk_clone_lock()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3632679d9e4f879f49949bb5b050e0de553e4739 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Date: Mon, 22 May 2023 14:08:20 +0200
Subject: [PATCH] ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..acec504c469a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@ struct ipcm_cookie {
__be32 addr;
int oif;
struct ip_options_rcu *opt;
+ __u8 protocol;
__u8 ttl;
__s16 tos;
char priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm->sockc.tsflags = inet->sk.sk_tsflags;
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
+ ipcm->protocol = inet->inet_num;
}
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 4b7f2df66b99..e682ab628dfa 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -163,6 +163,7 @@ struct in_addr {
#define IP_MULTICAST_ALL 49
#define IP_UNICAST_IF 50
#define IP_LOCAL_PORT_RANGE 51
+#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0
#define MCAST_INCLUDE 1
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b511ff0adc0a..8e97d8d4cc9d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->tos = val;
ipc->priority = rt_tos2priority(ipc->tos);
break;
-
+ case IP_PROTOCOL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->protocol = val;
+ break;
default:
return -EINVAL;
}
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
+ case IP_PROTOCOL:
+ val = inet_sk(sk)->inet_num;
+ break;
default:
sockopt_release_sock(sk);
return -ENOPROTOOPT;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..eadf1c9ef7e4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
ipcm_init_sk(&ipc, inet);
+ /* Keep backward compat */
+ if (hdrincl)
+ ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) {
err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
RT_SCOPE_UNIVERSE,
- hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? ipc.protocol : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..44ee7a2e72ac 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto)
proto = inet->inet_num;
- else if (proto != inet->inet_num)
+ else if (proto != inet->inet_num &&
+ inet->inet_num != IPPROTO_RAW)
return -EINVAL;
if (proto > 255)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3632679d9e4f879f49949bb5b050e0de553e4739
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023052623-available-vagueness-5c1e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3632679d9e4f879f49949bb5b050e0de553e4739 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Date: Mon, 22 May 2023 14:08:20 +0200
Subject: [PATCH] ipv{4,6}/raw: fix output xfrm lookup wrt protocol
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..acec504c469a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@ struct ipcm_cookie {
__be32 addr;
int oif;
struct ip_options_rcu *opt;
+ __u8 protocol;
__u8 ttl;
__s16 tos;
char priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm->sockc.tsflags = inet->sk.sk_tsflags;
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
+ ipcm->protocol = inet->inet_num;
}
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 4b7f2df66b99..e682ab628dfa 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -163,6 +163,7 @@ struct in_addr {
#define IP_MULTICAST_ALL 49
#define IP_UNICAST_IF 50
#define IP_LOCAL_PORT_RANGE 51
+#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0
#define MCAST_INCLUDE 1
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b511ff0adc0a..8e97d8d4cc9d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->tos = val;
ipc->priority = rt_tos2priority(ipc->tos);
break;
-
+ case IP_PROTOCOL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->protocol = val;
+ break;
default:
return -EINVAL;
}
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
+ case IP_PROTOCOL:
+ val = inet_sk(sk)->inet_num;
+ break;
default:
sockopt_release_sock(sk);
return -ENOPROTOOPT;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..eadf1c9ef7e4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
ipcm_init_sk(&ipc, inet);
+ /* Keep backward compat */
+ if (hdrincl)
+ ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) {
err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
RT_SCOPE_UNIVERSE,
- hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? ipc.protocol : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..44ee7a2e72ac 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto)
proto = inet->inet_num;
- else if (proto != inet->inet_num)
+ else if (proto != inet->inet_num &&
+ inet->inet_num != IPPROTO_RAW)
return -EINVAL;
if (proto > 255)