The size of device tree node secmon (bl31_secmon_reserved) was
incorrect. It should be increased to 2MiB (0x200000).
The origin setting will cause some abnormal behavior due to
trusted-firmware-a and related firmware didn't load correctly.
The incorrect behavior may vary because of different software stacks.
For example, it will cause build error in some Yocto project because
it will check if there was enough memory to load trusted-firmware-a
to the reserved memory.
When mt8195-demo.dts sent to the upstream, at that time the size of
BL31 was small. Because supported functions and modules in BL31 are
basic sets when the board was under early development stage.
Now BL31 includes more firmwares of coprocessors and maturer functions
so the size has grown bigger in real applications. According to the value
reported by customers, we think reserved 2MiB for BL31 might be enough
for maybe the following 2 or 3 years.
Cc: stable(a)vger.kernel.org # v5.19
Fixes: 6147314aeedc ("arm64: dts: mediatek: Add device-tree for MT8195 Demo board")
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Reviewed-by: Miles Chen <miles.chen(a)mediatek.com>
---
Changes for v2
- Add more information about the size difference for BL31 in commit message.
Thanks for Miles's review.
arch/arm64/boot/dts/mediatek/mt8195-demo.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
index 4fbd99eb496a..dec85d254838 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
@@ -56,10 +56,10 @@
#size-cells = <2>;
ranges;
- /* 192 KiB reserved for ARM Trusted Firmware (BL31) */
+ /* 2 MiB reserved for ARM Trusted Firmware (BL31) */
bl31_secmon_reserved: secmon@54600000 {
no-map;
- reg = <0 0x54600000 0x0 0x30000>;
+ reg = <0 0x54600000 0x0 0x200000>;
};
/* 12 MiB reserved for OP-TEE (BL32)
--
2.18.0
This is a note to let you know that I've just added the patch titled
dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid"
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 398e3479874f381cca8726ca5d8a31e1bf35a3cd Mon Sep 17 00:00:00 2001
From: Billy Tsai <billy_tsai(a)aspeedtech.com>
Date: Mon, 14 Nov 2022 10:50:57 +0800
Subject: dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid"
If the property is set on a device without valid trimming data in the OTP
the ADC will not function correctly. Therefore, this patch drops the use
of this property to avoid this scenario.
Fixes: 2bdb2f00a895 ("dt-bindings: iio: adc: Add ast2600-adc bindings")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
Acked-by: Rob Herring <robh(a)kernel.org>
Link: https://lore.kernel.org/r/20221114025057.10843-2-billy_tsai@aspeedtech.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
.../devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml | 7 -------
1 file changed, 7 deletions(-)
diff --git a/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml b/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
index b283c8ca2bbf..5c08d8b6e995 100644
--- a/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
+++ b/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
@@ -62,13 +62,6 @@ properties:
description:
Inform the driver that last channel will be used to sensor battery.
- aspeed,trim-data-valid:
- type: boolean
- description: |
- The ADC reference voltage can be calibrated to obtain the trimming
- data which will be stored in otp. This property informs the driver that
- the data store in the otp is valid.
-
required:
- compatible
- reg
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: adc: aspeed: Remove the trim valid dts property.
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From fdd0d6b2eb35c83d6b1226ad20b346a4b45ddfb8 Mon Sep 17 00:00:00 2001
From: Billy Tsai <billy_tsai(a)aspeedtech.com>
Date: Mon, 14 Nov 2022 10:50:56 +0800
Subject: iio: adc: aspeed: Remove the trim valid dts property.
The dts property "aspeed,trim-data-valid" is currently used to determine
whether to read trimming data from the OTP register. If this is set on
a device without valid trimming data in the OTP the ADC will not function
correctly. This patch drops the use of this property and instead uses the
default (unprogrammed) OTP value of 0 to detect when a fallback value of
0x8 should be used rather then the value read from the OTP.
Fixes: d0a4c17b4073 ("iio: adc: aspeed: Get and set trimming data.")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
Link: https://lore.kernel.org/r/20221114025057.10843-1-billy_tsai@aspeedtech.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/aspeed_adc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/adc/aspeed_adc.c b/drivers/iio/adc/aspeed_adc.c
index 9341e0e0eb55..998e8bcc06e1 100644
--- a/drivers/iio/adc/aspeed_adc.c
+++ b/drivers/iio/adc/aspeed_adc.c
@@ -202,6 +202,8 @@ static int aspeed_adc_set_trim_data(struct iio_dev *indio_dev)
((scu_otp) &
(data->model_data->trim_locate->field)) >>
__ffs(data->model_data->trim_locate->field);
+ if (!trimming_val)
+ trimming_val = 0x8;
}
dev_dbg(data->dev,
"trimming val = %d, offset = %08x, fields = %08x\n",
@@ -563,12 +565,9 @@ static int aspeed_adc_probe(struct platform_device *pdev)
if (ret)
return ret;
- if (of_find_property(data->dev->of_node, "aspeed,trim-data-valid",
- NULL)) {
- ret = aspeed_adc_set_trim_data(indio_dev);
- if (ret)
- return ret;
- }
+ ret = aspeed_adc_set_trim_data(indio_dev);
+ if (ret)
+ return ret;
if (of_find_property(data->dev->of_node, "aspeed,battery-sensing",
NULL)) {
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: core: Fix entry not deleted when iio_register_sw_trigger_type()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 4ad09d956f8eacff61e67e5b13ba8ebec3232f76 Mon Sep 17 00:00:00 2001
From: Chen Zhongjin <chenzhongjin(a)huawei.com>
Date: Tue, 8 Nov 2022 11:28:02 +0800
Subject: iio: core: Fix entry not deleted when iio_register_sw_trigger_type()
fails
In iio_register_sw_trigger_type(), configfs_register_default_group() is
possible to fail, but the entry add to iio_trigger_types_list is not
deleted.
This leaves wild in iio_trigger_types_list, which can cause page fault
when module is loading again. So fix this by list_del(&t->list) in error
path.
BUG: unable to handle page fault for address: fffffbfff81d7400
Call Trace:
<TASK>
iio_register_sw_trigger_type
do_one_initcall
do_init_module
load_module
...
Fixes: b662f809d410 ("iio: core: Introduce IIO software triggers")
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Link: https://lore.kernel.org/r/20221108032802.168623-1-chenzhongjin@huawei.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/industrialio-sw-trigger.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/industrialio-sw-trigger.c b/drivers/iio/industrialio-sw-trigger.c
index 994f03a71520..d86a3305d9e8 100644
--- a/drivers/iio/industrialio-sw-trigger.c
+++ b/drivers/iio/industrialio-sw-trigger.c
@@ -58,8 +58,12 @@ int iio_register_sw_trigger_type(struct iio_sw_trigger_type *t)
t->group = configfs_register_default_group(iio_triggers_group, t->name,
&iio_trigger_type_group_type);
- if (IS_ERR(t->group))
+ if (IS_ERR(t->group)) {
+ mutex_lock(&iio_trigger_types_lock);
+ list_del(&t->list);
+ mutex_unlock(&iio_trigger_types_lock);
ret = PTR_ERR(t->group);
+ }
return ret;
}
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: accel: bma400: Fix memory leak in bma400_get_steps_reg()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 20690cd50e68c0313472c7539460168b8ea6444d Mon Sep 17 00:00:00 2001
From: Dong Chenchen <dongchenchen2(a)huawei.com>
Date: Thu, 10 Nov 2022 09:07:26 +0800
Subject: iio: accel: bma400: Fix memory leak in bma400_get_steps_reg()
When regmap_bulk_read() fails, it does not free steps_raw,
which will cause a memory leak issue, this patch fixes it.
Fixes: d221de60eee3 ("iio: accel: bma400: Add separate channel for step counter")
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
Reviewed-by: Jagath Jog J <jagathjog1996(a)gmail.com>
Link: https://lore.kernel.org/r/20221110010726.235601-1-dongchenchen2@huawei.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/bma400_core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index 490c342ef72a..6e4d10a7cd32 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -805,8 +805,10 @@ static int bma400_get_steps_reg(struct bma400_data *data, int *val)
ret = regmap_bulk_read(data->regmap, BMA400_STEP_CNT0_REG,
steps_raw, BMA400_STEP_RAW_LEN);
- if (ret)
+ if (ret) {
+ kfree(steps_raw);
return ret;
+ }
*val = get_unaligned_le24(steps_raw);
kfree(steps_raw);
return IIO_VAL_INT;
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: light: apds9960: fix wrong register for gesture gain
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 0aa60ff5d996d4ecdd4a62699c01f6d00f798d59 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alejandro=20Concepci=C3=B3n=20Rodr=C3=ADguez?=
<asconcepcion(a)acoro.eu>
Date: Sun, 6 Nov 2022 01:56:51 +0000
Subject: iio: light: apds9960: fix wrong register for gesture gain
Gesture Gain Control is in REG_GCONF_2 (0xa3), not in REG_CONFIG_2 (0x90).
Fixes: aff268cd532e ("iio: light: add APDS9960 ALS + promixity driver")
Signed-off-by: Alejandro Concepcion-Rodriguez <asconcepcion(a)acoro.eu>
Acked-by: Matt Ranostay <matt.ranostay(a)konsulko.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/EaT-NKC-H4DNX5z4Lg9B6IWPD5TrTrYBr5DYB784wfDKQkTmz…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/light/apds9960.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/light/apds9960.c b/drivers/iio/light/apds9960.c
index b62c139baf41..38d4c7644bef 100644
--- a/drivers/iio/light/apds9960.c
+++ b/drivers/iio/light/apds9960.c
@@ -54,9 +54,6 @@
#define APDS9960_REG_CONTROL_PGAIN_MASK_SHIFT 2
#define APDS9960_REG_CONFIG_2 0x90
-#define APDS9960_REG_CONFIG_2_GGAIN_MASK 0x60
-#define APDS9960_REG_CONFIG_2_GGAIN_MASK_SHIFT 5
-
#define APDS9960_REG_ID 0x92
#define APDS9960_REG_STATUS 0x93
@@ -77,6 +74,9 @@
#define APDS9960_REG_GCONF_1_GFIFO_THRES_MASK_SHIFT 6
#define APDS9960_REG_GCONF_2 0xa3
+#define APDS9960_REG_GCONF_2_GGAIN_MASK 0x60
+#define APDS9960_REG_GCONF_2_GGAIN_MASK_SHIFT 5
+
#define APDS9960_REG_GOFFSET_U 0xa4
#define APDS9960_REG_GOFFSET_D 0xa5
#define APDS9960_REG_GPULSE 0xa6
@@ -396,9 +396,9 @@ static int apds9960_set_pxs_gain(struct apds9960_data *data, int val)
}
ret = regmap_update_bits(data->regmap,
- APDS9960_REG_CONFIG_2,
- APDS9960_REG_CONFIG_2_GGAIN_MASK,
- idx << APDS9960_REG_CONFIG_2_GGAIN_MASK_SHIFT);
+ APDS9960_REG_GCONF_2,
+ APDS9960_REG_GCONF_2_GGAIN_MASK,
+ idx << APDS9960_REG_GCONF_2_GGAIN_MASK_SHIFT);
if (!ret)
data->pxs_gain = idx;
mutex_unlock(&data->lock);
--
2.38.1
This is the start of the stable review cycle for the 4.19.266 release.
There are 34 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 23 Nov 2022 12:41:40 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.266-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.266-rc1
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
x86/speculation: Add RSB VM Exit protections
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
Nathan Chancellor <nathan(a)kernel.org>
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Disable RRSBA behavior
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Add Cannon lake to RETBleed affected CPU list
Andrew Cooper <andrew.cooper3(a)citrix.com>
x86/cpu/amd: Enumerate BTC_NO
Peter Zijlstra <peterz(a)infradead.org>
x86/common: Stamp out the stepping madness
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fill RSB on vmexit for IBRS
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Fix IBRS handling after vmexit
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Remove x86_spec_ctrl_mask
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix SPEC_CTRL write on SMT state change
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix firmware entry SPEC_CTRL handling
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
Peter Zijlstra <peterz(a)infradead.org>
intel_idle: Disable IBRS during long idle
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Report Intel retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Optimize SPEC_CTRL MSR writes
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Add kernel IBRS implementation
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Remove skip_r11rcx
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Add AMD retbleed= boot parameter
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Report AMD retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/cpufeatures: Move RETPOLINE flags to word 11
Mark Gross <mgross(a)linux.intel.com>
x86/cpu: Add a steppings field to struct x86_cpu_id
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu: Add consistent CPU match macros
Thomas Gleixner <tglx(a)linutronix.de>
x86/devicetable: Move x86 specific macro out of generic code
Ingo Molnar <mingo(a)kernel.org>
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Kan Liang <kan.liang(a)linux.intel.com>
x86/cpufeature: Add facility to check for min microcode revisions
Suleiman Souhlal <suleiman(a)google.com>
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
Suleiman Souhlal <suleiman(a)google.com>
Revert "x86/speculation: Add RSB VM Exit protections"
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 13 +
Makefile | 4 +-
arch/x86/entry/calling.h | 68 ++++-
arch/x86/entry/entry_32.S | 2 -
arch/x86/entry/entry_64.S | 34 ++-
arch/x86/entry/entry_64_compat.S | 11 +-
arch/x86/include/asm/cpu_device_id.h | 168 ++++++++++-
arch/x86/include/asm/cpufeatures.h | 18 +-
arch/x86/include/asm/intel-family.h | 6 +
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/nospec-branch.h | 53 ++--
arch/x86/kernel/cpu/amd.c | 21 +-
arch/x86/kernel/cpu/bugs.c | 368 +++++++++++++++++++-----
arch/x86/kernel/cpu/common.c | 60 ++--
arch/x86/kernel/cpu/match.c | 44 ++-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/process.c | 2 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 53 +++-
arch/x86/kvm/x86.c | 4 +-
drivers/base/cpu.c | 8 +
drivers/cpufreq/acpi-cpufreq.c | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 1 +
drivers/idle/intel_idle.c | 43 ++-
include/linux/cpu.h | 2 +
include/linux/kvm_host.h | 2 +-
include/linux/mod_devicetable.h | 4 +-
tools/arch/x86/include/asm/cpufeatures.h | 1 +
28 files changed, 815 insertions(+), 188 deletions(-)
Hello
I will like to use the liberty of this medium to inform you as a consultant,that my principal is interested in investing his bond/funds as a silent business partner in your company.Taking into proper
consideration the Return on Investment(ROI) based on a ten (10) year strategic plan.
I shall give you details when you reply.
Regards,
This is a note to let you know that I've just added the patch titled
usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 7a21b27aafa3edead79ed97e6f22236be6b9f447 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Tue, 15 Nov 2022 04:22:18 -0500
Subject: usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1
Patch modifies the TD_SIZE in TRB before ZLP TRB.
The TD_SIZE in TRB before ZLP TRB must be set to 1 to force
processing ZLP TRB by controller.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20221115092218.421267-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-ring.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 04dfcaa08dc4..2f29431f612e 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1763,10 +1763,15 @@ static u32 cdnsp_td_remainder(struct cdnsp_device *pdev,
int trb_buff_len,
unsigned int td_total_len,
struct cdnsp_request *preq,
- bool more_trbs_coming)
+ bool more_trbs_coming,
+ bool zlp)
{
u32 maxp, total_packet_count;
+ /* Before ZLP driver needs set TD_SIZE = 1. */
+ if (zlp)
+ return 1;
+
/* One TRB with a zero-length data packet. */
if (!more_trbs_coming || (transferred == 0 && trb_buff_len == 0) ||
trb_buff_len == td_total_len)
@@ -1960,7 +1965,8 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
/* Set the TRB length, TD size, and interrupter fields. */
remainder = cdnsp_td_remainder(pdev, enqd_len, trb_buff_len,
full_len, preq,
- more_trbs_coming);
+ more_trbs_coming,
+ zero_len_trb);
length_field = TRB_LEN(trb_buff_len) | TRB_TD_SIZE(remainder) |
TRB_INTR_TARGET(0);
@@ -2025,7 +2031,7 @@ int cdnsp_queue_ctrl_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
if (preq->request.length > 0) {
remainder = cdnsp_td_remainder(pdev, 0, preq->request.length,
- preq->request.length, preq, 1);
+ preq->request.length, preq, 1, 0);
length_field = TRB_LEN(preq->request.length) |
TRB_TD_SIZE(remainder) | TRB_INTR_TARGET(0);
@@ -2226,7 +2232,7 @@ static int cdnsp_queue_isoc_tx(struct cdnsp_device *pdev,
/* Set the TRB length, TD size, & interrupter fields. */
remainder = cdnsp_td_remainder(pdev, running_total,
trb_buff_len, td_len, preq,
- more_trbs_coming);
+ more_trbs_coming, 0);
length_field = TRB_LEN(trb_buff_len) | TRB_INTR_TARGET(0);
--
2.38.1
This is a note to let you know that I've just added the patch titled
usb: cdnsp: Fix issue with Clear Feature Halt Endpoint
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From b25264f22b498dff3fa5c70c9bea840e83fff0d1 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Thu, 10 Nov 2022 01:30:05 -0500
Subject: usb: cdnsp: Fix issue with Clear Feature Halt Endpoint
During handling Clear Halt Endpoint Feature request, driver invokes
Reset Endpoint command. Because this command has some issue with
transition endpoint from Running to Idle state the driver must
stop the endpoint by using Stop Endpoint command.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Link: https://lore.kernel.org/r/20221110063005.370656-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-gadget.c | 12 ++++--------
drivers/usb/cdns3/cdnsp-ring.c | 3 ++-
2 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index c67715f6f756..f9aa50ff14d4 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -600,11 +600,11 @@ int cdnsp_halt_endpoint(struct cdnsp_device *pdev,
trace_cdnsp_ep_halt(value ? "Set" : "Clear");
- if (value) {
- ret = cdnsp_cmd_stop_ep(pdev, pep);
- if (ret)
- return ret;
+ ret = cdnsp_cmd_stop_ep(pdev, pep);
+ if (ret)
+ return ret;
+ if (value) {
if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_STOPPED) {
cdnsp_queue_halt_endpoint(pdev, pep->idx);
cdnsp_ring_cmd_db(pdev);
@@ -613,10 +613,6 @@ int cdnsp_halt_endpoint(struct cdnsp_device *pdev,
pep->ep_state |= EP_HALTED;
} else {
- /*
- * In device mode driver can call reset endpoint command
- * from any endpoint state.
- */
cdnsp_queue_reset_ep(pdev, pep->idx);
cdnsp_ring_cmd_db(pdev);
ret = cdnsp_wait_for_cmd_compl(pdev);
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 794e413800ae..04dfcaa08dc4 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -2076,7 +2076,8 @@ int cdnsp_cmd_stop_ep(struct cdnsp_device *pdev, struct cdnsp_ep *pep)
u32 ep_state = GET_EP_CTX_STATE(pep->out_ctx);
int ret = 0;
- if (ep_state == EP_STATE_STOPPED || ep_state == EP_STATE_DISABLED) {
+ if (ep_state == EP_STATE_STOPPED || ep_state == EP_STATE_DISABLED ||
+ ep_state == EP_STATE_HALTED) {
trace_cdnsp_ep_stopped_or_disabled(pep->out_ctx);
goto ep_stopped;
}
--
2.38.1
commit 1980860e0c8299316cddaf0992dd9e1258ec9d88 upstream.
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_port.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 97a64bc79350..b47335dfae16 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1824,10 +1824,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- /* fall-through */
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
--
2.30.2
We face some regressions on a few IXP42x systems when
accessing flash, the following unrelated error prints
appear from the PCI driver:
ixp4xx-pci c0000000.pci: PCI: abort_handler addr = 0xff9ffb5f,
isr = 0x0, status = 0x22a0
ixp4xx-pci c0000000.pci: imprecise abort
(...)
It turns out that while bit 7 is masked "reserved" it is
not unused, so masking it off as zero is dangerous, and
breaks flash access on some systems such as the NSLU2.
Be more careful and avoid masking off any of the reserved
bits 7, 8, 9 or 30. Only keep masking EXP_WORD (bit 2)
on IXP43x which is necessary in some setups.
Cc: stable(a)vger.kernel.org
Fixes: 1c953bda90ca ("bus: ixp4xx: Add a driver for IXP4xx expansion bus")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
SoC folks: please apply this directly for fixes since it
fixes a regression.
---
drivers/bus/intel-ixp4xx-eb.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/bus/intel-ixp4xx-eb.c b/drivers/bus/intel-ixp4xx-eb.c
index a4388440aca7..91db001eb69a 100644
--- a/drivers/bus/intel-ixp4xx-eb.c
+++ b/drivers/bus/intel-ixp4xx-eb.c
@@ -49,7 +49,7 @@
#define IXP4XX_EXP_SIZE_SHIFT 10
#define IXP4XX_EXP_CNFG_0 BIT(9) /* Always zero */
#define IXP43X_EXP_SYNC_INTEL BIT(8) /* Only on IXP43x */
-#define IXP43X_EXP_EXP_CHIP BIT(7) /* Only on IXP43x */
+#define IXP43X_EXP_EXP_CHIP BIT(7) /* Only on IXP43x, dangerous to touch on IXP42x */
#define IXP4XX_EXP_BYTE_RD16 BIT(6)
#define IXP4XX_EXP_HRDY_POL BIT(5) /* Only on IXP42x */
#define IXP4XX_EXP_MUX_EN BIT(4)
@@ -57,8 +57,6 @@
#define IXP4XX_EXP_WORD BIT(2) /* Always zero */
#define IXP4XX_EXP_WR_EN BIT(1)
#define IXP4XX_EXP_BYTE_EN BIT(0)
-#define IXP42X_RESERVED (BIT(30)|IXP4XX_EXP_CNFG_0|BIT(8)|BIT(7)|IXP4XX_EXP_WORD)
-#define IXP43X_RESERVED (BIT(30)|IXP4XX_EXP_CNFG_0|BIT(5)|IXP4XX_EXP_WORD)
#define IXP4XX_EXP_CNFG0 0x20
#define IXP4XX_EXP_CNFG0_MEM_MAP BIT(31)
@@ -252,10 +250,9 @@ static void ixp4xx_exp_setup_chipselect(struct ixp4xx_eb *eb,
cs_cfg |= val << IXP4XX_EXP_CYC_TYPE_SHIFT;
}
- if (eb->is_42x)
- cs_cfg &= ~IXP42X_RESERVED;
if (eb->is_43x) {
- cs_cfg &= ~IXP43X_RESERVED;
+ /* Should always be zero */
+ cs_cfg &= ~IXP4XX_EXP_WORD;
/*
* This bit for Intel strata flash is currently unused, but let's
* report it if we find one.
--
2.38.1
commit 7090abd6ad0610a144523ce4ffcb8560909bf2a8 upstream.
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-4-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_lpss.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index 49ae73f4d3a0..078f12f856f6 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -177,6 +177,9 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
* matching with the registered General Purpose DMA controllers.
*/
up->dma = dma;
+
+ lpss->dma_maxburst = 16;
+
return 0;
}
--
2.30.2
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to stable-4.9 instead of the patch that could
not be applied to it last time.
A rejected hunk has been manually resolved, and replaced sb_rdonly()
with its equivalent bitwise operation since the function does not yet
exist in this kernel. Also, retested against that stable tree.
fs/nilfs2/segment.c | 15 ++++++++-------
fs/nilfs2/super.c | 2 --
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 450bfdee513b..86769ecfe61f 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -329,7 +329,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2252,7 +2252,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_transaction_info *ti;
int err;
- if (!sci)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2291,7 +2291,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2788,11 +2788,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index c95d369e90aa..9e8f186b14d1 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1147,8 +1147,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
goto out;
if (*flags & MS_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= MS_RDONLY;
/*
--
2.31.1
From: Keith Busch <kbusch(a)kernel.org>
commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream.
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
These backports are for CVE-2022-3169.
drivers/nvme/host/ioctl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index d3281f87cd6e..a48a79ed5c4c 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -764,11 +764,17 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
dev_warn(ctrl->device, "resetting controller\n");
return nvme_reset_ctrl_sync(ctrl);
case NVME_IOCTL_SUBSYS_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
return nvme_reset_subsystem(ctrl);
case NVME_IOCTL_RESCAN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
nvme_queue_scan(ctrl);
return 0;
default:
--
2.38.1
From: Keith Busch <kbusch(a)kernel.org>
commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream.
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
These backports are for CVE-2022-3169.
drivers/nvme/host/ioctl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 22314962842d..7397fad4c96f 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -484,11 +484,17 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
dev_warn(ctrl->device, "resetting controller\n");
return nvme_reset_ctrl_sync(ctrl);
case NVME_IOCTL_SUBSYS_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
return nvme_reset_subsystem(ctrl);
case NVME_IOCTL_RESCAN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
nvme_queue_scan(ctrl);
return 0;
default:
--
2.38.1
test_bpf tail call tests end up as:
test_bpf: #0 Tail call leaf jited:1 85 PASS
test_bpf: #1 Tail call 2 jited:1 111 PASS
test_bpf: #2 Tail call 3 jited:1 145 PASS
test_bpf: #3 Tail call 4 jited:1 170 PASS
test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
test_bpf: #5 Tail call load/store jited:1
BUG: Unable to handle kernel data access on write at 0xf1b4e000
Faulting instruction address: 0xbe86b710
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: test_bpf(+)
CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
NIP: be86b710 LR: be857e88 CTR: be86b704
REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+)
MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000
DAR: f1b4e000 DSISR: 42000000
GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
NIP [be86b710] 0xbe86b710
LR [be857e88] __run_one+0xec/0x264 [test_bpf]
Call Trace:
[f1b4dfe0] [00000002] 0x2 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b690 ("bpf, tests: Add load store
test case for tail call")
This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.
This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.
Let's take a new approach. Use register r0 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. That's a deviation from the ppc32 ABI, but after all the way
tail calls are implemented is already not in accordance with the ABI.
With the fix, tail call tests are now successfull:
test_bpf: #0 Tail call leaf jited:1 53 PASS
test_bpf: #1 Tail call 2 jited:1 115 PASS
test_bpf: #2 Tail call 3 jited:1 154 PASS
test_bpf: #3 Tail call 4 jited:1 165 PASS
test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
test_bpf: #5 Tail call load/store jited:1 141 PASS
test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
arch/powerpc/net/bpf_jit_comp32.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 43f1c76d48ce..97e75b8181ca 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -115,21 +115,19 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
/* First arg comes in as a 32 bits pointer. */
EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
- EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
+ EMIT(PPC_RAW_LI(_R0, 0));
+
+#define BPF_TAILCALL_PROLOGUE_SIZE 8
+
EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
/*
- * Initialize tail_call_cnt in stack frame if we do tail calls.
- * Otherwise, put in NOPs so that it can be skipped when we are
- * invoked through a tail call.
+ * Save tail_call_cnt in stack frame if we do tail calls.
*/
if (ctx->seen & SEEN_TAILCALL)
- EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_1) - 1, _R1,
- bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
- else
- EMIT(PPC_RAW_NOP());
+ EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
-#define BPF_TAILCALL_PROLOGUE_SIZE 16
+ EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
/*
* We need a stack frame, but we don't necessarily need to
@@ -244,7 +242,6 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));
EMIT(PPC_RAW_ADD(_R3, _R3, b2p_bpf_array));
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_array, ptrs)));
- EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
/*
* if (prog == NULL)
@@ -257,20 +254,20 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_prog, bpf_func)));
if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
+ EMIT(PPC_RAW_LWZ(_R5, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
EMIT(PPC_RAW_ADDIC(_R3, _R3, BPF_TAILCALL_PROLOGUE_SIZE));
if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_MTLR(_R0));
+ EMIT(PPC_RAW_MTLR(_R5));
EMIT(PPC_RAW_MTCTR(_R3));
- EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_1)));
-
/* tear restore NVRs, ... */
bpf_jit_emit_common_epilogue(image, ctx);
+ EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_PPC_STACKFRAME(ctx)));
+
EMIT(PPC_RAW_BCTR());
/* out: */
--
2.38.1
The VM_SOFTDIRTY should be set in the vma flags to be tested if new
allocation should be merged in previous vma or not. With this patch,
the new allocations are merged in the previous VMAs.
I've tested it by reverting the commit 34228d473efe ("mm: ignore
VM_SOFTDIRTY on VMA merging") and after adding this following patch,
I'm seeing that all the new allocations done through mmap() are merged
in the previous VMAs. The number of VMAs doesn't increase drastically
which had contributed to the crash of gimp. If I run the same test after
reverting and not including this patch, the number of VMAs keep on
increasing with every mmap() syscall which proves this patch.
The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
seems like a workaround. But it lets the soft-dirty and non-soft-dirty
VMA to get merged. It helps in avoiding the creation of too many VMAs.
But it creates the problem while adding the feature of clearing the
soft-dirty status of only a part of the memory region.
Cc: <stable(a)vger.kernel.org>
Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
We need more testing of this patch.
While implementing clear soft-dirty bit for a range of address space, I'm
facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
When discussed with the some other developers they consider it the
regression. Why the non-soft dirty page should appear as soft dirty when it
isn't soft dirty in reality? I agree with them. Should we revert
34228d473efe or find a workaround in the IOCTL?
* Revert may cause the VMAs to expand in uncontrollable situation where the
soft dirty bit of a lot of memory regions or the whole address space is
being cleared again and again. AFAIK normal process must either be only
clearing a few memory regions. So the applications should be okay. There is
still chance of regressions if some applications are already using the
soft-dirty bit. I'm not sure how to test it.
* Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
surely lose the functionality to detect reused memory regions. But the
extraneous soft-dirty pages would not appear. I'm trying to do this in the
patch series [1]. Some discussion is going on that this fails with some
mprotect use case [2]. I still need to have a look at the mprotect selftest
to see how and why this fails. I think this can be implemented after some
more work probably in mprotect side.
[1] https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.c…
[2] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/
---
mm/mmap.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index f9b96b387a6f..6934b8f61fdc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1708,6 +1708,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags |= VM_ACCOUNT;
}
+ /*
+ * New (or expanded) vma always get soft dirty status.
+ * Otherwise user-space soft-dirty page tracker won't
+ * be able to distinguish situation when vma area unmapped,
+ * then new mapped in-place (which must be aimed as
+ * a completely new data area).
+ */
+ vm_flags |= VM_SOFTDIRTY;
+
/*
* Can we just expand an old mapping?
*/
@@ -1823,15 +1832,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (file)
uprobe_mmap(vma);
- /*
- * New (or expanded) vma always get soft dirty status.
- * Otherwise user-space soft-dirty page tracker won't
- * be able to distinguish situation when vma area unmapped,
- * then new mapped in-place (which must be aimed as
- * a completely new data area).
- */
- vma->vm_flags |= VM_SOFTDIRTY;
-
vma_set_page_prot(vma);
return addr;
@@ -2998,6 +2998,8 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
return -ENOMEM;
+ flags |= VM_SOFTDIRTY;
+
/* Can we just expand an old private anonymous mapping? */
vma = vma_merge(mm, prev, addr, addr + len, flags,
NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
@@ -3026,7 +3028,6 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vma->vm_flags |= VM_SOFTDIRTY;
return 0;
}
--
2.30.2
From: Joe Korty <joe.korty(a)concurrent-rt.com>
The TVAL register is 32 bit signed. Thus only the lower 31 bits are
available to specify when an interrupt is to occur at some time in the
near future. Attempting to specify a larger interval with TVAL results
in a negative time delta which means the timer fires immediately upon
being programmed, rather than firing at that expected future time.
The solution is for Linux to declare that TVAL is a 31 bit register rather
than give its true size of 32 bits. This prevents Linux from programming
TVAL with a too-large value. Note that, prior to 5.16, this little trick
was the standard way to handle TVAL in Linux, so there is nothing new
happening here on that front.
The softlockup detector hides the issue, because it keeps generating
short timer deadlines that are within the scope of the broken timer.
Disable it, and you start using NO_HZ with much longer timer deadlines,
which turns into an interrupt flood:
11: 1124855130 949168462 758009394 76417474 104782230 30210281
310890 1734323687 GICv2 29 Level arch_timer
And "much longer" isn't that long: it takes less than 43s to underflow
TVAL at 50MHz (the frequency of the counter on XGene-1).
Some comments on the v1 version of this patch by Marc Zyngier:
XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown
register) instead of the other way around. TVAL being a 32bit register,
the width of the counter should equally be 32. However, TVAL is a
*signed* value, and keeps counting down in the negative range once the
timer fires.
It means that any TVAL value with bit 31 set will fire immediately,
as it cannot be distinguished from an already expired timer. Reducing
the timer range back to a paltry 31 bits papers over the issue.
Another problem cannot be fixed though, which is that the timer interrupt
*must* be handled within the negative countdown period, or the interrupt
will be lost (TVAL will rollover to a positive value, indicative of a
new timer deadline).
Cc: stable(a)vger.kernel.org # 5.16+
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Signed-off-by: Joe Korty <joe.korty(a)concurrent-rt.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
[maz: revamped the commit message]
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20221024165422.GA51107@zipoli.concurrent-rt.com
---
Notes:
v3: rejigged commit message along the lines of the v2 review.
drivers/clocksource/arm_arch_timer.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..41323c2bc0b1 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -806,6 +806,9 @@ static u64 __arch_timer_check_delta(void)
/*
* XGene-1 implements CVAL in terms of TVAL, meaning
* that the maximum timer range is 32bit. Shame on them.
+ *
+ * Note that TVAL is signed, thus has only 31 of its
+ * 32 bits to express magnitude.
*/
MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
APM_CPU_PART_POTENZA)),
@@ -813,8 +816,8 @@ static u64 __arch_timer_check_delta(void)
};
if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
- pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
- return CLOCKSOURCE_MASK(32);
+ pr_warn_once("Broken CNTx_CVAL_EL1, using 32 bit TVAL instead.\n");
+ return CLOCKSOURCE_MASK(31);
}
#endif
return CLOCKSOURCE_MASK(arch_counter_get_width());
--
2.34.1
Förmånstagare
Det finns en utmärkelse i ditt namn från FN och
Världshälsoorganisationen som är ansluten till den internationella
monetära fonden där din e-postadress och din fond lämnades till oss
för din överföring, vänligen bekräfta dina uppgifter för din
överföring.
Vi fick i uppdrag att överföra alla pågående transaktioner inom de
kommande två, men om du har fått din fond vänligen ignorera detta
meddelande, om inte följ detta omedelbart.
Vi behöver ditt brådskande svar på det här meddelandet, det här är
inte en av de där internetbedragarna där ute, det är en
pandemilindring.
Jennifer
Since commit 0f0101719138 ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present")
Dual Role support on Intel Merrifield platform broke due to rearranging
the call to dwc3_get_extcon().
It appears to be caused by ulpi_read_id() on the first test write failing
with -ETIMEDOUT. Currently ulpi_read_id() expects to discover the phy via
DT when the test write fails and returns 0 in that case, even if DT does not
provide the phy. As a result usb probe completes without phy.
On Intel Merrifield the issue is reproducible but difficult to find the
root cause. Using an ordinary kernel and rootfs with buildroot ulpi_read_id()
succeeds. As soon as adding ftrace / bootconfig to find out why,
ulpi_read_id() fails and we can't analyze the flow. Using another rootfs
ulpi_read_id() fails even without adding ftrace. Appearantly the issue is
some kind of timing / race, but merely retrying ulpi_read_id() does not
resolve the issue.
This patch makes ulpi_read_id() return -ETIMEDOUT to its user if the first
test write fails. The user should then handle it appropriately. A follow up
patch will make dwc3_core_init() set -EPROBE_DEFER in this case and bail out.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ferry Toth <ftoth(a)exalondelft.nl>
---
drivers/usb/common/ulpi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/common/ulpi.c b/drivers/usb/common/ulpi.c
index d7c8461976ce..60e8174686a1 100644
--- a/drivers/usb/common/ulpi.c
+++ b/drivers/usb/common/ulpi.c
@@ -207,7 +207,7 @@ static int ulpi_read_id(struct ulpi *ulpi)
/* Test the interface */
ret = ulpi_write(ulpi, ULPI_SCRATCH, 0xaa);
if (ret < 0)
- goto err;
+ return ret;
ret = ulpi_read(ulpi, ULPI_SCRATCH);
if (ret < 0)
--
2.37.2
The patch titled
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
Date: Sat, 19 Nov 2022 10:36:59 +0800
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Link: https://lkml.kernel.org/r/1668825419-30584-1-git-send-email-yangtiezhu@loon…
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/vm/slabinfo-gnuplot.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/vm/slabinfo-gnuplot.sh~tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep
+++ a/tools/vm/slabinfo-gnuplot.sh
@@ -150,7 +150,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-loss"
`cat "$in" | grep -A "$lines" 'Slabs sorted by loss' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4+$2*$3" "$4}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
@@ -159,7 +159,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-size"
`cat "$in" | grep -A "$lines" 'Slabs sorted by size' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4" "$4-$2*$3}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep.patch
The patch titled
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: ZhangPeng <zhangpeng362(a)huawei.com>
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
Date: Sat, 19 Nov 2022 21:05:42 +0900
Syzbot reported a null-ptr-deref bug:
NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
frequency < 30 seconds
general protection fault, probably for non-canonical address
0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 3603 Comm: segctord Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
10/11/2022
RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
fs/nilfs2/alloc.c:608
Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
Call Trace:
<TASK>
nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
...
If DAT metadata file is corrupted on disk, there is a case where
req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
because nilfs_dat_commit_alloc() for a lower level block can initialize
the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
nilfs_dat_commit_end().
If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
causes the NULL pointer dereference above in
nilfs_palloc_commit_free_entry() function, which leads to a crash.
Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
This also calls nilfs_error() in that case to notify that there is a fatal
flaw in the filesystem metadata and prevent further operations.
Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
Signed-off-by: ZhangPeng <zhangpeng362(a)huawei.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+ebe05ee8e98f755f61d0(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dat.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/nilfs2/dat.c~nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry
+++ a/fs/nilfs2/dat.c
@@ -111,6 +111,13 @@ static void nilfs_dat_commit_free(struct
kunmap_atomic(kaddr);
nilfs_dat_commit_entry(dat, req);
+
+ if (unlikely(req->pr_desc_bh == NULL || req->pr_bitmap_bh == NULL)) {
+ nilfs_error(dat->i_sb,
+ "state inconsistency probably due to duplicate use of vblocknr = %llu",
+ (unsigned long long)req->pr_entry_nr);
+ return;
+ }
nilfs_palloc_commit_free_entry(dat, req);
}
_
Patches currently in -mm which might be from zhangpeng362(a)huawei.com are
nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry.patch
The patch titled
Subject: error-injection: add prompt for function error injection
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
error-injection-add-prompt-for-function-error-injection.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Subject: error-injection: add prompt for function error injection
Date: Mon, 21 Nov 2022 10:44:03 -0500
The config to be able to inject error codes into any function annotated
with ALLOW_ERROR_INJECTION() is enabled when
CONFIG_FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is
always enabled on x86 when KPROBES is enabled, and there's no way to turn
it off.
As kprobes is useful for observability of the kernel, it is useful to have
it enabled in production environments. But error injection should be
avoided. Add a prompt to the config to allow it to be disabled even when
kprobes is enabled, and get rid of the "def_bool y".
This is a kernel debug feature (it's in Kconfig.debug), and should have
never been something enabled by default.
Link: https://lkml.kernel.org/r/20221121104403.1545f9b5@gandalf.local.home
Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Borislav Petkov <bp(a)suse.de>
Cc: Alexei Starovoitov <alexei.starovoitov(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Florent Revest <revest(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/lib/Kconfig.debug~error-injection-add-prompt-for-function-error-injection
+++ a/lib/Kconfig.debug
@@ -1874,8 +1874,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT
If unsure, say N.
config FUNCTION_ERROR_INJECTION
- def_bool y
+ bool "Fault-injections of functions"
depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+ help
+ Add fault injections into various functions that are annotated with
+ ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
+ value of theses functions. This is useful to test error paths of code.
+
+ If unsure, say N
config FAULT_INJECTION
bool "Fault-injection framework"
_
Patches currently in -mm which might be from rostedt(a)goodmis.org are
error-injection-add-prompt-for-function-error-injection.patch
Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
random segmentation fault after commit 0ccf7f168e17bb7e ("mm/thp: carry
over dirty bit when thp splits on pmd").
The reason is: when fork(), parent process use pmd_wrprotect() to clear
huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
_PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
machanism fails; and at last memory corruption occurred between parent
and child processes.
So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
mkdirty().
Cc: stable(a)vger.kernel.org
Cc: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
Note: CC sparc maillist because they have similar issues.
arch/loongarch/include/asm/pgtable.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 946704bee599..debbe116f105 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -349,7 +349,9 @@ static inline pte_t pte_mkclean(pte_t pte)
static inline pte_t pte_mkdirty(pte_t pte)
{
- pte_val(pte) |= (_PAGE_DIRTY | _PAGE_MODIFIED);
+ pte_val(pte) |= _PAGE_MODIFIED;
+ if (pte_val(pte) & _PAGE_WRITE)
+ pte_val(pte) |= _PAGE_DIRTY;
return pte;
}
@@ -478,7 +480,9 @@ static inline pmd_t pmd_mkclean(pmd_t pmd)
static inline pmd_t pmd_mkdirty(pmd_t pmd)
{
- pmd_val(pmd) |= (_PAGE_DIRTY | _PAGE_MODIFIED);
+ pmd_val(pmd) |= _PAGE_MODIFIED;
+ if (pmd_val(pmd) & _PAGE_WRITE)
+ pmd_val(pmd) |= _PAGE_DIRTY;
return pmd;
}
--
2.31.1
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e9cec1b692051c..36f651ce842174 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3428,15 +3428,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
--
2.34.3
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/x86.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 316ab1d5317f92..3fd900504e683b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11694,8 +11694,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
--
2.34.3
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b258d6988f5dde..b74da40c1fc40c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1126,6 +1126,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
--
2.34.3
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d22a809d923339..e9cec1b692051c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1440,6 +1440,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
--
2.34.3
From: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
---
Changes in v2:
erase->opcode -> erase->size
drivers/mtd/spi-nor/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 88dd090..183ea9d 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1400,6 +1400,8 @@ spi_nor_find_best_erase_type(const struct spi_nor_erase_map *map,
continue;
erase = &map->erase_type[i];
+ if (!erase->size)
+ continue;
/* Alignment is not mandatory for overlaid regions */
if (region->offset & SNOR_OVERLAID_REGION &&
--
2.10.2
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to stable-4.14 instead of the patch that could
not be applied to it last time.
A rejected hunk has been manually resolved without noticeable change,
and testing against that stable tree was fine.
fs/nilfs2/segment.c | 15 ++++++++-------
fs/nilfs2/super.c | 2 --
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index e5a2b04c77ad..bff7fca4762d 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -331,7 +331,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2256,7 +2256,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_transaction_info *ti;
int err;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2295,7 +2295,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2792,11 +2792,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index af7d0d5cce50..36e60a45a1bf 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1148,8 +1148,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & MS_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & MS_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= MS_RDONLY;
/*
--
2.31.1
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f0861f49bd94 ("x86/sgx: Add overflow check in sgx_validate_offset_length()")
dda03e2c331b ("x86/sgx: Create utility to validate user provided offset and length")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0861f49bd946ff94fce4f82509c45e167f63690 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Borys=20Pop=C5=82awski?= <borysp(a)invisiblethingslab.com>
Date: Wed, 5 Oct 2022 00:59:03 +0200
Subject: [PATCH] x86/sgx: Add overflow check in sgx_validate_offset_length()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
sgx_validate_offset_length() function verifies "offset" and "length"
arguments provided by userspace, but was missing an overflow check on
their addition. Add it.
Fixes: c6d26d370767 ("x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES")
Signed-off-by: Borys Popławski <borysp(a)invisiblethingslab.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: stable(a)vger.kernel.org # v5.11+
Link: https://lore.kernel.org/r/0d91ac79-6d84-abed-5821-4dbe59fa1a38@invisiblethi…
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index ebe79d60619f..da8b8ea6b063 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -356,6 +356,9 @@ static int sgx_validate_offset_length(struct sgx_encl *encl,
if (!length || !IS_ALIGNED(length, PAGE_SIZE))
return -EINVAL;
+ if (offset + length < offset)
+ return -EINVAL;
+
if (offset + length - PAGE_SIZE >= encl->size)
return -EINVAL;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
222cfa0118aa ("mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put()")
c31165d7400b ("mmc: sdhci-pci: Add support for HS200 tuning mode on AMD, eMMC-4.5.1")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 222cfa0118aa68687ace74aab8fdf77ce8fbd7e6 Mon Sep 17 00:00:00 2001
From: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Date: Mon, 14 Nov 2022 16:31:00 +0800
Subject: [PATCH] mmc: sdhci-pci: Fix possible memory leak caused by missing
pci_dev_put()
pci_get_device() will increase the reference count for the returned
pci_dev. We need to use pci_dev_put() to decrease the reference count
before amd_probe() returns. There is no problem for the 'smbus_dev ==
NULL' branch because pci_dev_put() can also handle the NULL input
parameter case.
Fixes: 659c9bc114a8 ("mmc: sdhci-pci: Build o2micro support in the same module")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221114083100.149200-1-wangxiongfeng2@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 34ea1acbb3cc..28dc65023fa9 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -1749,6 +1749,8 @@ static int amd_probe(struct sdhci_pci_chip *chip)
}
}
+ pci_dev_put(smbus_dev);
+
if (gen == AMD_CHIPSET_BEFORE_ML || gen == AMD_CHIPSET_CZ)
chip->quirks2 |= SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
65946690ed8d ("firmware: coreboot: Register bus in module init")
cae0970ee9c4 ("firmware: google: Release devices before unregistering the bus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65946690ed8d972fdb91a74ee75ac0f0f0d68321 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 19 Oct 2022 18:10:53 -0700
Subject: [PATCH] firmware: coreboot: Register bus in module init
The coreboot_table driver registers a coreboot bus while probing a
"coreboot_table" device representing the coreboot table memory region.
Probing this device (i.e., registering the bus) is a dependency for the
module_init() functions of any driver for this bus (e.g.,
memconsole-coreboot.c / memconsole_driver_init()).
With synchronous probe, this dependency works OK, as the link order in
the Makefile ensures coreboot_table_driver_init() (and thus,
coreboot_table_probe()) completes before a coreboot device driver tries
to add itself to the bus.
With asynchronous probe, however, coreboot_table_probe() may race with
memconsole_driver_init(), and so we're liable to hit one of these two:
1. coreboot_driver_register() eventually hits "[...] the bus was not
initialized.", and the memconsole driver fails to register; or
2. coreboot_driver_register() gets past #1, but still races with
bus_register() and hits some other undefined/crashing behavior (e.g.,
in driver_find() [1])
We can resolve this by registering the bus in our initcall, and only
deferring "device" work (scanning the coreboot memory region and
creating sub-devices) to probe().
[1] Example failure, using 'driver_async_probe=*' kernel command line:
[ 0.114217] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 0.114307] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1 #63
[ 0.114316] Hardware name: Google Scarlet (DT)
...
[ 0.114488] Call trace:
[ 0.114494] _raw_spin_lock+0x34/0x60
[ 0.114502] kset_find_obj+0x28/0x84
[ 0.114511] driver_find+0x30/0x50
[ 0.114520] driver_register+0x64/0x10c
[ 0.114528] coreboot_driver_register+0x30/0x3c
[ 0.114540] memconsole_driver_init+0x24/0x30
[ 0.114550] do_one_initcall+0x154/0x2e0
[ 0.114560] do_initcall_level+0x134/0x160
[ 0.114571] do_initcalls+0x60/0xa0
[ 0.114579] do_basic_setup+0x28/0x34
[ 0.114588] kernel_init_freeable+0xf8/0x150
[ 0.114596] kernel_init+0x2c/0x12c
[ 0.114607] ret_from_fork+0x10/0x20
[ 0.114624] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
[ 0.114631] ---[ end trace 0000000000000000 ]---
Fixes: b81e3140e412 ("firmware: coreboot: Make bus registration symmetric")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Link: https://lore.kernel.org/r/20221019180934.1.If29e167d8a4771b0bf4a39c89c6946e…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/firmware/google/coreboot_table.c b/drivers/firmware/google/coreboot_table.c
index c52bcaa9def6..9ca21feb9d45 100644
--- a/drivers/firmware/google/coreboot_table.c
+++ b/drivers/firmware/google/coreboot_table.c
@@ -149,12 +149,8 @@ static int coreboot_table_probe(struct platform_device *pdev)
if (!ptr)
return -ENOMEM;
- ret = bus_register(&coreboot_bus_type);
- if (!ret) {
- ret = coreboot_table_populate(dev, ptr);
- if (ret)
- bus_unregister(&coreboot_bus_type);
- }
+ ret = coreboot_table_populate(dev, ptr);
+
memunmap(ptr);
return ret;
@@ -169,7 +165,6 @@ static int __cb_dev_unregister(struct device *dev, void *dummy)
static int coreboot_table_remove(struct platform_device *pdev)
{
bus_for_each_dev(&coreboot_bus_type, NULL, NULL, __cb_dev_unregister);
- bus_unregister(&coreboot_bus_type);
return 0;
}
@@ -199,6 +194,32 @@ static struct platform_driver coreboot_table_driver = {
.of_match_table = of_match_ptr(coreboot_of_match),
},
};
-module_platform_driver(coreboot_table_driver);
+
+static int __init coreboot_table_driver_init(void)
+{
+ int ret;
+
+ ret = bus_register(&coreboot_bus_type);
+ if (ret)
+ return ret;
+
+ ret = platform_driver_register(&coreboot_table_driver);
+ if (ret) {
+ bus_unregister(&coreboot_bus_type);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit coreboot_table_driver_exit(void)
+{
+ platform_driver_unregister(&coreboot_table_driver);
+ bus_unregister(&coreboot_bus_type);
+}
+
+module_init(coreboot_table_driver_init);
+module_exit(coreboot_table_driver_exit);
+
MODULE_AUTHOR("Google, Inc.");
MODULE_LICENSE("GPL");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7fc961cf7ffc ("iommu/vt-d: Set SRE bit only when hardware has SRS cap")
54c80d907400 ("iommu/vt-d: Use user privilege for RID2PASID translation")
672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
3b50142d8528 ("MAINTAINERS: sort field names for all entries")
4400b7d68f6e ("MAINTAINERS: sort entries by entry name")
b032227c6293 ("Merge tag 'nios2-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fc961cf7ffcb130c4e93ee9a5628134f9de700a Mon Sep 17 00:00:00 2001
From: Tina Zhang <tina.zhang(a)intel.com>
Date: Wed, 16 Nov 2022 13:15:44 +0800
Subject: [PATCH] iommu/vt-d: Set SRE bit only when hardware has SRS cap
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.
Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
[fault reason 0x5a]
SM: Non-zero reserved field set in PASID Table Entry
Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang(a)intel.com>
Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c30ddac40ee5..e13d7e5273e1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -642,7 +642,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
* Since it is a second level only translation setup, we should
* set SRE bit as well (addresses are expected to be GPAs).
*/
- if (pasid != PASID_RID2PASID)
+ if (pasid != PASID_RID2PASID && ecap_srs(iommu->ecap))
pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
@@ -685,7 +685,8 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
* We should set SRE bit as well since the addresses are expected
* to be GPAs.
*/
- pasid_set_sre(pte);
+ if (ecap_srs(iommu->ecap))
+ pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
539bcb57da2f ("io_uring: fix tw losing poll events")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
4cf90495281b ("io_uring: add a dummy -EOPNOTSUPP prep handler")
99f15d8d6136 ("io_uring: move uring_cmd handling to its own file")
cd40cae29ef8 ("io_uring: split out open/close operations")
453b329be5ea ("io_uring: separate out file table handling code")
f4c163dd7d4b ("io_uring: split out fadvise/madvise operations")
0d5847274037 ("io_uring: split out fs related sync/fallocate functions")
531113bbd5bf ("io_uring: split out splice related operations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 539bcb57da2f58886d7d5c17134236b0ec9cd15d Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 17 Nov 2022 18:40:15 +0000
Subject: [PATCH] io_uring: fix tw losing poll events
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.16687102…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 90920abf91ff..c34019b18211 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -228,6 +228,13 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
return IOU_POLL_DONE;
if (v & IO_POLL_CANCEL_FLAG)
return -ECANCELED;
+ /*
+ * cqe.res contains only events of the first wake up
+ * and all others are be lost. Redo vfs_poll() to get
+ * up to date state.
+ */
+ if ((v & IO_POLL_REF_MASK) != 1)
+ req->cqe.res = 0;
/* the mask was stashed in __io_poll_execute */
if (!req->cqe.res) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
3ce00bb7e91c ("binder: validate alloc->mm in ->mmap() handler")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ce00bb7e91cf57d723905371507af57182c37ef Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 4 Nov 2022 23:12:35 +0000
Subject: [PATCH] binder: validate alloc->mm in ->mmap() handler
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 1c39cfce32fa..4ad42b0f75cd 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -739,6 +739,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -785,6 +791,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
3ce00bb7e91c ("binder: validate alloc->mm in ->mmap() handler")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ce00bb7e91cf57d723905371507af57182c37ef Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 4 Nov 2022 23:12:35 +0000
Subject: [PATCH] binder: validate alloc->mm in ->mmap() handler
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 1c39cfce32fa..4ad42b0f75cd 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -739,6 +739,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -785,6 +791,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
dac37e15b7d5 ("scsi: zfcp: fix use-after-"free" in FC ingress path after TMF")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
708c87168b61 ("ceph: fix off by one bugs in unsafe_request_wait()")
e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
708c87168b61 ("ceph: fix off by one bugs in unsafe_request_wait()")
e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7fdbc5f014c3 ("io_uring: disallow self-propelled ring polling")
2ba69707d915 ("io_uring: clean up io_poll_check_events return values")
d245bca6375b ("io_uring: don't expose io_fill_cqe_aux()")
f3b44f92e59a ("io_uring: move read/write related opcodes to its own file")
c98817e6cd44 ("io_uring: move remaining file table manipulation to filetable.c")
735729844819 ("io_uring: move rsrc related data, core, and commands")
3b77495a9723 ("io_uring: split provided buffers handling into its own file")
7aaff708a768 ("io_uring: move cancelation into its own file")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
92ac8beaea1f ("io_uring: include and forward-declaration sanitation")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fdbc5f014c3f71bc44673a2d6c5bb2d12d45f25 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 18 Nov 2022 15:41:41 +0000
Subject: [PATCH] io_uring: disallow self-propelled ring polling
When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.
Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.16687857…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c34019b18211..055632e9092a 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -246,6 +246,8 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
continue;
if (req->apoll_events & EPOLLONESHOT)
return IOU_POLL_DONE;
+ if (io_is_uring_fops(req->file))
+ return IOU_POLL_DONE;
/* multishot, just fill a CQE and proceed */
if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7090abd6ad06 ("serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake")
2cb3315107b5 ("serial: 8250_lpss: Enable PSE UART Auto Flow Control")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7090abd6ad0610a144523ce4ffcb8560909bf2a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:51 +0200
Subject: [PATCH] serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-4-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index 7d9cddbfef40..0e43bdfb7459 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -174,6 +174,8 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
*/
up->dma = dma;
+ lpss->dma_maxburst = 16;
+
port->set_termios = dw8250_do_set_termios;
return 0;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
57572cacd36e ("iio: accel: bma400: Ensure VDDIO is enable defore reading the chip ID.")
12c99f859fd3 ("iio: accel: bma400: conversion to device-managed function")
02e2af20f4f9 ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57572cacd36e6d4be7722d7770d23f4430219827 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Date: Sun, 2 Oct 2022 15:41:33 +0100
Subject: [PATCH] iio: accel: bma400: Ensure VDDIO is enable defore reading the
chip ID.
The regulator enables were after the check on the chip variant, which was
very unlikely to return a correct value when not powered.
Presumably all the device anyone is testing on have a regulator that
is already powered up when this code runs for reasons beyond the scope
of this driver. Move the read call down a few lines.
Fixes: 3cf7ded15e40 ("iio: accel: bma400: basic regulator support")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Dan Robertson <dan(a)dlrobertson.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20221002144133.3771029-1-jic23@kernel.org
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index ad8fce3e08cd..490c342ef72a 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -869,18 +869,6 @@ static int bma400_init(struct bma400_data *data)
unsigned int val;
int ret;
- /* Try to read chip_id register. It must return 0x90. */
- ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
- if (ret) {
- dev_err(data->dev, "Failed to read chip id register\n");
- return ret;
- }
-
- if (val != BMA400_ID_REG_VAL) {
- dev_err(data->dev, "Chip ID mismatch\n");
- return -ENODEV;
- }
-
data->regulators[BMA400_VDD_REGULATOR].supply = "vdd";
data->regulators[BMA400_VDDIO_REGULATOR].supply = "vddio";
ret = devm_regulator_bulk_get(data->dev,
@@ -906,6 +894,18 @@ static int bma400_init(struct bma400_data *data)
if (ret)
return ret;
+ /* Try to read chip_id register. It must return 0x90. */
+ ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
+ if (ret) {
+ dev_err(data->dev, "Failed to read chip id register\n");
+ return ret;
+ }
+
+ if (val != BMA400_ID_REG_VAL) {
+ dev_err(data->dev, "Chip ID mismatch\n");
+ return -ENODEV;
+ }
+
ret = bma400_get_power_mode(data);
if (ret) {
dev_err(data->dev, "Failed to get the initial power-mode\n");
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
57572cacd36e ("iio: accel: bma400: Ensure VDDIO is enable defore reading the chip ID.")
12c99f859fd3 ("iio: accel: bma400: conversion to device-managed function")
02e2af20f4f9 ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57572cacd36e6d4be7722d7770d23f4430219827 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Date: Sun, 2 Oct 2022 15:41:33 +0100
Subject: [PATCH] iio: accel: bma400: Ensure VDDIO is enable defore reading the
chip ID.
The regulator enables were after the check on the chip variant, which was
very unlikely to return a correct value when not powered.
Presumably all the device anyone is testing on have a regulator that
is already powered up when this code runs for reasons beyond the scope
of this driver. Move the read call down a few lines.
Fixes: 3cf7ded15e40 ("iio: accel: bma400: basic regulator support")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Dan Robertson <dan(a)dlrobertson.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20221002144133.3771029-1-jic23@kernel.org
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index ad8fce3e08cd..490c342ef72a 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -869,18 +869,6 @@ static int bma400_init(struct bma400_data *data)
unsigned int val;
int ret;
- /* Try to read chip_id register. It must return 0x90. */
- ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
- if (ret) {
- dev_err(data->dev, "Failed to read chip id register\n");
- return ret;
- }
-
- if (val != BMA400_ID_REG_VAL) {
- dev_err(data->dev, "Chip ID mismatch\n");
- return -ENODEV;
- }
-
data->regulators[BMA400_VDD_REGULATOR].supply = "vdd";
data->regulators[BMA400_VDDIO_REGULATOR].supply = "vddio";
ret = devm_regulator_bulk_get(data->dev,
@@ -906,6 +894,18 @@ static int bma400_init(struct bma400_data *data)
if (ret)
return ret;
+ /* Try to read chip_id register. It must return 0x90. */
+ ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
+ if (ret) {
+ dev_err(data->dev, "Failed to read chip id register\n");
+ return ret;
+ }
+
+ if (val != BMA400_ID_REG_VAL) {
+ dev_err(data->dev, "Chip ID mismatch\n");
+ return -ENODEV;
+ }
+
ret = bma400_get_power_mode(data);
if (ret) {
dev_err(data->dev, "Failed to get the initial power-mode\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9d5333c93134 ("usb: cdns3: host: fix endless superspeed hub port reset")
88171f67a2c1 ("usb: cdns3: Removes xhci_cdns3_suspend_quirk from host-export.h")
3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
e93e58d27402 ("usb: cdnsp: Device side header file for CDNSP driver")
0b490046d8d7 ("usb: cdns3: Refactoring names in reusable code")
394c3a144de8 ("usb: cdns3: Moves reusable code to separate module")
f738957277ba ("usb: cdns3: Split core.c into cdns3-plat and core.c file")
db8892bb1bb6 ("usb: cdns3: Add support for DRD CDNSP")
d2a968dddf98 ("Merge tag 'usb-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d5333c931347005352d5b8beaa43528c94cfc9c Mon Sep 17 00:00:00 2001
From: Li Jun <jun.li(a)nxp.com>
Date: Wed, 26 Oct 2022 15:07:49 -0400
Subject: [PATCH] usb: cdns3: host: fix endless superspeed hub port reset
When usb 3.0 hub connect with one USB 2.0 device and NO USB 3.0 device,
some usb hub reports endless port reset message.
[ 190.324169] usb 2-1: new SuperSpeed USB device number 88 using xhci-hcd
[ 190.352834] hub 2-1:1.0: USB hub found
[ 190.356995] hub 2-1:1.0: 4 ports detected
[ 190.700056] usb 2-1: USB disconnect, device number 88
[ 192.472139] usb 2-1: new SuperSpeed USB device number 89 using xhci-hcd
[ 192.500820] hub 2-1:1.0: USB hub found
[ 192.504977] hub 2-1:1.0: 4 ports detected
[ 192.852066] usb 2-1: USB disconnect, device number 89
The reason is the runtime pm state of USB2.0 port is active and
USB 3.0 port is suspend, so parent device is active state.
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/usb2/power/runtime_status
suspended
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/usb1/power/runtime_status
active
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/power/runtime_status
active
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/power/runtime_status
active
So xhci_cdns3_suspend_quirk() have not called. U3 configure is not applied.
move U3 configure into host start. Reinit again in resume function in case
controller power lost during suspend.
Cc: stable(a)vger.kernel.org 5.10
Signed-off-by: Li Jun <jun.li(a)nxp.com>
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Acked-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
Link: https://lore.kernel.org/r/20221026190749.2280367-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/cdns3/host.c b/drivers/usb/cdns3/host.c
index 9643b905e2d8..6164fc4c96a4 100644
--- a/drivers/usb/cdns3/host.c
+++ b/drivers/usb/cdns3/host.c
@@ -24,11 +24,37 @@
#define CFG_RXDET_P3_EN BIT(15)
#define LPM_2_STB_SWITCH_EN BIT(25)
-static int xhci_cdns3_suspend_quirk(struct usb_hcd *hcd);
+static void xhci_cdns3_plat_start(struct usb_hcd *hcd)
+{
+ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+ u32 value;
+
+ /* set usbcmd.EU3S */
+ value = readl(&xhci->op_regs->command);
+ value |= CMD_PM_INDEX;
+ writel(value, &xhci->op_regs->command);
+
+ if (hcd->regs) {
+ value = readl(hcd->regs + XECP_AUX_CTRL_REG1);
+ value |= CFG_RXDET_P3_EN;
+ writel(value, hcd->regs + XECP_AUX_CTRL_REG1);
+
+ value = readl(hcd->regs + XECP_PORT_CAP_REG);
+ value |= LPM_2_STB_SWITCH_EN;
+ writel(value, hcd->regs + XECP_PORT_CAP_REG);
+ }
+}
+
+static int xhci_cdns3_resume_quirk(struct usb_hcd *hcd)
+{
+ xhci_cdns3_plat_start(hcd);
+ return 0;
+}
static const struct xhci_plat_priv xhci_plat_cdns3_xhci = {
.quirks = XHCI_SKIP_PHY_INIT | XHCI_AVOID_BEI,
- .suspend_quirk = xhci_cdns3_suspend_quirk,
+ .plat_start = xhci_cdns3_plat_start,
+ .resume_quirk = xhci_cdns3_resume_quirk,
};
static int __cdns_host_init(struct cdns *cdns)
@@ -90,32 +116,6 @@ static int __cdns_host_init(struct cdns *cdns)
return ret;
}
-static int xhci_cdns3_suspend_quirk(struct usb_hcd *hcd)
-{
- struct xhci_hcd *xhci = hcd_to_xhci(hcd);
- u32 value;
-
- if (pm_runtime_status_suspended(hcd->self.controller))
- return 0;
-
- /* set usbcmd.EU3S */
- value = readl(&xhci->op_regs->command);
- value |= CMD_PM_INDEX;
- writel(value, &xhci->op_regs->command);
-
- if (hcd->regs) {
- value = readl(hcd->regs + XECP_AUX_CTRL_REG1);
- value |= CFG_RXDET_P3_EN;
- writel(value, hcd->regs + XECP_AUX_CTRL_REG1);
-
- value = readl(hcd->regs + XECP_PORT_CAP_REG);
- value |= LPM_2_STB_SWITCH_EN;
- writel(value, hcd->regs + XECP_PORT_CAP_REG);
- }
-
- return 0;
-}
-
static void cdns_host_exit(struct cdns *cdns)
{
kfree(cdns->xhci_plat_data);
On Mon, Nov 21, 2022 at 12:08 AM -05, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> to the 6.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> l2tp-serialize-access-to-sk_user_data-with-sk_callba.patch
> and it can be found in the queue-6.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't add the commit below to stable tree, yet.
We have a fix-up for it under review:
https://lore.kernel.org/netdev/20221121085426.21315-1-jakub@cloudflare.com/
> commit 87a828cfa5ce4b2075d26660756c07751648d13f
> Author: Jakub Sitnicki <jakub(a)cloudflare.com>
> Date: Mon Nov 14 20:16:19 2022 +0100
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> [ Upstream commit b68777d54fac21fc833ec26ea1a2a84f975ab035 ]
>
> sk->sk_user_data has multiple users, which are not compatible with each
> other. Writers must synchronize by grabbing the sk->sk_callback_lock.
>
> l2tp currently fails to grab the lock when modifying the underlying tunnel
> socket fields. Fix it by adding appropriate locking.
>
> We err on the side of safety and grab the sk_callback_lock also inside the
> sk_destruct callback overridden by l2tp, even though there should be no
> refs allowing access to the sock at the time when sk_destruct gets called.
>
> v4:
> - serialize write to sk_user_data in l2tp sk_destruct
>
> v3:
> - switch from sock lock to sk_callback_lock
> - document write-protection for sk_user_data
>
> v2:
> - update Fixes to point to origin of the bug
> - use real names in Reported/Tested-by tags
>
> Cc: Tom Parkin <tparkin(a)katalix.com>
> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
> Reported-by: Haowei Yan <g1042620637(a)gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f6e6838c82df..03a4ebe3ccc8 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -323,7 +323,7 @@ struct sk_filter;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock.
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 7499c51b1850..754fdda8a5f5 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk)
> }
>
> /* Remove hooks into tunnel socket */
> + write_lock_bh(&sk->sk_callback_lock);
> sk->sk_destruct = tunnel->old_sk_destruct;
> sk->sk_user_data = NULL;
> + write_unlock_bh(&sk->sk_callback_lock);
>
> /* Call the original destructor */
> if (sk->sk_destruct)
> @@ -1469,16 +1471,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock = sockfd_lookup(tunnel->fd, &ret);
> if (!sock)
> goto err;
> -
> - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap);
> - if (ret < 0)
> - goto err_sock;
> }
>
> + sk = sock->sk;
> + write_lock(&sk->sk_callback_lock);
> +
> + ret = l2tp_validate_socket(sk, net, tunnel->encap);
> + if (ret < 0)
> + goto err_sock;
> +
> tunnel->l2tp_net = net;
> pn = l2tp_pernet(net);
>
> - sk = sock->sk;
> sock_hold(sk);
> tunnel->sock = sk;
>
> @@ -1504,7 +1508,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>
> setup_udp_tunnel_sock(net, sock, &udp_cfg);
> } else {
> - sk->sk_user_data = tunnel;
> + rcu_assign_sk_user_data(sk, tunnel);
> }
>
> tunnel->old_sk_destruct = sk->sk_destruct;
> @@ -1518,6 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> if (tunnel->fd >= 0)
> sockfd_put(sock);
>
> + write_unlock(&sk->sk_callback_lock);
> return 0;
>
> err_sock:
> @@ -1525,6 +1530,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock_release(sock);
> else
> sockfd_put(sock);
> +
> + write_unlock(&sk->sk_callback_lock);
> err:
> return ret;
> }
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
b98186aee22f ("io_uring: update res mask in io_poll_check_events")
d245bca6375b ("io_uring: don't expose io_fill_cqe_aux()")
f3b44f92e59a ("io_uring: move read/write related opcodes to its own file")
c98817e6cd44 ("io_uring: move remaining file table manipulation to filetable.c")
735729844819 ("io_uring: move rsrc related data, core, and commands")
3b77495a9723 ("io_uring: split provided buffers handling into its own file")
7aaff708a768 ("io_uring: move cancelation into its own file")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
92ac8beaea1f ("io_uring: include and forward-declaration sanitation")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b98186aee22fa593bc8c6b2c5d839c2ee518bc8c Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 17 Nov 2022 18:40:14 +0000
Subject: [PATCH] io_uring: update res mask in io_poll_check_events
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.16687102…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index f500506984ec..90920abf91ff 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -258,6 +258,9 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
return ret;
}
+ /* force the next iteration to vfs_poll() */
+ req->cqe.res = 0;
+
/*
* Release all references, retry if someone tried to restart
* task_work while we were executing it.
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4d2852412306 ("drm/amd/display: Fix calculation for cursor CAB allocation")
525a65c77db5 ("drm/amd/display: Update MALL SS NumWays calculation")
6eef37460584 ("drm/amd/display: Add debug option for allocating extra way for cursor")
5c1a431aaf52 ("drm/amd/display: Added debug option for forcing subvp num ways")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4d285241230676ba8b888701b89684b4e0360fcc Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Tue, 1 Nov 2022 23:03:03 -0400
Subject: [PATCH] drm/amd/display: Fix calculation for cursor CAB allocation
[Why]
The cursor size (in memory) is currently incorrectly calculated,
resulting not enough CAB being allocated for static screen cursor
in MALL refresh. This results in cursor image corruption.
[How]
Use cursor pitch instead of cursor width when calculating cursor size.
Update num cache lines calculation to use the result of the cursor size
calculation instead of manually recalculating again.
Reviewed-by: Alvin Lee <Alvin.Lee2(a)amd.com>
Acked-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.0.x
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index cf5bd9713f54..ac41a763bb1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -283,8 +283,7 @@ static uint32_t dcn32_calculate_cab_allocation(struct dc *dc, struct dc_state *c
using the max for calculation */
if (hubp->curs_attr.width > 0) {
- // Round cursor width to next multiple of 64
- cursor_size = (((hubp->curs_attr.width + 63) / 64) * 64) * hubp->curs_attr.height;
+ cursor_size = hubp->curs_attr.pitch * hubp->curs_attr.height;
switch (pipe->stream->cursor_attributes.color_format) {
case CURSOR_MODE_MONO:
@@ -309,9 +308,9 @@ static uint32_t dcn32_calculate_cab_allocation(struct dc *dc, struct dc_state *c
cursor_size > 16384) {
/* cursor_num_mblk = CEILING(num_cursors*cursor_width*cursor_width*cursor_Bpe/mblk_bytes, 1)
*/
- cache_lines_used += (((hubp->curs_attr.width * hubp->curs_attr.height * cursor_bpp +
- DCN3_2_MALL_MBLK_SIZE_BYTES - 1) / DCN3_2_MALL_MBLK_SIZE_BYTES) *
- DCN3_2_MALL_MBLK_SIZE_BYTES) / dc->caps.cache_line_size + 2;
+ cache_lines_used += (((cursor_size + DCN3_2_MALL_MBLK_SIZE_BYTES - 1) /
+ DCN3_2_MALL_MBLK_SIZE_BYTES) * DCN3_2_MALL_MBLK_SIZE_BYTES) /
+ dc->caps.cache_line_size + 2;
}
break;
}
@@ -727,10 +726,7 @@ void dcn32_update_mall_sel(struct dc *dc, struct dc_state *context)
struct hubp *hubp = pipe->plane_res.hubp;
if (pipe->stream && pipe->plane_state && hubp && hubp->funcs->hubp_update_mall_sel) {
- //Round cursor width up to next multiple of 64
- int cursor_width = ((hubp->curs_attr.width + 63) / 64) * 64;
- int cursor_height = hubp->curs_attr.height;
- int cursor_size = cursor_width * cursor_height;
+ int cursor_size = hubp->curs_attr.pitch * hubp->curs_attr.height;
switch (hubp->curs_attr.color_format) {
case CURSOR_MODE_MONO:
From: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
In vc4_platform_drm_probe() function vc4_match_add_drivers() is called to
find component matches for the component drivers. If no such match is found
the passed variable "match" is still NULL after the function returns.
Do not pass "match" to component_master_add_with_match() in this case since
this results in a NULL pointer access as soon as match->num is used to
allocate a component_match array. Instead return with -ENODEV from the
drivers probe function.
Fixes: c8b75bca92cb ("drm/vc4: Add KMS support for Raspberry Pi.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
---
drivers/gpu/drm/vc4/vc4_drv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index 2027063fdc30..2e53d7f8ad44 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -437,6 +437,9 @@ static int vc4_platform_drm_probe(struct platform_device *pdev)
vc4_match_add_drivers(dev, &match,
component_drivers, ARRAY_SIZE(component_drivers));
+ if (!match)
+ return -ENODEV;
+
return component_master_add_with_match(dev, &vc4_drm_ops, match);
}
base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
--
2.36.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
This bug is marked as fixed by commit:
net: core: netlink: add helper refcount dec and lock function
net: sched: add helper function to take reference to Qdisc
net: sched: extend Qdisc with rcu
net: sched: rename qdisc_destroy() to qdisc_put()
net: sched: use Qdisc rcu API instead of relying on rtnl lock
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
When extending segments, nilfs_sufile_alloc() is called to get an
unassigned segment, then mark it as dirty to avoid accidentally
allocating the same segment in the future.
But for some special cases such as a corrupted image it can be
unreliable.
If such corruption of the dirty state of the segment occurs, nilfs2 may
reallocate a segment that is in use and pick the same segment for
writing twice at the same time.
This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd…
This case started with segbuf1.segnum = 3, nextnum = 4 when constructed.
It supposed segment 4 has already been allocated and marked as dirty.
However the dirty state was corrupted and segment 4 usage was not dirty.
For the first time nilfs_segctor_extend_segments() segment 4 was
allocated again, which made segbuf2 and next segbuf3 had same segment 4.
sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is
added to both buffer lists of two segbuf. It makes the lists broken
which causes NULL pointer dereference.
Fix the problem by setting usage as dirty every time in
nilfs_sufile_mark_dirty(), which is called during constructing current
segment to be written out and before allocating next segment.
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+77e4f005cb899d4268d1(a)syzkaller.appspotmail.com
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
---
v1 -> v2:
1) Add lock protection as Ryusuke suggested and slightly fix commit
message.
2) Fix and add tags.
v2 -> v3:
Fix commit message to make it clear.
---
fs/nilfs2/sufile.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..dc359b56fdfa 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -495,14 +495,22 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum)
{
struct buffer_head *bh;
+ void *kaddr;
+ struct nilfs_segment_usage *su;
int ret;
+ down_write(&NILFS_MDT(sufile)->mi_sem);
ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
if (!ret) {
mark_buffer_dirty(bh);
nilfs_mdt_mark_dirty(sufile);
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
+ nilfs_segment_usage_set_dirty(su);
+ kunmap_atomic(kaddr);
brelse(bh);
}
+ up_write(&NILFS_MDT(sufile)->mi_sem);
return ret;
}
--
2.17.1
In nilfs_sufile_mark_dirty(), the buffer and inode are set dirty, but
nilfs_segment_usage is not set dirty, which makes it can be found by
nilfs_sufile_alloc() because it checks nilfs_segment_usage_clean(su).
This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd…
It's because the case starts with segbuf1.segnum = 3, nextnum = 4, and
nilfs_sufile_alloc() not called to allocate a new segment.
The first time nilfs_segctor_extend_segments() allocated segment
segbuf2.segnum = segbuf1.nextnum = 4, then nilfs_sufile_alloc() found
nextnextnum = 4 segment because its su is not set dirty.
So segbuf2.nextnum = 4, which causes next segbuf3.segnum = 4.
sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is
added to both buffer lists of two segbuf.
It makes the list head of second list linked to the first one. When
iterating the first one, it will access and deref the head of second,
which causes NULL pointer dereference.
Fix this by setting usage as dirty in nilfs_sufile_mark_dirty(),
and add lock in it to protect the usage modification.
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+77e4f005cb899d4268d1(a)syzkaller.appspotmail.com
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
---
v1 -> v2:
1) Add lock protection as Ryusuke suggested and slightly fix commit
message.
2) Fix and add tags.
---
fs/nilfs2/sufile.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..dc359b56fdfa 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -495,14 +495,22 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum)
{
struct buffer_head *bh;
+ void *kaddr;
+ struct nilfs_segment_usage *su;
int ret;
+ down_write(&NILFS_MDT(sufile)->mi_sem);
ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
if (!ret) {
mark_buffer_dirty(bh);
nilfs_mdt_mark_dirty(sufile);
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
+ nilfs_segment_usage_set_dirty(su);
+ kunmap_atomic(kaddr);
brelse(bh);
}
+ up_write(&NILFS_MDT(sufile)->mi_sem);
return ret;
}
--
2.17.1
On Mon, Nov 21, 2022 at 12:12 AM -05, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> l2tp-serialize-access-to-sk_user_data-with-sk_callba.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't add the commit below to stable tree, yet.
We have a fix-up for it under review:
https://lore.kernel.org/netdev/20221121085426.21315-1-jakub@cloudflare.com/
> commit 92aa1132edc6e6e912efd056c698cd6e52b83c6f
> Author: Jakub Sitnicki <jakub(a)cloudflare.com>
> Date: Mon Nov 14 20:16:19 2022 +0100
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> [ Upstream commit b68777d54fac21fc833ec26ea1a2a84f975ab035 ]
>
> sk->sk_user_data has multiple users, which are not compatible with each
> other. Writers must synchronize by grabbing the sk->sk_callback_lock.
>
> l2tp currently fails to grab the lock when modifying the underlying tunnel
> socket fields. Fix it by adding appropriate locking.
>
> We err on the side of safety and grab the sk_callback_lock also inside the
> sk_destruct callback overridden by l2tp, even though there should be no
> refs allowing access to the sock at the time when sk_destruct gets called.
>
> v4:
> - serialize write to sk_user_data in l2tp sk_destruct
>
> v3:
> - switch from sock lock to sk_callback_lock
> - document write-protection for sk_user_data
>
> v2:
> - update Fixes to point to origin of the bug
> - use real names in Reported/Tested-by tags
>
> Cc: Tom Parkin <tparkin(a)katalix.com>
> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
> Reported-by: Haowei Yan <g1042620637(a)gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e1a303e4f0f7..3e9db5146765 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -323,7 +323,7 @@ struct bpf_local_storage;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock.
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 93271a2632b8..c77032638a06 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk)
> }
>
> /* Remove hooks into tunnel socket */
> + write_lock_bh(&sk->sk_callback_lock);
> sk->sk_destruct = tunnel->old_sk_destruct;
> sk->sk_user_data = NULL;
> + write_unlock_bh(&sk->sk_callback_lock);
>
> /* Call the original destructor */
> if (sk->sk_destruct)
> @@ -1471,16 +1473,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock = sockfd_lookup(tunnel->fd, &ret);
> if (!sock)
> goto err;
> -
> - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap);
> - if (ret < 0)
> - goto err_sock;
> }
>
> + sk = sock->sk;
> + write_lock(&sk->sk_callback_lock);
> +
> + ret = l2tp_validate_socket(sk, net, tunnel->encap);
> + if (ret < 0)
> + goto err_sock;
> +
> tunnel->l2tp_net = net;
> pn = l2tp_pernet(net);
>
> - sk = sock->sk;
> sock_hold(sk);
> tunnel->sock = sk;
>
> @@ -1506,7 +1510,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>
> setup_udp_tunnel_sock(net, sock, &udp_cfg);
> } else {
> - sk->sk_user_data = tunnel;
> + rcu_assign_sk_user_data(sk, tunnel);
> }
>
> tunnel->old_sk_destruct = sk->sk_destruct;
> @@ -1520,6 +1524,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> if (tunnel->fd >= 0)
> sockfd_put(sock);
>
> + write_unlock(&sk->sk_callback_lock);
> return 0;
>
> err_sock:
> @@ -1527,6 +1532,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock_release(sock);
> else
> sockfd_put(sock);
> +
> + write_unlock(&sk->sk_callback_lock);
> err:
> return ret;
> }
#regzbot introduced v5.19-rc6..1dd685c414a7b9fdb3d23aca3aedae84f0b998ae
Hi, I recently tried to upgrade to linux v6.0.x but when trying to
boot it fails with "error: out of memory" when or after loading
initramfs (which then kpanics because the vfs root is missing).
The latest releases I tested are v6.0.9 and v6.1-rc5 and it's broken there too.
I bisected the error to this patch:
1dd685c414a7b9fdb3d23aca3aedae84f0b998ae "XArray: Add calls to
might_alloc()" is the first bad commit.
I've confirmed this is not a side effect of a poor bitsect because
1dd685c414a7b9fdb3d23aca3aedae84f0b998ae~1 (v5.19-rc6) works.
I've tried reverting the failing commit on top of v6.0.9 and it didn't fixed it.
My system is:
CPU: Ryzen 3600
Motherboard: B550 AORUS ELITE V2
Ram: 48GB (16+32) of unmatched DDR4
GPU: AMD rx580
Various ssds, hdds and network cards plugged with various buses.
You can find a folder with my .config, bisect logs and screenshots of
the error messages there:
https://jorropo.net/ipfs/QmaWH84UPEen4E9n69KZiLjPDaTi2aJvizv3JYiL7Gfmnr/https://ipfs.io/ipfs/QmaWH84UPEen4E9n69KZiLjPDaTi2aJvizv3JYiL7Gfmnr/
I'll be happy to assist you if you need help reproducing this issue
and or testing fixes.
Thx, Jorropo
Since commit 0f0101719138 ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present"),
Dual Role support on Intel Merrifield platform broke due to rearranging
the call to dwc3_get_extcon().
It appears to be caused by ulpi_read_id() masking the timeout on the first
test write. In the past dwc3 probe continued by calling dwc3_core_soft_reset()
followed by dwc3_get_extcon() which happend to return -EPROBE_DEFER.
On deferred probe ulpi_read_id() finally succeeded. Due to above mentioned
rearranging -EPROBE_DEFER is not returned and probe completes without phy.
As we now changed ulpi_read_id() to return -ETIMEDOUT in this case, we
need to handle the error by calling dwc3_core_soft_reset() and request
-EPROBE_DEFER. On deferred probe ulpi_read_id() is retried and succeeds.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ferry Toth <ftoth(a)exalondelft.nl>
---
drivers/usb/dwc3/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 648f1c570021..2779f17bffaf 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1106,8 +1106,13 @@ static int dwc3_core_init(struct dwc3 *dwc)
if (!dwc->ulpi_ready) {
ret = dwc3_core_ulpi_init(dwc);
- if (ret)
+ if (ret) {
+ if (ret == -ETIMEDOUT) {
+ dwc3_core_soft_reset(dwc);
+ ret = -EPROBE_DEFER;
+ }
goto err0;
+ }
dwc->ulpi_ready = true;
}
--
2.37.2
Hi, this is your Linux kernel regression tracker speaking.
I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216675 :
> Casey Tucker 2022-11-09 20:09:22 UTC
>
> After updating kernel to latest, my Roland audio device no longer shows up in aplay -l or cat /proc/asound/cards.
>
> I'm running Arch Linux. The last known good kernel was 6.0.2-arch1, and I was able to determine by booting a couple of virtual machines in qemu that an upstream patch shipped in 6.0.3-arch3 refactors card registration, and effectively breaks initialization of this device.
>
> Patch: https://lore.kernel.org/all/20220904161247.16461-1-tiwai@suse.de/
@Casey, btw and just to be sure you are aware of it, there was a fix-up
patch ("ALSA: usb-audio: Fix last interface check for registration") for
that commit in 6.0.3, too:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
> I will be looking at this later this evening attempting to implement a fix that doesn't depend on reverting this patch, and may update this bug report with more details.
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 30d629795e2
https://bugzilla.kernel.org/show_bug.cgi?id=216675
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
The cpufreq_driver->get() callback is supposed to return the current
frequency of the CPU and not the one requested by the CPUFreq core.
Fix it by returning the frequency that gets supplied to the CPU after
the DCVS operation of EPSS/OSM.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 2849dd8bc72b ("cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver")
Reported-by: Sudeep Holla <sudeep.holla(a)arm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 42 +++++++++++++++++++++----------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 1faa325d3e52..62f36c76e26c 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -131,7 +131,35 @@ static int qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
return 0;
}
+static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
+{
+ unsigned int lval;
+
+ if (qcom_cpufreq.soc_data->reg_current_vote)
+ lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_current_vote) & 0x3ff;
+ else
+ lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_domain_state) & 0xff;
+
+ return lval * xo_rate;
+}
+
+/* Get the current frequency of the CPU (after throttling) */
static unsigned int qcom_cpufreq_hw_get(unsigned int cpu)
+{
+ struct qcom_cpufreq_data *data;
+ struct cpufreq_policy *policy;
+
+ policy = cpufreq_cpu_get_raw(cpu);
+ if (!policy)
+ return 0;
+
+ data = policy->driver_data;
+
+ return qcom_lmh_get_throttle_freq(data) / HZ_PER_KHZ;
+}
+
+/* Get the frequency requested by the cpufreq core for the CPU */
+static unsigned int qcom_cpufreq_get_freq(unsigned int cpu)
{
struct qcom_cpufreq_data *data;
const struct qcom_cpufreq_soc_data *soc_data;
@@ -292,18 +320,6 @@ static void qcom_get_related_cpus(int index, struct cpumask *m)
}
}
-static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
-{
- unsigned int lval;
-
- if (qcom_cpufreq.soc_data->reg_current_vote)
- lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_current_vote) & 0x3ff;
- else
- lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_domain_state) & 0xff;
-
- return lval * xo_rate;
-}
-
static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
{
struct cpufreq_policy *policy = data->policy;
@@ -347,7 +363,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
* If h/w throttled frequency is higher than what cpufreq has requested
* for, then stop polling and switch back to interrupt mechanism.
*/
- if (throttled_freq >= qcom_cpufreq_hw_get(cpu))
+ if (throttled_freq >= qcom_cpufreq_get_freq(cpu))
enable_irq(data->throttle_irq);
else
mod_delayed_work(system_highpri_wq, &data->throttle_work,
--
2.25.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
bigjoiner_pipes() doesn't consider that:
- RKL only has three pipes
- some pipes may be fused off
This means that intel_atomic_check_bigjoiner() won't reject
all configurations that would need a non-existent pipe.
Instead we just keep on rolling witout actually having
reserved the slave pipe we need.
It's possible that we don't outright explode anywhere due to
this since eg. for_each_intel_crtc_in_pipe_mask() will only
walk the crtcs we've registered even though the passed in
pipe_mask asks for more of them. But clearly the thing won't
do what is expected of it when the required pipes are not
present.
Fix the problem by consulting the device info pipe_mask already
in bigjoiner_pipes().
Cc: stable(a)vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b3e23708d194..6c2686ecb62a 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -3733,12 +3733,16 @@ static bool ilk_get_pipe_config(struct intel_crtc *crtc,
static u8 bigjoiner_pipes(struct drm_i915_private *i915)
{
+ u8 pipes;
+
if (DISPLAY_VER(i915) >= 12)
- return BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C) | BIT(PIPE_D);
+ pipes = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C) | BIT(PIPE_D);
else if (DISPLAY_VER(i915) >= 11)
- return BIT(PIPE_B) | BIT(PIPE_C);
+ pipes = BIT(PIPE_B) | BIT(PIPE_C);
else
- return 0;
+ pipes = 0;
+
+ return pipes & RUNTIME_INFO(i915)->pipe_mask;
}
static bool transcoder_ddi_func_is_enabled(struct drm_i915_private *dev_priv,
--
2.37.4
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelm…
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.…
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov(a)gmail.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_eprobe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index 5dd0617e5df6..9cda9a38422c 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -563,6 +563,9 @@ static void eprobe_trigger_func(struct event_trigger_data *data,
{
struct eprobe_data *edata = data->private_data;
+ if (unlikely(!rec))
+ return;
+
__eprobe_trace_func(edata, rec);
}
--
2.35.1
From: Xiu Jianfeng <xiujianfeng(a)huawei.com>
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable(a)vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 56a168121bfc..33236241f236 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1289,6 +1289,7 @@ static int ftrace_add_mod(struct trace_array *tr,
if (!ftrace_mod)
return -ENOMEM;
+ INIT_LIST_HEAD(&ftrace_mod->list);
ftrace_mod->func = kstrdup(func, GFP_KERNEL);
ftrace_mod->module = kstrdup(module, GFP_KERNEL);
ftrace_mod->enable = enable;
--
2.35.1
From: Wang Wensheng <wangwensheng4(a)huawei.com>
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8b13ce2eae70..56a168121bfc 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3190,7 +3190,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
/* if we can't allocate this size, try something smaller */
if (!order)
return -ENOMEM;
- order >>= 1;
+ order--;
goto again;
}
--
2.35.1
Pass in EPOLL_URING when signaling eventfd or doing poll related wakups,
so that we can check for a circular event dependency between eventfd
and epoll. If this flag is set when our wakeup handlers are called, then
we know we have a dependency that needs to terminate multishot requests.
eventfd and epoll are the only such possible dependencies.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
io_uring/io_uring.c | 4 ++--
io_uring/io_uring.h | 15 +++++++++++----
io_uring/poll.c | 8 ++++++++
3 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8840cf3e20f2..53d0043b77a5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -495,7 +495,7 @@ static void io_eventfd_ops(struct rcu_head *rcu)
int ops = atomic_xchg(&ev_fd->ops, 0);
if (ops & BIT(IO_EVENTFD_OP_SIGNAL_BIT))
- eventfd_signal(ev_fd->cq_ev_fd, 1);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
/* IO_EVENTFD_OP_FREE_BIT may not be set here depending on callback
* ordering in a race but if references are 0 we know we have to free
@@ -531,7 +531,7 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
goto out;
if (likely(eventfd_signal_allowed())) {
- eventfd_signal(ev_fd->cq_ev_fd, 1);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
} else {
atomic_inc(&ev_fd->refs);
if (!atomic_fetch_or(BIT(IO_EVENTFD_OP_SIGNAL_BIT), &ev_fd->ops))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index cef5ff924e63..f6cf74cd692b 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -4,6 +4,7 @@
#include <linux/errno.h>
#include <linux/lockdep.h>
#include <linux/io_uring_types.h>
+#include <uapi/linux/eventpoll.h>
#include "io-wq.h"
#include "slist.h"
#include "filetable.h"
@@ -207,12 +208,18 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
{
/*
- * wake_up_all() may seem excessive, but io_wake_function() and
- * io_should_wake() handle the termination of the loop and only
- * wake as many waiters as we need to.
+ * Trigger waitqueue handler on all waiters on our waitqueue. This
+ * won't necessarily wake up all the tasks, io_should_wake() will make
+ * that decision.
+ *
+ * Pass in EPOLLIN|EPOLL_URING as the poll wakeup key. The latter set
+ * in the mask so that if we recurse back into our own poll waitqueue
+ * handlers, we know we have a dependency between eventfd or epoll and
+ * should terminate multishot poll at that point.
*/
if (waitqueue_active(&ctx->cq_wait))
- wake_up_all(&ctx->cq_wait);
+ __wake_up(&ctx->cq_wait, TASK_NORMAL, 0,
+ poll_to_key(EPOLL_URING | EPOLLIN));
}
static inline void io_cqring_wake(struct io_ring_ctx *ctx)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 055632e9092a..b5d9426c60f6 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -394,6 +394,14 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
return 0;
if (io_poll_get_ownership(req)) {
+ /*
+ * If we trigger a multishot poll off our own wakeup path,
+ * disable multishot as there is a circular dependency between
+ * CQ posting and triggering the event.
+ */
+ if (mask & EPOLL_URING)
+ poll->events |= EPOLLONESHOT;
+
/* optional, saves extra locking for removal in tw handler */
if (mask && poll->events & EPOLLONESHOT) {
list_del_init(&poll->wait.entry);
--
2.35.1
This is identical to eventfd_signal(), but it allows the caller to pass
in a mask to be used for the poll wakeup key. The use case is avoiding
repeated multishot triggers if we have a dependency between eventfd and
io_uring.
If we setup an eventfd context and register that as the io_uring eventfd,
and at the same time queue a multishot poll request for the eventfd
context, then any CQE posted will repeatedly trigger the multishot request
until it terminates when the CQ ring overflows.
In preparation for io_uring detecting this circular dependency, add the
mentioned helper so that io_uring can pass in EPOLL_URING as part of the
poll wakeup key.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/eventfd.c | 37 +++++++++++++++++++++----------------
include/linux/eventfd.h | 1 +
2 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/fs/eventfd.c b/fs/eventfd.c
index c0ffee99ad23..249ca6c0b784 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -43,21 +43,7 @@ struct eventfd_ctx {
int id;
};
-/**
- * eventfd_signal - Adds @n to the eventfd counter.
- * @ctx: [in] Pointer to the eventfd context.
- * @n: [in] Value of the counter to be added to the eventfd internal counter.
- * The value cannot be negative.
- *
- * This function is supposed to be called by the kernel in paths that do not
- * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
- * value, and we signal this as overflow condition by returning a EPOLLERR
- * to poll(2).
- *
- * Returns the amount by which the counter was incremented. This will be less
- * than @n if the counter has overflowed.
- */
-__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask)
{
unsigned long flags;
@@ -78,12 +64,31 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
n = ULLONG_MAX - ctx->count;
ctx->count += n;
if (waitqueue_active(&ctx->wqh))
- wake_up_locked_poll(&ctx->wqh, EPOLLIN);
+ wake_up_locked_poll(&ctx->wqh, EPOLLIN | mask);
current->in_eventfd = 0;
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
return n;
}
+
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ * The value cannot be negative.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
+ * value, and we signal this as overflow condition by returning a EPOLLERR
+ * to poll(2).
+ *
+ * Returns the amount by which the counter was incremented. This will be less
+ * than @n if the counter has overflowed.
+ */
+__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+{
+ return eventfd_signal_mask(ctx, n, 0);
+}
EXPORT_SYMBOL_GPL(eventfd_signal);
static void eventfd_free_ctx(struct eventfd_ctx *ctx)
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 30eb30d6909b..e849329ce1a8 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -40,6 +40,7 @@ struct file *eventfd_fget(int fd);
struct eventfd_ctx *eventfd_ctx_fdget(int fd);
struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask);
int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait,
__u64 *cnt);
void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
--
2.35.1
We can have dependencies between epoll and io_uring. Consider an epoll
context, identified by the epfd file descriptor, and an io_uring file
descriptor identified by iofd. If we add iofd to the epfd context, and
arm a multishot poll request for epfd with iofd, then the multishot
poll request will repeatedly trigger and generate events until terminated
by CQ ring overflow. This isn't a desired behavior.
Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup
key, and io_uring can check for that to detect a potential recursive
invocation.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/eventpoll.c | 18 ++++++++++--------
include/uapi/linux/eventpoll.h | 6 ++++++
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 52954d4637b5..e864256001e8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -491,7 +491,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+ unsigned pollflags)
{
struct eventpoll *ep_src;
unsigned long flags;
@@ -522,16 +523,17 @@ static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
}
spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
ep->nests = nests + 1;
- wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
+ wake_up_locked_poll(&ep->poll_wait, EPOLLIN | pollflags);
ep->nests = 0;
spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
}
#else
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+ unsigned pollflags)
{
- wake_up_poll(&ep->poll_wait, EPOLLIN);
+ wake_up_poll(&ep->poll_wait, EPOLLIN | pollflags);
}
#endif
@@ -742,7 +744,7 @@ static void ep_free(struct eventpoll *ep)
/* We need to release all tasks waiting for these file */
if (waitqueue_active(&ep->poll_wait))
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
/*
* We need to lock this because we could be hit by
@@ -1208,7 +1210,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, epi);
+ ep_poll_safewake(ep, epi, pollflags & EPOLL_URING);
if (!(epi->event.events & EPOLLEXCLUSIVE))
ewake = 1;
@@ -1553,7 +1555,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
return 0;
}
@@ -1629,7 +1631,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
return 0;
}
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 8a3432d0f0dc..532cc7fa75c0 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -41,6 +41,12 @@
#define EPOLLMSG (__force __poll_t)0x00000400
#define EPOLLRDHUP (__force __poll_t)0x00002000
+/*
+ * Internal flag - wakeup generated by io_uring, used to detect recursion back
+ * into the io_uring poll handler.
+ */
+#define EPOLL_URING ((__force __poll_t)(1U << 27))
+
/* Set exclusive wakeup mode for the target file descriptor */
#define EPOLLEXCLUSIVE ((__force __poll_t)(1U << 28))
--
2.35.1
The member void *data in the structure devfreq can be overwrite
by governor_userspace. For example:
1. The device driver assigned the devfreq governor to simple_ondemand
by the function devfreq_add_device() and init the devfreq member
void *data to a pointer of a static structure devfreq_simple_ondemand_data
by the function devfreq_add_device().
2. The user changed the devfreq governor to userspace by the command
"echo userspace > /sys/class/devfreq/.../governor".
3. The governor userspace alloced a dynamic memory for the struct
userspace_data and assigend the member void *data of devfreq to
this memory by the function userspace_init().
4. The user changed the devfreq governor back to simple_ondemand
by the command "echo simple_ondemand > /sys/class/devfreq/.../governor".
5. The governor userspace exited and assigned the member void *data
in the structure devfreq to NULL by the function userspace_exit().
6. The governor simple_ondemand fetched the static information of
devfreq_simple_ondemand_data in the function
devfreq_simple_ondemand_func() but the member void *data of devfreq was
assigned to NULL by the function userspace_exit().
7. The information of upthreshold and downdifferential is lost
and the governor simple_ondemand can't work correctly.
The member void *data in the structure devfreq is designed for
a static pointer used in a governor and inited by the function
devfreq_add_device(). This patch add an element named governor_data
in the devfreq structure which can be used by a governor(E.g userspace)
who want to assign a private data to do some private things.
Fixes: ce26c5bb9569 ("PM / devfreq: Add basic governors")
Cc: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Chanwoo Choi <cwchoi00(a)gmail.com>
Acked-by: MyungJoo Ham <myungjoo.ham(a)samsung.com>
Signed-off-by: Kant Fan <kant(a)allwinnertech.com>
---
drivers/devfreq/devfreq.c | 6 ++----
drivers/devfreq/governor_userspace.c | 12 ++++++------
include/linux/devfreq.h | 7 ++++---
3 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 63347a5ae599..8c5f6f7fca11 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -776,8 +776,7 @@ static void remove_sysfs_files(struct devfreq *devfreq,
* @dev: the device to add devfreq feature.
* @profile: device-specific profile to run devfreq.
* @governor_name: name of the policy to choose frequency.
- * @data: private data for the governor. The devfreq framework does not
- * touch this value.
+ * @data: devfreq driver pass to governors, governor should not change it.
*/
struct devfreq *devfreq_add_device(struct device *dev,
struct devfreq_dev_profile *profile,
@@ -1011,8 +1010,7 @@ static void devm_devfreq_dev_release(struct device *dev, void *res)
* @dev: the device to add devfreq feature.
* @profile: device-specific profile to run devfreq.
* @governor_name: name of the policy to choose frequency.
- * @data: private data for the governor. The devfreq framework does not
- * touch this value.
+ * @data: devfreq driver pass to governors, governor should not change it.
*
* This function manages automatically the memory of devfreq device using device
* resource management and simplify the free operation for memory of devfreq
diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
index ab9db7adb3ad..d69672ccacc4 100644
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -21,7 +21,7 @@ struct userspace_data {
static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
{
- struct userspace_data *data = df->data;
+ struct userspace_data *data = df->governor_data;
if (data->valid)
*freq = data->user_frequency;
@@ -40,7 +40,7 @@ static ssize_t set_freq_store(struct device *dev, struct device_attribute *attr,
int err = 0;
mutex_lock(&devfreq->lock);
- data = devfreq->data;
+ data = devfreq->governor_data;
sscanf(buf, "%lu", &wanted);
data->user_frequency = wanted;
@@ -60,7 +60,7 @@ static ssize_t set_freq_show(struct device *dev,
int err = 0;
mutex_lock(&devfreq->lock);
- data = devfreq->data;
+ data = devfreq->governor_data;
if (data->valid)
err = sprintf(buf, "%lu\n", data->user_frequency);
@@ -91,7 +91,7 @@ static int userspace_init(struct devfreq *devfreq)
goto out;
}
data->valid = false;
- devfreq->data = data;
+ devfreq->governor_data = data;
err = sysfs_create_group(&devfreq->dev.kobj, &dev_attr_group);
out:
@@ -107,8 +107,8 @@ static void userspace_exit(struct devfreq *devfreq)
if (devfreq->dev.kobj.sd)
sysfs_remove_group(&devfreq->dev.kobj, &dev_attr_group);
- kfree(devfreq->data);
- devfreq->data = NULL;
+ kfree(devfreq->governor_data);
+ devfreq->governor_data = NULL;
}
static int devfreq_userspace_handler(struct devfreq *devfreq,
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 34aab4dd336c..4dc7cda4fd46 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -152,8 +152,8 @@ struct devfreq_stats {
* @max_state: count of entry present in the frequency table.
* @previous_freq: previously configured frequency value.
* @last_status: devfreq user device info, performance statistics
- * @data: Private data of the governor. The devfreq framework does not
- * touch this.
+ * @data: devfreq driver pass to governors, governor should not change it.
+ * @governor_data: private data for governors, devfreq core doesn't touch it.
* @user_min_freq_req: PM QoS minimum frequency request from user (via sysfs)
* @user_max_freq_req: PM QoS maximum frequency request from user (via sysfs)
* @scaling_min_freq: Limit minimum frequency requested by OPP interface
@@ -193,7 +193,8 @@ struct devfreq {
unsigned long previous_freq;
struct devfreq_dev_status last_status;
- void *data; /* private data for governors */
+ void *data;
+ void *governor_data;
struct dev_pm_qos_request user_min_freq_req;
struct dev_pm_qos_request user_max_freq_req;
--
2.29.0
The AMD Secure Processor (ASP) and an SNP guest use a series of
AES-GCM keys called VMPCKs to communicate securely with each other.
The IV to this scheme is a sequence number that both the ASP and the
guest track. Currently this sequence number in a guest request must
exactly match the sequence number tracked by the ASP. This means that
if the guest sees an error from the host during a request it can only
retry that exact request or disable the VMPCK to prevent an IV reuse.
AES-GCM cannot tolerate IV reuse see: "Authentication Failures in NIST
version of GCM" - Antoine Joux et al.
In order to address this make handle_guest_request() delete the VMPCK
on any non successful return. To allow userspace querying the cert_data
length make handle_guest_request() safe the number of pages required by
the host, then handle_guest_request() retry the request without
requesting the extended data, then return the number of pages required
back to userspace.
Fixes: fce96cf044308 ("virt: Add SEV-SNP guest driver")
Signed-off-by: Peter Gonda <pgonda(a)google.com>
Reported-by: Peter Gonda <pgonda(a)google.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Michael Roth <michael.roth(a)amd.com>
Cc: Haowen Bai <baihaowen(a)meizu.com>
Cc: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: Marc Orr <marcorr(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dionna Glaze <dionnaglaze(a)google.com>
Cc: Ashish Kalra <Ashish.Kalra(a)amd.com>
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
---
drivers/virt/coco/sev-guest/sev-guest.c | 83 ++++++++++++++++++++-----
1 file changed, 69 insertions(+), 14 deletions(-)
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index f422f9c58ba79..64b4234c14da8 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -67,8 +67,27 @@ static bool is_vmpck_empty(struct snp_guest_dev *snp_dev)
return true;
}
+/*
+ * If an error is received from the host or AMD Secure Processor (ASP) there
+ * are two options. Either retry the exact same encrypted request or discontinue
+ * using the VMPCK.
+ *
+ * This is because in the current encryption scheme GHCB v2 uses AES-GCM to
+ * encrypt the requests. The IV for this scheme is the sequence number. GCM
+ * cannot tolerate IV reuse.
+ *
+ * The ASP FW v1.51 only increments the sequence numbers on a successful
+ * guest<->ASP back and forth and only accepts messages at its exact sequence
+ * number.
+ *
+ * So if the sequence number were to be reused the encryption scheme is
+ * vulnerable. If the sequence number were incremented for a fresh IV the ASP
+ * will reject the request.
+ */
static void snp_disable_vmpck(struct snp_guest_dev *snp_dev)
{
+ dev_alert(snp_dev->dev, "Disabling vmpck_id: %d to prevent IV reuse.\n",
+ vmpck_id);
memzero_explicit(snp_dev->vmpck, VMPCK_KEY_LEN);
snp_dev->vmpck = NULL;
}
@@ -321,34 +340,70 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
if (rc)
return rc;
- /* Call firmware to process the request */
+ /*
+ * Call firmware to process the request. In this function the encrypted
+ * message enters shared memory with the host. So after this call the
+ * sequence number must be incremented or the VMPCK must be deleted to
+ * prevent reuse of the IV.
+ */
rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
+
+ /*
+ * If the extended guest request fails due to having too small of a
+ * certificate data buffer retry the same guest request without the
+ * extended data request in order to not have to reuse the IV.
+ */
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
+ err == SNP_GUEST_REQ_INVALID_LEN) {
+ const unsigned int certs_npages = snp_dev->input.data_npages;
+
+ exit_code = SVM_VMGEXIT_GUEST_REQUEST;
+
+ /*
+ * If this call to the firmware succeeds the sequence number can
+ * be incremented allowing for continued use of the VMPCK. If
+ * there is an error reflected in the return value, this value
+ * is checked further down and the result will be the deletion
+ * of the VMPCK and the error code being propagated back to the
+ * user as an IOCLT return code.
+ */
+ rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
+
+ /*
+ * Override the error to inform callers the given extended
+ * request buffer size was too small and give the caller the
+ * required buffer size.
+ */
+ err = SNP_GUEST_REQ_INVALID_LEN;
+ snp_dev->input.data_npages = certs_npages;
+ }
+
if (fw_err)
*fw_err = err;
- if (rc)
- return rc;
+ if (rc) {
+ dev_alert(snp_dev->dev,
+ "Detected error from ASP request. rc: %d, fw_err: %llu\n",
+ rc, *fw_err);
+ goto disable_vmpck;
+ }
- /*
- * The verify_and_dec_payload() will fail only if the hypervisor is
- * actively modifying the message header or corrupting the encrypted payload.
- * This hints that hypervisor is acting in a bad faith. Disable the VMPCK so that
- * the key cannot be used for any communication. The key is disabled to ensure
- * that AES-GCM does not use the same IV while encrypting the request payload.
- */
rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
if (rc) {
dev_alert(snp_dev->dev,
- "Detected unexpected decode failure, disabling the vmpck_id %d\n",
- vmpck_id);
- snp_disable_vmpck(snp_dev);
- return rc;
+ "Detected unexpected decode failure from ASP. rc: %d\n",
+ rc);
+ goto disable_vmpck;
}
/* Increment to new message sequence after payload decryption was successful. */
snp_inc_msg_seqno(snp_dev);
return 0;
+
+disable_vmpck:
+ snp_disable_vmpck(snp_dev);
+ return rc;
}
static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
--
2.38.1.493.g58b659f92b-goog
The second (UID) strcmp in acpi_dev_hid_uid_match considers
"0" and "00" different, which can prevent device registration.
Have the AMD IOMMU driver's ivrs_acpihid parsing code remove
any leading zeroes to make the UID strcmp succeed. Now users
can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect
the same behaviour.
Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Signed-off-by: Kim Phillips <kim.phillips(a)amd.com>
Cc: stable(a)vger.kernel.org
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit(a)amd.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
---
v2: no changes
drivers/iommu/amd/init.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fdc642362c14..ef0e1a4b5a11 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3471,6 +3471,13 @@ static int __init parse_ivrs_acpihid(char *str)
return 1;
}
+ /*
+ * Ignore leading zeroes after ':', so e.g., AMDI0095:00
+ * will match AMDI0095:0 in the second strcmp in acpi_dev_hid_uid_match
+ */
+ while (*uid == '0' && *(uid + 1))
+ uid++;
+
i = early_acpihid_map_size++;
memcpy(early_acpihid_map[i].hid, hid, strlen(hid));
memcpy(early_acpihid_map[i].uid, uid, strlen(uid));
--
2.34.1
If O_EXCL is *not* specified, then linkat() can be
used to link the temporary file into the filesystem.
If O_EXCL is specified then linkat() should fail (-1).
After commit 863f144f12ad ("vfs: open inside ->tmpfile()")
the O_EXCL flag is no longer honored by the vfs layer for
tmpfile, which means the file can be linked even if O_EXCL
flag is specified, which is a change in behaviour for
userspace!
The open flags was previously passed as a parameter, so it
was uneffected by the changes to file->f_flags caused by
finish_open(). This patch fixes the issue by storing
file->f_flags in a local variable so the O_EXCL test
logic is restored.
This regression was detected by Android CTS Bionic fcntl()
tests running on android-mainline [1].
[1] https://android.googlesource.com/platform/bionic/+/
refs/heads/master/tests/fcntl_test.cpp#352
Fixes: 863f144f12ad ("vfs: open inside ->tmpfile()")
To: lkml <linux-kernel(a)vger.kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Will McVicker <willmcvicker(a)google.com>
Cc: Peter Griffin <gpeter(a)google.com>
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c
index 578c2110df02..9155ecb547ce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3591,6 +3591,7 @@ static int vfs_tmpfile(struct user_namespace *mnt_userns,
struct inode *dir = d_inode(parentpath->dentry);
struct inode *inode;
int error;
+ int open_flag = file->f_flags;
/* we want directory to be writable */
error = inode_permission(mnt_userns, dir, MAY_WRITE | MAY_EXEC);
@@ -3613,7 +3614,7 @@ static int vfs_tmpfile(struct user_namespace *mnt_userns,
if (error)
return error;
inode = file_inode(file);
- if (!(file->f_flags & O_EXCL)) {
+ if (!(open_flag & O_EXCL)) {
spin_lock(&inode->i_lock);
inode->i_state |= I_LINKABLE;
spin_unlock(&inode->i_lock);
--
2.34.1
It is a bit unlcear to us why that's helping, but it does and unbreaks
suspend/resume on a lot of GPUs without any known drawbacks.
Cc: stable(a)vger.kernel.org # v5.15+
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
Signed-off-by: Karol Herbst <kherbst(a)redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 35bb0bb3fe61..126b3c6e12f9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
+ /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+ nouveau_fence_wait(fence, false, false);
ret = ttm_bo_move_accel_cleanup(bo,
&fence->base,
evict, false,
--
2.37.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 70251c703c9e..53e0fec274a4 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -671,6 +671,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -691,6 +692,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 55cca2ffa392..d3905e70b1e9 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -670,6 +670,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -690,6 +691,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index c52802adb5b2..22738ba7d65b 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -686,6 +686,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -707,6 +708,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
The patch titled
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
Date: Fri, 18 Nov 2022 12:36:03 +0530
balance_dirty_pages doesn't do the required dirty throttling on cgroupv1.
See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
on traditional hierarchies"). Instead, the kernel depends on writeback
throttling in shrink_folio_list to achieve the same goal. With large
memory systems, the flusher may not be able to writeback quickly enough
such that we will start finding pages in the shrink_folio_list already in
writeback. Hence for cgroupv1 let's do a reclaim throttle after waking up
the flusher.
The below test which used to fail on a 256GB system completes till the the
file system is full with this change.
root@lp2:/sys/fs/cgroup/memory# mkdir test
root@lp2:/sys/fs/cgroup/memory# cd test/
root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
Killed
Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Suggested-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: zefan li <lizefan.x(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1
+++ a/mm/vmscan.c
@@ -2514,8 +2514,20 @@ static unsigned long shrink_inactive_lis
* the flushers simply cannot keep up with the allocation
* rate. Nudge the flusher threads in case they are asleep.
*/
- if (stat.nr_unqueued_dirty == nr_taken)
+ if (stat.nr_unqueued_dirty == nr_taken) {
wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /*
+ * For cgroupv1 dirty throttling is achieved by waking up
+ * the kernel flusher here and later waiting on folios
+ * which are in writeback to finish (see shrink_folio_list()).
+ *
+ * Flusher may not be able to issue writeback quickly
+ * enough for cgroupv1 writeback throttling to work
+ * on a large system.
+ */
+ if (!writeback_throttling_sane(sc))
+ reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+ }
sc->nr.dirty += stat.nr_dirty;
sc->nr.congested += stat.nr_congested;
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch
The patch titled
Subject: mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-unexpected-changes-to-failslabfail_page_allocattr.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
Date: Fri, 18 Nov 2022 18:00:11 +0800
When we specify __GFP_NOWARN, we only expect that no warnings will be
issued for current caller. But in the __should_failslab() and
__should_fail_alloc_page(), the local GFP flags alter the global
{failslab|fail_page_alloc}.attr, which is persistent and shared by all
tasks. This is not what we expected, let's fix it.
Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com
Fixes: 3f913fc5f974 ("mm: fix missing handler for __GFP_NOWARN")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reviewed-by: Akinobu Mita <akinobu.mita(a)gmail.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Akinobu Mita <akinobu.mita(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fault-inject.h | 7 +++++--
lib/fault-inject.c | 14 +++++++++-----
mm/failslab.c | 12 ++++++++++--
mm/page_alloc.c | 7 +++++--
4 files changed, 29 insertions(+), 11 deletions(-)
--- a/include/linux/fault-inject.h~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/include/linux/fault-inject.h
@@ -20,7 +20,6 @@ struct fault_attr {
atomic_t space;
unsigned long verbose;
bool task_filter;
- bool no_warn;
unsigned long stacktrace_depth;
unsigned long require_start;
unsigned long require_end;
@@ -32,6 +31,10 @@ struct fault_attr {
struct dentry *dname;
};
+enum fault_flags {
+ FAULT_NOWARN = 1 << 0,
+};
+
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
@@ -40,11 +43,11 @@ struct fault_attr {
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
.verbose = 2, \
.dname = NULL, \
- .no_warn = false, \
}
#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
int setup_fault_attr(struct fault_attr *attr, char *str);
+bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags);
bool should_fail(struct fault_attr *attr, ssize_t size);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
--- a/lib/fault-inject.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/lib/fault-inject.c
@@ -41,9 +41,6 @@ EXPORT_SYMBOL_GPL(setup_fault_attr);
static void fail_dump(struct fault_attr *attr)
{
- if (attr->no_warn)
- return;
-
if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) {
printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
"name %pd, interval %lu, probability %lu, "
@@ -103,7 +100,7 @@ static inline bool fail_stacktrace(struc
* http://www.nongnu.org/failmalloc/
*/
-bool should_fail(struct fault_attr *attr, ssize_t size)
+bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags)
{
if (in_task()) {
unsigned int fail_nth = READ_ONCE(current->fail_nth);
@@ -146,13 +143,20 @@ bool should_fail(struct fault_attr *attr
return false;
fail:
- fail_dump(attr);
+ if (!(flags & FAULT_NOWARN))
+ fail_dump(attr);
if (atomic_read(&attr->times) != -1)
atomic_dec_not_zero(&attr->times);
return true;
}
+EXPORT_SYMBOL_GPL(should_fail_ex);
+
+bool should_fail(struct fault_attr *attr, ssize_t size)
+{
+ return should_fail_ex(attr, size, 0);
+}
EXPORT_SYMBOL_GPL(should_fail);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
--- a/mm/failslab.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/mm/failslab.c
@@ -16,6 +16,8 @@ static struct {
bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags)
{
+ int flags = 0;
+
/* No fault-injection for bootstrap cache */
if (unlikely(s == kmem_cache))
return false;
@@ -30,10 +32,16 @@ bool __should_failslab(struct kmem_cache
if (failslab.cache_filter && !(s->flags & SLAB_FAILSLAB))
return false;
+ /*
+ * In some cases, it expects to specify __GFP_NOWARN
+ * to avoid printing any information(not just a warning),
+ * thus avoiding deadlocks. See commit 6b9dbedbe349 for
+ * details.
+ */
if (gfpflags & __GFP_NOWARN)
- failslab.attr.no_warn = true;
+ flags |= FAULT_NOWARN;
- return should_fail(&failslab.attr, s->object_size);
+ return should_fail_ex(&failslab.attr, s->object_size, flags);
}
static int __init setup_failslab(char *str)
--- a/mm/page_alloc.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/mm/page_alloc.c
@@ -3887,6 +3887,8 @@ __setup("fail_page_alloc=", setup_fail_p
static bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
{
+ int flags = 0;
+
if (order < fail_page_alloc.min_order)
return false;
if (gfp_mask & __GFP_NOFAIL)
@@ -3897,10 +3899,11 @@ static bool __should_fail_alloc_page(gfp
(gfp_mask & __GFP_DIRECT_RECLAIM))
return false;
+ /* See comment in __should_failslab() */
if (gfp_mask & __GFP_NOWARN)
- fail_page_alloc.attr.no_warn = true;
+ flags |= FAULT_NOWARN;
- return should_fail(&fail_page_alloc.attr, 1 << order);
+ return should_fail_ex(&fail_page_alloc.attr, 1 << order, flags);
}
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
mm-fix-unexpected-changes-to-failslabfail_page_allocattr.patch
There's been a very long running bug that seems to have been neglected for
a while, where amdgpu consistently triggers a KASAN error at start:
BUG: KASAN: global-out-of-bounds in read_indirect_azalia_reg+0x1d4/0x2a0 [amdgpu]
Read of size 4 at addr ffffffffc2274b28 by task modprobe/1889
After digging through amd's rather creative method for accessing registers,
I eventually discovered the problem likely has to do with the fact that on
my dce120 GPU there are supposedly 7 sets of audio registers. But we only
define a register mapping for 6 sets.
So, fix this and fix the KASAN warning finally.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
Sending this one separately from the rest of my fixes since:
* It's definitely completely unrelated to the Gitlab 2171 issue
* I'm not sure if this is the correct fix since it's in DC
drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
index 1b70b78e2fa15..af631085e88c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
@@ -359,7 +359,8 @@ static const struct dce_audio_registers audio_regs[] = {
audio_regs(2),
audio_regs(3),
audio_regs(4),
- audio_regs(5)
+ audio_regs(5),
+ audio_regs(6),
};
#define DCE120_AUD_COMMON_MASK_SH_LIST(mask_sh)\
--
2.37.3
> On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
> > Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not
> > successfully boot. The machine boots successfully (and exhibits stable operation)
> > with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases
> > from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the
> > software environment, do not boot. Instead, the machine hangs after loading services
> > but before presenting a display manager; the machine instead shows repetitive hard
> > drive activity at this point and then no apparent activity.
> >
> > ''uname'' output for the machine successfully running v5.19.17 is:
> >
> > Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
> >
> > The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to
> > LFS development. The kernel update uses ''make olddefconfig''.
>
> Can you use 'git bisect' to find the offending change that causes this
> to happen?
Wanted to follow up here. I got up to speed on ''git bisect'' and am currently
running the process. It looks like I have about six iterations to go and I'm
typically building/testing 1-3 kernels per day. I can also verify that the issue
occurs under 6.1rc4 as a side effect of the process.
Dominic Jones
jonesd(a)xmission.com
Hello!
This is an experimental semi-automated report about issues detected by
Coverity from a scan of next-20221118 as part of the linux-next scan project:
https://scan.coverity.com/projects/linux-next-weekly-scan
You're getting this email because you were associated with the identified
lines of code (noted below) that were touched by commits:
Thu Nov 17 00:18:25 2022 -0500
7cce4cd628be ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
Coverity reported the following:
*** CID 1527373: Uninitialized variables (UNINIT)
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c:1227 in pre_compute_mst_dsc_configs_for_state()
1221 for (j = 0; j < dc_state->stream_count; j++) {
1222 if (dc_state->streams[j]->link == stream->link)
1223 computed_streams[j] = true;
1224 }
1225 }
1226
vvv CID 1527373: Uninitialized variables (UNINIT)
vvv Using uninitialized value "ret".
1227 return ret;
1228 }
1229
1230 static int find_crtc_index_in_state_by_stream(struct drm_atomic_state *state,
1231 struct dc_stream_state *stream)
1232 {
If this is a false positive, please let us know so we can mark it as
such, or teach the Coverity rules to be smarter. If not, please make
sure fixes get into linux-next. :) For patches fixing this, please
include these lines (but double-check the "Fixes" first):
Reported-by: coverity-bot <keescook+coverity-bot(a)chromium.org>
Addresses-Coverity-ID: 1527373 ("Uninitialized variables")
Fixes: 7cce4cd628be ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
If dc_state->stream_count is 0, "ret" is undefined. Perhaps initialize
it as -EINVAL?
Thanks for your attention!
--
Coverity-bot
--
Hello,,
We are privileged and delighted to reach you via email" And we are
urgently waiting to hear from you. and again your number is not
connecting.
Thanks,
Woo Nam
The SPI framework checks for each transfer (with the struct
spi_controller::can_dma callback) whether the driver wants to use DMA
for the transfer. If the driver returns true, the SPI framework will
map the transfer's data to the device, start the actual transfer and
map the data back.
In commit 07e759387788 ("spi: spi-imx: add PIO polling support") the
spi-imx driver's spi_imx_transfer_one() function was extended. If the
estimated duration of a transfer does not exceed a configurable
duration, a polling transfer function is used. This check happens
before checking if the driver decided earlier for a DMA transfer.
If spi_imx_can_dma() decided to use a DMA transfer, and the user
configured a big maximum polling duration, a polling transfer will be
used. The DMA unmap after the transfer destroys the transferred data.
To fix this problem check in spi_imx_transfer_one() if the driver
decided for DMA transfer first, then check the limits for a polling
transfer.
Fixes: 07e759387788 ("spi: spi-imx: add PIO polling support")
Link: https://lore.kernel.org/all/20221111003032.82371-1-festevam@gmail.com
Reported-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Reported-by: Fabio Estevam <festevam(a)gmail.com>
Tested-by: Fabio Estevam <festevam(a)gmail.com>
Cc: David Jander <david(a)protonic.nl>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/spi/spi-imx.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 30d82cc7300b..3240c139f8f3 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1607,6 +1607,13 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
if (spi_imx->slave_mode)
return spi_imx_pio_transfer_slave(spi, transfer);
+ /*
+ * If we decided in spi_imx_can_dma() that we want to do a DMA
+ * transfer, the SPI transfer has already been mapped, so we
+ * have to do the DMA transfer here.
+ */
+ if (spi_imx->usedma)
+ return spi_imx_dma_transfer(spi_imx, transfer);
/*
* Calculate the estimated time in us the transfer runs. Find
* the number of Hz per byte per polling limit.
@@ -1618,9 +1625,6 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
if (transfer->len < byte_limit)
return spi_imx_poll_transfer(spi, transfer);
- if (spi_imx->usedma)
- return spi_imx_dma_transfer(spi_imx, transfer);
-
return spi_imx_pio_transfer(spi, transfer);
}
--
2.35.1
política em nosso país, meus pais e minhas três irmãs foram
envenenados pela crueldade. Felizmente para mim, eu estava na escola
quando essa tragédia aconteceu com minha família. Por falar nisso. No
momento, ainda estou aqui no país, mas muito inseguro para mim. Estou
vivendo com muito medo e escravidão. Pretendo deixar este país o mais
rápido possível, mas apenas uma coisa me atrapalhou. Meu falecido pai
depositou uma quantia em dinheiro de 3,2 milhões de euros em uma das
principais instituições da Europa para transferir
Infelizmente, porém, ele não concluiu a transação até morrer
repentinamente. 45% pela ajuda e assistência, porque acho estúpido
tentar confiar em um total desconhecido que nunca conheci antes. Estou
instintivamente convencido de que você é uma pessoa honesta e tem a
capacidade de lidar com essa transação comigo. Quando estiver pronto,
vou encontrá-lo e passar o resto da minha vida em seu país. Estou com
medo aqui porque os inimigos dos meus pais, tios e parentes ruins
estão atrás de mim. Por favor, deixe-me saber o que você acha da minha
proposta para você.
Miss Michelle
Taking the minimum is wrong, if the bootloader or EFI stub is actually
passing on a bunch of bytes that it expects the kernel to hash itself.
Ideally, a bootloader will hash it for us, but STUB won't do that, so we
should map all the bytes. Also, all those bytes must be zeroed out after
use to preserve forward secrecy.
Fixes: 161a438d730d ("efi: random: reduce seed size to 32 bytes")
Cc: stable(a)vger.kernel.org # v4.14+
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/firmware/efi/efi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index f73709f7589a..819409b7b43b 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -630,7 +630,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
seed = early_memremap(efi_rng_seed, sizeof(*seed));
if (seed != NULL) {
- size = min(seed->size, EFI_RANDOM_SEED_SIZE);
+ size = seed->size;
early_memunmap(seed, sizeof(*seed));
} else {
pr_err("Could not map UEFI random seed!\n");
@@ -641,6 +641,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
if (seed != NULL) {
pr_notice("seeding entropy pool\n");
add_bootloader_randomness(seed->bits, size);
+ memzero_explicit(seed->bits, size);
early_memunmap(seed, sizeof(*seed) + size);
} else {
pr_err("Could not map UEFI random seed!\n");
--
2.38.1
Hallo, wie geht es Ihnen und haben Sie meine letzte E-Mail erhalten? Ich versuche schon seit einiger Zeit, mit Ihnen in Kontakt zu treten, ich erwarte Ihre Antwort!
ЧИТАЙ ВНИМАТЕЛЬНО!!!,
Это сообщение приходит из центра обмена сообщениями веб-почты на все наши
Владелец счета. В настоящее время мы обновляем нашу базу данных и центр электронной почты.
на этот 2022 год. Мы удаляем все неиспользуемые учетные записи, чтобы создать новые
Место для нового и для предотвращения нежелательной почты. Чтобы предотвратить вашу учетную запись
При закрытии вам нужно обновить его ниже, чтобы мы знали, что это один
используется текущий счет.
Предупреждение!!! Владелец электронной почты, который отказывается обновлять свою электронную почту, введите
Через 48 часов после получения этого уведомления вы навсегда потеряете свой адрес электронной почты.
Вы должны отправить нам по электронной почте информацию ниже.
ПОДТВЕРДИТЕ ВАШУ ИДЕНТИФИКАЦИЮ ЭЛЕКТРОННОЙ ПОЧТЫ НИЖЕ:
Имя:____________________________
Фамилия:_____________________________
Электронная почта (имя пользователя:_____________________________
Пароль от электронной почты:_______________________
Нажмите «Ответить» и отправьте нам указанные выше данные.
Предупреждение!!!
Если вы не подтвердите свою учетную запись в течение 48 часов с момента получения
уведомление, ваша учетная запись будет автоматически деактивирована.
Благодарим вас за использование учетной записи веб-почты.
Код предупреждения: QATO8B52AXV
С уважением,
Спасибо за вашу помощь.
Copyright @ 2022 WEBMAIL OFFICE Все права защищены.
From: Xiubo Li <xiubli(a)redhat.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 2 +-
fs/ceph/locks.c | 4 ----
fs/ceph/super.h | 1 -
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 065e9311b607..948136f81fc8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2964,7 +2964,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (vfs_inode_has_locks(inode))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..b191426bf880 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
- atomic_inc(&fi->num_locks);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct ceph_file_info *fi = fl->fl_file->private_data;
struct inode *inode = file_inode(fl->fl_file);
struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 7b75a84ba48d..87dc55c866e9 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -803,7 +803,6 @@ struct ceph_file_info {
struct list_head rw_contexts;
u32 filp_gen;
- atomic_t num_locks;
};
struct ceph_dir_file_info {
--
2.31.1
Add missing <string.h> include for strcmp.
Clang 16 makes -Wimplicit-function-declaration an error by default. Unfortunately,
out of tree modules may use this in configure scripts, which means failure
might cause silent miscompilation or misconfiguration.
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-…
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6…
[3] hosted at lists.linux.dev.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: trivial(a)kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam James <sam(a)gentoo.org>
---
include/linux/license.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/license.h b/include/linux/license.h
index 7cce390f120b..1c0f28904ed0 100644
--- a/include/linux/license.h
+++ b/include/linux/license.h
@@ -2,6 +2,8 @@
#ifndef __LICENSE_H
#define __LICENSE_H
+#include <string.h>
+
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
--
2.38.1
We used to use the wrong type of integer in `zfcp_fsf_req_send()` to
cache the FSF request ID when sending a new FSF request. This is used in
case the sending fails and we need to remove the request from our
internal hash table again (so we don't keep an invalid reference and use
it when we free the request again).
In `zfcp_fsf_req_send()` we used to cache the ID as `int` (signed and 32
bit wide), but the rest of the zfcp code (and the firmware
specification) handles the ID as `unsigned long`/`u64` (unsigned and 64
bit wide [s390x ELF ABI]).
For one this has the obvious problem that when the ID grows past 32
bit (this can happen reasonably fast) it is truncated to 32 bit when
storing it in the cache variable and so doesn't match the original ID
anymore.
The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to `unsigned long`. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to
load the 32 bit variable into a 64 bit register (e.g.:
`lgf %r11,188(%r15)`). So once we pass the cached variable into
`zfcp_reqlist_find_rm()` to remove the request again all the leading
zeros will be flipped to ones to extend the sign and won't match the
original ID anymore (this has been observed in practice).
If we can't successfully remove the request from the hash table again
after `zfcp_qdio_send()` fails (this happens regularly when zfcp cannot
notify the adapter about new work because the adapter is already gone
during e.g. a ChpID toggle) we will end up with a double free.
We unconditionally free the request in the calling function when
`zfcp_fsf_req_send()` fails, but because the request is still in the
hash table we end up with a stale memory reference, and once the zfcp
adapter is either reset during recovery or shutdown we end up freeing
the same memory twice.
The resulting stack traces vary depending on the kernel and have no
direct correlation to the place where the bug occurs. Here are three
examples that have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to `unsigned
long`, like the rest of zfcp and also the argument for
`zfcp_reqlist_find_rm()`. This prevents truncation and wrong sign
extension and so can successfully remove the request from the hash
table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_fsf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
base-commit: ecb8c2580d37dbb641451049376d80c8afaa387f
--
2.38.1