The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain offset hack. If someone
cares about modelling the impact of the compander setting this can
possibly be done by exporting it as a volume control later.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Fixes: 2c4066e5d428 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route")
Cc: stable(a)vger.kernel.org # 5.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/lpass-wsa-macro.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/sound/soc/codecs/lpass-wsa-macro.c b/sound/soc/codecs/lpass-wsa-macro.c
index 7e21cec3c2fb..6ce309980cd1 100644
--- a/sound/soc/codecs/lpass-wsa-macro.c
+++ b/sound/soc/codecs/lpass-wsa-macro.c
@@ -1584,7 +1584,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
u16 gain_reg;
u16 reg;
int val;
- int offset_val = 0;
struct wsa_macro *wsa = snd_soc_component_get_drvdata(component);
if (w->shift == WSA_MACRO_COMP1) {
@@ -1623,10 +1622,8 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_ENABLE);
- offset_val = -2;
}
val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
snd_soc_component_write(component, gain_reg, val);
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
@@ -1654,10 +1651,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_DISABLE);
- offset_val = 2;
- val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
- snd_soc_component_write(component, gain_reg, val);
}
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
--
2.41.0
The PA gain can be set in steps of 1.5 dB from -3 dB to 18 dB, that is,
in 15 levels.
Fix the dB values for the PA volume control as experiments using wsa8835
show that the first 16 levels all map to the same lowest gain while the
last three map to the highest gain.
These values specifically need to be correct for the sound server to
provide proper volume control.
Note that level 0 (-3 dB) does not mute the PA so the mute flag should
also not be set.
Fixes: cdb09e623143 ("ASoC: codecs: wsa883x: add control, dapm widgets and map")
Cc: stable(a)vger.kernel.org # 6.0
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/wsa883x.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wsa883x.c b/sound/soc/codecs/wsa883x.c
index cb83c569e18d..a2e86ef7d18f 100644
--- a/sound/soc/codecs/wsa883x.c
+++ b/sound/soc/codecs/wsa883x.c
@@ -1098,7 +1098,11 @@ static int wsa_dev_mode_put(struct snd_kcontrol *kcontrol,
return 1;
}
-static const DECLARE_TLV_DB_SCALE(pa_gain, -300, 150, -300);
+static const SNDRV_CTL_TLVD_DECLARE_DB_RANGE(pa_gain,
+ 0, 14, TLV_DB_SCALE_ITEM(-300, 0, 0),
+ 15, 29, TLV_DB_SCALE_ITEM(-300, 150, 0),
+ 30, 31, TLV_DB_SCALE_ITEM(1800, 0, 0),
+);
static int wsa883x_get_swr_port(struct snd_kcontrol *kcontrol,
struct snd_ctl_elem_value *ucontrol)
--
2.41.0
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
On HSW non-ULT (or at least on Dell Latitude E6540) external displays
start to flicker when we enable PSR on the eDP. We observe a much higher
SR and PC6 residency than should be possible with an external display,
and indeen much higher than what we observe with eDP disabled and
only the external display enabled. Looks like the hardware is somehow
ignoring the fact that the external display is active during PSR.
I wasn't able to redproduce this on my HSW ULT machine, or BDW.
So either there's something specific about this particular laptop
(eg. some unknown firmware thing) or the issue is limited to just
non-ULT HSW systems. All known registers that could affect this
look perfectly reasonable on the affected machine.
As a workaround let's unmask the LPSP event to prevent PSR entry
except while in LPSP mode (only pipe A + eDP active). This
will prevent PSR entry entirely when multiple pipes are active.
The one slight downside is that we now also prevent PSR entry
when driving eDP with pipe B or C, but I think that's a reasonable
tradeoff to avoid having to implement a more complex workaround.
Cc: stable(a)vger.kernel.org
Fixes: 783d8b80871f ("drm/i915/psr: Re-enable PSR1 on hsw/bdw")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10092
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_psr.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c
index 696d5d32ca9d..1010b8c405df 100644
--- a/drivers/gpu/drm/i915/display/intel_psr.c
+++ b/drivers/gpu/drm/i915/display/intel_psr.c
@@ -1544,8 +1544,18 @@ static void intel_psr_enable_source(struct intel_dp *intel_dp,
* can rely on frontbuffer tracking.
*/
mask = EDP_PSR_DEBUG_MASK_MEMUP |
- EDP_PSR_DEBUG_MASK_HPD |
- EDP_PSR_DEBUG_MASK_LPSP;
+ EDP_PSR_DEBUG_MASK_HPD;
+
+ /*
+ * For some unknown reason on HSW non-ULT (or at least on
+ * Dell Latitude E6540) external displays start to flicker
+ * when PSR is enabled on the eDP. SR/PC6 residency is much
+ * higher than should be possible with an external display.
+ * As a workaround leave LPSP unmasked to prevent PSR entry
+ * when external displays are active.
+ */
+ if (DISPLAY_VER(dev_priv) >= 8 || IS_HASWELL_ULT(dev_priv))
+ mask |= EDP_PSR_DEBUG_MASK_LPSP;
if (DISPLAY_VER(dev_priv) < 20)
mask |= EDP_PSR_DEBUG_MASK_MAX_SLEEP;
--
2.41.0
The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain offset hack. If someone
cares about modelling the impact of the compander setting this can
possibly be done by exporting it as a volume control later.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Fixes: 2c4066e5d428 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route")
Cc: stable(a)vger.kernel.org # 5.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/lpass-wsa-macro.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/sound/soc/codecs/lpass-wsa-macro.c b/sound/soc/codecs/lpass-wsa-macro.c
index 7e21cec3c2fb..6ce309980cd1 100644
--- a/sound/soc/codecs/lpass-wsa-macro.c
+++ b/sound/soc/codecs/lpass-wsa-macro.c
@@ -1584,7 +1584,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
u16 gain_reg;
u16 reg;
int val;
- int offset_val = 0;
struct wsa_macro *wsa = snd_soc_component_get_drvdata(component);
if (w->shift == WSA_MACRO_COMP1) {
@@ -1623,10 +1622,8 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_ENABLE);
- offset_val = -2;
}
val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
snd_soc_component_write(component, gain_reg, val);
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
@@ -1654,10 +1651,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_DISABLE);
- offset_val = 2;
- val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
- snd_soc_component_write(component, gain_reg, val);
}
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
--
2.41.0
The UCM configuration for the Lenovo ThinkPad X13s has up until now
been setting the speaker PA volume to -3 dB when enabling the speakers,
but this does not prevent the user from increasing the volume further.
Limit the PA volume to -3 dB in the machine driver to reduce the risk of
speaker damage until we have active speaker protection in place.
Note that this will probably need to be generalised using
machine-specific limits, but a common limit should do for now.
Cc: stable(a)vger.kernel.org # 6.5
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/sc8280xp.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/sound/soc/qcom/sc8280xp.c b/sound/soc/qcom/sc8280xp.c
index ed4bb551bfbb..a19bfa354af8 100644
--- a/sound/soc/qcom/sc8280xp.c
+++ b/sound/soc/qcom/sc8280xp.c
@@ -32,12 +32,14 @@ static int sc8280xp_snd_init(struct snd_soc_pcm_runtime *rtd)
case WSA_CODEC_DMA_RX_0:
case WSA_CODEC_DMA_RX_1:
/*
- * set limit of 0dB on Digital Volume for Speakers,
- * this can prevent damage of speakers to some extent without
- * active speaker protection
+ * Set limit of 0 dB on Digital Volume and -3 dB on PA Volume
+ * to reduce the risk of speaker damage until we have active
+ * speaker protection in place.
*/
snd_soc_limit_volume(card, "WSA_RX0 Digital Volume", 84);
snd_soc_limit_volume(card, "WSA_RX1 Digital Volume", 84);
+ snd_soc_limit_volume(card, "SpkrLeft PA Volume", 1);
+ snd_soc_limit_volume(card, "SpkrRight PA Volume", 1);
break;
default:
break;
--
2.41.0
The value of an arithmetic expression period_ns * 1000 is subject
to overflow due to a failure to cast operands to a larger data
type before performing arithmetic
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 3e90b1c7ebe9 ("staging: comedi: ni_tio: tidy up ni_tio_set_clock_src() and helpers")
Cc: <stable(a)vger.kernel.org> # v5.15+
Reviewed-by: Ian Abbott <abbotti(a)mev.co.uk>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
V1 -> V2: Oh, good point. It should be 1000ULL.
drivers/comedi/drivers/ni_tio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/comedi/drivers/ni_tio.c b/drivers/comedi/drivers/ni_tio.c
index da6826d77e60..acc914903c70 100644
--- a/drivers/comedi/drivers/ni_tio.c
+++ b/drivers/comedi/drivers/ni_tio.c
@@ -800,7 +800,7 @@ static int ni_tio_set_clock_src(struct ni_gpct *counter,
GI_PRESCALE_X2(counter_dev->variant) |
GI_PRESCALE_X8(counter_dev->variant), bits);
}
- counter->clock_period_ps = period_ns * 1000;
+ counter->clock_period_ps = period_ns * 1000ULL;
ni_tio_set_sync_mode(counter);
return 0;
}
--
2.25.1
The PA gain can be set in steps of 1.5 dB from -3 dB to 18 dB, that is,
in fifteen levels.
Fix the range of the PA volume control to avoid having the first
sixteen levels all map to -3 dB.
Note that level 0 (-3 dB) does not mute the PA so the mute flag should
also not be set.
Fixes: cdb09e623143 ("ASoC: codecs: wsa883x: add control, dapm widgets and map")
Cc: stable(a)vger.kernel.org # 6.0
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/wsa883x.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/wsa883x.c b/sound/soc/codecs/wsa883x.c
index cb83c569e18d..32983ca9afba 100644
--- a/sound/soc/codecs/wsa883x.c
+++ b/sound/soc/codecs/wsa883x.c
@@ -1098,7 +1098,7 @@ static int wsa_dev_mode_put(struct snd_kcontrol *kcontrol,
return 1;
}
-static const DECLARE_TLV_DB_SCALE(pa_gain, -300, 150, -300);
+static const DECLARE_TLV_DB_SCALE(pa_gain, -300, 150, 0);
static int wsa883x_get_swr_port(struct snd_kcontrol *kcontrol,
struct snd_ctl_elem_value *ucontrol)
@@ -1239,7 +1239,7 @@ static const struct snd_soc_dapm_widget wsa883x_dapm_widgets[] = {
static const struct snd_kcontrol_new wsa883x_snd_controls[] = {
SOC_SINGLE_RANGE_TLV("PA Volume", WSA883X_DRE_CTL_1, 1,
- 0x0, 0x1f, 1, pa_gain),
+ 0x1, 0xf, 1, pa_gain),
SOC_ENUM_EXT("WSA MODE", wsa_dev_mode_enum,
wsa_dev_mode_get, wsa_dev_mode_put),
SOC_SINGLE_EXT("COMP Offset", SND_SOC_NOPM, 0, 4, 0,
--
2.41.0
In current scenario if Plug-out and Plug-In performed continuously
there could be a chance while checking for dwc->gadget_driver in
dwc3_gadget_suspend, a NULL pointer dereference may occur.
Call Stack:
CPU1: CPU2:
gadget_unbind_driver dwc3_suspend_common
dw3_gadget_stop dwc3_gadget_suspend
dwc3_disconnect_gadget
CPU1 basically clears the variable and CPU2 checks the variable.
Consider CPU1 is running and right before gadget_driver is cleared
and in parallel CPU2 executes dwc3_gadget_suspend where it finds
dwc->gadget_driver which is not NULL and resumes execution and then
CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where
it checks dwc->gadget_driver is already NULL because of which the
NULL pointer deference occur.
Cc: <stable(a)vger.kernel.org>
Fixes: 9772b47a4c291 ("usb: dwc3: gadget: Fix suspend/resume during device mode")
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa(a)quicinc.com>
---
Changes in v2:
Added cc and fixes tag missing in v1.
Link to v1:
https://lore.kernel.org/linux-usb/0ef3fb11-a207-2db4-1714-b3bca2ce2cea@quic…
drivers/usb/dwc3/gadget.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 019368f8e9c4..564976b3e2b9 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4709,15 +4709,13 @@ int dwc3_gadget_suspend(struct dwc3 *dwc)
unsigned long flags;
int ret;
- if (!dwc->gadget_driver)
- return 0;
-
ret = dwc3_gadget_soft_disconnect(dwc);
if (ret)
goto err;
spin_lock_irqsave(&dwc->lock, flags);
- dwc3_disconnect_gadget(dwc);
+ if (dwc->gadget_driver)
+ dwc3_disconnect_gadget(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
return 0;
--
2.17.1
The patch titled
Subject: mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-writeback-fix-possible-divide-by-zero-in-wb_dirty_limits-again.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
Date: Thu, 18 Jan 2024 10:19:53 -0800
(struct dirty_throttle_control *)->thresh is an unsigned long, but is
passed as the u32 divisor argument to div_u64(). On architectures where
unsigned long is 64 bytes, the argument will be implicitly truncated.
Use div64_u64() instead of div_u64() so that the value used in the "is
this a safe division" check is the same as the divisor.
Also, remove redundant cast of the numerator to u64, as that should happen
implicitly.
This would be difficult to exploit in memcg domain, given the ratio-based
arithmetic domain_drity_limits() uses, but is much easier in global
writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Cc: Maxim Patlasov <MPatlasov(a)parallels.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page-writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page-writeback.c~mm-writeback-fix-possible-divide-by-zero-in-wb_dirty_limits-again
+++ a/mm/page-writeback.c
@@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struc
*/
dtc->wb_thresh = __wb_calc_thresh(dtc);
dtc->wb_bg_thresh = dtc->thresh ?
- div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
_
Patches currently in -mm which might be from zokeefe(a)google.com are
mm-writeback-fix-possible-divide-by-zero-in-wb_dirty_limits-again.patch
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3f489c2067c5824528212b0fc18b28d51332d906
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024011855-lyricist-marshy-4883@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f489c2067c5824528212b0fc18b28d51332d906 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 1 Dec 2023 17:21:31 +0000
Subject: [PATCH] binder: fix use-after-free in shinker's callback
The mmap read lock is used during the shrinker's callback, which means
that using alloc->vma pointer isn't safe as it can race with munmap().
As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and
triggering page reclaiming through the shrinker's debug sysfs. The
following KASAN report confirms the UAF:
==================================================================
BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
Read of size 8 at addr ffff356ed50e50f0 by task bash/478
CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
Hardware name: linux,dummy-virt (DT)
Call trace:
zap_page_range_single+0x470/0x4b8
binder_alloc_free_page+0x608/0xadc
__list_lru_walk_one+0x130/0x3b0
list_lru_walk_node+0xc4/0x22c
binder_shrink_scan+0x108/0x1dc
shrinker_debugfs_scan_write+0x2b4/0x500
full_proxy_write+0xd4/0x140
vfs_write+0x1ac/0x758
ksys_write+0xf0/0x1dc
__arm64_sys_write+0x6c/0x9c
Allocated by task 492:
kmem_cache_alloc+0x130/0x368
vm_area_alloc+0x2c/0x190
mmap_region+0x258/0x18bc
do_mmap+0x694/0xa60
vm_mmap_pgoff+0x170/0x29c
ksys_mmap_pgoff+0x290/0x3a0
__arm64_sys_mmap+0xcc/0x144
Freed by task 491:
kmem_cache_free+0x17c/0x3c8
vm_area_free_rcu_cb+0x74/0x98
rcu_core+0xa38/0x26d4
rcu_core_si+0x10/0x1c
__do_softirq+0x2fc/0xd24
Last potentially related work creation:
__call_rcu_common.constprop.0+0x6c/0xba0
call_rcu+0x10/0x1c
vm_area_free+0x18/0x24
remove_vma+0xe4/0x118
do_vmi_align_munmap.isra.0+0x718/0xb5c
do_vmi_munmap+0xdc/0x1fc
__vm_munmap+0x10c/0x278
__arm64_sys_munmap+0x58/0x7c
Fix this issue by performing instead a vma_lookup() which will fail to
find the vma that was isolated before the mmap lock downgrade. Note that
this option has better performance than upgrading to a mmap write lock
which would increase contention. Plus, mmap_write_trylock() has been
recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: stable(a)vger.kernel.org
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 138f6d43d13b..9d2eff70c3ba 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
+ vma = vma_lookup(mm, page_addr);
+ if (vma && vma != binder_alloc_get_vma(alloc))
+ goto err_invalid_vma;
list_lru_isolate(lru, item);
spin_unlock(lock);
@@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;
+err_invalid_vma:
+ mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
The value of an arithmetic expression period_ns * 1000 is subject
to overflow due to a failure to cast operands to a larger data
type before performing arithmetic
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 3e90b1c7ebe9 ("staging: comedi: ni_tio: tidy up ni_tio_set_clock_src() and helpers")
Cc: <stable(a)vger.kernel.org> # v5.15+
Reviewed-by: Ian Abbott <abbotti(a)mev.co.uk>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/comedi/drivers/ni_tio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/comedi/drivers/ni_tio.c b/drivers/comedi/drivers/ni_tio.c
index da6826d77e60..acc914903c70 100644
--- a/drivers/comedi/drivers/ni_tio.c
+++ b/drivers/comedi/drivers/ni_tio.c
@@ -800,7 +800,7 @@ static int ni_tio_set_clock_src(struct ni_gpct *counter,
GI_PRESCALE_X2(counter_dev->variant) |
GI_PRESCALE_X8(counter_dev->variant), bits);
}
- counter->clock_period_ps = period_ns * 1000;
+ counter->clock_period_ps = period_ns * 1000UL;
ni_tio_set_sync_mode(counter);
return 0;
}
--
2.25.1
hi,
we need to be able to use latest pahole options for 6.1 kernels,
updating the scripts/pahole-flags.sh with that (clean backports).
thanks,
jirka
v2 changes:
- added missing SOB
---
Alan Maguire (1):
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
Martin Rodriguez Reboredo (1):
btf, scripts: Exclude Rust CUs with pahole
init/Kconfig | 2 +-
lib/Kconfig.debug | 9 +++++++++
scripts/pahole-flags.sh | 7 +++++++
3 files changed, 17 insertions(+), 1 deletion(-)
For !CONFIG_SPARSE_IRQ kernel, early_irq_init() is supposed to
initialize all the desc entries in system, desc->resend_node
included.
Thus, initialize desc->resend_node for all irq_desc entries, rather
than irq_desc[0] only, which is the current implementation is about.
Fixes: bc06a9e08742 ("genirq: Use hlist for managing resend handlers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dawei Li <dawei.li(a)shingroup.cn>
---
kernel/irq/irqdesc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 27ca1c866f29..371eb1711d34 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -600,7 +600,7 @@ int __init early_irq_init(void)
mutex_init(&desc[i].request_mutex);
init_waitqueue_head(&desc[i].wait_for_threads);
desc_set_defaults(i, &desc[i], node, NULL, NULL);
- irq_resend_init(desc);
+ irq_resend_init(&desc[i]);
}
return arch_early_irq_init();
}
--
2.27.0
From: Zheng Wang <zyytlz.wz(a)163.com>
This is the candidate patch of CVE-2023-47233 :
https://nvd.nist.gov/vuln/detail/CVE-2023-47233
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Cc: stable(a)vger.kernel.org
[arend.vanspriel(a)broadcom.com: keep timer delete as is and cancel work just before free]
Signed-off-by: Arend van Spriel <arend.vanspriel(a)broadcom.com>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 133c5ea6429c..52df03243c9f 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -1179,8 +1179,7 @@ s32 brcmf_notify_escan_complete(struct brcmf_cfg80211_info *cfg,
scan_request = cfg->scan_request;
cfg->scan_request = NULL;
- if (timer_pending(&cfg->escan_timeout))
- del_timer_sync(&cfg->escan_timeout);
+ timer_delete_sync(&cfg->escan_timeout);
if (fw_abort) {
/* Do a scan abort to stop the driver's scan engine */
@@ -8435,6 +8434,7 @@ void brcmf_cfg80211_detach(struct brcmf_cfg80211_info *cfg)
brcmf_btcoex_detach(cfg);
wiphy_unregister(cfg->wiphy);
wl_deinit_priv(cfg);
+ cancel_work_sync(&cfg->escan_timeout_work);
brcmf_free_wiphy(cfg->wiphy);
kfree(cfg);
}
base-commit: 3aca362a4c1411ec11ff04f81b6cdf2359fee962
--
2.32.0
The value of an arithmetic expression period_ns * 1000 is subject
to overflow due to a failure to cast operands to a larger data
type before performing arithmetic
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 3e90b1c7ebe9 ("staging: comedi: ni_tio: tidy up ni_tio_set_clock_src() and helpers")
Reviewed-by: Ian Abbott <abbotti(a)mev.co.uk>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/comedi/drivers/ni_tio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/comedi/drivers/ni_tio.c b/drivers/comedi/drivers/ni_tio.c
index da6826d77e60..acc914903c70 100644
--- a/drivers/comedi/drivers/ni_tio.c
+++ b/drivers/comedi/drivers/ni_tio.c
@@ -800,7 +800,7 @@ static int ni_tio_set_clock_src(struct ni_gpct *counter,
GI_PRESCALE_X2(counter_dev->variant) |
GI_PRESCALE_X8(counter_dev->variant), bits);
}
- counter->clock_period_ps = period_ns * 1000;
+ counter->clock_period_ps = period_ns * 1000UL;
ni_tio_set_sync_mode(counter);
return 0;
}
--
2.25.1
When sme_alloc() is called with existing storage and we are not flushing we
will always allocate new storage, both leaking the existing storage and
corrupting the state. Fix this by separating the checks for flushing and
for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should
call sme_free() themselves.
Fixes: 5d0a8d2fba50 (arm64/ptrace: Ensure that SME is set up for target when writing SSVE state)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
---
arch/arm64/kernel/fpsimd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1559c706d32d..7363f2eb98e8 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1245,8 +1245,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
*/
void sme_alloc(struct task_struct *task, bool flush)
{
- if (task->thread.sme_state && flush) {
- memset(task->thread.sme_state, 0, sme_state_size(task));
+ if (task->thread.sme_state) {
+ if (flush)
+ memset(task->thread.sme_state, 0,
+ sme_state_size(task));
return;
}
---
base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
change-id: 20240112-arm64-sme-flush-09bdc40bb4fc
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3f489c2067c5824528212b0fc18b28d51332d906
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024011837-velocity-scheme-4395@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f489c2067c5824528212b0fc18b28d51332d906 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 1 Dec 2023 17:21:31 +0000
Subject: [PATCH] binder: fix use-after-free in shinker's callback
The mmap read lock is used during the shrinker's callback, which means
that using alloc->vma pointer isn't safe as it can race with munmap().
As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and
triggering page reclaiming through the shrinker's debug sysfs. The
following KASAN report confirms the UAF:
==================================================================
BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
Read of size 8 at addr ffff356ed50e50f0 by task bash/478
CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
Hardware name: linux,dummy-virt (DT)
Call trace:
zap_page_range_single+0x470/0x4b8
binder_alloc_free_page+0x608/0xadc
__list_lru_walk_one+0x130/0x3b0
list_lru_walk_node+0xc4/0x22c
binder_shrink_scan+0x108/0x1dc
shrinker_debugfs_scan_write+0x2b4/0x500
full_proxy_write+0xd4/0x140
vfs_write+0x1ac/0x758
ksys_write+0xf0/0x1dc
__arm64_sys_write+0x6c/0x9c
Allocated by task 492:
kmem_cache_alloc+0x130/0x368
vm_area_alloc+0x2c/0x190
mmap_region+0x258/0x18bc
do_mmap+0x694/0xa60
vm_mmap_pgoff+0x170/0x29c
ksys_mmap_pgoff+0x290/0x3a0
__arm64_sys_mmap+0xcc/0x144
Freed by task 491:
kmem_cache_free+0x17c/0x3c8
vm_area_free_rcu_cb+0x74/0x98
rcu_core+0xa38/0x26d4
rcu_core_si+0x10/0x1c
__do_softirq+0x2fc/0xd24
Last potentially related work creation:
__call_rcu_common.constprop.0+0x6c/0xba0
call_rcu+0x10/0x1c
vm_area_free+0x18/0x24
remove_vma+0xe4/0x118
do_vmi_align_munmap.isra.0+0x718/0xb5c
do_vmi_munmap+0xdc/0x1fc
__vm_munmap+0x10c/0x278
__arm64_sys_munmap+0x58/0x7c
Fix this issue by performing instead a vma_lookup() which will fail to
find the vma that was isolated before the mmap lock downgrade. Note that
this option has better performance than upgrading to a mmap write lock
which would increase contention. Plus, mmap_write_trylock() has been
recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: stable(a)vger.kernel.org
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 138f6d43d13b..9d2eff70c3ba 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
+ vma = vma_lookup(mm, page_addr);
+ if (vma && vma != binder_alloc_get_vma(alloc))
+ goto err_invalid_vma;
list_lru_isolate(lru, item);
spin_unlock(lock);
@@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;
+err_invalid_vma:
+ mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d592a9158a112d419f341f035d18d02f8d232def
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024011805-atlantic-diffuser-53a5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d592a9158a11 ("ksmbd: don't allow O_TRUNC open on read-only share")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
5dd8ce24667a ("cifs: missing directory in MAINTAINERS file")
332019e23a51 ("Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d592a9158a112d419f341f035d18d02f8d232def Mon Sep 17 00:00:00 2001
From: Namjae Jeon <linkinjeon(a)kernel.org>
Date: Sun, 7 Jan 2024 21:24:07 +0900
Subject: [PATCH] ksmbd: don't allow O_TRUNC open on read-only share
When file is changed using notepad on read-only share(read_only = yes in
ksmbd.conf), There is a problem where existing data is truncated.
notepad in windows try to O_TRUNC open(FILE_OVERWRITE_IF) and all data
in file is truncated. This patch don't allow O_TRUNC open on read-only
share and add KSMBD_TREE_CONN_FLAG_WRITABLE check in smb2_set_info().
Cc: stable(a)vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index a2f729675183..2a335bfe25a4 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -2972,7 +2972,7 @@ int smb2_open(struct ksmbd_work *work)
&may_flags);
if (!test_tree_conn_flag(tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)) {
- if (open_flags & O_CREAT) {
+ if (open_flags & (O_CREAT | O_TRUNC)) {
ksmbd_debug(SMB,
"User does not have write permission\n");
rc = -EACCES;
@@ -5944,12 +5944,6 @@ static int smb2_set_info_file(struct ksmbd_work *work, struct ksmbd_file *fp,
}
case FILE_RENAME_INFORMATION:
{
- if (!test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)) {
- ksmbd_debug(SMB,
- "User does not have write permission\n");
- return -EACCES;
- }
-
if (buf_len < sizeof(struct smb2_file_rename_info))
return -EINVAL;
@@ -5969,12 +5963,6 @@ static int smb2_set_info_file(struct ksmbd_work *work, struct ksmbd_file *fp,
}
case FILE_DISPOSITION_INFORMATION:
{
- if (!test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)) {
- ksmbd_debug(SMB,
- "User does not have write permission\n");
- return -EACCES;
- }
-
if (buf_len < sizeof(struct smb2_file_disposition_info))
return -EINVAL;
@@ -6036,7 +6024,7 @@ int smb2_set_info(struct ksmbd_work *work)
{
struct smb2_set_info_req *req;
struct smb2_set_info_rsp *rsp;
- struct ksmbd_file *fp;
+ struct ksmbd_file *fp = NULL;
int rc = 0;
unsigned int id = KSMBD_NO_FID, pid = KSMBD_NO_FID;
@@ -6056,6 +6044,13 @@ int smb2_set_info(struct ksmbd_work *work)
rsp = smb2_get_msg(work->response_buf);
}
+ if (!test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)) {
+ ksmbd_debug(SMB, "User does not have write permission\n");
+ pr_err("User does not have write permission\n");
+ rc = -EACCES;
+ goto err_out;
+ }
+
if (!has_file_id(id)) {
id = req->VolatileFileId;
pid = req->PersistentFileId;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8cf9bedfc3c47d24bb0de386f808f925dc52863e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024011859-dragging-hermit-6aa8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8cf9bedfc3c4 ("ksmbd: free ppace array on error in parse_dacl")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cf9bedfc3c47d24bb0de386f808f925dc52863e Mon Sep 17 00:00:00 2001
From: Fedor Pchelkin <pchelkin(a)ispras.ru>
Date: Tue, 9 Jan 2024 17:14:44 +0300
Subject: [PATCH] ksmbd: free ppace array on error in parse_dacl
The ppace array is not freed if one of the init_acl_state() calls inside
parse_dacl() fails. At the moment the function may fail only due to the
memory allocation errors so it's highly unlikely in this case but
nevertheless a fix is needed.
Move ppace allocation after the init_acl_state() calls with proper error
handling.
Found by Linux Verification Center (linuxtesting.org).
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smbacl.c b/fs/smb/server/smbacl.c
index 1164365533f0..1c9775f1efa5 100644
--- a/fs/smb/server/smbacl.c
+++ b/fs/smb/server/smbacl.c
@@ -401,10 +401,6 @@ static void parse_dacl(struct mnt_idmap *idmap,
if (num_aces > ULONG_MAX / sizeof(struct smb_ace *))
return;
- ppace = kmalloc_array(num_aces, sizeof(struct smb_ace *), GFP_KERNEL);
- if (!ppace)
- return;
-
ret = init_acl_state(&acl_state, num_aces);
if (ret)
return;
@@ -414,6 +410,13 @@ static void parse_dacl(struct mnt_idmap *idmap,
return;
}
+ ppace = kmalloc_array(num_aces, sizeof(struct smb_ace *), GFP_KERNEL);
+ if (!ppace) {
+ free_acl_state(&default_acl_state);
+ free_acl_state(&acl_state);
+ return;
+ }
+
/*
* reset rwx permissions for user/group/other.
* Also, if num_aces is 0 i.e. DACL has no ACEs,
The patch titled
Subject: selftests/mm: switch to bash from sh
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-switch-to-bash-from-sh.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests/mm: switch to bash from sh
Date: Tue, 16 Jan 2024 14:04:54 +0500
Running charge_reserved_hugetlb.sh generates errors if sh is set to
dash:
./charge_reserved_hugetlb.sh: 9: [[: not found
./charge_reserved_hugetlb.sh: 19: [[: not found
./charge_reserved_hugetlb.sh: 27: [[: not found
./charge_reserved_hugetlb.sh: 37: [[: not found
./charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected
Switch to using /bin/bash instead of /bin/sh. Make the switch for
write_hugetlb_memory.sh as well which is called from
charge_reserved_hugetlb.sh.
Link: https://lkml.kernel.org/r/20240116090455.3407378-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh~selftests-mm-switch-to-bash-from-sh
+++ a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Kselftest framework requirement - SKIP code is 4.
--- a/tools/testing/selftests/mm/write_hugetlb_memory.sh~selftests-mm-switch-to-bash-from-sh
+++ a/tools/testing/selftests/mm/write_hugetlb_memory.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
set -e
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
selftests-mm-mremap_test-fix-build-warning.patch
selftests-mm-run_vmtestssh-add-missing-tests.patch
selftests-mm-switch-to-bash-from-sh.patch
From: Tony Krowiak <akrowiak(a)linux.ibm.com>
When filtering the adapters from the configuration profile for a guest to
create or update a guest's AP configuration, if the APID of an adapter and
the APQI of a domain identify a queue device that is not bound to the
vfio_ap device driver, the APID of the adapter will be filtered because an
individual APQN can not be filtered due to the fact the APQNs are assigned
to an AP configuration as a matrix of APIDs and APQIs. Consequently, a
guest will not have access to all of the queues associated with the
filtered adapter. If the queues are subsequently made available again to
the guest, they should re-appear in a reset state; so, let's make sure all
queues associated with an adapter unplugged from the guest are reset.
In order to identify the set of queues that need to be reset, let's allow a
vfio_ap_queue object to be simultaneously stored in both a hashtable and a
list: A hashtable used to store all of the queues assigned
to a matrix mdev; and/or, a list used to store a subset of the queues that
need to be reset. For example, when an adapter is hot unplugged from a
guest, all guest queues associated with that adapter must be reset. Since
that may be a subset of those assigned to the matrix mdev, they can be
stored in a list that can be passed to the vfio_ap_mdev_reset_queues
function.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 171 +++++++++++++++++++-------
drivers/s390/crypto/vfio_ap_private.h | 11 +-
2 files changed, 133 insertions(+), 49 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e45d0bf7b126..44cd29aace8e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -32,7 +32,8 @@
#define AP_RESET_INTERVAL 20 /* Reset sleep interval (20ms) */
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
+static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev);
+static int vfio_ap_mdev_reset_qlist(struct list_head *qlist);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
static void vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
@@ -665,16 +666,23 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
* device driver.
*
* @matrix_mdev: the matrix mdev whose matrix is to be filtered.
+ * @apm_filtered: a 256-bit bitmap for storing the APIDs filtered from the
+ * guest's AP configuration that are still in the host's AP
+ * configuration.
*
* Note: If an APQN referencing a queue device that is not bound to the vfio_ap
* driver, its APID will be filtered from the guest's APCB. The matrix
* structure precludes filtering an individual APQN, so its APID will be
- * filtered.
+ * filtered. Consequently, all queues associated with the adapter that
+ * are in the host's AP configuration must be reset. If queues are
+ * subsequently made available again to the guest, they should re-appear
+ * in a reset state
*
* Return: a boolean value indicating whether the KVM guest's APCB was changed
* by the filtering or not.
*/
-static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
+static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apm_filtered)
{
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES);
@@ -684,6 +692,7 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
bitmap_copy(prev_shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
bitmap_copy(prev_shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+ bitmap_clear(apm_filtered, 0, AP_DEVICES);
/*
* Copy the adapters, domains and control domains to the shadow_apcb
@@ -709,8 +718,16 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
apqn = AP_MKQID(apid, apqi);
q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q || q->reset_status.response_code) {
- clear_bit_inv(apid,
- matrix_mdev->shadow_apcb.apm);
+ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+ /*
+ * If the adapter was previously plugged into
+ * the guest, let's let the caller know that
+ * the APID was filtered.
+ */
+ if (test_bit_inv(apid, prev_shadow_apm))
+ set_bit_inv(apid, apm_filtered);
+
break;
}
}
@@ -812,7 +829,7 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)
mutex_lock(&matrix_dev->guests_lock);
mutex_lock(&matrix_dev->mdevs_lock);
- vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ vfio_ap_mdev_reset_queues(matrix_mdev);
vfio_ap_mdev_unlink_fr_queues(matrix_mdev);
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->mdevs_lock);
@@ -922,6 +939,47 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
AP_MKQID(apid, apqi));
}
+static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apm_reset)
+{
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
+ unsigned long apid, apqi;
+ int apqn, ret = 0;
+
+ if (bitmap_empty(apm_reset, AP_DEVICES))
+ return 0;
+
+ INIT_LIST_HEAD(&qlist);
+
+ for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the domain is not in the host's AP configuration,
+ * then resetting it will fail with response code 01
+ * (APQN not valid).
+ */
+ if (!test_bit_inv(apqi,
+ (unsigned long *)matrix_dev->info.aqm))
+ continue;
+
+ apqn = AP_MKQID(apid, apqi);
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
+
+ if (q)
+ list_add_tail(&q->reset_qnode, &qlist);
+ }
+ }
+
+ ret = vfio_ap_mdev_reset_qlist(&qlist);
+
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode)
+ list_del(&q->reset_qnode);
+
+ return ret;
+}
+
/**
* assign_adapter_store - parses the APID from @buf and sets the
* corresponding bit in the mediated matrix device's APM
@@ -962,6 +1020,7 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -991,8 +1050,10 @@ static ssize_t assign_adapter_store(struct device *dev,
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
ret = count;
done:
@@ -1023,11 +1084,12 @@ static struct vfio_ap_queue
* adapter was assigned.
* @matrix_mdev: the matrix mediated device to which the adapter was assigned.
* @apid: the APID of the unassigned adapter.
- * @qtable: table for storing queues associated with unassigned adapter.
+ * @qlist: list for storing queues associated with unassigned adapter that
+ * need to be reset.
*/
static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid,
- struct ap_queue_table *qtable)
+ struct list_head *qlist)
{
unsigned long apqi;
struct vfio_ap_queue *q;
@@ -1035,11 +1097,10 @@ static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) {
+ if (q && qlist) {
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
- hash_add(qtable->queues, &q->mdev_qnode,
- q->apqn);
+ list_add_tail(&q->reset_qnode, qlist);
}
}
}
@@ -1047,26 +1108,23 @@ static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid)
{
- int loop_cursor;
- struct vfio_ap_queue *q;
- struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL);
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
- hash_init(qtable->queues);
- vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, qtable);
+ INIT_LIST_HEAD(&qlist);
+ vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qlist);
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
- vfio_ap_mdev_reset_queues(qtable);
+ vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) {
vfio_ap_unlink_mdev_fr_queue(q);
- hash_del(&q->mdev_qnode);
+ list_del(&q->reset_qnode);
}
-
- kfree(qtable);
}
/**
@@ -1167,6 +1225,7 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -1196,8 +1255,10 @@ static ssize_t assign_domain_store(struct device *dev,
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
ret = count;
done:
@@ -1210,7 +1271,7 @@ static DEVICE_ATTR_WO(assign_domain);
static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi,
- struct ap_queue_table *qtable)
+ struct list_head *qlist)
{
unsigned long apid;
struct vfio_ap_queue *q;
@@ -1218,11 +1279,10 @@ static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) {
+ if (q && qlist) {
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
- hash_add(qtable->queues, &q->mdev_qnode,
- q->apqn);
+ list_add_tail(&q->reset_qnode, qlist);
}
}
}
@@ -1230,26 +1290,23 @@ static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi)
{
- int loop_cursor;
- struct vfio_ap_queue *q;
- struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL);
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
- hash_init(qtable->queues);
- vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, qtable);
+ INIT_LIST_HEAD(&qlist);
+ vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qlist);
if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
- vfio_ap_mdev_reset_queues(qtable);
+ vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) {
vfio_ap_unlink_mdev_fr_queue(q);
- hash_del(&q->mdev_qnode);
+ list_del(&q->reset_qnode);
}
-
- kfree(qtable);
}
/**
@@ -1604,7 +1661,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
get_update_locks_for_kvm(kvm);
kvm_arch_crypto_clear_masks(kvm);
- vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ vfio_ap_mdev_reset_queues(matrix_mdev);
kvm_put_kvm(kvm);
matrix_mdev->kvm = NULL;
@@ -1740,15 +1797,33 @@ static void vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
}
}
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable)
+static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
{
int ret = 0, loop_cursor;
struct vfio_ap_queue *q;
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode)
+ hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode)
vfio_ap_mdev_reset_queue(q);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode) {
+ flush_work(&q->reset_work);
+
+ if (q->reset_status.response_code)
+ ret = -EIO;
+ }
+
+ return ret;
+}
+
+static int vfio_ap_mdev_reset_qlist(struct list_head *qlist)
+{
+ int ret = 0;
+ struct vfio_ap_queue *q;
+
+ list_for_each_entry(q, qlist, reset_qnode)
+ vfio_ap_mdev_reset_queue(q);
+
+ list_for_each_entry(q, qlist, reset_qnode) {
flush_work(&q->reset_work);
if (q->reset_status.response_code)
@@ -1934,7 +2009,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
- ret = vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ ret = vfio_ap_mdev_reset_queues(matrix_mdev);
break;
case VFIO_DEVICE_GET_IRQ_INFO:
ret = vfio_ap_get_irq_info(arg);
@@ -2080,6 +2155,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
int ret;
struct vfio_ap_queue *q;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev;
ret = sysfs_create_group(&apdev->device.kobj, &vfio_queue_attr_group);
@@ -2112,15 +2188,17 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
!bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS))
goto done;
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
}
done:
dev_set_drvdata(&apdev->device, q);
release_update_locks_for_mdev(matrix_mdev);
- return 0;
+ return ret;
err_remove_group:
sysfs_remove_group(&apdev->device.kobj, &vfio_queue_attr_group);
@@ -2464,6 +2542,7 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
{
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock);
@@ -2477,7 +2556,7 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
matrix_mdev->adm_add, AP_DOMAINS);
if (filter_adapters || filter_domains)
- do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
+ do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered);
if (filter_cdoms)
do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
@@ -2485,6 +2564,8 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
if (do_hotplug)
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+
mutex_unlock(&matrix_dev->mdevs_lock);
mutex_unlock(&matrix_mdev->kvm->lock);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 88aff8b81f2f..20eac8b0f0b9 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -83,10 +83,10 @@ struct ap_matrix {
};
/**
- * struct ap_queue_table - a table of queue objects.
- *
- * @queues: a hashtable of queues (struct vfio_ap_queue).
- */
+ * struct ap_queue_table - a table of queue objects.
+ *
+ * @queues: a hashtable of queues (struct vfio_ap_queue).
+ */
struct ap_queue_table {
DECLARE_HASHTABLE(queues, 8);
};
@@ -133,6 +133,8 @@ struct ap_matrix_mdev {
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
* @mdev_qnode: allows the vfio_ap_queue struct to be added to a hashtable
+ * @reset_qnode: allows the vfio_ap_queue struct to be added to a list of queues
+ * that need to be reset
* @reset_status: the status from the last reset of the queue
* @reset_work: work to wait for queue reset to complete
*/
@@ -143,6 +145,7 @@ struct vfio_ap_queue {
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
struct hlist_node mdev_qnode;
+ struct list_head reset_qnode;
struct ap_queue_status reset_status;
struct work_struct reset_work;
};
--
2.43.0
hi,
we need to be able to use latest pahole options for 6.1 kernels,
updating the scripts/pahole-flags.sh with that (clean backports).
thanks,
jirka
---
Alan Maguire (1):
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
Martin Rodriguez Reboredo (1):
btf, scripts: Exclude Rust CUs with pahole
init/Kconfig | 2 +-
lib/Kconfig.debug | 9 +++++++++
scripts/pahole-flags.sh | 7 +++++++
3 files changed, 17 insertions(+), 1 deletion(-)
[ Upstream commit a7939f01672034a58ad3fdbce69bb6c665ce0024 ]
Builtin/initrd microcode will not be used the ucode loader is disabled.
But currently, save_microcode_in_initrd is always performed and it
accesses MSR_IA32_UCODE_REV even if dis_ucode_ldr is true, and in
particular even if X86_FEATURE_HYPERVISOR is set; the TDX module does not
implement the MSR and the result is a call trace at boot for TDX guests.
Mainline Linux fixed this as part of a more complex rework of microcode
caching that went into 6.7 (see in particular commits dd5e3e3ca6,
"x86/microcode/intel: Simplify early loading"; and a7939f0167203,
"x86/microcode/amd: Cache builtin/initrd microcode early"). Do the bare
minimum in stable kernels, setting initrd_gone just like mainline Linux
does in mark_initrd_gone().
Note that save_microcode_in_initrd() is not in the microcode application
path, which runs with paging disabled on 32-bit systems, so it can (and
has to) use dis_ucode_ldr instead of check_loader_disabled_ap().
Cc: stable(a)vger.kernel.org # v6.6+
Cc: x86(a)kernel.org # v6.6+
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/cpu/microcode/core.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 35d39a13dc90..503b5da56685 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -214,6 +214,11 @@ static int __init save_microcode_in_initrd(void)
struct cpuinfo_x86 *c = &boot_cpu_data;
int ret = -EINVAL;
+ if (dis_ucode_ldr) {
+ ret = 0;
+ goto out;
+ }
+
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
if (c->x86 >= 6)
@@ -227,6 +230,7 @@ static int __init save_microcode_in_initrd(void)
break;
}
+out:
initrd_gone = true;
return ret;
--
2.43.0
Some DSA tagging protocols change the EtherType field in the MAC header
e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged
frames are ignored by the checksum offload engine and IP header checker of
some stmmac cores.
On RX, the stmmac driver wrongly assumes that checksums have been computed
for these tagged packets, and sets CHECKSUM_UNNECESSARY.
Add an additional check in the stmmac TX and RX hotpaths so that COE is
deactivated for packets with ethertypes that will not trigger the COE and
IP header checks.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
Cc: <stable(a)vger.kernel.org>
Reported-by: Richard Tresidder <rtresidd(a)electromag.com.au>
Link: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electro…
Reported-by: Romain Gantois <romain.gantois(a)bootlin.com>
Link: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin…
Reviewed-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Romain Gantois <romain.gantois(a)bootlin.com>
---
Hello everyone,
This is the sixth version of my proposed fix for the stmmac checksum
offloading issue that has recently been reported.
significant changes in v4:
- Removed "inline" from declaration of stmmac_has_ip_ethertype
significant changes in v3:
- Use __vlan_get_protocol to make sure that 8021Q-encapsulated
traffic is checked correctly.
significant changes in v2:
- Replaced the stmmac_link_up-based fix with an ethertype check in the TX
and RX hotpaths.
The Checksum Offloading Engine of some stmmac cores (e.g. DWMAC1000)
computes an incorrect checksum when presented with DSA-tagged packets. This
causes all TCP/UDP transfers to break when the stmmac device is connected
to the CPU port of a DSA switch.
I ran some tests using different tagging protocols with DSA_LOOP, and all
of the protocols that set a custom ethertype field in the MAC header caused
the checksum offload engine to ignore the tagged packets. On TX, this
caused packets to egress with incorrect checksums. On RX, these packets
were similarly ignored by the COE, yet the stmmac driver set
CHECKSUM_UNNECESSARY, wrongly assuming that their checksums had been
verified in hardware.
Version 2 of this patch series fixes this issue by checking ethertype
fields in both the TX and RX hotpaths of the stmmac driver. On TX, if a
non-IP ethertype is detected, the packet is checksummed in software. On
RX, the same condition causes stmmac to avoid setting CHECKSUM_UNNECESSARY.
To measure the performance degradation to the TX/RX hotpaths, I did some
iperf3 runs with 512-byte unfragmented UDP packets.
measured degradation on TX: -466 pps (-0.2%) on RX: -338 pps (-1.2%)
original performances on TX: 22kpps on RX: 27kpps
The performance hit on the RX path can be partly explained by the fact that
the stmmac driver doesn't set CHECKSUM_UNNECESSARY anymore.
The TX performance degradation observed in v2 seems to have improved.
It's not entirely clear to me why that is.
Best Regards,
Romain
Romain Gantois (1):
net: stmmac: Prevent DSA tags from breaking COE
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 23 ++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
--
2.43.0
---
Changes in v6:
- Style fixes
- Link to v5: https://lore.kernel.org/r/20240111-prevent_dsa_tags-v5-1-63e795a4d129@bootl…
Changes in v5:
- Added missing "net" tag to subject of patch series
- Link to v4: https://lore.kernel.org/r/20240109-prevent_dsa_tags-v4-1-f888771fa2f6@bootl…
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 32 ++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c78a96b8eb64..a0e46369ae15 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4435,6 +4435,28 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
+/**
+ * stmmac_has_ip_ethertype() - Check if packet has IP ethertype
+ * @skb: socket buffer to check
+ *
+ * Check if a packet has an ethertype that will trigger the IP header checks
+ * and IP/TCP checksum engine of the stmmac core.
+ *
+ * Return: true if the ethertype can trigger the checksum engine, false
+ * otherwise
+ */
+static bool stmmac_has_ip_ethertype(struct sk_buff *skb)
+{
+ int depth = 0;
+ __be16 proto;
+
+ proto = __vlan_get_protocol(skb, eth_header_parse_protocol(skb),
+ &depth);
+
+ return (depth <= ETH_HLEN) &&
+ (proto == htons(ETH_P_IP) || proto == htons(ETH_P_IPV6));
+}
+
/**
* stmmac_xmit - Tx entry point of the driver
* @skb : the socket buffer
@@ -4499,9 +4521,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
/* DWMAC IPs can be synthesized to support tx coe only for a few tx
* queues. In that case, checksum offloading for those queues that don't
* support tx coe needs to fallback to software checksum calculation.
+ *
+ * Packets that won't trigger the COE e.g. most DSA-tagged packets will
+ * also have to be checksummed in software.
*/
if (csum_insertion &&
- priv->plat->tx_queues_cfg[queue].coe_unsupported) {
+ (priv->plat->tx_queues_cfg[queue].coe_unsupported ||
+ !stmmac_has_ip_ethertype(skb))) {
if (unlikely(skb_checksum_help(skb)))
goto dma_map_err;
csum_insertion = !csum_insertion;
@@ -5066,7 +5092,7 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
stmmac_rx_vlan(priv->dev, skb);
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -5589,7 +5615,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
---
base-commit: a23aa04042187cbde16f470b49d4ad60d32e9206
change-id: 20240108-prevent_dsa_tags-7bb0def0db81
Best regards,
--
Romain Gantois <romain.gantois(a)bootlin.com>
The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain hack. If someone cares
about modelling the impact of the compander setting this can possibly be
done by exporting it as a volume control later.
Fixes: 2c4066e5d428 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route")
Cc: stable(a)vger.kernel.org # 5.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/lpass-wsa-macro.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/sound/soc/codecs/lpass-wsa-macro.c b/sound/soc/codecs/lpass-wsa-macro.c
index 7e21cec3c2fb..7de221464d47 100644
--- a/sound/soc/codecs/lpass-wsa-macro.c
+++ b/sound/soc/codecs/lpass-wsa-macro.c
@@ -1583,8 +1583,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
struct snd_soc_component *component = snd_soc_dapm_to_component(w->dapm);
u16 gain_reg;
u16 reg;
- int val;
- int offset_val = 0;
struct wsa_macro *wsa = snd_soc_component_get_drvdata(component);
if (w->shift == WSA_MACRO_COMP1) {
@@ -1623,11 +1621,7 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_ENABLE);
- offset_val = -2;
}
- val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
- snd_soc_component_write(component, gain_reg, val);
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
break;
@@ -1654,10 +1648,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_DISABLE);
- offset_val = 2;
- val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
- snd_soc_component_write(component, gain_reg, val);
}
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
--
2.41.0
The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain offset hack. If someone
cares about modelling the impact of the compander setting this can
possibly be done by exporting it as a volume control later.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Fixes: 2c4066e5d428 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route")
Cc: stable(a)vger.kernel.org # 5.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/codecs/lpass-wsa-macro.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/sound/soc/codecs/lpass-wsa-macro.c b/sound/soc/codecs/lpass-wsa-macro.c
index 7e21cec3c2fb..6ce309980cd1 100644
--- a/sound/soc/codecs/lpass-wsa-macro.c
+++ b/sound/soc/codecs/lpass-wsa-macro.c
@@ -1584,7 +1584,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
u16 gain_reg;
u16 reg;
int val;
- int offset_val = 0;
struct wsa_macro *wsa = snd_soc_component_get_drvdata(component);
if (w->shift == WSA_MACRO_COMP1) {
@@ -1623,10 +1622,8 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_ENABLE);
- offset_val = -2;
}
val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
snd_soc_component_write(component, gain_reg, val);
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
@@ -1654,10 +1651,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
CDC_WSA_RX1_RX_PATH_MIX_SEC0,
CDC_WSA_RX_PGA_HALF_DB_MASK,
CDC_WSA_RX_PGA_HALF_DB_DISABLE);
- offset_val = 2;
- val = snd_soc_component_read(component, gain_reg);
- val += offset_val;
- snd_soc_component_write(component, gain_reg, val);
}
wsa_macro_config_ear_spkr_gain(component, wsa,
event, gain_reg);
--
2.41.0
The current UCM configuration sets the speaker PA volume to 15 dB when
enabling the speakers but this does not prevent the user from increasing
the volume further.
Limit the PA volume to 15 dB in the machine driver to reduce the risk of
speaker damage until we have active speaker protection in place.
Note that this will probably need to be generalised using
machine-specific limits, but a common limit should do for now.
Cc: stable(a)vger.kernel.org # 6.5
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/sc8280xp.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/sound/soc/qcom/sc8280xp.c b/sound/soc/qcom/sc8280xp.c
index ed4bb551bfbb..aa43903421f5 100644
--- a/sound/soc/qcom/sc8280xp.c
+++ b/sound/soc/qcom/sc8280xp.c
@@ -32,12 +32,14 @@ static int sc8280xp_snd_init(struct snd_soc_pcm_runtime *rtd)
case WSA_CODEC_DMA_RX_0:
case WSA_CODEC_DMA_RX_1:
/*
- * set limit of 0dB on Digital Volume for Speakers,
- * this can prevent damage of speakers to some extent without
- * active speaker protection
+ * Set limit of 0 dB on Digital Volume and 15 dB on PA Volume
+ * to reduce the risk of speaker damage until we have active
+ * speaker protection in place.
*/
snd_soc_limit_volume(card, "WSA_RX0 Digital Volume", 84);
snd_soc_limit_volume(card, "WSA_RX1 Digital Volume", 84);
+ snd_soc_limit_volume(card, "SpkrLeft PA Volume", 12);
+ snd_soc_limit_volume(card, "SpkrRight PA Volume", 12);
break;
default:
break;
--
2.41.0
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
This reverts commit 88b065943cb583e890324d618e8d4b23460d51a3.
Lenovo 82TQ is unhappy if we do the display on sequence this
late. The display output shows severe corruption.
It's unclear if this is a failure on our part (perhaps
something to do with sending commands in LP mode after HS
/video mode transmission has been started? Though the backlight
on command at least seems to work) or simply that there are
some commands in the sequence that are needed to be done
earlier (eg. could be some DSC init stuff?). If the latter
then I don't think the current Windows code would work
either, but maybe this was originally tested with an older
driver, who knows.
Root causing this fully would likely require a lot of
experimentation which isn't really feasible without direct
access to the machine, so let's just accept failure and
go back to the original sequence.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10071
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/icl_dsi.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c b/drivers/gpu/drm/i915/display/icl_dsi.c
index ac456a2275db..eda4a8b88590 100644
--- a/drivers/gpu/drm/i915/display/icl_dsi.c
+++ b/drivers/gpu/drm/i915/display/icl_dsi.c
@@ -1155,6 +1155,7 @@ static void gen11_dsi_powerup_panel(struct intel_encoder *encoder)
}
intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_INIT_OTP);
+ intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON);
/* ensure all panel commands dispatched before enabling transcoder */
wait_for_cmds_dispatched_to_panel(encoder);
@@ -1255,8 +1256,6 @@ static void gen11_dsi_enable(struct intel_atomic_state *state,
/* step6d: enable dsi transcoder */
gen11_dsi_enable_transcoder(encoder);
- intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON);
-
/* step7: enable backlight */
intel_backlight_enable(crtc_state, conn_state);
intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_BACKLIGHT_ON);
--
2.41.0
There's a warning in btrfs_issue_discard() when the range is not aligned
to 512 bytes, originally added in 4d89d377bbb0 ("btrfs:
btrfs_issue_discard ensure offset/length are aligned to sector
boundaries"). We can't do sub-sector writes anyway so the adjustment is
the only thing that we can do and the warning is unnecessary.
CC: stable(a)vger.kernel.org # 4.19+
Reported-by: syzbot+4a4f1eba14eb5c3417d1(a)syzkaller.appspotmail.com
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/extent-tree.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6d680031211a..8e8cc1111277 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1260,7 +1260,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
u64 bytes_left, end;
u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
- if (WARN_ON(start != aligned_start)) {
+ /* Adjust the range to be aligned to 512B sectors if necessary. */
+ if (start != aligned_start) {
len -= aligned_start - start;
len = round_down(len, 1 << SECTOR_SHIFT);
start = aligned_start;
--
2.42.1
Upstream commit bac1ec551434 ("usb: xhci: Set quirk for
XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI
which fixes XHC timeout, which was seen on synopsys XHCs while
using SG buffers. But the support for this quirk isn't present
in the DWC3 layer.
We will encounter this XHCI timeout/hung issue if we run iperf
loopback tests using RTL8156 ethernet adaptor on DWC3 targets
with scatter-gather enabled. This gets resolved after enabling
the XHCI_SG_TRB_CACHE_SIZE_QUIRK. This patch enables it using
the xhci device property since its needed for DWC3 controller.
In Synopsys DWC3 databook,
Table 9-3: xHCI Debug Capability Limitations
Chained TRBs greater than TRB cache size: The debug capability
driver must not create a multi-TRB TD that describes smaller
than a 1K packet that spreads across 8 or more TRBs on either
the IN TR or the OUT TR.
Cc: <stable(a)vger.kernel.org> #5.11
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/dwc3/host.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 61f57fe5bb78..43230915323c 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -61,7 +61,7 @@ static int dwc3_host_get_irq(struct dwc3 *dwc)
int dwc3_host_init(struct dwc3 *dwc)
{
- struct property_entry props[4];
+ struct property_entry props[5];
struct platform_device *xhci;
int ret, irq;
int prop_idx = 0;
@@ -89,6 +89,8 @@ int dwc3_host_init(struct dwc3 *dwc)
memset(props, 0, sizeof(struct property_entry) * ARRAY_SIZE(props));
+ props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-sg-trb-cache-size-quirk");
+
if (dwc->usb3_lpm_capable)
props[prop_idx++] = PROPERTY_ENTRY_BOOL("usb3-lpm-capable");
--
2.25.1
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
If there is a problem after resetting a port, the do/while() loop that
checks the default value of DIVLSB register may run forever and spam the
I2C bus.
Add a delay before each read of DIVLSB, and a maximum number of tries to
prevent that situation from happening.
Also fail probe if port reset is unsuccessful.
Fixes: 10d8b34a4217 ("serial: max310x: Driver rework")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
---
drivers/tty/serial/max310x.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c
index 552e153a24e0..10bf6d75bf9e 100644
--- a/drivers/tty/serial/max310x.c
+++ b/drivers/tty/serial/max310x.c
@@ -237,6 +237,10 @@
#define MAX310x_REV_MASK (0xf8)
#define MAX310X_WRITE_BIT 0x80
+/* Port startup definitions */
+#define MAX310X_PORT_STARTUP_WAIT_RETRIES 20 /* Number of retries */
+#define MAX310X_PORT_STARTUP_WAIT_DELAY_MS 10 /* Delay between retries */
+
/* Crystal-related definitions */
#define MAX310X_XTAL_WAIT_RETRIES 20 /* Number of retries */
#define MAX310X_XTAL_WAIT_DELAY_MS 10 /* Delay between retries */
@@ -1346,6 +1350,9 @@ static int max310x_probe(struct device *dev, const struct max310x_devtype *devty
goto out_clk;
for (i = 0; i < devtype->nr; i++) {
+ bool started = false;
+ unsigned int try = 0, val = 0;
+
/* Reset port */
regmap_write(regmaps[i], MAX310X_MODE2_REG,
MAX310X_MODE2_RST_BIT);
@@ -1354,8 +1361,17 @@ static int max310x_probe(struct device *dev, const struct max310x_devtype *devty
/* Wait for port startup */
do {
- regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &ret);
- } while (ret != 0x01);
+ msleep(MAX310X_PORT_STARTUP_WAIT_DELAY_MS);
+ regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &val);
+
+ if (val == 0x01)
+ started = true;
+ } while (!started && (++try < MAX310X_PORT_STARTUP_WAIT_RETRIES));
+
+ if (!started) {
+ ret = dev_err_probe(dev, -EAGAIN, "port reset failed\n");
+ goto out_uart;
+ }
regmap_write(regmaps[i], MAX310X_MODE1_REG, devtype->mode1);
}
--
2.39.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Some people are seeing a warning similar to this when using a crystal:
max310x 11-006c: clock is not stable yet
The datasheet doesn't mention the maximum time to wait for the clock to be
stable when using a crystal, and it seems that the 10ms delay in the driver
is not always sufficient.
Jan Kundrát reported that it took three tries (each separated by 10ms) to
get a stable clock.
Modify behavior to check stable clock ready bit multiple times (20), and
waiting 10ms between each try.
Note: the first draft of the driver originally used a 50ms delay, without
checking the clock stable bit.
Then a loop with 1000 retries was implemented, each time reading the clock
stable bit.
Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Jan Kundrát <jan.kundrat(a)cesnet.cz>
Link: https://www.spinics.net/lists/linux-serial/msg35773.html
Link: https://lore.kernel.org/all/20240110174015.6f20195fde08e5c9e64e5675@hugovil…
Link: https://github.com/boundarydevices/linux/commit/e5dfe3e4a751392515d78051973…
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
---
drivers/tty/serial/max310x.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c
index b2c753ba9cbf..c0eb0615d945 100644
--- a/drivers/tty/serial/max310x.c
+++ b/drivers/tty/serial/max310x.c
@@ -237,6 +237,10 @@
#define MAX310x_REV_MASK (0xf8)
#define MAX310X_WRITE_BIT 0x80
+/* Crystal-related definitions */
+#define MAX310X_XTAL_WAIT_RETRIES 20 /* Number of retries */
+#define MAX310X_XTAL_WAIT_DELAY_MS 10 /* Delay between retries */
+
/* MAX3107 specific */
#define MAX3107_REV_ID (0xa0)
@@ -641,12 +645,19 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s,
/* Wait for crystal */
if (xtal) {
- unsigned int val = 0;
- msleep(10);
- regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val);
- if (!(val & MAX310X_STS_CLKREADY_BIT)) {
+ bool stable = false;
+ unsigned int try = 0, val = 0;
+
+ do {
+ msleep(MAX310X_XTAL_WAIT_DELAY_MS);
+ regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val);
+
+ if (val & MAX310X_STS_CLKREADY_BIT)
+ stable = true;
+ } while (!stable && (++try < MAX310X_XTAL_WAIT_RETRIES));
+
+ if (!stable)
dev_warn(dev, "clock is not stable yet\n");
- }
}
return bestfreq;
--
2.39.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
If regmap_read() returns a non-zero value, the 'val' variable can be left
uninitialized.
Clear it before calling regmap_read() to make sure we properly detect
the clock ready bit.
Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
---
drivers/tty/serial/max310x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c
index f3a99daebdaa..b2c753ba9cbf 100644
--- a/drivers/tty/serial/max310x.c
+++ b/drivers/tty/serial/max310x.c
@@ -641,7 +641,7 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s,
/* Wait for crystal */
if (xtal) {
- unsigned int val;
+ unsigned int val = 0;
msleep(10);
regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val);
if (!(val & MAX310X_STS_CLKREADY_BIT)) {
--
2.39.2
Some DSA tagging protocols change the EtherType field in the MAC header
e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged
frames are ignored by the checksum offload engine and IP header checker of
some stmmac cores.
On RX, the stmmac driver wrongly assumes that checksums have been computed
for these tagged packets, and sets CHECKSUM_UNNECESSARY.
Add an additional check in the stmmac TX and RX hotpaths so that COE is
deactivated for packets with ethertypes that will not trigger the COE and
IP header checks.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
Cc: <stable(a)vger.kernel.org>
Reported-by: Richard Tresidder <rtresidd(a)electromag.com.au>
Link: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electro…
Reported-by: Romain Gantois <romain.gantois(a)bootlin.com>
Link: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin…
Reviewed-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Romain Gantois <romain.gantois(a)bootlin.com>
---
Hello everyone,
This is the fifth version of my proposed fix for the stmmac checksum
offloading issue that has recently been reported.
significant changes in v4:
- Removed "inline" from declaration of stmmac_has_ip_ethertype
significant changes in v3:
- Use __vlan_get_protocol to make sure that 8021Q-encapsulated
traffic is checked correctly.
significant changes in v2:
- Replaced the stmmac_link_up-based fix with an ethertype check in the TX
and RX hotpaths.
The Checksum Offloading Engine of some stmmac cores (e.g. DWMAC1000)
computes an incorrect checksum when presented with DSA-tagged packets. This
causes all TCP/UDP transfers to break when the stmmac device is connected
to the CPU port of a DSA switch.
I ran some tests using different tagging protocols with DSA_LOOP, and all
of the protocols that set a custom ethertype field in the MAC header caused
the checksum offload engine to ignore the tagged packets. On TX, this
caused packets to egress with incorrect checksums. On RX, these packets
were similarly ignored by the COE, yet the stmmac driver set
CHECKSUM_UNNECESSARY, wrongly assuming that their checksums had been
verified in hardware.
Version 2 of this patch series fixes this issue by checking ethertype
fields in both the TX and RX hotpaths of the stmmac driver. On TX, if a
non-IP ethertype is detected, the packet is checksummed in software. On
RX, the same condition causes stmmac to avoid setting CHECKSUM_UNNECESSARY.
To measure the performance degradation to the TX/RX hotpaths, I did some
iperf3 runs with 512-byte unfragmented UDP packets.
measured degradation on TX: -466 pps (-0.2%) on RX: -338 pps (-1.2%)
original performances on TX: 22kpps on RX: 27kpps
The performance hit on the RX path can be partly explained by the fact that
the stmmac driver doesn't set CHECKSUM_UNNECESSARY anymore.
The TX performance degradation observed in v2 seems to have improved.
It's not entirely clear to me why that is.
Best Regards,
Romain
Romain Gantois (1):
net: stmmac: Prevent DSA tags from breaking COE
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 23 ++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
--
2.43.0
---
Changes in v5:
- Added missing "net" tag to subject of patch series
- Link to v4: https://lore.kernel.org/r/20240109-prevent_dsa_tags-v4-1-f888771fa2f6@bootl…
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 29 ++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 37e64283f910..b30dba06dbd1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4371,6 +4371,25 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
+/**
+ * stmmac_has_ip_ethertype() - Check if packet has IP ethertype
+ * @skb: socket buffer to check
+ *
+ * Check if a packet has an ethertype that will trigger the IP header checks
+ * and IP/TCP checksum engine of the stmmac core.
+ *
+ * Return: true if the ethertype can trigger the checksum engine, false otherwise
+ */
+static bool stmmac_has_ip_ethertype(struct sk_buff *skb)
+{
+ int depth = 0;
+ __be16 proto;
+
+ proto = __vlan_get_protocol(skb, eth_header_parse_protocol(skb), &depth);
+
+ return (depth <= ETH_HLEN) && (proto == htons(ETH_P_IP) || proto == htons(ETH_P_IPV6));
+}
+
/**
* stmmac_xmit - Tx entry point of the driver
* @skb : the socket buffer
@@ -4435,9 +4454,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
/* DWMAC IPs can be synthesized to support tx coe only for a few tx
* queues. In that case, checksum offloading for those queues that don't
* support tx coe needs to fallback to software checksum calculation.
+ *
+ * Packets that won't trigger the COE e.g. most DSA-tagged packets will
+ * also have to be checksummed in software.
*/
if (csum_insertion &&
- priv->plat->tx_queues_cfg[queue].coe_unsupported) {
+ (priv->plat->tx_queues_cfg[queue].coe_unsupported ||
+ !stmmac_has_ip_ethertype(skb))) {
if (unlikely(skb_checksum_help(skb)))
goto dma_map_err;
csum_insertion = !csum_insertion;
@@ -4997,7 +5020,7 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
stmmac_rx_vlan(priv->dev, skb);
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -5513,7 +5536,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
stmmac_rx_vlan(priv->dev, skb);
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
---
base-commit: ac631873c9e7a50d2a8de457cfc4b9f86666403e
change-id: 20240108-prevent_dsa_tags-7bb0def0db81
Best regards,
--
Romain Gantois <romain.gantois(a)bootlin.com>
Il 16/01/24 11:52, Sasha Levin ha scritto:
> This is a note to let you know that I've just added the patch titled
>
> ASoC: SOF: mediatek: mt8186: Add Google Steelix topology compatible
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> asoc-sof-mediatek-mt8186-add-google-steelix-topology.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
This commit got reverted afterwards, please don't backport.
Ref.:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git/commit/?i…
Thanks,
Angelo
>
>
> commit 152629176282853223e1768c312539ee0d847384
> Author: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
> Date: Thu Nov 23 09:44:54 2023 +0100
>
> ASoC: SOF: mediatek: mt8186: Add Google Steelix topology compatible
>
> [ Upstream commit 505c83212da5bfca95109421b8f5d9f8c6cdfef2 ]
>
> Add the machine compatible and topology filename for the Google Steelix
> MT8186 Chromebook to load the correct SOF topology file.
>
> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
> Link: https://lore.kernel.org/r/20231123084454.20471-1-angelogioacchino.delregno@…
> Signed-off-by: Mark Brown <broonie(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/sound/soc/sof/mediatek/mt8186/mt8186.c b/sound/soc/sof/mediatek/mt8186/mt8186.c
> index 181189e00e02..76ce90e1f103 100644
> --- a/sound/soc/sof/mediatek/mt8186/mt8186.c
> +++ b/sound/soc/sof/mediatek/mt8186/mt8186.c
> @@ -596,6 +596,9 @@ static struct snd_sof_dsp_ops sof_mt8186_ops = {
>
> static struct snd_sof_of_mach sof_mt8186_machs[] = {
> {
> + .compatible = "google,steelix",
> + .sof_tplg_filename = "sof-mt8186-google-steelix.tplg"
> + }, {
> .compatible = "mediatek,mt8186",
> .sof_tplg_filename = "sof-mt8186.tplg",
> },
In stable linux (4.19~5.15), if “CONFIG_BPF_SYSCALL=y” is set,
the .config generated by Kconfig does not set
“CONFIG_BPF_JIT_ALWAYS_ON” and “CONFIG_BPF_UNPRIV_DEFAULT_OFF”.
If the kernel is compiled with such .config, a normal user
without any capabilities at all can load eBPF programs
(SOCKET_FILTER type), and uses the interpreter.
Due to the threat of side-channel attacks and inextirpable
mistakes in the verifier, this is considered insecure.
We have report this issue to maintainers of architectures.
RISCV and s390 maintainers have confirmed and advise us to
patch the Kconfig so that all architectures can be fixed.
So this patch add "default y" to these config entries.
On the other hand, we found that such configs facilitate kernel
bug exploitation. Specifically, an attacker can leverage existing
CVEs to corrupt eBPF prog-array map, hijacking a bpf_prog pointer
(ptrs[xx]) to point to a forged BPF program. In this way, arbitrary
bytecode execution can be achieved, we have proved this concept with
various CVEs(e.g. CVE-2018-18445). Such an attack enhances the
exploitability of CVEs, and is more dangerous than side-channel
threats.
Signed-off-by: liboti <hoshimi10mang(a)163.com>
---
kernel/bpf/Kconfig | 91 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 91 insertions(+)
create mode 100644 kernel/bpf/Kconfig
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
new file mode 100644
index 0000000..8abdc0d
--- /dev/null
+++ b/kernel/bpf/Kconfig
@@ -0,0 +1,91 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+# BPF interpreter that, for example, classic socket filters depend on.
+config BPF
+ bool
+
+# Used by archs to tell that they support BPF JIT compiler plus which
+# flavour. Only one of the two can be selected for a specific arch since
+# eBPF JIT supersedes the cBPF JIT.
+
+# Classic BPF JIT (cBPF)
+config HAVE_CBPF_JIT
+ bool
+
+# Extended BPF JIT (eBPF)
+config HAVE_EBPF_JIT
+ bool
+
+# Used by archs to tell that they want the BPF JIT compiler enabled by
+# default for kernels that were compiled with BPF JIT support.
+config ARCH_WANT_DEFAULT_BPF_JIT
+ bool
+
+menu "BPF subsystem"
+
+config BPF_SYSCALL
+ bool "Enable bpf() system call"
+ select BPF
+ select IRQ_WORK
+ select TASKS_TRACE_RCU
+ select BINARY_PRINTF
+ select NET_SOCK_MSG if NET
+ default n
+ help
+ Enable the bpf() system call that allows to manipulate BPF programs
+ and maps via file descriptors.
+
+config BPF_JIT
+ bool "Enable BPF Just In Time compiler"
+ depends on BPF
+ depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
+ depends on MODULES
+ help
+ BPF programs are normally handled by a BPF interpreter. This option
+ allows the kernel to generate native code when a program is loaded
+ into the kernel. This will significantly speed-up processing of BPF
+ programs.
+
+ Note, an admin should enable this feature changing:
+ /proc/sys/net/core/bpf_jit_enable
+ /proc/sys/net/core/bpf_jit_harden (optional)
+ /proc/sys/net/core/bpf_jit_kallsyms (optional)
+
+config BPF_JIT_ALWAYS_ON
+ bool "Permanently enable BPF JIT and remove BPF interpreter"
+ depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
+ default y
+ help
+ Enables BPF JIT and removes BPF interpreter to avoid speculative
+ execution of BPF instructions by the interpreter.
+
+config BPF_JIT_DEFAULT_ON
+ def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON
+ depends on HAVE_EBPF_JIT && BPF_JIT
+
+config BPF_UNPRIV_DEFAULT_OFF
+ bool "Disable unprivileged BPF by default"
+ depends on BPF_SYSCALL
+ default y
+ help
+ Disables unprivileged BPF by default by setting the corresponding
+ /proc/sys/kernel/unprivileged_bpf_disabled knob to 2. An admin can
+ still reenable it by setting it to 0 later on, or permanently
+ disable it by setting it to 1 (from which no other transition to
+ 0 is possible anymore).
+
+source "kernel/bpf/preload/Kconfig"
+
+config BPF_LSM
+ bool "Enable BPF LSM Instrumentation"
+ depends on BPF_EVENTS
+ depends on BPF_SYSCALL
+ depends on SECURITY
+ depends on BPF_JIT
+ help
+ Enables instrumentation of the security hooks with BPF programs for
+ implementing dynamic MAC and Audit Policies.
+
+ If you are unsure how to answer this question, answer N.
+
+endmenu # "BPF subsystem"
--
2.34.1
The current UCM configuration sets the speaker PA volume to 15 dB when
enabling the speakers but this does not prevent the user from increasing
the volume further.
Limit the PA volume to 15 dB in the machine driver to reduce the risk of
speaker damage until we have active speaker protection in place.
Note that this will probably need to be generalised using
machine-specific limits, but a common limit should do for now.
Cc: stable(a)vger.kernel.org # 6.5
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/sc8280xp.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/sound/soc/qcom/sc8280xp.c b/sound/soc/qcom/sc8280xp.c
index ed4bb551bfbb..aa43903421f5 100644
--- a/sound/soc/qcom/sc8280xp.c
+++ b/sound/soc/qcom/sc8280xp.c
@@ -32,12 +32,14 @@ static int sc8280xp_snd_init(struct snd_soc_pcm_runtime *rtd)
case WSA_CODEC_DMA_RX_0:
case WSA_CODEC_DMA_RX_1:
/*
- * set limit of 0dB on Digital Volume for Speakers,
- * this can prevent damage of speakers to some extent without
- * active speaker protection
+ * Set limit of 0 dB on Digital Volume and 15 dB on PA Volume
+ * to reduce the risk of speaker damage until we have active
+ * speaker protection in place.
*/
snd_soc_limit_volume(card, "WSA_RX0 Digital Volume", 84);
snd_soc_limit_volume(card, "WSA_RX1 Digital Volume", 84);
+ snd_soc_limit_volume(card, "SpkrLeft PA Volume", 12);
+ snd_soc_limit_volume(card, "SpkrRight PA Volume", 12);
break;
default:
break;
--
2.41.0
From: Rui Zhang <zr.zhang(a)vivo.com>
[ Upstream commit 7993d3a9c34f609c02171e115fd12c10e2105ff4 ]
The use_count of a regulator should only be incremented when the
enable_count changes from 0 to 1. Similarly, the use_count should
only be decremented when the enable_count changes from 1 to 0.
In the previous implementation, use_count was sometimes decremented
to 0 when some consumer called unbalanced disable,
leading to unexpected disable even the regulator is enabled by
other consumers. With this change, the use_count accurately reflects
the number of users which the regulator is enabled.
This should make things more robust in the case where a consumer does
leak references.
Signed-off-by: Rui Zhang <zr.zhang(a)vivo.com>
Link: https://lore.kernel.org/r/20231103074231.8031-1-zr.zhang@vivo.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/regulator/core.c | 56 +++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 26 deletions(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 87d0cd6f49ca..894915892eaf 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2658,7 +2658,8 @@ static int _regulator_enable(struct regulator *regulator)
/* Fallthrough on positive return values - already enabled */
}
- rdev->use_count++;
+ if (regulator->enable_count == 1)
+ rdev->use_count++;
return 0;
@@ -2736,37 +2737,40 @@ static int _regulator_disable(struct regulator *regulator)
lockdep_assert_held_once(&rdev->mutex.base);
- if (WARN(rdev->use_count <= 0,
+ if (WARN(regulator->enable_count == 0,
"unbalanced disables for %s\n", rdev_get_name(rdev)))
return -EIO;
- /* are we the last user and permitted to disable ? */
- if (rdev->use_count == 1 &&
- (rdev->constraints && !rdev->constraints->always_on)) {
-
- /* we are last user */
- if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
- ret = _notifier_call_chain(rdev,
- REGULATOR_EVENT_PRE_DISABLE,
- NULL);
- if (ret & NOTIFY_STOP_MASK)
- return -EINVAL;
-
- ret = _regulator_do_disable(rdev);
- if (ret < 0) {
- rdev_err(rdev, "failed to disable\n");
- _notifier_call_chain(rdev,
- REGULATOR_EVENT_ABORT_DISABLE,
+ if (regulator->enable_count == 1) {
+ /* disabling last enable_count from this regulator */
+ /* are we the last user and permitted to disable ? */
+ if (rdev->use_count == 1 &&
+ (rdev->constraints && !rdev->constraints->always_on)) {
+
+ /* we are last user */
+ if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
+ ret = _notifier_call_chain(rdev,
+ REGULATOR_EVENT_PRE_DISABLE,
+ NULL);
+ if (ret & NOTIFY_STOP_MASK)
+ return -EINVAL;
+
+ ret = _regulator_do_disable(rdev);
+ if (ret < 0) {
+ rdev_err(rdev, "failed to disable\n");
+ _notifier_call_chain(rdev,
+ REGULATOR_EVENT_ABORT_DISABLE,
+ NULL);
+ return ret;
+ }
+ _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
NULL);
- return ret;
}
- _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
- NULL);
- }
- rdev->use_count = 0;
- } else if (rdev->use_count > 1) {
- rdev->use_count--;
+ rdev->use_count = 0;
+ } else if (rdev->use_count > 1) {
+ rdev->use_count--;
+ }
}
if (ret == 0)
--
2.43.0
From: Chris Riches <chris.riches(a)nutanix.com>
[ Upstream commit 022732e3d846e197539712e51ecada90ded0572a ]
When auditd_set sets the auditd_conn pointer, audit messages can
immediately be put on the socket by other kernel threads. If the backlog
is large or the rate is high, this can immediately fill the socket
buffer. If the audit daemon requested an ACK for this operation, a full
socket buffer causes the ACK to get dropped, also setting ENOBUFS on the
socket.
To avoid this race and ensure ACKs get through, fast-track the ACK in
this specific case to ensure it is sent before auditd_conn is set.
Signed-off-by: Chris Riches <chris.riches(a)nutanix.com>
[PM: fix some tab vs space damage]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/audit.c | 31 ++++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/kernel/audit.c b/kernel/audit.c
index 471d3ad910aa..5fb87eccb8c2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -498,15 +498,19 @@ static void auditd_conn_free(struct rcu_head *rcu)
* @pid: auditd PID
* @portid: auditd netlink portid
* @net: auditd network namespace pointer
+ * @skb: the netlink command from the audit daemon
+ * @ack: netlink ack flag, cleared if ack'd here
*
* Description:
* This function will obtain and drop network namespace references as
* necessary. Returns zero on success, negative values on failure.
*/
-static int auditd_set(struct pid *pid, u32 portid, struct net *net)
+static int auditd_set(struct pid *pid, u32 portid, struct net *net,
+ struct sk_buff *skb, bool *ack)
{
unsigned long flags;
struct auditd_connection *ac_old, *ac_new;
+ struct nlmsghdr *nlh;
if (!pid || !net)
return -EINVAL;
@@ -518,6 +522,13 @@ static int auditd_set(struct pid *pid, u32 portid, struct net *net)
ac_new->portid = portid;
ac_new->net = get_net(net);
+ /* send the ack now to avoid a race with the queue backlog */
+ if (*ack) {
+ nlh = nlmsg_hdr(skb);
+ netlink_ack(skb, nlh, 0, NULL);
+ *ack = false;
+ }
+
spin_lock_irqsave(&auditd_conn_lock, flags);
ac_old = rcu_dereference_protected(auditd_conn,
lockdep_is_held(&auditd_conn_lock));
@@ -1204,7 +1215,8 @@ static int audit_replace(struct pid *pid)
return auditd_send_unicast_skb(skb);
}
-static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
+ bool *ack)
{
u32 seq;
void *data;
@@ -1296,7 +1308,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
/* register a new auditd connection */
err = auditd_set(req_pid,
NETLINK_CB(skb).portid,
- sock_net(NETLINK_CB(skb).sk));
+ sock_net(NETLINK_CB(skb).sk),
+ skb, ack);
if (audit_enabled != AUDIT_OFF)
audit_log_config_change("audit_pid",
new_pid,
@@ -1529,9 +1542,10 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
* Parse the provided skb and deal with any messages that may be present,
* malformed skbs are discarded.
*/
-static void audit_receive(struct sk_buff *skb)
+static void audit_receive(struct sk_buff *skb)
{
struct nlmsghdr *nlh;
+ bool ack;
/*
* len MUST be signed for nlmsg_next to be able to dec it below 0
* if the nlmsg_len was not aligned
@@ -1544,9 +1558,12 @@ static void audit_receive(struct sk_buff *skb)
audit_ctl_lock();
while (nlmsg_ok(nlh, len)) {
- err = audit_receive_msg(skb, nlh);
- /* if err or if this message says it wants a response */
- if (err || (nlh->nlmsg_flags & NLM_F_ACK))
+ ack = nlh->nlmsg_flags & NLM_F_ACK;
+ err = audit_receive_msg(skb, nlh, &ack);
+
+ /* send an ack if the user asked for one and audit_receive_msg
+ * didn't already do it, or if there was an error. */
+ if (ack || err)
netlink_ack(skb, nlh, err, NULL);
nlh = nlmsg_next(nlh, &len);
--
2.43.0
Upstream commit bac1ec551434 ("usb: xhci: Set quirk for
XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI
which fixes XHC timeout, which was seen on synopsys XHCs while
using SG buffers. Currently this quirk can only be set using
xhci private data. But there are some drivers like dwc3/host.c
which adds adds quirks using software node for xhci device.
Hence set this xhci quirk by iterating over device properties.
Cc: <stable(a)vger.kernel.org> # 5.11
Fixes: bac1ec551434 ("usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK")
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/host/xhci-plat.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index f04fde19f551..3d071b875308 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -253,6 +253,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
if (device_property_read_bool(tmpdev, "quirk-broken-port-ped"))
xhci->quirks |= XHCI_BROKEN_PORT_PED;
+ if (device_property_read_bool(tmpdev, "xhci-sg-trb-cache-size-quirk"))
+ xhci->quirks |= XHCI_SG_TRB_CACHE_SIZE_QUIRK;
+
device_property_read_u32(tmpdev, "imod-interval-ns",
&xhci->imod_interval);
}
--
2.25.1
Upstream commit bac1ec551434 ("usb: xhci: Set quirk for
XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI
which fixes XHC timeout, which was seen on synopsys XHCs while
using SG buffers. But the support for this quirk isn't present
in the DWC3 layer.
We will encounter this XHCI timeout/hung issue if we run iperf
loopback tests using RTL8156 ethernet adaptor on DWC3 targets
with scatter-gather enabled. This gets resolved after enabling
the XHCI_SG_TRB_CACHE_SIZE_QUIRK. This patch enables it using
the xhci device property since its needed for DWC3 controller.
In Synopsys DWC3 databook,
Table 9-3: xHCI Debug Capability Limitations
Chained TRBs greater than TRB cache size: The debug capability
driver must not create a multi-TRB TD that describes smaller
than a 1K packet that spreads across 8 or more TRBs on either
the IN TR or the OUT TR.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/dwc3/host.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 61f57fe5bb78..31a496233d87 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -89,6 +89,8 @@ int dwc3_host_init(struct dwc3 *dwc)
memset(props, 0, sizeof(struct property_entry) * ARRAY_SIZE(props));
+ props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-sg-trb-cache-size-quirk");
+
if (dwc->usb3_lpm_capable)
props[prop_idx++] = PROPERTY_ENTRY_BOOL("usb3-lpm-capable");
--
2.25.1
has_extra_refcount() makes the assumption that the page cache adds a ref
count of 1 and subtracts this in the extra_pins case. Commit a08c7193e4f1
(mm/filemap: remove hugetlb special casing in filemap.c) modifies
__filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases
(including hugtetlb) where nr is the number of pages in the folio. We
should adjust the number of references coming from the page cache by
subtracing the number of pages rather than 1.
In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong
flag as, in the hugetlb case, memory-failure code calls
folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is the
correct function to test for that flag.
After these fixes, the hugetlb hwpoison read selftest passes all cases.
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.c…
Cc: <stable(a)vger.kernel.org> # 6.7+
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
v1 -> v2:
move ref_count adjustment to if(extra_pins) block as that represents
ref counts from the page cache per Miaohe Lin.
fs/hugetlbfs/inode.c | 2 +-
mm/memory-failure.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 36132c9125f9..3a248e4f7e93 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
folio_unlock(folio);
- if (!folio_test_has_hwpoisoned(folio))
+ if (!folio_test_hwpoison(folio))
want = nr;
else {
/*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d8c853b35dbb..ef7ae73b65bd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -976,7 +976,7 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,
int count = page_count(p) - 1;
if (extra_pins)
- count -= 1;
+ count -= folio_nr_pages(page_folio(p));
if (count > 0) {
pr_err("%#lx: %s still referenced by %d users\n",
--
2.31.1
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit d6e81532b10d8deb2bc30f7b44f09534876893e3 ]
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.
For symmetry do the same with pfn_to_virt().
For compiletime resolution of __pa() we need PAGE_OFFSET which
was not available to __pa() and resolved by the preprocessor
wherever __pa() was used. Fix this by explicitly including
<asm/mem-layout.h> where required, following the pattern of the
architectures page.h file.
Acked-by: Brian Cain <bcain(a)quicinc.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/hexagon/include/asm/page.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h
index 93f5669b4aa1..a12ba19e6460 100644
--- a/arch/hexagon/include/asm/page.h
+++ b/arch/hexagon/include/asm/page.h
@@ -91,6 +91,9 @@ typedef struct page *pgtable_t;
#define __pgd(x) ((pgd_t) { (x) })
#define __pgprot(x) ((pgprot_t) { (x) })
+/* Needed for PAGE_OFFSET used in the macro right below */
+#include <asm/mem-layout.h>
+
/*
* We need a __pa and a __va routine for kernel space.
* MIPS says they're only used during mem_init.
@@ -140,8 +143,16 @@ static inline void clear_page(void *page)
*/
#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
-#define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
-#define pfn_to_virt(pfn) __va((pfn) << PAGE_SHIFT)
+static inline unsigned long virt_to_pfn(const void *kaddr)
+{
+ return __pa(kaddr) >> PAGE_SHIFT;
+}
+
+static inline void *pfn_to_virt(unsigned long pfn)
+{
+ return (void *)((unsigned long)__va(pfn) << PAGE_SHIFT);
+}
+
#define page_to_virt(page) __va(page_to_phys(page))
--
2.43.0
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit d6e81532b10d8deb2bc30f7b44f09534876893e3 ]
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.
For symmetry do the same with pfn_to_virt().
For compiletime resolution of __pa() we need PAGE_OFFSET which
was not available to __pa() and resolved by the preprocessor
wherever __pa() was used. Fix this by explicitly including
<asm/mem-layout.h> where required, following the pattern of the
architectures page.h file.
Acked-by: Brian Cain <bcain(a)quicinc.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/hexagon/include/asm/page.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h
index ee31f36f48f3..62976e38a963 100644
--- a/arch/hexagon/include/asm/page.h
+++ b/arch/hexagon/include/asm/page.h
@@ -78,6 +78,9 @@ typedef struct page *pgtable_t;
#define __pgd(x) ((pgd_t) { (x) })
#define __pgprot(x) ((pgprot_t) { (x) })
+/* Needed for PAGE_OFFSET used in the macro right below */
+#include <asm/mem-layout.h>
+
/*
* We need a __pa and a __va routine for kernel space.
* MIPS says they're only used during mem_init.
@@ -127,8 +130,16 @@ static inline void clear_page(void *page)
*/
#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
-#define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
-#define pfn_to_virt(pfn) __va((pfn) << PAGE_SHIFT)
+static inline unsigned long virt_to_pfn(const void *kaddr)
+{
+ return __pa(kaddr) >> PAGE_SHIFT;
+}
+
+static inline void *pfn_to_virt(unsigned long pfn)
+{
+ return (void *)((unsigned long)__va(pfn) << PAGE_SHIFT);
+}
+
#define page_to_virt(page) __va(page_to_phys(page))
--
2.43.0
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit d6e81532b10d8deb2bc30f7b44f09534876893e3 ]
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.
For symmetry do the same with pfn_to_virt().
For compiletime resolution of __pa() we need PAGE_OFFSET which
was not available to __pa() and resolved by the preprocessor
wherever __pa() was used. Fix this by explicitly including
<asm/mem-layout.h> where required, following the pattern of the
architectures page.h file.
Acked-by: Brian Cain <bcain(a)quicinc.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/hexagon/include/asm/page.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h
index 7cbf719c578e..2d8c681c3469 100644
--- a/arch/hexagon/include/asm/page.h
+++ b/arch/hexagon/include/asm/page.h
@@ -78,6 +78,9 @@ typedef struct page *pgtable_t;
#define __pgd(x) ((pgd_t) { (x) })
#define __pgprot(x) ((pgprot_t) { (x) })
+/* Needed for PAGE_OFFSET used in the macro right below */
+#include <asm/mem-layout.h>
+
/*
* We need a __pa and a __va routine for kernel space.
* MIPS says they're only used during mem_init.
@@ -126,8 +129,16 @@ static inline void clear_page(void *page)
*/
#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
-#define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
-#define pfn_to_virt(pfn) __va((pfn) << PAGE_SHIFT)
+static inline unsigned long virt_to_pfn(const void *kaddr)
+{
+ return __pa(kaddr) >> PAGE_SHIFT;
+}
+
+static inline void *pfn_to_virt(unsigned long pfn)
+{
+ return (void *)((unsigned long)__va(pfn) << PAGE_SHIFT);
+}
+
#define page_to_virt(page) __va(page_to_phys(page))
--
2.43.0
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
[ Upstream commit 65f8780e2d70257200547b5a7654974aa7c37ce1 ]
The size of vmalloc area depends from various factors
on boot and could be set to:
1. Default size as determined by VMALLOC_DEFAULT_SIZE macro;
2. One half of the virtual address space not occupied by
modules and fixed mappings;
3. The size provided by user with vmalloc= kernel command
line parameter;
In cases [1] and [2] the vmalloc area base address is aligned
on Region3 table type boundary, while in case [3] in might get
aligned on page boundary.
Limit the waste of page tables and always align vmalloc area
size and base address on segment boundary.
Acked-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/s390/boot/ipl_parm.c | 2 +-
arch/s390/boot/startup.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/s390/boot/ipl_parm.c b/arch/s390/boot/ipl_parm.c
index 7b7521762633..4230144645bc 100644
--- a/arch/s390/boot/ipl_parm.c
+++ b/arch/s390/boot/ipl_parm.c
@@ -272,7 +272,7 @@ void parse_boot_command_line(void)
memory_limit = round_down(memparse(val, NULL), PAGE_SIZE);
if (!strcmp(param, "vmalloc") && val) {
- vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE);
+ vmalloc_size = round_up(memparse(val, NULL), _SEGMENT_SIZE);
vmalloc_size_set = 1;
}
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
index d3e48bd9c394..d08db5df6091 100644
--- a/arch/s390/boot/startup.c
+++ b/arch/s390/boot/startup.c
@@ -212,7 +212,8 @@ static unsigned long setup_kernel_memory_layout(void)
VMALLOC_END = MODULES_VADDR;
/* allow vmalloc area to occupy up to about 1/2 of the rest virtual space left */
- vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, _REGION3_SIZE));
+ vsize = round_down(VMALLOC_END / 2, _SEGMENT_SIZE);
+ vmalloc_size = min(vmalloc_size, vsize);
VMALLOC_START = VMALLOC_END - vmalloc_size;
/* split remaining virtual space between 1:1 mapping & vmemmap array */
--
2.43.0
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
[ Upstream commit 65f8780e2d70257200547b5a7654974aa7c37ce1 ]
The size of vmalloc area depends from various factors
on boot and could be set to:
1. Default size as determined by VMALLOC_DEFAULT_SIZE macro;
2. One half of the virtual address space not occupied by
modules and fixed mappings;
3. The size provided by user with vmalloc= kernel command
line parameter;
In cases [1] and [2] the vmalloc area base address is aligned
on Region3 table type boundary, while in case [3] in might get
aligned on page boundary.
Limit the waste of page tables and always align vmalloc area
size and base address on segment boundary.
Acked-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/s390/boot/ipl_parm.c | 2 +-
arch/s390/boot/startup.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/s390/boot/ipl_parm.c b/arch/s390/boot/ipl_parm.c
index 2ab4872fbee1..b24de9aabf7d 100644
--- a/arch/s390/boot/ipl_parm.c
+++ b/arch/s390/boot/ipl_parm.c
@@ -274,7 +274,7 @@ void parse_boot_command_line(void)
memory_limit = round_down(memparse(val, NULL), PAGE_SIZE);
if (!strcmp(param, "vmalloc") && val) {
- vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE);
+ vmalloc_size = round_up(memparse(val, NULL), _SEGMENT_SIZE);
vmalloc_size_set = 1;
}
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
index 8104e0e3d188..9cc76e631759 100644
--- a/arch/s390/boot/startup.c
+++ b/arch/s390/boot/startup.c
@@ -255,7 +255,8 @@ static unsigned long setup_kernel_memory_layout(void)
VMALLOC_END = MODULES_VADDR;
/* allow vmalloc area to occupy up to about 1/2 of the rest virtual space left */
- vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, _REGION3_SIZE));
+ vsize = round_down(VMALLOC_END / 2, _SEGMENT_SIZE);
+ vmalloc_size = min(vmalloc_size, vsize);
VMALLOC_START = VMALLOC_END - vmalloc_size;
/* split remaining virtual space between 1:1 mapping & vmemmap array */
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index ea330ce921b1..e8c1f3738c39 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2935,6 +2935,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index 72eb5ed54c2a..985beb1c654d 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2935,6 +2935,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index 5b01026fff9b..bd2bb5724cc1 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2939,6 +2939,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index 4d56f6081a5d..34e230b2110b 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2871,6 +2871,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index 11c77757ead9..d55f0dd8d754 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2871,6 +2871,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Osama Muhammad <osmtendev(a)gmail.com>
[ Upstream commit 9862ec7ac1cbc6eb5ee4a045b5d5b8edbb2f7e68 ]
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb
Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index 11c77757ead9..d55f0dd8d754 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -2871,6 +2871,9 @@ static void dbAdjTree(dmtree_t * tp, int leafno, int newval)
/* is the current value the same as the old value ? if so,
* there is nothing to do.
*/
+ if (WARN_ON_ONCE(lp >= CTLTREESIZE))
+ return;
+
if (tp->dmt_stree[lp] == newval)
return;
--
2.43.0
From: Rui Zhang <zr.zhang(a)vivo.com>
[ Upstream commit 7993d3a9c34f609c02171e115fd12c10e2105ff4 ]
The use_count of a regulator should only be incremented when the
enable_count changes from 0 to 1. Similarly, the use_count should
only be decremented when the enable_count changes from 1 to 0.
In the previous implementation, use_count was sometimes decremented
to 0 when some consumer called unbalanced disable,
leading to unexpected disable even the regulator is enabled by
other consumers. With this change, the use_count accurately reflects
the number of users which the regulator is enabled.
This should make things more robust in the case where a consumer does
leak references.
Signed-off-by: Rui Zhang <zr.zhang(a)vivo.com>
Link: https://lore.kernel.org/r/20231103074231.8031-1-zr.zhang@vivo.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/regulator/core.c | 56 +++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 26 deletions(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 8ad50dc8fb35..9b1f27f87c95 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2881,7 +2881,8 @@ static int _regulator_enable(struct regulator *regulator)
/* Fallthrough on positive return values - already enabled */
}
- rdev->use_count++;
+ if (regulator->enable_count == 1)
+ rdev->use_count++;
return 0;
@@ -2956,37 +2957,40 @@ static int _regulator_disable(struct regulator *regulator)
lockdep_assert_held_once(&rdev->mutex.base);
- if (WARN(rdev->use_count <= 0,
+ if (WARN(regulator->enable_count == 0,
"unbalanced disables for %s\n", rdev_get_name(rdev)))
return -EIO;
- /* are we the last user and permitted to disable ? */
- if (rdev->use_count == 1 &&
- (rdev->constraints && !rdev->constraints->always_on)) {
-
- /* we are last user */
- if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
- ret = _notifier_call_chain(rdev,
- REGULATOR_EVENT_PRE_DISABLE,
- NULL);
- if (ret & NOTIFY_STOP_MASK)
- return -EINVAL;
-
- ret = _regulator_do_disable(rdev);
- if (ret < 0) {
- rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
- _notifier_call_chain(rdev,
- REGULATOR_EVENT_ABORT_DISABLE,
+ if (regulator->enable_count == 1) {
+ /* disabling last enable_count from this regulator */
+ /* are we the last user and permitted to disable ? */
+ if (rdev->use_count == 1 &&
+ (rdev->constraints && !rdev->constraints->always_on)) {
+
+ /* we are last user */
+ if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
+ ret = _notifier_call_chain(rdev,
+ REGULATOR_EVENT_PRE_DISABLE,
+ NULL);
+ if (ret & NOTIFY_STOP_MASK)
+ return -EINVAL;
+
+ ret = _regulator_do_disable(rdev);
+ if (ret < 0) {
+ rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
+ _notifier_call_chain(rdev,
+ REGULATOR_EVENT_ABORT_DISABLE,
+ NULL);
+ return ret;
+ }
+ _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
NULL);
- return ret;
}
- _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
- NULL);
- }
- rdev->use_count = 0;
- } else if (rdev->use_count > 1) {
- rdev->use_count--;
+ rdev->use_count = 0;
+ } else if (rdev->use_count > 1) {
+ rdev->use_count--;
+ }
}
if (ret == 0)
--
2.43.0
From: Rui Zhang <zr.zhang(a)vivo.com>
[ Upstream commit 7993d3a9c34f609c02171e115fd12c10e2105ff4 ]
The use_count of a regulator should only be incremented when the
enable_count changes from 0 to 1. Similarly, the use_count should
only be decremented when the enable_count changes from 1 to 0.
In the previous implementation, use_count was sometimes decremented
to 0 when some consumer called unbalanced disable,
leading to unexpected disable even the regulator is enabled by
other consumers. With this change, the use_count accurately reflects
the number of users which the regulator is enabled.
This should make things more robust in the case where a consumer does
leak references.
Signed-off-by: Rui Zhang <zr.zhang(a)vivo.com>
Link: https://lore.kernel.org/r/20231103074231.8031-1-zr.zhang@vivo.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/regulator/core.c | 56 +++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 26 deletions(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 34d3d8281906..c8702011b761 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2925,7 +2925,8 @@ static int _regulator_enable(struct regulator *regulator)
/* Fallthrough on positive return values - already enabled */
}
- rdev->use_count++;
+ if (regulator->enable_count == 1)
+ rdev->use_count++;
return 0;
@@ -3000,37 +3001,40 @@ static int _regulator_disable(struct regulator *regulator)
lockdep_assert_held_once(&rdev->mutex.base);
- if (WARN(rdev->use_count <= 0,
+ if (WARN(regulator->enable_count == 0,
"unbalanced disables for %s\n", rdev_get_name(rdev)))
return -EIO;
- /* are we the last user and permitted to disable ? */
- if (rdev->use_count == 1 &&
- (rdev->constraints && !rdev->constraints->always_on)) {
-
- /* we are last user */
- if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
- ret = _notifier_call_chain(rdev,
- REGULATOR_EVENT_PRE_DISABLE,
- NULL);
- if (ret & NOTIFY_STOP_MASK)
- return -EINVAL;
-
- ret = _regulator_do_disable(rdev);
- if (ret < 0) {
- rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
- _notifier_call_chain(rdev,
- REGULATOR_EVENT_ABORT_DISABLE,
+ if (regulator->enable_count == 1) {
+ /* disabling last enable_count from this regulator */
+ /* are we the last user and permitted to disable ? */
+ if (rdev->use_count == 1 &&
+ (rdev->constraints && !rdev->constraints->always_on)) {
+
+ /* we are last user */
+ if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
+ ret = _notifier_call_chain(rdev,
+ REGULATOR_EVENT_PRE_DISABLE,
+ NULL);
+ if (ret & NOTIFY_STOP_MASK)
+ return -EINVAL;
+
+ ret = _regulator_do_disable(rdev);
+ if (ret < 0) {
+ rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
+ _notifier_call_chain(rdev,
+ REGULATOR_EVENT_ABORT_DISABLE,
+ NULL);
+ return ret;
+ }
+ _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
NULL);
- return ret;
}
- _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
- NULL);
- }
- rdev->use_count = 0;
- } else if (rdev->use_count > 1) {
- rdev->use_count--;
+ rdev->use_count = 0;
+ } else if (rdev->use_count > 1) {
+ rdev->use_count--;
+ }
}
if (ret == 0)
--
2.43.0
From: Rui Zhang <zr.zhang(a)vivo.com>
[ Upstream commit 7993d3a9c34f609c02171e115fd12c10e2105ff4 ]
The use_count of a regulator should only be incremented when the
enable_count changes from 0 to 1. Similarly, the use_count should
only be decremented when the enable_count changes from 1 to 0.
In the previous implementation, use_count was sometimes decremented
to 0 when some consumer called unbalanced disable,
leading to unexpected disable even the regulator is enabled by
other consumers. With this change, the use_count accurately reflects
the number of users which the regulator is enabled.
This should make things more robust in the case where a consumer does
leak references.
Signed-off-by: Rui Zhang <zr.zhang(a)vivo.com>
Link: https://lore.kernel.org/r/20231103074231.8031-1-zr.zhang@vivo.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/regulator/core.c | 56 +++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 26 deletions(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3137e40fcd3e..a7b3e548ea5a 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2918,7 +2918,8 @@ static int _regulator_enable(struct regulator *regulator)
/* Fallthrough on positive return values - already enabled */
}
- rdev->use_count++;
+ if (regulator->enable_count == 1)
+ rdev->use_count++;
return 0;
@@ -2993,37 +2994,40 @@ static int _regulator_disable(struct regulator *regulator)
lockdep_assert_held_once(&rdev->mutex.base);
- if (WARN(rdev->use_count <= 0,
+ if (WARN(regulator->enable_count == 0,
"unbalanced disables for %s\n", rdev_get_name(rdev)))
return -EIO;
- /* are we the last user and permitted to disable ? */
- if (rdev->use_count == 1 &&
- (rdev->constraints && !rdev->constraints->always_on)) {
-
- /* we are last user */
- if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
- ret = _notifier_call_chain(rdev,
- REGULATOR_EVENT_PRE_DISABLE,
- NULL);
- if (ret & NOTIFY_STOP_MASK)
- return -EINVAL;
-
- ret = _regulator_do_disable(rdev);
- if (ret < 0) {
- rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
- _notifier_call_chain(rdev,
- REGULATOR_EVENT_ABORT_DISABLE,
+ if (regulator->enable_count == 1) {
+ /* disabling last enable_count from this regulator */
+ /* are we the last user and permitted to disable ? */
+ if (rdev->use_count == 1 &&
+ (rdev->constraints && !rdev->constraints->always_on)) {
+
+ /* we are last user */
+ if (regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS)) {
+ ret = _notifier_call_chain(rdev,
+ REGULATOR_EVENT_PRE_DISABLE,
+ NULL);
+ if (ret & NOTIFY_STOP_MASK)
+ return -EINVAL;
+
+ ret = _regulator_do_disable(rdev);
+ if (ret < 0) {
+ rdev_err(rdev, "failed to disable: %pe\n", ERR_PTR(ret));
+ _notifier_call_chain(rdev,
+ REGULATOR_EVENT_ABORT_DISABLE,
+ NULL);
+ return ret;
+ }
+ _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
NULL);
- return ret;
}
- _notifier_call_chain(rdev, REGULATOR_EVENT_DISABLE,
- NULL);
- }
- rdev->use_count = 0;
- } else if (rdev->use_count > 1) {
- rdev->use_count--;
+ rdev->use_count = 0;
+ } else if (rdev->use_count > 1) {
+ rdev->use_count--;
+ }
}
if (ret == 0)
--
2.43.0
From: Raghavendra K T <raghavendra.kt(a)amd.com>
[ Upstream commit 84db47ca7146d7bd00eb5cf2b93989a971c84650 ]
Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
NUMA Balancing allows updating PTEs to trap NUMA hinting faults if the
task had previously accessed VMA. However unconditional scan of VMAs are
allowed during initial phase of VMA creation until process's
mm numa_scan_seq reaches 2 even though current task had not accessed VMA.
Rationale:
- Without initial scan subsequent PTE update may never happen.
- Give fair opportunity to all the VMAs to be scanned and subsequently
understand the access pattern of all the VMAs.
But it has a corner case where, if a VMA is created after some time,
process's mm numa_scan_seq could be already greater than 2.
For e.g., values of mm numa_scan_seq when VMAs are created by running
mmtest autonuma benchmark briefly looks like:
start_seq=0 : 459
start_seq=2 : 138
start_seq=3 : 144
start_seq=4 : 8
start_seq=8 : 1
start_seq=9 : 1
This results in no unconditional PTE updates for those VMAs created after
some time.
Fix:
- Note down the initial value of mm numa_scan_seq in per VMA start_seq.
- Allow unconditional scan till start_seq + 2.
Result:
SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
base kernel: upstream 6.6-rc6 with Mels patches [1] applied.
kernbench
========== base patched %gain
Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*
Duration User 41404.28 41375.08
Duration System 9862.22 9768.48
Duration Elapsed 519.87 518.72
Ops NUMA PTE updates 1041416.00 831536.00
Ops NUMA hint faults 263296.00 220966.00
Ops NUMA pages migrated 258021.00 212769.00
Ops AutoNUMA cost 1328.67 1114.69
autonumabench
NUMA01_THREADLOCAL
==================
Amean elsp-NUMA01_THREADLOCAL 81.79 (0.00%) 67.74 * 17.18%*
Duration User 54832.73 47379.67
Duration System 75.00 185.75
Duration Elapsed 576.72 476.09
Ops NUMA PTE updates 394429.00 11121044.00
Ops NUMA hint faults 1001.00 8906404.00
Ops NUMA pages migrated 288.00 2998694.00
Ops AutoNUMA cost 7.77 44666.84
Signed-off-by: Raghavendra K T <raghavendra.kt(a)amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Link: https://lore.kernel.org/r/2ea7cbce80ac7c62e90cbfb9653a7972f902439f.16978166…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/linux/mm_types.h | 3 +++
kernel/sched/fair.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 957ce38768b2..950df415d7de 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -600,6 +600,9 @@ struct vma_numab_state {
*/
unsigned long pids_active[2];
+ /* MM scan sequence ID when scan first started after VMA creation */
+ int start_scan_seq;
+
/*
* MM scan sequence ID when the VMA was last completely scanned.
* A VMA is not eligible for scanning if prev_scan_seq == numa_scan_seq
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..44b5262b6657 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3164,7 +3164,7 @@ static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma)
* This is also done to avoid any side effect of task scanning
* amplifying the unfairness of disjoint set of VMAs' access.
*/
- if (READ_ONCE(current->mm->numa_scan_seq) < 2)
+ if ((READ_ONCE(current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2)
return true;
pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1];
@@ -3307,6 +3307,8 @@ static void task_numa_work(struct callback_head *work)
if (!vma->numab_state)
continue;
+ vma->numab_state->start_scan_seq = mm->numa_scan_seq;
+
vma->numab_state->next_scan = now +
msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
--
2.43.0
From: Kunwu Chan <chentao(a)kylinos.cn>
[ Upstream commit f46c8a75263f97bda13c739ba1c90aced0d3b071 ]
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Suggested-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Suggested-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Kunwu Chan <chentao(a)kylinos.cn>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20231204023223.2447523-1-chentao@kylinos.cn
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/powerpc/mm/init-common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2b656e67f2ea..927703af49be 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -65,7 +65,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
* as to leave enough 0 bits in the address to contain it. */
unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
HUGEPD_SHIFT_MASK + 1);
- struct kmem_cache *new;
+ struct kmem_cache *new = NULL;
/* It would be nice if this was a BUILD_BUG_ON(), but at the
* moment, gcc doesn't seem to recognize is_power_of_2 as a
@@ -78,7 +78,8 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
align = max_t(unsigned long, align, minalign);
name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
- new = kmem_cache_create(name, table_size, align, 0, ctor);
+ if (name)
+ new = kmem_cache_create(name, table_size, align, 0, ctor);
if (!new)
panic("Could not allocate pgtable cache for order %d", shift);
--
2.43.0
From: Kunwu Chan <chentao(a)kylinos.cn>
[ Upstream commit f46c8a75263f97bda13c739ba1c90aced0d3b071 ]
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Suggested-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Suggested-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Kunwu Chan <chentao(a)kylinos.cn>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20231204023223.2447523-1-chentao@kylinos.cn
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/powerpc/mm/init-common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index a84da92920f7..e7b9cc90fd9e 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -104,7 +104,7 @@ void pgtable_cache_add(unsigned int shift)
* as to leave enough 0 bits in the address to contain it. */
unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
HUGEPD_SHIFT_MASK + 1);
- struct kmem_cache *new;
+ struct kmem_cache *new = NULL;
/* It would be nice if this was a BUILD_BUG_ON(), but at the
* moment, gcc doesn't seem to recognize is_power_of_2 as a
@@ -117,7 +117,8 @@ void pgtable_cache_add(unsigned int shift)
align = max_t(unsigned long, align, minalign);
name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
- new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
+ if (name)
+ new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
if (!new)
panic("Could not allocate pgtable cache for order %d", shift);
--
2.43.0
Return value of 'to_amdgpu_crtc' which is container_of(...) can't be
null, so it's null check 'acrtc' is dropped.
Fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9302 amdgpu_dm_atomic_commit_tail() error: we previously assumed 'acrtc' could be null (see line 9299)
Add 'new_crtc_state'NULL check for function
'drm_atomic_get_new_crtc_state' that retrieves the new state for a CRTC,
while enabling writeback requests.
Cc: stable(a)vger.kernel.org
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 95ff3800fc87..8eb381d5f6b8 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9294,10 +9294,10 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
if (!new_con_state->writeback_job)
continue;
- new_crtc_state = NULL;
+ new_crtc_state = drm_atomic_get_new_crtc_state(state, &acrtc->base);
- if (acrtc)
- new_crtc_state = drm_atomic_get_new_crtc_state(state, &acrtc->base);
+ if (!new_crtc_state)
+ continue;
if (acrtc->wb_enabled)
continue;
--
2.34.1
This is the start of the stable review cycle for the 5.10.208 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 15 Jan 2024 09:41:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.208-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.208-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "nvme: use command_id instead of req->tag in trace_nvme_complete_rq()"
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Disable ATS for specific Intel IPU E2000 devices
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Extract ATS disabling to a helper function
Phil Sutter <phil(a)nwl.cc>
netfilter: nf_tables: Reject tables of unsupported family
Wander Lairson Costa <wander(a)redhat.com>
drm/qxl: fix UAF on handle creation
Jon Maxwell <jmaxwell37(a)gmail.com>
ipv6: remove max_size check inline with ipv4
John Fastabend <john.fastabend(a)gmail.com>
net: tls, update curr on splice as well
Aditya Gupta <adityag(a)linux.ibm.com>
powerpc: update ppc_save_regs to save current r1 in pt_regs
Wenchao Chen <wenchao.chen(a)unisoc.com>
mmc: sdhci-sprd: Fix eMMC init failure after hw reset
Geert Uytterhoeven <geert+renesas(a)glider.be>
mmc: core: Cancel delayed work before releasing host
Jorge Ramirez-Ortiz <jorge(a)foundries.io>
mmc: rpmb: fixes pause retune on all RPMB partitions.
Ziyang Huang <hzyitc(a)outlook.com>
mmc: meson-mx-sdhc: Fix initialization frozen issue
Jiajun Xie <jiajun.xie.sh(a)gmail.com>
mm: fix unmap_mapping_range high bits shift bug
Benjamin Bara <benjamin.bara(a)skidata.com>
i2c: core: Fix atomic xfer check for non-preempt config
Jinghao Jia <jinghao7(a)illinois.edu>
x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
Matthew Wilcox (Oracle) <willy(a)infradead.org>
mm/memory-failure: check the mapcount of the precise page
Thomas Lange <thomas(a)corelatus.se>
net: Implement missing SO_TIMESTAMPING_NEW cmsg support
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
Chen Ni <nichen(a)iscas.ac.cn>
asix: Add check for usbnet_get_endpoints
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
net/qla3xxx: switch from 'pci_' to 'dma_' API
Andrii Staikov <andrii.staikov(a)intel.com>
i40e: Restore VF MSI-X state during PCI reset
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-toacodec: Fix event generation
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-tohdmitx: Validate written enum values
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-toacodec: Validate written enum values
Ke Xiao <xiaoke(a)sangfor.com.cn>
i40e: fix use-after-free in i40e_aqc_add_filters()
Marc Dionne <marc.dionne(a)auristor.com>
net: Save and restore msg_namelen in sock_sendmsg
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_immediate: drop chain reference counter on error
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nftables: add loop check helper function
Adrian Cinal <adriancinal(a)gmail.com>
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Zhipeng Lu <alexious(a)zju.edu.cn>
sfc: fix a double-free bug in efx_probe_filters
Stefan Wahren <wahrenst(a)gmx.net>
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
Jörn-Thorben Hinz <jthinz(a)mailbox.tu-berlin.de>
net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
Hangyu Hua <hbh25y(a)gmail.com>
net: sched: em_text: fix possible memory leak in em_text_destroy()
Sudheer Mogilappagari <sudheer.mogilappagari(a)intel.com>
i40e: Fix filter input checks to prevent config with invalid values
Khaled Almahallawy <khaled.almahallawy(a)intel.com>
drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern
Suman Ghosh <sumang(a)marvell.com>
octeontx2-af: Fix marking couple of structure as __packed
Siddh Raman Pant <code(a)siddh.me>
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
Siddhesh Dharme <siddheshdharme18(a)gmail.com>
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6
Sarthak Kukreti <sarthakkukreti(a)chromium.org>
block: Don't invalidate pagecache for invalid falloc modes
Edward Adam Davis <eadavis(a)qq.com>
keys, dns: Fix missing size check of V1 server-list header
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-sunxi/mc_smp.c | 4 +-
arch/powerpc/kernel/ppc_save_regs.S | 6 +-
arch/x86/kernel/kprobes/core.c | 3 +-
drivers/firewire/ohci.c | 51 ++++++
drivers/gpu/drm/i915/display/intel_dp.c | 2 +-
drivers/gpu/drm/qxl/qxl_drv.h | 2 +-
drivers/gpu/drm/qxl/qxl_dumb.c | 5 +-
drivers/gpu/drm/qxl/qxl_gem.c | 25 ++-
drivers/gpu/drm/qxl/qxl_ioctl.c | 6 +-
drivers/i2c/i2c-core.h | 4 +-
drivers/mmc/core/block.c | 7 +-
drivers/mmc/core/host.c | 1 +
drivers/mmc/host/meson-mx-sdhc-mmc.c | 26 +--
drivers/mmc/host/sdhci-sprd.c | 10 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 3 +
drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 +-
drivers/net/ethernet/qlogic/qla3xxx.c | 198 +++++++++------------
drivers/net/ethernet/sfc/rx_common.c | 4 +-
drivers/net/usb/ax88172a.c | 4 +-
drivers/nvme/host/trace.h | 2 +-
drivers/pci/quirks.c | 28 ++-
fs/block_dev.c | 21 ++-
include/net/dst_ops.h | 2 +-
mm/memory-failure.c | 6 +-
mm/memory.c | 4 +-
net/core/dst.c | 8 +-
net/core/sock.c | 12 +-
net/dns_resolver/dns_key.c | 19 +-
net/ipv6/route.c | 13 +-
net/netfilter/nf_tables_api.c | 57 +++++-
net/netfilter/nft_immediate.c | 2 +-
net/nfc/llcp_core.c | 39 +++-
net/sched/em_text.c | 4 +-
net/socket.c | 2 +
net/tls/tls_sw.c | 2 +
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/meson/g12a-toacodec.c | 5 +-
sound/soc/meson/g12a-tohdmitx.c | 8 +-
43 files changed, 429 insertions(+), 228 deletions(-)
This is the start of the stable review cycle for the 6.1.73 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 15 Jan 2024 09:41:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.73-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.73-rc1
Jon Maxwell <jmaxwell37(a)gmail.com>
ipv6: remove max_size check inline with ipv4
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "nfsd: separate nfsd_last_thread() from nfsd_put()"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "nfsd: call nfsd_last_thread() before final nfsd_put()"
Steve French <stfrench(a)microsoft.com>
cifs: fix flushing folio regression for 6.1 backport
-------------
Diffstat:
Makefile | 4 ++--
fs/nfsd/nfsctl.c | 9 ++-------
fs/nfsd/nfsd.h | 8 +-------
fs/nfsd/nfssvc.c | 52 ++++++++++++++++++++++++++++++++------------------
fs/smb/client/cifsfs.c | 2 +-
include/net/dst_ops.h | 2 +-
net/core/dst.c | 8 ++------
net/ipv6/route.c | 13 +++++--------
8 files changed, 47 insertions(+), 51 deletions(-)
This is the start of the stable review cycle for the 6.6.12 release.
There are 1 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 15 Jan 2024 09:41:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.12-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.12-rc1
Jeff Layton <jlayton(a)kernel.org>
nfsd: drop the nfsd_put helper
-------------
Diffstat:
Makefile | 4 ++--
fs/nfsd/nfsctl.c | 31 +++++++++++++++++--------------
fs/nfsd/nfsd.h | 7 -------
3 files changed, 19 insertions(+), 23 deletions(-)
This is the start of the stable review cycle for the 5.4.267 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 15 Jan 2024 09:41:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.267-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.267-rc1
Jon Maxwell <jmaxwell37(a)gmail.com>
ipv6: remove max_size check inline with ipv4
Eric Dumazet <edumazet(a)google.com>
ipv6: make ip6_rt_gc_expire an atomic_t
Eric Dumazet <edumazet(a)google.com>
net/dst: use a smaller percpu_counter batch for dst entries accounting
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Disable ATS for specific Intel IPU E2000 devices
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Extract ATS disabling to a helper function
Phil Sutter <phil(a)nwl.cc>
netfilter: nf_tables: Reject tables of unsupported family
John Fastabend <john.fastabend(a)gmail.com>
net: tls, update curr on splice as well
Douglas Anderson <dianders(a)chromium.org>
ath10k: Get rid of "per_ce_irq" hw param
Douglas Anderson <dianders(a)chromium.org>
ath10k: Keep track of which interrupts fired, don't poll them
Rakesh Pillai <pillair(a)codeaurora.org>
ath10k: Add interrupt summary based CE processing
Douglas Anderson <dianders(a)chromium.org>
ath10k: Wait until copy complete is actually done before completing
Wenchao Chen <wenchao.chen(a)unisoc.com>
mmc: sdhci-sprd: Fix eMMC init failure after hw reset
Geert Uytterhoeven <geert+renesas(a)glider.be>
mmc: core: Cancel delayed work before releasing host
Jorge Ramirez-Ortiz <jorge(a)foundries.io>
mmc: rpmb: fixes pause retune on all RPMB partitions.
Jiajun Xie <jiajun.xie.sh(a)gmail.com>
mm: fix unmap_mapping_range high bits shift bug
Benjamin Bara <benjamin.bara(a)skidata.com>
i2c: core: Fix atomic xfer check for non-preempt config
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
Matthew Wilcox (Oracle) <willy(a)infradead.org>
mm/memory-failure: check the mapcount of the precise page
Thomas Lange <thomas(a)corelatus.se>
net: Implement missing SO_TIMESTAMPING_NEW cmsg support
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
Chen Ni <nichen(a)iscas.ac.cn>
asix: Add check for usbnet_get_endpoints
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
net/qla3xxx: switch from 'pci_' to 'dma_' API
Andrii Staikov <andrii.staikov(a)intel.com>
i40e: Restore VF MSI-X state during PCI reset
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux
Mark Brown <broonie(a)kernel.org>
ASoC: meson: g12a-tohdmitx: Validate written enum values
Jerome Brunet <jbrunet(a)baylibre.com>
ASoC: meson: g12a: extract codec-to-codec utils
Ke Xiao <xiaoke(a)sangfor.com.cn>
i40e: fix use-after-free in i40e_aqc_add_filters()
Marc Dionne <marc.dionne(a)auristor.com>
net: Save and restore msg_namelen in sock_sendmsg
Adrian Cinal <adriancinal(a)gmail.com>
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Stefan Wahren <wahrenst(a)gmx.net>
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
Vadim Fedorenko <vadfed(a)meta.com>
net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps
Marc Kleine-Budde <mkl(a)pengutronix.de>
can: raw: add support for SO_MARK
Marc Kleine-Budde <mkl(a)pengutronix.de>
can: raw: add support for SO_TXTIME/SCM_TXTIME
Jörn-Thorben Hinz <jthinz(a)mailbox.tu-berlin.de>
net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
Hangyu Hua <hbh25y(a)gmail.com>
net: sched: em_text: fix possible memory leak in em_text_destroy()
Sudheer Mogilappagari <sudheer.mogilappagari(a)intel.com>
i40e: Fix filter input checks to prevent config with invalid values
Siddh Raman Pant <code(a)siddh.me>
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-sunxi/mc_smp.c | 4 +-
drivers/firewire/ohci.c | 51 +++++
drivers/i2c/i2c-core.h | 4 +-
drivers/mmc/core/block.c | 7 +-
drivers/mmc/core/host.c | 1 +
drivers/mmc/host/sdhci-sprd.c | 10 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 ++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 198 ++++++++----------
drivers/net/usb/ax88172a.c | 4 +-
drivers/net/wireless/ath/ath10k/ce.c | 79 +++----
drivers/net/wireless/ath/ath10k/ce.h | 15 +-
drivers/net/wireless/ath/ath10k/core.c | 13 --
drivers/net/wireless/ath/ath10k/hw.h | 3 -
drivers/net/wireless/ath/ath10k/snoc.c | 19 +-
drivers/net/wireless/ath/ath10k/snoc.h | 1 +
drivers/pci/quirks.c | 28 ++-
include/net/dst_ops.h | 6 +-
include/net/netns/ipv6.h | 4 +-
mm/memory-failure.c | 6 +-
mm/memory.c | 4 +-
net/can/raw.c | 12 +-
net/core/dst.c | 12 +-
net/core/sock.c | 12 +-
net/ipv4/ip_output.c | 2 +-
net/ipv6/ip6_output.c | 2 +-
net/ipv6/route.c | 25 +--
net/netfilter/nf_tables_api.c | 27 +++
net/nfc/llcp_core.c | 39 +++-
net/sched/em_text.c | 4 +-
net/socket.c | 2 +
net/tls/tls_sw.c | 2 +
sound/soc/meson/Kconfig | 4 +
sound/soc/meson/Makefile | 2 +
sound/soc/meson/g12a-tohdmitx.c | 227 +++++----------------
sound/soc/meson/meson-codec-glue.c | 149 ++++++++++++++
sound/soc/meson/meson-codec-glue.h | 32 +++
41 files changed, 658 insertions(+), 412 deletions(-)
From: Tony Krowiak <akrowiak(a)linux.ibm.com>
When a queue is unbound from the vfio_ap device driver, it is reset to
ensure its crypto data is not leaked when it is bound to another device
driver. If the queue is unbound due to the fact that the adapter or domain
was removed from the host's AP configuration, then attempting to reset it
will fail with response code 01 (APID not valid) getting returned from the
reset command. Let's ensure that the queue is assigned to the host's
configuration before resetting it.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne(a)linux.ibm.com>
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 550c936c413d..983b3b16196c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2215,10 +2215,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
q = dev_get_drvdata(&apdev->device);
get_update_locks_for_queue(q);
matrix_mdev = q->matrix_mdev;
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) {
- apid = AP_QID_CARD(q->apqn);
- apqi = AP_QID_QUEUE(q->apqn);
/* If the queue is assigned to the guest's AP configuration */
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
@@ -2234,8 +2234,16 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
}
}
- vfio_ap_mdev_reset_queue(q);
- flush_work(&q->reset_work);
+ /*
+ * If the queue is not in the host's AP configuration, then resetting
+ * it will fail with response code 01, (APQN not valid); so, let's make
+ * sure it is in the host's config.
+ */
+ if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) &&
+ test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) {
+ vfio_ap_mdev_reset_queue(q);
+ flush_work(&q->reset_work);
+ }
done:
if (matrix_mdev)
--
2.43.0
From: Tony Krowiak <akrowiak(a)linux.ibm.com>
When a queue is unbound from the vfio_ap device driver, if that queue is
assigned to a guest's AP configuration, its associated adapter is removed
because queues are defined to a guest via a matrix of adapters and
domains; so, it is not possible to remove a single queue.
If an adapter is removed from the guest's AP configuration, all associated
queues must be reset to prevent leaking crypto data should any of them be
assigned to a different guest or device driver. The one caveat is that if
the queue is being removed because the adapter or domain has been removed
from the host's AP configuration, then an attempt to reset the queue will
fail with response code 01, AP-queue number not valid; so resetting these
queues should be skipped.
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Fixes: 09d31ff78793 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 76 +++++++++++++++++--------------
1 file changed, 41 insertions(+), 35 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 44cd29aace8e..550c936c413d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -939,45 +939,45 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
AP_MKQID(apid, apqi));
}
+static void collect_queues_to_reset(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid,
+ struct list_head *qlist)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apqi;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ if (q)
+ list_add_tail(&q->reset_qnode, qlist);
+ }
+}
+
+static void reset_queues_for_apid(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ struct list_head qlist;
+
+ INIT_LIST_HEAD(&qlist);
+ collect_queues_to_reset(matrix_mdev, apid, &qlist);
+ vfio_ap_mdev_reset_qlist(&qlist);
+}
+
static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev,
unsigned long *apm_reset)
{
- struct vfio_ap_queue *q, *tmpq;
struct list_head qlist;
- unsigned long apid, apqi;
- int apqn, ret = 0;
+ unsigned long apid;
if (bitmap_empty(apm_reset, AP_DEVICES))
return 0;
INIT_LIST_HEAD(&qlist);
- for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) {
- for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
- AP_DOMAINS) {
- /*
- * If the domain is not in the host's AP configuration,
- * then resetting it will fail with response code 01
- * (APQN not valid).
- */
- if (!test_bit_inv(apqi,
- (unsigned long *)matrix_dev->info.aqm))
- continue;
-
- apqn = AP_MKQID(apid, apqi);
- q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
-
- if (q)
- list_add_tail(&q->reset_qnode, &qlist);
- }
- }
+ for_each_set_bit_inv(apid, apm_reset, AP_DEVICES)
+ collect_queues_to_reset(matrix_mdev, apid, &qlist);
- ret = vfio_ap_mdev_reset_qlist(&qlist);
-
- list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode)
- list_del(&q->reset_qnode);
-
- return ret;
+ return vfio_ap_mdev_reset_qlist(&qlist);
}
/**
@@ -2217,24 +2217,30 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
matrix_mdev = q->matrix_mdev;
if (matrix_mdev) {
- vfio_ap_unlink_queue_fr_mdev(q);
-
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
-
- /*
- * If the queue is assigned to the guest's APCB, then remove
- * the adapter's APID from the APCB and hot it into the guest.
- */
+ /* If the queue is assigned to the guest's AP configuration */
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+ /*
+ * Since the queues are defined via a matrix of adapters
+ * and domains, it is not possible to hot unplug a
+ * single queue; so, let's unplug the adapter.
+ */
clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apid(matrix_mdev, apid);
+ goto done;
}
}
vfio_ap_mdev_reset_queue(q);
flush_work(&q->reset_work);
+
+done:
+ if (matrix_mdev)
+ vfio_ap_unlink_queue_fr_mdev(q);
+
dev_set_drvdata(&apdev->device, NULL);
kfree(q);
release_update_locks_for_mdev(matrix_mdev);
--
2.43.0
From: Tony Krowiak <akrowiak(a)linux.ibm.com>
While filtering the mdev matrix, it doesn't make sense - and will have
unexpected results - to filter an APID from the matrix if the APID or one
of the associated APQIs is not in the host's AP configuration. There are
two reasons for this:
1. An adapter or domain that is not in the host's AP configuration can be
assigned to the matrix; this is known as over-provisioning. Queue
devices, however, are only created for adapters and domains in the
host's AP configuration, so there will be no queues associated with an
over-provisioned adapter or domain to filter.
2. The adapter or domain may have been externally removed from the host's
configuration via an SE or HMC attached to a DPM enabled LPAR. In this
case, the vfio_ap device driver would have been notified by the AP bus
via the on_config_changed callback and the adapter or domain would
have already been filtered.
Since the matrix_mdev->shadow_apcb.apm and matrix_mdev->shadow_apcb.aqm are
copied from the mdev matrix sans the APIDs and APQIs not in the host's AP
configuration, let's loop over those bitmaps instead of those assigned to
the matrix.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1f7a6c106786..e825e13847fe 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -695,8 +695,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
(unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS) {
/*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
--
2.43.0
From: Tony Krowiak <akrowiak(a)linux.ibm.com>
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or
domain is assigned to the mdev. The purpose of the function is to update
the guest's AP configuration by filtering the matrix of adapters and
domains assigned to the mdev. When an adapter or domain is assigned, only
the APQNs associated with the APID of the new adapter or APQI of the new
domain are inspected. If an APQN does not reference a queue device bound to
the vfio_ap device driver, then it's APID will be filtered from the mdev's
matrix when updating the guest's AP configuration.
Inspecting only the APID of the new adapter or APQI of the new domain will
result in passing AP queues through to a guest that are not bound to the
vfio_ap device driver under certain circumstances. Consider the following:
guest's AP configuration (all also assigned to the mdev's matrix):
14.0004
14.0005
14.0006
16.0004
16.0005
16.0006
unassign domain 4
unbind queue 16.0005
assign domain 4
When domain 4 is re-assigned, since only domain 4 will be inspected, the
APQNs that will be examined will be:
14.0004
16.0004
Since both of those APQNs reference queue devices that are bound to the
vfio_ap device driver, nothing will get filtered from the mdev's matrix
when updating the guest's AP configuration. Consequently, queue 16.0005
will get passed through despite not being bound to the driver. This
violates the linux device model requirement that a guest shall only be
given access to devices bound to the device driver facilitating their
pass-through.
To resolve this problem, every adapter and domain assigned to the mdev will
be inspected when filtering the mdev's matrix.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 57 +++++++++----------------------
1 file changed, 17 insertions(+), 40 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index acb710d3d7bc..1f7a6c106786 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -674,8 +674,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
* Return: a boolean value indicating whether the KVM guest's APCB was changed
* by the filtering or not.
*/
-static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
- struct ap_matrix_mdev *matrix_mdev)
+static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
{
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES);
@@ -696,8 +695,8 @@ static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
(unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, apm, AP_DEVICES) {
- for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
/*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
@@ -962,7 +961,6 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
- DECLARE_BITMAP(apm_delta, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -991,11 +989,8 @@ static ssize_t assign_adapter_store(struct device *dev,
}
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- memset(apm_delta, 0, sizeof(apm_delta));
- set_bit_inv(apid, apm_delta);
- if (vfio_ap_mdev_filter_matrix(apm_delta,
- matrix_mdev->matrix.aqm, matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count;
@@ -1171,7 +1166,6 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
- DECLARE_BITMAP(aqm_delta, AP_DOMAINS);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -1200,11 +1194,8 @@ static ssize_t assign_domain_store(struct device *dev,
}
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- memset(aqm_delta, 0, sizeof(aqm_delta));
- set_bit_inv(apqi, aqm_delta);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, aqm_delta,
- matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count;
@@ -2109,9 +2100,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
if (matrix_mdev) {
vfio_ap_mdev_link_queue(matrix_mdev, q);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
dev_set_drvdata(&apdev->device, q);
@@ -2461,34 +2450,22 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
{
- bool do_hotplug = false;
- int filter_domains = 0;
- int filter_adapters = 0;
- DECLARE_BITMAP(apm, AP_DEVICES);
- DECLARE_BITMAP(aqm, AP_DOMAINS);
+ bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock);
mutex_lock(&matrix_dev->mdevs_lock);
- filter_adapters = bitmap_and(apm, matrix_mdev->matrix.apm,
- matrix_mdev->apm_add, AP_DEVICES);
- filter_domains = bitmap_and(aqm, matrix_mdev->matrix.aqm,
- matrix_mdev->aqm_add, AP_DOMAINS);
-
- if (filter_adapters && filter_domains)
- do_hotplug |= vfio_ap_mdev_filter_matrix(apm, aqm, matrix_mdev);
- else if (filter_adapters)
- do_hotplug |=
- vfio_ap_mdev_filter_matrix(apm,
- matrix_mdev->shadow_apcb.aqm,
- matrix_mdev);
- else
- do_hotplug |=
- vfio_ap_mdev_filter_matrix(matrix_mdev->shadow_apcb.apm,
- aqm, matrix_mdev);
+ filter_adapters = bitmap_intersects(matrix_mdev->matrix.apm,
+ matrix_mdev->apm_add, AP_DEVICES);
+ filter_domains = bitmap_intersects(matrix_mdev->matrix.aqm,
+ matrix_mdev->aqm_add, AP_DOMAINS);
+ filter_cdoms = bitmap_intersects(matrix_mdev->matrix.adm,
+ matrix_mdev->adm_add, AP_DOMAINS);
+
+ if (filter_adapters || filter_domains)
+ do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
- if (bitmap_intersects(matrix_mdev->matrix.adm, matrix_mdev->adm_add,
- AP_DOMAINS))
+ if (filter_cdoms)
do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
if (do_hotplug)
--
2.43.0
I'm announcing the release of the 6.6.12 kernel.
All users of the 6.6 kernel series must upgrade.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/nfsd/nfsctl.c | 31 +++++++++++++++++--------------
fs/nfsd/nfsd.h | 7 -------
3 files changed, 18 insertions(+), 22 deletions(-)
Greg Kroah-Hartman (1):
Linux 6.6.12
Jeff Layton (1):
nfsd: drop the nfsd_put helper
I'm announcing the release of the 6.1.73 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
fs/nfsd/nfsctl.c | 9 +-------
fs/nfsd/nfsd.h | 8 -------
fs/nfsd/nfssvc.c | 52 +++++++++++++++++++++++++++++++------------------
fs/smb/client/cifsfs.c | 2 -
include/net/dst_ops.h | 2 -
net/core/dst.c | 8 +------
net/ipv6/route.c | 13 ++++--------
8 files changed, 46 insertions(+), 50 deletions(-)
Greg Kroah-Hartman (3):
Revert "nfsd: call nfsd_last_thread() before final nfsd_put()"
Revert "nfsd: separate nfsd_last_thread() from nfsd_put()"
Linux 6.1.73
Jon Maxwell (1):
ipv6: remove max_size check inline with ipv4
Steve French (1):
cifs: fix flushing folio regression for 6.1 backport
I'm announcing the release of the 4.19.305 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/mach-sunxi/mc_smp.c | 4
drivers/firewire/ohci.c | 51 +++++
drivers/mmc/core/block.c | 7
drivers/mmc/core/host.c | 1
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4
drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +++
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 3
drivers/net/ethernet/qlogic/qla3xxx.c | 198 +++++++++------------
drivers/net/usb/ax88172a.c | 4
drivers/pci/quirks.c | 28 ++
fs/fuse/dir.c | 10 -
include/net/dst_ops.h | 6
include/net/netns/ipv6.h | 4
mm/memory-failure.c | 6
mm/memory.c | 4
net/core/dst.c | 8
net/ipv6/route.c | 25 +-
net/netfilter/nf_tables_api.c | 27 ++
net/nfc/llcp_core.c | 39 +++-
net/sched/em_text.c | 4
net/socket.c | 2
24 files changed, 330 insertions(+), 156 deletions(-)
Adrian Cinal (1):
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Andrii Staikov (1):
i40e: Restore VF MSI-X state during PCI reset
Bartosz Pawlowski (2):
PCI: Extract ATS disabling to a helper function
PCI: Disable ATS for specific Intel IPU E2000 devices
Chen Ni (1):
asix: Add check for usbnet_get_endpoints
Christophe JAILLET (1):
net/qla3xxx: switch from 'pci_' to 'dma_' API
Dinghao Liu (1):
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
Eric Dumazet (2):
net/dst: use a smaller percpu_counter batch for dst entries accounting
ipv6: make ip6_rt_gc_expire an atomic_t
Geert Uytterhoeven (1):
mmc: core: Cancel delayed work before releasing host
Greg Kroah-Hartman (1):
Linux 4.19.305
Hangyu Hua (1):
net: sched: em_text: fix possible memory leak in em_text_destroy()
Jiajun Xie (1):
mm: fix unmap_mapping_range high bits shift bug
Jon Maxwell (1):
ipv6: remove max_size check inline with ipv4
Jorge Ramirez-Ortiz (1):
mmc: rpmb: fixes pause retune on all RPMB partitions.
Ke Xiao (1):
i40e: fix use-after-free in i40e_aqc_add_filters()
Marc Dionne (1):
net: Save and restore msg_namelen in sock_sendmsg
Matthew Wilcox (Oracle) (1):
mm/memory-failure: check the mapcount of the precise page
Michael Chan (1):
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
Peter Oskolkov (1):
net: add a route cache full diagnostic message
Phil Sutter (1):
netfilter: nf_tables: Reject tables of unsupported family
Siddh Raman Pant (1):
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
Stefan Wahren (1):
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
Sudheer Mogilappagari (1):
i40e: Fix filter input checks to prevent config with invalid values
Takashi Sakamoto (1):
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
ruanmeisi (1):
fuse: nlookup missing decrement in fuse_direntplus_link
This is the start of the stable review cycle for the 4.19.305 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 15 Jan 2024 09:41:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.305-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.305-rc1
Jon Maxwell <jmaxwell37(a)gmail.com>
ipv6: remove max_size check inline with ipv4
Eric Dumazet <edumazet(a)google.com>
ipv6: make ip6_rt_gc_expire an atomic_t
Eric Dumazet <edumazet(a)google.com>
net/dst: use a smaller percpu_counter batch for dst entries accounting
Peter Oskolkov <posk(a)google.com>
net: add a route cache full diagnostic message
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Disable ATS for specific Intel IPU E2000 devices
Bartosz Pawlowski <bartosz.pawlowski(a)intel.com>
PCI: Extract ATS disabling to a helper function
Phil Sutter <phil(a)nwl.cc>
netfilter: nf_tables: Reject tables of unsupported family
ruanmeisi <ruan.meisi(a)zte.com.cn>
fuse: nlookup missing decrement in fuse_direntplus_link
Geert Uytterhoeven <geert+renesas(a)glider.be>
mmc: core: Cancel delayed work before releasing host
Jorge Ramirez-Ortiz <jorge(a)foundries.io>
mmc: rpmb: fixes pause retune on all RPMB partitions.
Jiajun Xie <jiajun.xie.sh(a)gmail.com>
mm: fix unmap_mapping_range high bits shift bug
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
Matthew Wilcox (Oracle) <willy(a)infradead.org>
mm/memory-failure: check the mapcount of the precise page
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
Chen Ni <nichen(a)iscas.ac.cn>
asix: Add check for usbnet_get_endpoints
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
net/qla3xxx: switch from 'pci_' to 'dma_' API
Andrii Staikov <andrii.staikov(a)intel.com>
i40e: Restore VF MSI-X state during PCI reset
Ke Xiao <xiaoke(a)sangfor.com.cn>
i40e: fix use-after-free in i40e_aqc_add_filters()
Marc Dionne <marc.dionne(a)auristor.com>
net: Save and restore msg_namelen in sock_sendmsg
Adrian Cinal <adriancinal(a)gmail.com>
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Stefan Wahren <wahrenst(a)gmx.net>
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
Hangyu Hua <hbh25y(a)gmail.com>
net: sched: em_text: fix possible memory leak in em_text_destroy()
Sudheer Mogilappagari <sudheer.mogilappagari(a)intel.com>
i40e: Fix filter input checks to prevent config with invalid values
Siddh Raman Pant <code(a)siddh.me>
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-sunxi/mc_smp.c | 4 +-
drivers/firewire/ohci.c | 51 ++++++
drivers/mmc/core/block.c | 7 +-
drivers/mmc/core/host.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 198 +++++++++------------
drivers/net/usb/ax88172a.c | 4 +-
drivers/pci/quirks.c | 28 ++-
fs/fuse/dir.c | 10 +-
include/net/dst_ops.h | 6 +-
include/net/netns/ipv6.h | 4 +-
mm/memory-failure.c | 6 +-
mm/memory.c | 4 +-
net/core/dst.c | 8 +-
net/ipv6/route.c | 25 +--
net/netfilter/nf_tables_api.c | 27 +++
net/nfc/llcp_core.c | 39 +++-
net/sched/em_text.c | 4 +-
net/socket.c | 2 +
24 files changed, 331 insertions(+), 157 deletions(-)
On Mon, Jan 15, 2024 at 12:19:06PM +0500, Марк Коренберг wrote:
> Kernel 6.6.9-200.fc39.x86_64
>
> The following bash script demonstrates the problem (run under root):
>
> ```
> #!/bin/bash
>
> set -e -u -x
>
> # Some cleanups
> ip netns delete myspace || :
> ip link del qweqwe1 || :
>
> # The bug happens only with physical interfaces, not with, say, dummy one
> ip link property add dev enp0s20f0u2 altname myname
> ip netns add myspace
> ip link set enp0s20f0u2 netns myspace
>
> # add dummy interface + set the same altname as in background namespace.
> ip link add name qweqwe1 type dummy
> ip link property add dev qweqwe1 altname myname
>
> # Trigger the bug. The kernel will try to return ethernet interface
> back to root namespace, but it can not, because of conflicting
> altnames.
> ip netns delete myspace
>
> # now `ip link` will hang forever !!!!!
> ```
>
> I think, the problem is obvious. Althougn I don't know how to fix.
> Remove conflicting altnames for interfaces that returns from killed
> namespaces ?
As this can only be triggered by root, not much for us to do here,
perhaps discuss it on the netdev mailing list for all network developers
to work on?
> On kernel 6.3.8 (at least) was another bug, that allows dulicate
> altnames, and it was fixed mainline somewhere. I have another script
> to trigger the bug on these old kernels. I did not bisect.
If this is an issue on 6.1.y, that would be good to know so that we can
try to fix the issue there if bisection can find it. Care to share the
script so that I can test?
thanks,
greg k-h
From: Stefan Hajnoczi <stefanha(a)redhat.com>
[ Upstream commit b8e0792449928943c15d1af9f63816911d139267 ]
Commit 4e0400525691 ("virtio-blk: support polling I/O") triggers the
following gcc 13 W=1 warnings:
drivers/block/virtio_blk.c: In function ‘init_vq’:
drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~
drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~
drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a false positive because the lower bound -2147483648 is
incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs <
65536.
The code mixes int, unsigned short, and unsigned int types in addition
to using "%d" for an unsigned value. Use unsigned short and "%u"
consistently to solve the compiler warning.
Cc: Suwan Kim <suwan.kim027(a)gmail.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/
Signed-off-by: Stefan Hajnoczi <stefanha(a)redhat.com>
Message-Id: <20231204140743.1487843-1-stefanha(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/block/virtio_blk.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index efa5535a8e1d8..3124837aa406f 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -609,12 +609,12 @@ static void virtblk_config_changed(struct virtio_device *vdev)
static int init_vq(struct virtio_blk *vblk)
{
int err;
- int i;
+ unsigned short i;
vq_callback_t **callbacks;
const char **names;
struct virtqueue **vqs;
unsigned short num_vqs;
- unsigned int num_poll_vqs;
+ unsigned short num_poll_vqs;
struct virtio_device *vdev = vblk->vdev;
struct irq_affinity desc = { 0, };
@@ -658,13 +658,13 @@ static int init_vq(struct virtio_blk *vblk)
for (i = 0; i < num_vqs - num_poll_vqs; i++) {
callbacks[i] = virtblk_done;
- snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
+ snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%u", i);
names[i] = vblk->vqs[i].name;
}
for (; i < num_vqs; i++) {
callbacks[i] = NULL;
- snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
+ snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%u", i);
names[i] = vblk->vqs[i].name;
}
--
2.43.0
On Sat, Jan 13, 2024 at 11:08:00AM -0600, Steve French wrote:
> I thought that it was "safer" since if it was misapplied to version where
> new folio rc behavior it wouldn't regress anything
There are only three versions where this patch can be applied: 6.7, 6.6
and 6.1. AIUI it's a backport from 6.7, it's already applied to 6.6,
and it misapplies to 6.1. So this kind of belt-and-braces approach is
unnecessary.
With 5.10LTS (e.g., 5.10.206), on a machine using an NVME device, the
following tracing commands will trigger a crash due to a NULL pointer
dereference:
KDIR=/sys/kernel/debug/tracing
echo 1 > $KDIR/tracing_on
echo 1 > $KDIR/events/nvme/enable
echo "Waiting for trace events..."
cat $KDIR/trace_pipe
The backtrace looks something like this:
Call Trace:
<IRQ>
? __die_body+0x6b/0xb0
? __die+0x9e/0xb0
? no_context+0x3eb/0x460
? ttwu_do_activate+0xf0/0x120
? __bad_area_nosemaphore+0x157/0x200
? select_idle_sibling+0x2f/0x410
? bad_area_nosemaphore+0x13/0x20
? do_user_addr_fault+0x2ab/0x360
? exc_page_fault+0x69/0x180
? asm_exc_page_fault+0x1e/0x30
? trace_event_raw_event_nvme_complete_rq+0xba/0x170
? trace_event_raw_event_nvme_complete_rq+0xa3/0x170
nvme_complete_rq+0x168/0x170
nvme_pci_complete_rq+0x16c/0x1f0
nvme_handle_cqe+0xde/0x190
nvme_irq+0x78/0x100
__handle_irq_event_percpu+0x77/0x1e0
handle_irq_event+0x54/0xb0
handle_edge_irq+0xdf/0x230
asm_call_irq_on_stack+0xf/0x20
</IRQ>
common_interrupt+0x9e/0x150
asm_common_interrupt+0x1e/0x40
It looks to me like these two upstream commits were backported to 5.10:
679c54f2de67 ("nvme: use command_id instead of req->tag in trace_nvme_complete_rq()")
e7006de6c238 ("nvme: code command_id with a genctr for use-after-free validation")
But they depend on this upstream commit to initialize the 'cmd' field in
some cases:
f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
Does it sound like I'm on the right track? The 5.15LTS and later seems to be okay.
For 5.15 attempting to use an ax88179_178a adapter "0b95:1790 ASIX
Electronics Corp. AX88179 Gigabit Ethernet"
started causing crashes.
This did not reproduce in the 6.6 kernel.
The crashes were narrowed down to the following two commits brought
into v5.15.146:
commit d63fafd6cc28 ("net: usb: ax88179_178a: avoid failed operations
when device is disconnected")
commit f860413aa00c ("net: usb: ax88179_178a: wol optimizations")
Those two use an uninitialized pointer `dev->driver_priv`.
In later kernels this pointer is initialized in commit 2bcbd3d8a7b4
("net: usb: ax88179_178a: move priv to driver_priv").
Picking in the two following commits fixed the issue for me on 5.15:
commit 9718f9ce5b86 ("net: usb: ax88179_178a: remove redundant init code")
commit 2bcbd3d8a7b4 ("net: usb: ax88179_178a: move priv to driver_priv")
The commit 9718f9ce5b86 ("net: usb: ax88179_178a: remove redundant
init code") was required for
the fix to apply cleanly.
This backports the fix to the kprobe_events interface allowing to create
kprobes on symbols defined in loadable modules again. The backport is
simpler than ones for later kernels, since the backport of the commit
introducing the bug already brought along much of the code needed to fix
it.
Andrii Nakryiko (1):
tracing/kprobes: Fix symbol counting logic by looking at modules as
well
Jiri Olsa (1):
kallsyms: Make module_kallsyms_on_each_symbol generally available
include/linux/module.h | 9 +++++++++
kernel/module.c | 2 --
kernel/trace/trace_kprobe.c | 2 ++
3 files changed, 11 insertions(+), 2 deletions(-)
--
2.40.1
From: Peter Oskolkov <posk(a)google.com>
commit 22c2ad616b74f3de2256b242572ab449d031d941 upstream.
In some testing scenarios, dst/route cache can fill up so quickly
that even an explicit GC call occasionally fails to clean it up. This leads
to sporadically failing calls to dst_alloc and "network unreachable" errors
to the user, which is confusing.
This patch adds a diagnostic message to make the cause of the failure
easier to determine.
Signed-off-by: Peter Oskolkov <posk(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
Cc: <stable(a)vger.kernel.org> # 4.19.x
---
net/core/dst.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/core/dst.c b/net/core/dst.c
index 81ccf20e2826..a263309df115 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -98,8 +98,12 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
struct dst_entry *dst;
if (ops->gc && dst_entries_get_fast(ops) > ops->gc_thresh) {
- if (ops->gc(ops))
+ if (ops->gc(ops)) {
+ printk_ratelimited(KERN_NOTICE "Route cache is full: "
+ "consider increasing sysctl "
+ "net.ipv[4|6].route.max_size.\n");
return NULL;
+ }
}
dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC);
--
2.34.1
From: Phil Sutter <phil(a)nwl.cc>
commit f1082dd31fe461d482d69da2a8eccfeb7bf07ac2 upstream.
An nftables family is merely a hollow container, its family just a
number and such not reliant on compile-time options other than nftables
support itself. Add an artificial check so attempts at using a family
the kernel can't support fail as early as possible. This helps user
space detect kernels which lack e.g. NFPROTO_INET.
Signed-off-by: Phil Sutter <phil(a)nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Cengiz Can <cengiz.can(a)canonical.com>
---
net/netfilter/nf_tables_api.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 78be121f38ac..915df77161e1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1005,6 +1005,30 @@ static int nft_objname_hash_cmp(struct rhashtable_compare_arg *arg,
return strcmp(obj->key.name, k->name);
}
+static bool nft_supported_family(u8 family)
+{
+ return false
+#ifdef CONFIG_NF_TABLES_INET
+ || family == NFPROTO_INET
+#endif
+#ifdef CONFIG_NF_TABLES_IPV4
+ || family == NFPROTO_IPV4
+#endif
+#ifdef CONFIG_NF_TABLES_ARP
+ || family == NFPROTO_ARP
+#endif
+#ifdef CONFIG_NF_TABLES_NETDEV
+ || family == NFPROTO_NETDEV
+#endif
+#if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
+ || family == NFPROTO_BRIDGE
+#endif
+#ifdef CONFIG_NF_TABLES_IPV6
+ || family == NFPROTO_IPV6
+#endif
+ ;
+}
+
static int nf_tables_newtable(struct net *net, struct sock *nlsk,
struct sk_buff *skb, const struct nlmsghdr *nlh,
const struct nlattr * const nla[],
@@ -1020,6 +1044,9 @@ static int nf_tables_newtable(struct net *net, struct sock *nlsk,
struct nft_ctx ctx;
int err;
+ if (!nft_supported_family(family))
+ return -EOPNOTSUPP;
+
lockdep_assert_held(&nft_net->commit_mutex);
attr = nla[NFTA_TABLE_NAME];
table = nft_table_lookup(net, attr, family, genmask);
--
2.40.1
This patch series addresses the problem with A an B steppings of
Intel IPU E2000 which expects incorrect endianness in data field of ATS
invalidation request TLP by disabling ATS capability for vulnerable
devices.
Bartosz Pawlowski (2):
PCI: Extract ATS disabling to a helper function
PCI: Disable ATS for specific Intel IPU E2000 devices
drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
--
2.43.0
The quilt patch titled
Subject: selftests: mm: hugepage-vmemmap fails on 64K page size systems
has been removed from the -mm tree. Its filename was
selftests-mm-hugepage-vmemmap-fails-on-64k-page-size-systems.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Donet Tom <donettom(a)linux.vnet.ibm.com>
Subject: selftests: mm: hugepage-vmemmap fails on 64K page size systems
Date: Wed, 10 Jan 2024 14:03:35 +0530
The kernel sefltest mm/hugepage-vmemmap fails on architectures which has
different page size other than 4K. In hugepage-vmemmap page size used is
4k so the pfn calculation will go wrong on systems which has different
page size .The length of MAP_HUGETLB memory must be hugepage aligned but
in hugepage-vmemmap map length is 2M so this will not get aligned if the
system has differnet hugepage size.
Added psize() to get the page size and default_huge_page_size() to
get the default hugepage size at run time, hugepage-vmemmap test pass
on powerpc with 64K page size and x86 with 4K page size.
Result on powerpc without patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 0
Head page flags (100000000) is invalid
check_page_flags: Invalid argument
*#
Result on powerpc with patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 600
*#
Result on x86 with patch (page size 4K)
*# ./hugepage-vmemmap
Returned address is 0x7fc7c2c00000 whose pfn is 1dac00
*#
Link: https://lkml.kernel.org/r/3b3a3ae37ba21218481c482a872bbf7526031600.17048657…
Fixes: b147c89cd429 ("selftests: vm: add a hugetlb test case")
Signed-off-by: Donet Tom <donettom(a)linux.vnet.ibm.com>
Reported-by: Geetika Moolchandani <geetika(a)linux.ibm.com>
Tested-by: Geetika Moolchandani <geetika(a)linux.ibm.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugepage-vmemmap.c | 29 +++++++++-------
1 file changed, 18 insertions(+), 11 deletions(-)
--- a/tools/testing/selftests/mm/hugepage-vmemmap.c~selftests-mm-hugepage-vmemmap-fails-on-64k-page-size-systems
+++ a/tools/testing/selftests/mm/hugepage-vmemmap.c
@@ -10,10 +10,7 @@
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
-
-#define MAP_LENGTH (2UL * 1024 * 1024)
-
-#define PAGE_SIZE 4096
+#include "vm_util.h"
#define PAGE_COMPOUND_HEAD (1UL << 15)
#define PAGE_COMPOUND_TAIL (1UL << 16)
@@ -39,6 +36,9 @@
#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
#endif
+static size_t pagesize;
+static size_t maplength;
+
static void write_bytes(char *addr, size_t length)
{
unsigned long i;
@@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *a
if (fd < 0)
return -1UL;
- lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+ lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET);
read(fd, &pagemap, sizeof(pagemap));
close(fd);
@@ -86,7 +86,7 @@ static int check_page_flags(unsigned lon
* this also verifies kernel has correctly set the fake page_head to tail
* while hugetlb_free_vmemmap is enabled.
*/
- for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+ for (i = 1; i < maplength / pagesize; i++) {
read(fd, &pageflags, sizeof(pageflags));
if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
(pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
@@ -106,18 +106,25 @@ int main(int argc, char **argv)
void *addr;
unsigned long pfn;
- addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+ pagesize = psize();
+ maplength = default_huge_page_size();
+ if (!maplength) {
+ printf("Unable to determine huge page size\n");
+ exit(1);
+ }
+
+ addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(1);
}
/* Trigger allocation of HugeTLB page. */
- write_bytes(addr, MAP_LENGTH);
+ write_bytes(addr, maplength);
pfn = virt_to_pfn(addr);
if (pfn == -1UL) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("virt_to_pfn");
exit(1);
}
@@ -125,13 +132,13 @@ int main(int argc, char **argv)
printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
if (check_page_flags(pfn) < 0) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("check_page_flags");
exit(1);
}
/* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
- if (munmap(addr, MAP_LENGTH)) {
+ if (munmap(addr, maplength)) {
perror("munmap");
exit(1);
}
_
Patches currently in -mm which might be from donettom(a)linux.vnet.ibm.com are
The quilt patch titled
Subject: mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval
has been removed from the -mm tree. Its filename was
mm-memory_hotplug-fix-memmap_on_memory-sysfs-value-retrieval.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval
Date: Wed, 10 Jan 2024 15:01:27 +0100
set_memmap_mode() stores the kernel parameter memmap mode as an integer.
However, the get_memmap_mode() function utilizes param_get_bool() to fetch
the value as a boolean, leading to potential endianness issue. On
Big-endian architectures, the memmap_on_memory is consistently displayed
as 'N' regardless of its actual status.
To address this endianness problem, the solution involves obtaining the
mode as an integer. This adjustment ensures the proper display of the
memmap_on_memory parameter, presenting it as one of the following options:
Force, Y, or N.
Link: https://lkml.kernel.org/r/20240110140127.241451-1-sumanthk@linux.ibm.com
Fixes: 2d1f649c7c08 ("mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks")
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Suggested-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-memmap_on_memory-sysfs-value-retrieval
+++ a/mm/memory_hotplug.c
@@ -101,9 +101,11 @@ static int set_memmap_mode(const char *v
static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
{
- if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
- return sprintf(buffer, "force\n");
- return param_get_bool(buffer, kp);
+ int mode = *((int *)kp->arg);
+
+ if (mode == MEMMAP_ON_MEMORY_FORCE)
+ return sprintf(buffer, "force\n");
+ return sprintf(buffer, "%c\n", mode ? 'Y' : 'N');
}
static const struct kernel_param_ops memmap_mode_ops = {
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-memory_hotplug-introduce-mem_prepare_online-mem_finish_offline-notifiers.patch
s390-mm-allocate-vmemmap-pages-from-self-contained-memory-range.patch
s390-sclp-remove-unhandled-memory-notifier-type.patch
s390-mm-implement-mem_prepare_online-mem_finish_offline-notifiers.patch
s390-enable-mhp_memmap_on_memory.patch
The quilt patch titled
Subject: efi: disable mirror feature during crashkernel
has been removed from the -mm tree. Its filename was
efi-disable-mirror-feature-during-crashkernel.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: efi: disable mirror feature during crashkernel
Date: Tue, 9 Jan 2024 12:15:36 +0800
If the system has no mirrored memory or uses crashkernel.high while
kernelcore=mirror is enabled on the command line then during crashkernel,
there will be limited mirrored memory and this usually leads to OOM.
To solve this problem, disable the mirror feature during crashkernel.
Link: https://lkml.kernel.org/r/20240109041536.3903042-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mm_init.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/mm_init.c~efi-disable-mirror-feature-during-crashkernel
+++ a/mm/mm_init.c
@@ -26,6 +26,7 @@
#include <linux/pgtable.h>
#include <linux/swap.h>
#include <linux/cma.h>
+#include <linux/crash_dump.h>
#include "internal.h"
#include "slab.h"
#include "shuffle.h"
@@ -381,6 +382,11 @@ static void __init find_zone_movable_pfn
goto out;
}
+ if (is_kdump_kernel()) {
+ pr_warn("The system is under kdump, ignore kernelcore=mirror.\n");
+ goto out;
+ }
+
for_each_mem_region(r) {
if (memblock_is_mirror(r))
continue;
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
The quilt patch titled
Subject: kexec: do syscore_shutdown() in kernel_kexec
has been removed from the -mm tree. Its filename was
kexec-do-syscore_shutdown-in-kernel_kexec.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: James Gowans <jgowans(a)amazon.com>
Subject: kexec: do syscore_shutdown() in kernel_kexec
Date: Wed, 13 Dec 2023 08:40:04 +0200
syscore_shutdown() runs driver and module callbacks to get the system into
a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM
/ reboot: call syscore_shutdown() after disable_nonboot_cpus()")
syscore_shutdown() was removed from kernel_restart_prepare() and hence got
(incorrectly?) removed from the kexec flow. This was innocuous until
commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to
hook restart/shutdown") changed the way that KVM registered its shutdown
callbacks, switching from reboot notifiers to syscore_ops.shutdown. As
syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run
and virtualisation is left enabled on the boot CPU which results in triple
faults when switching to the new kernel on Intel x86 VT-x with VMXE
enabled.
Fix this by adding syscore_shutdown() to the kexec sequence. In terms of
where to add it, it is being added after migrating the kexec task to the
boot CPU, but before APs are shut down. It is not totally clear if this
is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call
syscore_shutdown() after disable_nonboot_cpus()") it is stated that
"syscore_ops operations should be carried with one CPU on-line and
interrupts disabled." APs are only offlined later in machine_shutdown(),
so this syscore_shutdown() is being run while APs are still online. This
seems to be the correct place as it matches where syscore_shutdown() is
run in the reboot and halt flows - they also run it before APs are shut
down. The assumption is that the commit message in commit 6f389a8f1dd2
("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is
no longer valid.
KVM has been discussed here as it is what broke loudly by not having
syscore_shutdown() in kexec, but this change impacts more than just KVM;
all drivers/modules which register a syscore_ops.shutdown callback will
now be invoked in the kexec flow. Looking at some of them like x86 MCE it
is probably more correct to also shut these down during kexec.
Maintainers of all drivers which use syscore_ops.shutdown are added on CC
for visibility. They are:
arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown,
arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown,
arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown,
drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown,
drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown,
kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown,
virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
This has been tested by doing a kexec on x86_64 and aarch64.
Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com
Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
Signed-off-by: James Gowans <jgowans(a)amazon.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Chen-Yu Tsai <wens(a)csie.org>
Cc: Jernej Skrabec <jernej.skrabec(a)gmail.com>
Cc: Samuel Holland <samuel(a)sholland.org>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: Sebastian Reichel <sre(a)kernel.org>
Cc: Orson Zhai <orsonzhai(a)gmail.com>
Cc: Alexander Graf <graf(a)amazon.de>
Cc: Jan H. Schoenherr <jschoenh(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_core.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/kexec_core.c~kexec-do-syscore_shutdown-in-kernel_kexec
+++ a/kernel/kexec_core.c
@@ -1257,6 +1257,7 @@ int kernel_kexec(void)
kexec_in_progress = true;
kernel_restart_prepare("kexec reboot");
migrate_to_reboot_cpu();
+ syscore_shutdown();
/*
* migrate_to_reboot_cpu() disables CPU hotplug assuming that
_
Patches currently in -mm which might be from jgowans(a)amazon.com are
The quilt patch titled
Subject: fs/proc/task_mmu: move mmu notification mechanism inside mm lock
has been removed from the -mm tree. Its filename was
fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: fs/proc/task_mmu: move mmu notification mechanism inside mm lock
Date: Tue, 9 Jan 2024 16:24:42 +0500
Move mmu notification mechanism inside mm lock to prevent race condition
in other components which depend on it. The notifier will invalidate
memory range. Depending upon the number of iterations, different memory
ranges would be invalidated.
The following warning would be removed by this patch:
WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734
There is no behavioural and performance change with this patch when
there is no component registered with the mmu notifier.
[akpm(a)linux-foundation.org: narrow the scope of `range', per Sean]
Link: https://lkml.kernel.org/r/20240109112445.590736-1-usama.anjum@collabora.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reported-by: syzbot+81227d2bd69e9dedb802(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Micha�� Miros��aw <mirq-linux(a)rere.qmqm.pl>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock
+++ a/fs/proc/task_mmu.c
@@ -2432,7 +2432,6 @@ static long pagemap_scan_flush_buffer(st
static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
{
- struct mmu_notifier_range range;
struct pagemap_scan_private p = {0};
unsigned long walk_start;
size_t n_ranges_out = 0;
@@ -2448,15 +2447,9 @@ static long do_pagemap_scan(struct mm_st
if (ret)
return ret;
- /* Protection change for the range is going to happen. */
- if (p.arg.flags & PM_SCAN_WP_MATCHING) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
- mm, p.arg.start, p.arg.end);
- mmu_notifier_invalidate_range_start(&range);
- }
-
for (walk_start = p.arg.start; walk_start < p.arg.end;
walk_start = p.arg.walk_end) {
+ struct mmu_notifier_range range;
long n_out;
if (fatal_signal_pending(current)) {
@@ -2467,8 +2460,20 @@ static long do_pagemap_scan(struct mm_st
ret = mmap_read_lock_killable(mm);
if (ret)
break;
+
+ /* Protection change for the range is going to happen. */
+ if (p.arg.flags & PM_SCAN_WP_MATCHING) {
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
+ mm, walk_start, p.arg.end);
+ mmu_notifier_invalidate_range_start(&range);
+ }
+
ret = walk_page_range(mm, walk_start, p.arg.end,
&pagemap_scan_ops, &p);
+
+ if (p.arg.flags & PM_SCAN_WP_MATCHING)
+ mmu_notifier_invalidate_range_end(&range);
+
mmap_read_unlock(mm);
n_out = pagemap_scan_flush_buffer(&p);
@@ -2494,9 +2499,6 @@ static long do_pagemap_scan(struct mm_st
if (pagemap_scan_writeback_args(&p.arg, uarg))
ret = -EFAULT;
- if (p.arg.flags & PM_SCAN_WP_MATCHING)
- mmu_notifier_invalidate_range_end(&range);
-
kfree(p.vec_buf);
return ret;
}
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
selftests-mm-mremap_test-fix-build-warning.patch
selftests-mm-hugepage-shm-conform-test-to-tap-format-output.patch
selftests-mm-hugepage-vmemmap-conform-test-to-tap-format-output.patch
selftests-mm-hugetlb-madvise-conform-test-to-tap-format-output.patch
selftests-mm-khugepaged-conform-test-to-tap-format-output.patch
selftests-mm-hugetlb-read-hwpoison-conform-test-to-tap-format-output.patch
selftests-mm-ksm_tests-conform-test-to-tap-format-output.patch
selftests-mm-config-add-missing-configs.patch
The quilt patch titled
Subject: scripts/decode_stacktrace.sh: optionally use LLVM utilities
has been removed from the -mm tree. Its filename was
scripts-decode_stacktracesh-optionally-use-llvm-utilities.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Carlos Llamas <cmllamas(a)google.com>
Subject: scripts/decode_stacktrace.sh: optionally use LLVM utilities
Date: Fri, 29 Sep 2023 03:48:17 +0000
GNU's addr2line can have problems parsing a vmlinux built with LLVM,
particularly when LTO was used. In order to decode the traces correctly
this patch adds the ability to switch to LLVM's utilities readelf and
addr2line. The same approach is followed by Will in [1].
Before:
$ scripts/decode_stacktrace.sh vmlinux < kernel.log
[17716.240635] Call trace:
[17716.240646] skb_cow_data (??:?)
[17716.240654] esp6_input (ld-temp.o:?)
[17716.240666] xfrm_input (ld-temp.o:?)
[17716.240674] xfrm6_rcv (??:?)
[...]
After:
$ LLVM=1 scripts/decode_stacktrace.sh vmlinux < kernel.log
[17716.240635] Call trace:
[17716.240646] skb_cow_data (include/linux/skbuff.h:2172 net/core/skbuff.c:4503)
[17716.240654] esp6_input (net/ipv6/esp6.c:977)
[17716.240666] xfrm_input (net/xfrm/xfrm_input.c:659)
[17716.240674] xfrm6_rcv (net/ipv6/xfrm6_input.c:172)
[...]
Note that one could set CROSS_COMPILE=llvm- instead to hack around this
issue. However, doing so can break the decodecode routine as it will
force the selection of other LLVM utilities down the line e.g. llvm-as.
[1] https://lore.kernel.org/all/20230914131225.13415-3-will@kernel.org/
Link: https://lkml.kernel.org/r/20230929034836.403735-1-cmllamas@google.com
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Elliot Berman <quic_eberman(a)quicinc.com>
Tested-by: Justin Stitt <justinstitt(a)google.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: John Stultz <jstultz(a)google.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/decode_stacktrace.sh | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
--- a/scripts/decode_stacktrace.sh~scripts-decode_stacktracesh-optionally-use-llvm-utilities
+++ a/scripts/decode_stacktrace.sh
@@ -16,6 +16,21 @@ elif type c++filt >/dev/null 2>&1 ; then
cppfilt_opts=-i
fi
+UTIL_SUFFIX=
+if [[ -z ${LLVM:-} ]]; then
+ UTIL_PREFIX=${CROSS_COMPILE:-}
+else
+ UTIL_PREFIX=llvm-
+ if [[ ${LLVM} == */ ]]; then
+ UTIL_PREFIX=${LLVM}${UTIL_PREFIX}
+ elif [[ ${LLVM} == -* ]]; then
+ UTIL_SUFFIX=${LLVM}
+ fi
+fi
+
+READELF=${UTIL_PREFIX}readelf${UTIL_SUFFIX}
+ADDR2LINE=${UTIL_PREFIX}addr2line${UTIL_SUFFIX}
+
if [[ $1 == "-r" ]] ; then
vmlinux=""
basepath="auto"
@@ -75,7 +90,7 @@ find_module() {
if [[ "$modpath" != "" ]] ; then
for fn in $(find "$modpath" -name "${module//_/[-_]}.ko*") ; do
- if readelf -WS "$fn" | grep -qwF .debug_line ; then
+ if ${READELF} -WS "$fn" | grep -qwF .debug_line ; then
echo $fn
return
fi
@@ -169,7 +184,7 @@ parse_symbol() {
if [[ $aarray_support == true && "${cache[$module,$address]+isset}" == "isset" ]]; then
local code=${cache[$module,$address]}
else
- local code=$(${CROSS_COMPILE}addr2line -i -e "$objfile" "$address" 2>/dev/null)
+ local code=$(${ADDR2LINE} -i -e "$objfile" "$address" 2>/dev/null)
if [[ $aarray_support == true ]]; then
cache[$module,$address]=$code
fi
_
Patches currently in -mm which might be from cmllamas(a)google.com are
I'm announcing the release of the 5.10.207 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/scsi/scsi.c | 2 +-
drivers/scsi/scsi_error.c | 34 +++++++++++++++++++---------------
drivers/scsi/scsi_lib.c | 38 +++++++++++++-------------------------
drivers/scsi/scsi_logging.c | 18 ++++++++----------
drivers/scsi/scsi_priv.h | 1 -
include/scsi/scsi_cmnd.h | 29 +++--------------------------
include/scsi/scsi_device.h | 16 +++++++---------
8 files changed, 52 insertions(+), 88 deletions(-)
Alexander Atanasov (1):
scsi: core: Always send batch on reset or error handling command
Greg Kroah-Hartman (7):
Revert "scsi: core: Always send batch on reset or error handling command"
Revert "scsi: core: Use a structure member to track the SCSI command submitter"
Revert "scsi: core: Use scsi_cmd_to_rq() instead of scsi_cmnd.request"
Revert "scsi: core: Make scsi_get_lba() return the LBA"
Revert "scsi: core: Introduce scsi_get_sector()"
Revert "scsi: core: Add scsi_prot_ref_tag() helper"
Linux 5.10.207
The patch titled
Subject: fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-hugetlbfs-inodec-mm-memory-failurec-fix-hugetlbfs-hwpoison-handling.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Subject: fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling
Date: Fri, 12 Jan 2024 10:08:40 -0800
has_extra_refcount() makes the assumption that the page cache adds a ref
count of 1 and subtracts this in the extra_pins case. Commit a08c7193e4f1
(mm/filemap: remove hugetlb special casing in filemap.c) modifies
__filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases
(including hugtetlb) where nr is the number of pages in the folio. We
should adjust the number of references coming from the page cache by
subtracing the number of pages rather than 1.
In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong
flag as, in the hugetlb case, memory-failure code calls
folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is
the correct function to test for that flag.
After these fixes, the hugetlb hwpoison read selftest passes all cases.
Link: https://lkml.kernel.org/r/20240112180840.367006-1-sidhartha.kumar@oracle.com
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.c…
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org> [6.7+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 2 +-
mm/memory-failure.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c~fs-hugetlbfs-inodec-mm-memory-failurec-fix-hugetlbfs-hwpoison-handling
+++ a/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struc
} else {
folio_unlock(folio);
- if (!folio_test_has_hwpoisoned(folio))
+ if (!folio_test_hwpoison(folio))
want = nr;
else {
/*
--- a/mm/memory-failure.c~fs-hugetlbfs-inodec-mm-memory-failurec-fix-hugetlbfs-hwpoison-handling
+++ a/mm/memory-failure.c
@@ -982,7 +982,7 @@ static bool has_extra_refcount(struct pa
int count = page_count(p) - 1;
if (extra_pins)
- count -= 1;
+ count -= folio_nr_pages(page_folio(p));
if (count > 0) {
pr_err("%#lx: %s still referenced by %d users\n",
_
Patches currently in -mm which might be from sidhartha.kumar(a)oracle.com are
fs-hugetlbfs-inodec-mm-memory-failurec-fix-hugetlbfs-hwpoison-handling.patch
maple_tree-fix-comment-describing-mas_node_count_gfp.patch
In uart_throttle() and uart_unthrottle():
if (port->status & mask) {
port->ops->throttle/unthrottle(port);
mask &= ~port->status;
}
// Code segment utilizing the mask value to determine UART behavior
In uart_change_line_settings():
uart_port_lock_irq(uport);
// Code segment responsible for updating uport->status
uart_port_unlock_irq(uport);
In the uart_throttle() and uart_unthrottle() functions, there is a double
fetch issue due to concurrent execution with uart_change_line_settings().
In uart_throttle() and uart_unthrottle(), the check
if (port->status & mask) is made, followed by mask &= ~port->status,
where the relevant bits are cleared. However, port->status may be modified
in uart_change_line_settings(). The current implementation does not ensure
atomicity in the access and modification of port->status and mask. This
can result in mask being updated based on a modified port->status value,
leading to improper UART actions.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To resolve this double fetch, it is suggested to add a uart_port_lock pair
in uart_throttle() and uart_unthrottle(). With this patch applied, our
tool no longer reports the bug, with the kernel configuration allyesconfig
for x86_64. Due to the absence of the requisite hardware, we are unable to
conduct runtime testing of the patch. Therefore, our verification is
solely based on code logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: 391f93f2ec9f ("serial: core: Rework hw-assisted flow control support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
drivers/tty/serial/serial_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 80085b151b34..9d905fdf2843 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -723,11 +723,13 @@ static void uart_throttle(struct tty_struct *tty)
mask |= UPSTAT_AUTOXOFF;
if (C_CRTSCTS(tty))
mask |= UPSTAT_AUTORTS;
-
+
+ uart_port_lock_irq(port);
if (port->status & mask) {
port->ops->throttle(port);
mask &= ~port->status;
}
+ uart_port_unlock_irq(port);
if (mask & UPSTAT_AUTORTS)
uart_clear_mctrl(port, TIOCM_RTS);
@@ -753,10 +755,12 @@ static void uart_unthrottle(struct tty_struct *tty)
if (C_CRTSCTS(tty))
mask |= UPSTAT_AUTORTS;
+ uart_port_lock_irq(port);
if (port->status & mask) {
port->ops->unthrottle(port);
mask &= ~port->status;
}
+ uart_port_unlock_irq(port);
if (mask & UPSTAT_AUTORTS)
uart_set_mctrl(port, TIOCM_RTS);
--
2.34.1
This is the start of the stable review cycle for the 5.10.207 release.
There are 7 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 13 Jan 2024 09:46:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.207-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.207-rc1
Alexander Atanasov <alexander.atanasov(a)virtuozzo.com>
scsi: core: Always send batch on reset or error handling command
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Add scsi_prot_ref_tag() helper"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Introduce scsi_get_sector()"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Make scsi_get_lba() return the LBA"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Use scsi_cmd_to_rq() instead of scsi_cmnd.request"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Use a structure member to track the SCSI command submitter"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: core: Always send batch on reset or error handling command"
-------------
Diffstat:
Makefile | 4 ++--
drivers/scsi/scsi.c | 2 +-
drivers/scsi/scsi_error.c | 34 +++++++++++++++++++---------------
drivers/scsi/scsi_lib.c | 38 +++++++++++++-------------------------
drivers/scsi/scsi_logging.c | 18 ++++++++----------
drivers/scsi/scsi_priv.h | 1 -
include/scsi/scsi_cmnd.h | 29 +++--------------------------
include/scsi/scsi_device.h | 16 +++++++---------
8 files changed, 53 insertions(+), 89 deletions(-)
This patch series addresses the problem with A an B steppings of
Intel IPU E2000 which expects incorrect endianness in data field of ATS
invalidation request TLP by disabling ATS capability for vulnerable
devices.
Bartosz Pawlowski (2):
PCI: Extract ATS disabling to a helper function
PCI: Disable ATS for specific Intel IPU E2000 devices
drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
--
2.43.0
This patch series addresses the problem with A an B steppings of
Intel IPU E2000 which expects incorrect endianness in data field of ATS
invalidation request TLP by disabling ATS capability for vulnerable
devices.
Bartosz Pawlowski (2):
PCI: Extract ATS disabling to a helper function
PCI: Disable ATS for specific Intel IPU E2000 devices
drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
--
2.43.0
Syzkaller reports warning in ext4_set_page_dirty() in 5.10 stable
releases. The problem can be fixed by the following patches
which can be cleanly applied to the 5.10 branch.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://syzkaller.appspot.com/bug?extid=02f21431b65c214aa1d6
Matthew Wilcox (Oracle) (2):
mm/truncate: Inline invalidate_complete_page() into its one caller
mm/truncate: Replace page_mapped() call in invalidate_inode_page()
kernel/futex/core.c | 2 +-
mm/truncate.c | 34 +++++++---------------------------
2 files changed, 8 insertions(+), 28 deletions(-)
--
2.34.1
In uart_tiocmget():
result = uport->mctrl;
uart_port_lock_irq(uport);
result |= uport->ops->get_mctrl(uport);
uart_port_unlock_irq(uport);
...
return result;
In uart_update_mctrl():
uart_port_lock_irqsave(port, &flags);
...
port->mctrl = (old & ~clear) | set;
...
port->ops->set_mctrl(port, port->mctrl);
...
uart_port_unlock_irqrestore(port, flags);
An atomicity violation is identified due to the concurrent execution of
uart_tiocmget() and uart_update_mctrl(). After assigning
result = uport->mctrl, the mctrl value may change in uart_update_mctrl(),
leading to a mismatch between the value returned by
uport->ops->get_mctrl(uport) and the mctrl value previously read.
This can result in uart_tiocmget() returning an incorrect value.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To address this issue, it is suggested to move the line
result = uport->mctrl inside the uart_port_lock block to ensure atomicity
and prevent the mctrl value from being altered during the execution of
uart_tiocmget(). With this patch applied, our tool no longer reports the
bug, with the kernel configuration allyesconfig for x86_64. Due to the
absence of the requisite hardware, we are unable to conduct runtime
testing of the patch. Therefore, our verification is solely based on code
logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: c5f4644e6c8b ("[PATCH] Serial: Adjust serial locking")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
v2:
* In this patch v2, we've updated the right Fixes.
Thank John Ogness for helpful advice.
---
drivers/tty/serial/serial_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 80085b151b34..a9e39416d877 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -1085,8 +1085,8 @@ static int uart_tiocmget(struct tty_struct *tty)
goto out;
if (!tty_io_error(tty)) {
- result = uport->mctrl;
uart_port_lock_irq(uport);
+ result = uport->mctrl;
result |= uport->ops->get_mctrl(uport);
uart_port_unlock_irq(uport);
}
--
2.34.1
In uart_tiocmget():
result = uport->mctrl;
uart_port_lock_irq(uport);
result |= uport->ops->get_mctrl(uport);
uart_port_unlock_irq(uport);
...
return result;
In uart_update_mctrl():
uart_port_lock_irqsave(port, &flags);
...
port->mctrl = (old & ~clear) | set;
...
uart_port_unlock_irqrestore(port, flags);
An atomicity violation is identified due to the concurrent execution of
uart_tiocmget() and uart_update_mctrl(). After assigning
result = uport->mctrl, the mctrl value may change in uart_update_mctrl(),
leading to a mismatch between the value returned by
uport->ops->get_mctrl(uport) and the mctrl value previously read.
This can result in uart_tiocmget() returning an incorrect value.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To address this issue, it is suggested to move the line
result = uport->mctrl inside the uart_port_lock block to ensure atomicity
and prevent the mctrl value from being altered during the execution of
uart_tiocmget(). With this patch applied, our tool no longer reports the
bug, with the kernel configuration allyesconfig for x86_64. Due to the
absence of the requisite hardware, we are unable to conduct runtime
testing of the patch. Therefore, our verification is solely based on code
logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: 559c7ff4e324 ("serial: core: Use port lock wrappers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
drivers/tty/serial/serial_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 80085b151b34..a9e39416d877 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -1085,8 +1085,8 @@ static int uart_tiocmget(struct tty_struct *tty)
goto out;
if (!tty_io_error(tty)) {
- result = uport->mctrl;
uart_port_lock_irq(uport);
+ result = uport->mctrl;
result |= uport->ops->get_mctrl(uport);
uart_port_unlock_irq(uport);
}
--
2.34.1
In gameport_run_poll_handler():
...
if (gameport->poll_cnt)
mod_timer(&gameport->poll_timer, jiffies + ...));
In gameport_stop_polling():
spin_lock(&gameport->timer_lock);
if (!--gameport->poll_cnt)
del_timer(&gameport->poll_timer);
spin_unlock(&gameport->timer_lock);
An atomicity violation occurs due to the concurrent execution of
gameport_run_poll_handler() and gameport_stop_polling(). The current check
for gameport->poll_cnt in gameport_run_poll_handler() is not effective
because poll_cnt can be decremented to 0 and del_timer can be called in
gameport_stop_polling() before mod_timer is called in
gameport_run_poll_handler(). This situation leads to the risk of calling
mod_timer for a timer that has already been deleted in
gameport_stop_polling(). Since calling mod_timer on a deleted timer
reactivates it, this atomicity violation could result in the timer being
activated while the poll_cnt value is 0.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To resolve this issue, it is suggested to add a spinlock pair in
gameport_run_poll_handler() to ensure atomicity. With this patch applied,
our tool no longer reports the bug, with the kernel configuration
allyesconfig for x86_64. Due to the absence of the requisite hardware, we
are unable to conduct runtime testing of the patch. Therefore, our
verification is solely based on code logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
drivers/input/gameport/gameport.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/input/gameport/gameport.c b/drivers/input/gameport/gameport.c
index 34f416a3ebcb..12af46d3c059 100644
--- a/drivers/input/gameport/gameport.c
+++ b/drivers/input/gameport/gameport.c
@@ -202,8 +202,13 @@ static void gameport_run_poll_handler(struct timer_list *t)
struct gameport *gameport = from_timer(gameport, t, poll_timer);
gameport->poll_handler(gameport);
+
+ spin_lock(&gameport->timer_lock);
+
if (gameport->poll_cnt)
mod_timer(&gameport->poll_timer, jiffies + msecs_to_jiffies(gameport->poll_interval));
+
+ spin_unlock(&gameport->timer_lock);
}
/*
--
2.34.1
has_extra_refcount() makes the assumption that a ref count of 1 means
the page is not referenced by other users. Commit a08c7193e4f1
(mm/filemap: remove hugetlb special casing in filemap.c) modifies
__filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases
(including hugtetlb) where nr is the number of pages in the folio. We
should check if the page is not referenced by other users by checking
the page count against the number of pages rather than 1.
In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong
flag as, in the hugetlb case, memory-failure code calls
folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is the
correct function to test for that flag.
After these fixes, the hugetlb hwpoison read selftest passes all cases.
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.c…
Cc: <stable(a)vger.kernel.org> # 6.7+
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
fs/hugetlbfs/inode.c | 2 +-
mm/memory-failure.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 36132c9125f9..3a248e4f7e93 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
folio_unlock(folio);
- if (!folio_test_has_hwpoisoned(folio))
+ if (!folio_test_hwpoison(folio))
want = nr;
else {
/*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d8c853b35dbb..87f6bf7d8bc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -973,7 +973,7 @@ struct page_state {
static bool has_extra_refcount(struct page_state *ps, struct page *p,
bool extra_pins)
{
- int count = page_count(p) - 1;
+ int count = page_count(p) - folio_nr_pages(page_folio(p));
if (extra_pins)
count -= 1;
--
2.31.1
Unmapped folios accessed through file descriptors can be
underprotected. Those folios are added to the oldest generation based
on:
1. The fact that they are less costly to reclaim (no need to walk the
rmap and flush the TLB) and have less impact on performance (don't
cause major PFs and can be non-blocking if needed again).
2. The observation that they are likely to be single-use. E.g., for
client use cases like Android, its apps parse configuration files
and store the data in heap (anon); for server use cases like MySQL,
it reads from InnoDB files and holds the cached data for tables in
buffer pools (anon).
However, the oldest generation can be very short lived, and if so, it
doesn't provide the PID controller with enough time to respond to a
surge of refaults. (Note that the PID controller uses weighted
refaults and those from evicted generations only take a half of the
whole weight.) In other words, for a short lived generation, the
moving average smooths out the spike quickly.
To fix the problem:
1. For folios that are already on LRU, if they can be beyond the
tracking range of tiers, i.e., five accesses through file
descriptors, move them to the second oldest generation to give them
more time to age. (Note that tiers are used by the PID controller
to statistically determine whether folios accessed multiple times
through file descriptors are worth protecting.)
2. When adding unmapped folios to LRU, adjust the placement of them so
that they are not too close to the tail. The effect of this is
similar to the above.
On Android, launching 55 apps sequentially:
Before After Change
workingset_refault_anon 25641024 25598972 0%
workingset_refault_file 115016834 106178438 -8%
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Tested-by: Kalesh Singh <kaleshsingh(a)google.com>
Cc: stable(a)vger.kernel.org
---
include/linux/mm_inline.h | 23 ++++++++++++++---------
mm/vmscan.c | 2 +-
mm/workingset.c | 6 +++---
3 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 9ae7def16cb2..f4fe593c1400 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -232,22 +232,27 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
if (folio_test_unevictable(folio) || !lrugen->enabled)
return false;
/*
- * There are three common cases for this page:
- * 1. If it's hot, e.g., freshly faulted in or previously hot and
- * migrated, add it to the youngest generation.
- * 2. If it's cold but can't be evicted immediately, i.e., an anon page
- * not in swapcache or a dirty page pending writeback, add it to the
- * second oldest generation.
- * 3. Everything else (clean, cold) is added to the oldest generation.
+ * There are four common cases for this page:
+ * 1. If it's hot, i.e., freshly faulted in, add it to the youngest
+ * generation, and it's protected over the rest below.
+ * 2. If it can't be evicted immediately, i.e., a dirty page pending
+ * writeback, add it to the second youngest generation.
+ * 3. If it should be evicted first, e.g., cold and clean from
+ * folio_rotate_reclaimable(), add it to the oldest generation.
+ * 4. Everything else falls between 2 & 3 above and is added to the
+ * second oldest generation if it's considered inactive, or the
+ * oldest generation otherwise. See lru_gen_is_active().
*/
if (folio_test_active(folio))
seq = lrugen->max_seq;
else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
(folio_test_reclaim(folio) &&
(folio_test_dirty(folio) || folio_test_writeback(folio))))
- seq = lrugen->min_seq[type] + 1;
- else
+ seq = lrugen->max_seq - 1;
+ else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq)
seq = lrugen->min_seq[type];
+ else
+ seq = lrugen->min_seq[type] + 1;
gen = lru_gen_from_seq(seq);
flags = (gen + 1UL) << LRU_GEN_PGOFF;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4e3b835c6b4a..e67631c60ac0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4260,7 +4260,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
}
/* protected */
- if (tier > tier_idx) {
+ if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) {
int hist = lru_hist_from_seq(lrugen->min_seq[type]);
gen = folio_inc_gen(lruvec, folio, false);
diff --git a/mm/workingset.c b/mm/workingset.c
index 7d3dacab8451..2a2a34234df9 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
* 1. For pages accessed through page tables, hotter pages pushed out
* hot pages which refaulted immediately.
* 2. For pages accessed multiple times through file descriptors,
- * numbers of accesses might have been out of the range.
+ * they would have been protected by sort_folio().
*/
- if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) {
- folio_set_workingset(folio);
+ if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) {
+ set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset));
mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta);
}
unlock:
--
2.43.0.472.g3155946c3a-goog
The patch titled
Subject: selftests/mm: mremap_test: fix build warning
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-mremap_test-fix-build-warning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests/mm: mremap_test: fix build warning
Date: Thu, 11 Jan 2024 13:20:38 +0500
Fix following build warning:
warning: format `%d' expects argument of type `int', but argument 2 has type `long long unsigned int'
Link: https://lkml.kernel.org/r/20240111082039.3398848-1-usama.anjum@collabora.com
Fixes: a4cb3b243343 ("selftests: mm: add a test for remapping to area immediately after existing mapping")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/mremap_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/mremap_test.c~selftests-mm-mremap_test-fix-build-warning
+++ a/tools/testing/selftests/mm/mremap_test.c
@@ -457,7 +457,7 @@ static long long remap_region(struct con
char c = (char) rand();
if (((char *) dest_preamble_addr)[i] != c) {
- ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
+ ksft_print_msg("Preamble data after remap doesn't match at offset %llu\n",
i);
ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
((char *) dest_preamble_addr)[i] & 0xff);
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock.patch
selftests-mm-mremap_test-fix-build-warning.patch
The patch titled
Subject: fs/hugetlbfs/inode.c: mm/memory-failure.c
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fix-hugetlbfs-hwpoison-handling.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Subject: fs/hugetlbfs/inode.c: mm/memory-failure.c
fix hugetlbfs hwpoison handling
Date: Thu, 11 Jan 2024 11:16:55 -0800
has_extra_refcount() makes the assumption that a ref count of 1 means the
page is not referenced by other users. Commit a08c7193e4f1 (mm/filemap:
remove hugetlb special casing in filemap.c) modifies __filemap_add_folio()
by calling folio_ref_add(folio, nr); for all cases (including hugtetlb)
where nr is the number of pages in the folio. We should check if the page
is not referenced by other users by checking the page count against the
number of pages rather than 1.
In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong
flag as, in the hugetlb case, memory-failure code calls
folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is
the correct function to test for that flag.
After these fixes, the hugetlb hwpoison read selftest passes all cases.
Link: https://lkml.kernel.org/r/20240111191655.295530-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.c…
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [6.7+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 2 +-
mm/memory-failure.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c~fix-hugetlbfs-hwpoison-handling
+++ a/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struc
} else {
folio_unlock(folio);
- if (!folio_test_has_hwpoisoned(folio))
+ if (!folio_test_hwpoison(folio))
want = nr;
else {
/*
--- a/mm/memory-failure.c~fix-hugetlbfs-hwpoison-handling
+++ a/mm/memory-failure.c
@@ -979,7 +979,7 @@ struct page_state {
static bool has_extra_refcount(struct page_state *ps, struct page *p,
bool extra_pins)
{
- int count = page_count(p) - 1;
+ int count = page_count(p) - folio_nr_pages(page_folio(p));
if (extra_pins)
count -= 1;
_
Patches currently in -mm which might be from sidhartha.kumar(a)oracle.com are
fix-hugetlbfs-hwpoison-handling.patch
maple_tree-fix-comment-describing-mas_node_count_gfp.patch
Hello Ext4 Developers,
I hope this email finds you well. We are reaching out to report a
persistent issue that we have been facing on Windows Subsystem for
Linux (WSL)[1] with various kernel versions. We have encountered the
problem on kernel versions v5.15, v6.1, v6.6 stable kernels, and also
the current upstream kernel. While the issue takes longer to reproduce
on v5.15, it is consistently observable across these versions.
Issue Description:
Intermittent segfault with memory corruption. The time of segfault and
output can vary, though one of the notable failures manifests as a
segfault with the following error message:
EXT4-fs error (device sdc): ext4_find_dest_de:2092: inode #32168:
block 2334198: comm dpkg: bad entry in directory: rec_len is smaller
than minimal - offset=0, inode=0, rec_len=0, size=4084 fake=0
and
EXT4-fs warning (device sdc): dx_probe:890: inode #27771: comm dpkg:
dx entry: limit 0 != root limit 508
EXT4-fs warning (device sdc): dx_probe:964: inode #27771: comm dpkg:
Corrupt directory, running e2fsck is recommended
EXT4-fs error (device sdc): ext4_empty_dir:3098: inode #27753: block
133944722: comm dpkg: bad entry in directory: rec_len is smaller than
minimal - offset=0, inode=0, rec_len=0, size=4096 fake=0
or we see a segfault message where the source can change depending on
which command we're testing with (dpkg, apt, gcc..):
dpkg[135]: segfault at 0 ip 00007f9209eb6a19 sp 00007ffd8a6a0b08 error
4 in libc-2.31.so[7f9209d6e000+159000] likely on CPU 1 (core 0, socket
0)
Reproduction Steps:
The steps to reproduce the issue are seemingly straightforward: Run an
install or upgrade. The larger the change the better.
Installing Gimp has been a go to for testing, though we have
reproduced the failure with:
- apt upgrade
- apt install
- dpkg install
- gcc building source files
Observations:
The issue occurs consistently across multiple kernel versions.
Reproduction is faster on more recent kernels.
Longer intervals are required for v5.15.
When adding more debugging options that increases processing time,
segfault seems to be harder to hit.
When DX_DEBUG is enabled, during the installation process(dpkg
install), we observed instances where both rlen and de->name_len
values become 0.
We wanted to bring this to your attention and seek guidance on how we
could proceed with debugging and resolving this issue. Your insights
and assistance would be greatly appreciated.
Thank you for your time and consideration.
[1] What is Windows Subsystem for Linux:
https://learn.microsoft.com/en-us/windows/wsl/about
--
- Allen
Some DSA tagging protocols change the EtherType field in the MAC header
e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged
frames are ignored by the checksum offload engine and IP header checker of
some stmmac cores.
On RX, the stmmac driver wrongly assumes that checksums have been computed
for these tagged packets, and sets CHECKSUM_UNNECESSARY.
Add an additional check in the stmmac TX and RX hotpaths so that COE is
deactivated for packets with ethertypes that will not trigger the COE and
IP header checks.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
Cc: <stable(a)vger.kernel.org>
Reported-by: Richard Tresidder <rtresidd(a)electromag.com.au>
Link: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electro…
Reported-by: Romain Gantois <romain.gantois(a)bootlin.com>
Link: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin…
Reviewed-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Signed-off-by: Romain Gantois <romain.gantois(a)bootlin.com>
---
Hello everyone,
This is the fourth version of my proposed fix for the stmmac checksum
offloading issue that has recently been reported.
significant changes in v4:
- Removed "inline" from declaration of stmmac_has_ip_ethertype
significant changes in v3:
- Use __vlan_get_protocol to make sure that 8021Q-encapsulated
traffic is checked correctly.
significant changes in v2:
- Replaced the stmmac_link_up-based fix with an ethertype check in the TX
and RX hotpaths.
The Checksum Offloading Engine of some stmmac cores (e.g. DWMAC1000)
computes an incorrect checksum when presented with DSA-tagged packets. This
causes all TCP/UDP transfers to break when the stmmac device is connected
to the CPU port of a DSA switch.
I ran some tests using different tagging protocols with DSA_LOOP, and all
of the protocols that set a custom ethertype field in the MAC header caused
the checksum offload engine to ignore the tagged packets. On TX, this
caused packets to egress with incorrect checksums. On RX, these packets
were similarly ignored by the COE, yet the stmmac driver set
CHECKSUM_UNNECESSARY, wrongly assuming that their checksums had been
verified in hardware.
Version 2 of this patch series fixes this issue by checking ethertype
fields in both the TX and RX hotpaths of the stmmac driver. On TX, if a
non-IP ethertype is detected, the packet is checksummed in software. On
RX, the same condition causes stmmac to avoid setting CHECKSUM_UNNECESSARY.
To measure the performance degradation to the TX/RX hotpaths, I did some
iperf3 runs with 512-byte unfragmented UDP packets.
measured degradation on TX: -466 pps (-0.2%) on RX: -338 pps (-1.2%)
original performances on TX: 22kpps on RX: 27kpps
The performance hit on the RX path can be partly explained by the fact that
the stmmac driver doesn't set CHECKSUM_UNNECESSARY anymore.
The TX performance degradation observed in v2 seems to have improved.
It's not entirely clear to me why that is.
Best Regards,
Romain
Romain Gantois (1):
net: stmmac: Prevent DSA tags from breaking COE
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 23 ++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
--
2.43.0
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 29 ++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 37e64283f910..b30dba06dbd1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4371,6 +4371,25 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
+/**
+ * stmmac_has_ip_ethertype() - Check if packet has IP ethertype
+ * @skb: socket buffer to check
+ *
+ * Check if a packet has an ethertype that will trigger the IP header checks
+ * and IP/TCP checksum engine of the stmmac core.
+ *
+ * Return: true if the ethertype can trigger the checksum engine, false otherwise
+ */
+static bool stmmac_has_ip_ethertype(struct sk_buff *skb)
+{
+ int depth = 0;
+ __be16 proto;
+
+ proto = __vlan_get_protocol(skb, eth_header_parse_protocol(skb), &depth);
+
+ return (depth <= ETH_HLEN) && (proto == htons(ETH_P_IP) || proto == htons(ETH_P_IPV6));
+}
+
/**
* stmmac_xmit - Tx entry point of the driver
* @skb : the socket buffer
@@ -4435,9 +4454,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
/* DWMAC IPs can be synthesized to support tx coe only for a few tx
* queues. In that case, checksum offloading for those queues that don't
* support tx coe needs to fallback to software checksum calculation.
+ *
+ * Packets that won't trigger the COE e.g. most DSA-tagged packets will
+ * also have to be checksummed in software.
*/
if (csum_insertion &&
- priv->plat->tx_queues_cfg[queue].coe_unsupported) {
+ (priv->plat->tx_queues_cfg[queue].coe_unsupported ||
+ !stmmac_has_ip_ethertype(skb))) {
if (unlikely(skb_checksum_help(skb)))
goto dma_map_err;
csum_insertion = !csum_insertion;
@@ -4997,7 +5020,7 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
stmmac_rx_vlan(priv->dev, skb);
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -5513,7 +5536,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
stmmac_rx_vlan(priv->dev, skb);
skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe))
+ if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb))
skb_checksum_none_assert(skb);
else
skb->ip_summed = CHECKSUM_UNNECESSARY;
---
base-commit: ac631873c9e7a50d2a8de457cfc4b9f86666403e
change-id: 20240108-prevent_dsa_tags-7bb0def0db81
Best regards,
--
Romain Gantois <romain.gantois(a)bootlin.com>
When a queue is unbound from the vfio_ap device driver, it is reset to
ensure its crypto data is not leaked when it is bound to another device
driver. If the queue is unbound due to the fact that the adapter or domain
was removed from the host's AP configuration, then attempting to reset it
will fail with response code 01 (APID not valid) getting returned from the
reset command. Let's ensure that the queue is assigned to the host's
configuration before resetting it.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne(a)linux.ibm.com>
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e014108067dc..84decb0d5c97 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2197,6 +2197,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
q = dev_get_drvdata(&apdev->device);
get_update_locks_for_queue(q);
matrix_mdev = q->matrix_mdev;
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) {
/* If the queue is assigned to the guest's AP configuration */
@@ -2214,8 +2216,16 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
}
}
- vfio_ap_mdev_reset_queue(q);
- flush_work(&q->reset_work);
+ /*
+ * If the queue is not in the host's AP configuration, then resetting
+ * it will fail with response code 01, (APQN not valid); so, let's make
+ * sure it is in the host's config.
+ */
+ if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) &&
+ test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) {
+ vfio_ap_mdev_reset_queue(q);
+ flush_work(&q->reset_work);
+ }
done:
if (matrix_mdev)
--
2.43.0
When a queue is unbound from the vfio_ap device driver, if that queue is
assigned to a guest's AP configuration, its associated adapter is removed
because queues are defined to a guest via a matrix of adapters and
domains; so, it is not possible to remove a single queue.
If an adapter is removed from the guest's AP configuration, all associated
queues must be reset to prevent leaking crypto data should any of them be
assigned to a different guest or device driver. The one caveat is that if
the queue is being removed because the adapter or domain has been removed
from the host's AP configuration, then an attempt to reset the queue will
fail with response code 01, AP-queue number not valid; so resetting these
queues should be skipped.
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Fixes: 09d31ff78793 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 78 ++++++++++++++++---------------
1 file changed, 41 insertions(+), 37 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 11f8f0bcc7ed..e014108067dc 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -935,45 +935,45 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
AP_MKQID(apid, apqi));
}
+static void collect_queues_to_reset(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid,
+ struct list_head *qlist)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apqi;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ if (q)
+ list_add_tail(&q->reset_qnode, qlist);
+ }
+}
+
+static void reset_queues_for_apid(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ struct list_head qlist;
+
+ INIT_LIST_HEAD(&qlist);
+ collect_queues_to_reset(matrix_mdev, apid, &qlist);
+ vfio_ap_mdev_reset_qlist(&qlist);
+}
+
static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev,
unsigned long *apm_reset)
{
- struct vfio_ap_queue *q, *tmpq;
struct list_head qlist;
- unsigned long apid, apqi;
- int apqn, ret = 0;
+ unsigned long apid;
if (bitmap_empty(apm_reset, AP_DEVICES))
return 0;
INIT_LIST_HEAD(&qlist);
- for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) {
- for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
- AP_DOMAINS) {
- /*
- * If the domain is not in the host's AP configuration,
- * then resetting it will fail with response code 01
- * (APQN not valid).
- */
- if (!test_bit_inv(apqi,
- (unsigned long *)matrix_dev->info.aqm))
- continue;
-
- apqn = AP_MKQID(apid, apqi);
- q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
-
- if (q)
- list_add_tail(&q->reset_qnode, &qlist);
- }
- }
+ for_each_set_bit_inv(apid, apm_reset, AP_DEVICES)
+ collect_queues_to_reset(matrix_mdev, apid, &qlist);
- ret = vfio_ap_mdev_reset_qlist(&qlist);
-
- list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode)
- list_del(&q->reset_qnode);
-
- return ret;
+ return vfio_ap_mdev_reset_qlist(&qlist);
}
/**
@@ -2199,24 +2199,28 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
matrix_mdev = q->matrix_mdev;
if (matrix_mdev) {
- vfio_ap_unlink_queue_fr_mdev(q);
-
- apid = AP_QID_CARD(q->apqn);
- apqi = AP_QID_QUEUE(q->apqn);
-
- /*
- * If the queue is assigned to the guest's APCB, then remove
- * the adapter's APID from the APCB and hot it into the guest.
- */
+ /* If the queue is assigned to the guest's AP configuration */
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+ /*
+ * Since the queues are defined via a matrix of adapters
+ * and domains, it is not possible to hot unplug a
+ * single queue; so, let's unplug the adapter.
+ */
clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apid(matrix_mdev, apid);
+ goto done;
}
}
vfio_ap_mdev_reset_queue(q);
flush_work(&q->reset_work);
+
+done:
+ if (matrix_mdev)
+ vfio_ap_unlink_queue_fr_mdev(q);
+
dev_set_drvdata(&apdev->device, NULL);
kfree(q);
release_update_locks_for_mdev(matrix_mdev);
--
2.43.0
When filtering the adapters from the configuration profile for a guest to
create or update a guest's AP configuration, if the APID of an adapter and
the APQI of a domain identify a queue device that is not bound to the
vfio_ap device driver, the APID of the adapter will be filtered because an
individual APQN can not be filtered due to the fact the APQNs are assigned
to an AP configuration as a matrix of APIDs and APQIs. Consequently, a
guest will not have access to all of the queues associated with the
filtered adapter. If the queues are subsequently made available again to
the guest, they should re-appear in a reset state; so, let's make sure all
queues associated with an adapter unplugged from the guest are reset.
In order to identify the set of queues that need to be reset, let's allow a
vfio_ap_queue object to be simultaneously stored in both a hashtable and a
list: A hashtable used to store all of the queues assigned
to a matrix mdev; and/or, a list used to store a subset of the queues that
need to be reset. For example, when an adapter is hot unplugged from a
guest, all guest queues associated with that adapter must be reset. Since
that may be a subset of those assigned to the matrix mdev, they can be
stored in a list that can be passed to the vfio_ap_mdev_reset_queues
function.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 171 +++++++++++++++++++-------
drivers/s390/crypto/vfio_ap_private.h | 11 +-
2 files changed, 133 insertions(+), 49 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 26bd4aca497a..11f8f0bcc7ed 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -32,7 +32,8 @@
#define AP_RESET_INTERVAL 20 /* Reset sleep interval (20ms) */
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
+static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev);
+static int vfio_ap_mdev_reset_qlist(struct list_head *qlist);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
static void vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
@@ -661,16 +662,23 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
* device driver.
*
* @matrix_mdev: the matrix mdev whose matrix is to be filtered.
+ * @apm_filtered: a 256-bit bitmap for storing the APIDs filtered from the
+ * guest's AP configuration that are still in the host's AP
+ * configuration.
*
* Note: If an APQN referencing a queue device that is not bound to the vfio_ap
* driver, its APID will be filtered from the guest's APCB. The matrix
* structure precludes filtering an individual APQN, so its APID will be
- * filtered.
+ * filtered. Consequently, all queues associated with the adapter that
+ * are in the host's AP configuration must be reset. If queues are
+ * subsequently made available again to the guest, they should re-appear
+ * in a reset state
*
* Return: a boolean value indicating whether the KVM guest's APCB was changed
* by the filtering or not.
*/
-static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
+static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apm_filtered)
{
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES);
@@ -680,6 +688,7 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
bitmap_copy(prev_shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
bitmap_copy(prev_shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+ bitmap_clear(apm_filtered, 0, AP_DEVICES);
/*
* Copy the adapters, domains and control domains to the shadow_apcb
@@ -705,8 +714,16 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
apqn = AP_MKQID(apid, apqi);
q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q || q->reset_status.response_code) {
- clear_bit_inv(apid,
- matrix_mdev->shadow_apcb.apm);
+ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+ /*
+ * If the adapter was previously plugged into
+ * the guest, let's let the caller know that
+ * the APID was filtered.
+ */
+ if (test_bit_inv(apid, prev_shadow_apm))
+ set_bit_inv(apid, apm_filtered);
+
break;
}
}
@@ -808,7 +825,7 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)
mutex_lock(&matrix_dev->guests_lock);
mutex_lock(&matrix_dev->mdevs_lock);
- vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ vfio_ap_mdev_reset_queues(matrix_mdev);
vfio_ap_mdev_unlink_fr_queues(matrix_mdev);
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->mdevs_lock);
@@ -918,6 +935,47 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
AP_MKQID(apid, apqi));
}
+static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apm_reset)
+{
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
+ unsigned long apid, apqi;
+ int apqn, ret = 0;
+
+ if (bitmap_empty(apm_reset, AP_DEVICES))
+ return 0;
+
+ INIT_LIST_HEAD(&qlist);
+
+ for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the domain is not in the host's AP configuration,
+ * then resetting it will fail with response code 01
+ * (APQN not valid).
+ */
+ if (!test_bit_inv(apqi,
+ (unsigned long *)matrix_dev->info.aqm))
+ continue;
+
+ apqn = AP_MKQID(apid, apqi);
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
+
+ if (q)
+ list_add_tail(&q->reset_qnode, &qlist);
+ }
+ }
+
+ ret = vfio_ap_mdev_reset_qlist(&qlist);
+
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode)
+ list_del(&q->reset_qnode);
+
+ return ret;
+}
+
/**
* assign_adapter_store - parses the APID from @buf and sets the
* corresponding bit in the mediated matrix device's APM
@@ -958,6 +1016,7 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -987,8 +1046,10 @@ static ssize_t assign_adapter_store(struct device *dev,
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
ret = count;
done:
@@ -1019,11 +1080,12 @@ static struct vfio_ap_queue
* adapter was assigned.
* @matrix_mdev: the matrix mediated device to which the adapter was assigned.
* @apid: the APID of the unassigned adapter.
- * @qtable: table for storing queues associated with unassigned adapter.
+ * @qlist: list for storing queues associated with unassigned adapter that
+ * need to be reset.
*/
static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid,
- struct ap_queue_table *qtable)
+ struct list_head *qlist)
{
unsigned long apqi;
struct vfio_ap_queue *q;
@@ -1031,11 +1093,10 @@ static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) {
+ if (q && qlist) {
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
- hash_add(qtable->queues, &q->mdev_qnode,
- q->apqn);
+ list_add_tail(&q->reset_qnode, qlist);
}
}
}
@@ -1043,26 +1104,23 @@ static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid)
{
- int loop_cursor;
- struct vfio_ap_queue *q;
- struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL);
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
- hash_init(qtable->queues);
- vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, qtable);
+ INIT_LIST_HEAD(&qlist);
+ vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qlist);
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
- vfio_ap_mdev_reset_queues(qtable);
+ vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) {
vfio_ap_unlink_mdev_fr_queue(q);
- hash_del(&q->mdev_qnode);
+ list_del(&q->reset_qnode);
}
-
- kfree(qtable);
}
/**
@@ -1163,6 +1221,7 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -1192,8 +1251,10 @@ static ssize_t assign_domain_store(struct device *dev,
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
ret = count;
done:
@@ -1206,7 +1267,7 @@ static DEVICE_ATTR_WO(assign_domain);
static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi,
- struct ap_queue_table *qtable)
+ struct list_head *qlist)
{
unsigned long apid;
struct vfio_ap_queue *q;
@@ -1214,11 +1275,10 @@ static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) {
+ if (q && qlist) {
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
- hash_add(qtable->queues, &q->mdev_qnode,
- q->apqn);
+ list_add_tail(&q->reset_qnode, qlist);
}
}
}
@@ -1226,26 +1286,23 @@ static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi)
{
- int loop_cursor;
- struct vfio_ap_queue *q;
- struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL);
+ struct vfio_ap_queue *q, *tmpq;
+ struct list_head qlist;
- hash_init(qtable->queues);
- vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, qtable);
+ INIT_LIST_HEAD(&qlist);
+ vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qlist);
if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
- vfio_ap_mdev_reset_queues(qtable);
+ vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) {
vfio_ap_unlink_mdev_fr_queue(q);
- hash_del(&q->mdev_qnode);
+ list_del(&q->reset_qnode);
}
-
- kfree(qtable);
}
/**
@@ -1600,7 +1657,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
get_update_locks_for_kvm(kvm);
kvm_arch_crypto_clear_masks(kvm);
- vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ vfio_ap_mdev_reset_queues(matrix_mdev);
kvm_put_kvm(kvm);
matrix_mdev->kvm = NULL;
@@ -1736,15 +1793,33 @@ static void vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
}
}
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable)
+static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
{
int ret = 0, loop_cursor;
struct vfio_ap_queue *q;
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode)
+ hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode)
vfio_ap_mdev_reset_queue(q);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
+ hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode) {
+ flush_work(&q->reset_work);
+
+ if (q->reset_status.response_code)
+ ret = -EIO;
+ }
+
+ return ret;
+}
+
+static int vfio_ap_mdev_reset_qlist(struct list_head *qlist)
+{
+ int ret = 0;
+ struct vfio_ap_queue *q;
+
+ list_for_each_entry(q, qlist, reset_qnode)
+ vfio_ap_mdev_reset_queue(q);
+
+ list_for_each_entry(q, qlist, reset_qnode) {
flush_work(&q->reset_work);
if (q->reset_status.response_code)
@@ -1930,7 +2005,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
- ret = vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
+ ret = vfio_ap_mdev_reset_queues(matrix_mdev);
break;
case VFIO_DEVICE_GET_IRQ_INFO:
ret = vfio_ap_get_irq_info(arg);
@@ -2062,6 +2137,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
int ret;
struct vfio_ap_queue *q;
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev;
ret = sysfs_create_group(&apdev->device.kobj, &vfio_queue_attr_group);
@@ -2094,15 +2170,17 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
!bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS))
goto done;
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) {
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+ }
}
done:
dev_set_drvdata(&apdev->device, q);
release_update_locks_for_mdev(matrix_mdev);
- return 0;
+ return ret;
err_remove_group:
sysfs_remove_group(&apdev->device.kobj, &vfio_queue_attr_group);
@@ -2446,6 +2524,7 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
{
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock);
@@ -2459,7 +2538,7 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
matrix_mdev->adm_add, AP_DOMAINS);
if (filter_adapters || filter_domains)
- do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
+ do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered);
if (filter_cdoms)
do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
@@ -2467,6 +2546,8 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
if (do_hotplug)
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered);
+
mutex_unlock(&matrix_dev->mdevs_lock);
mutex_unlock(&matrix_mdev->kvm->lock);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 88aff8b81f2f..20eac8b0f0b9 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -83,10 +83,10 @@ struct ap_matrix {
};
/**
- * struct ap_queue_table - a table of queue objects.
- *
- * @queues: a hashtable of queues (struct vfio_ap_queue).
- */
+ * struct ap_queue_table - a table of queue objects.
+ *
+ * @queues: a hashtable of queues (struct vfio_ap_queue).
+ */
struct ap_queue_table {
DECLARE_HASHTABLE(queues, 8);
};
@@ -133,6 +133,8 @@ struct ap_matrix_mdev {
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
* @mdev_qnode: allows the vfio_ap_queue struct to be added to a hashtable
+ * @reset_qnode: allows the vfio_ap_queue struct to be added to a list of queues
+ * that need to be reset
* @reset_status: the status from the last reset of the queue
* @reset_work: work to wait for queue reset to complete
*/
@@ -143,6 +145,7 @@ struct vfio_ap_queue {
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
struct hlist_node mdev_qnode;
+ struct list_head reset_qnode;
struct ap_queue_status reset_status;
struct work_struct reset_work;
};
--
2.43.0
While filtering the mdev matrix, it doesn't make sense - and will have
unexpected results - to filter an APID from the matrix if the APID or one
of the associated APQIs is not in the host's AP configuration. There are
two reasons for this:
1. An adapter or domain that is not in the host's AP configuration can be
assigned to the matrix; this is known as over-provisioning. Queue
devices, however, are only created for adapters and domains in the
host's AP configuration, so there will be no queues associated with an
over-provisioned adapter or domain to filter.
2. The adapter or domain may have been externally removed from the host's
configuration via an SE or HMC attached to a DPM enabled LPAR. In this
case, the vfio_ap device driver would have been notified by the AP bus
via the on_config_changed callback and the adapter or domain would
have already been filtered.
Since the matrix_mdev->shadow_apcb.apm and matrix_mdev->shadow_apcb.aqm are
copied from the mdev matrix sans the APIDs and APQIs not in the host's AP
configuration, let's loop over those bitmaps instead of those assigned to
the matrix.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9382b32e5bd1..47232e19a50e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -691,8 +691,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
(unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS) {
/*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
--
2.43.0
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or
domain is assigned to the mdev. The purpose of the function is to update
the guest's AP configuration by filtering the matrix of adapters and
domains assigned to the mdev. When an adapter or domain is assigned, only
the APQNs associated with the APID of the new adapter or APQI of the new
domain are inspected. If an APQN does not reference a queue device bound to
the vfio_ap device driver, then it's APID will be filtered from the mdev's
matrix when updating the guest's AP configuration.
Inspecting only the APID of the new adapter or APQI of the new domain will
result in passing AP queues through to a guest that are not bound to the
vfio_ap device driver under certain circumstances. Consider the following:
guest's AP configuration (all also assigned to the mdev's matrix):
14.0004
14.0005
14.0006
16.0004
16.0005
16.0006
unassign domain 4
unbind queue 16.0005
assign domain 4
When domain 4 is re-assigned, since only domain 4 will be inspected, the
APQNs that will be examined will be:
14.0004
16.0004
Since both of those APQNs reference queue devices that are bound to the
vfio_ap device driver, nothing will get filtered from the mdev's matrix
when updating the guest's AP configuration. Consequently, queue 16.0005
will get passed through despite not being bound to the driver. This
violates the linux device model requirement that a guest shall only be
given access to devices bound to the device driver facilitating their
pass-through.
To resolve this problem, every adapter and domain assigned to the mdev will
be inspected when filtering the mdev's matrix.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Acked-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 57 +++++++++----------------------
1 file changed, 17 insertions(+), 40 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4db538a55192..9382b32e5bd1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -670,8 +670,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
* Return: a boolean value indicating whether the KVM guest's APCB was changed
* by the filtering or not.
*/
-static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
- struct ap_matrix_mdev *matrix_mdev)
+static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
{
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES);
@@ -692,8 +691,8 @@ static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
(unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, apm, AP_DEVICES) {
- for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
/*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
@@ -958,7 +957,6 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
- DECLARE_BITMAP(apm_delta, AP_DEVICES);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -987,11 +985,8 @@ static ssize_t assign_adapter_store(struct device *dev,
}
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- memset(apm_delta, 0, sizeof(apm_delta));
- set_bit_inv(apid, apm_delta);
- if (vfio_ap_mdev_filter_matrix(apm_delta,
- matrix_mdev->matrix.aqm, matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count;
@@ -1167,7 +1162,6 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
- DECLARE_BITMAP(aqm_delta, AP_DOMAINS);
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex);
@@ -1196,11 +1190,8 @@ static ssize_t assign_domain_store(struct device *dev,
}
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- memset(aqm_delta, 0, sizeof(aqm_delta));
- set_bit_inv(apqi, aqm_delta);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, aqm_delta,
- matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count;
@@ -2091,9 +2082,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
if (matrix_mdev) {
vfio_ap_mdev_link_queue(matrix_mdev, q);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev))
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
}
dev_set_drvdata(&apdev->device, q);
@@ -2443,34 +2432,22 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev)
{
- bool do_hotplug = false;
- int filter_domains = 0;
- int filter_adapters = 0;
- DECLARE_BITMAP(apm, AP_DEVICES);
- DECLARE_BITMAP(aqm, AP_DOMAINS);
+ bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock);
mutex_lock(&matrix_dev->mdevs_lock);
- filter_adapters = bitmap_and(apm, matrix_mdev->matrix.apm,
- matrix_mdev->apm_add, AP_DEVICES);
- filter_domains = bitmap_and(aqm, matrix_mdev->matrix.aqm,
- matrix_mdev->aqm_add, AP_DOMAINS);
-
- if (filter_adapters && filter_domains)
- do_hotplug |= vfio_ap_mdev_filter_matrix(apm, aqm, matrix_mdev);
- else if (filter_adapters)
- do_hotplug |=
- vfio_ap_mdev_filter_matrix(apm,
- matrix_mdev->shadow_apcb.aqm,
- matrix_mdev);
- else
- do_hotplug |=
- vfio_ap_mdev_filter_matrix(matrix_mdev->shadow_apcb.apm,
- aqm, matrix_mdev);
+ filter_adapters = bitmap_intersects(matrix_mdev->matrix.apm,
+ matrix_mdev->apm_add, AP_DEVICES);
+ filter_domains = bitmap_intersects(matrix_mdev->matrix.aqm,
+ matrix_mdev->aqm_add, AP_DOMAINS);
+ filter_cdoms = bitmap_intersects(matrix_mdev->matrix.adm,
+ matrix_mdev->adm_add, AP_DOMAINS);
+
+ if (filter_adapters || filter_domains)
+ do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
- if (bitmap_intersects(matrix_mdev->matrix.adm, matrix_mdev->adm_add,
- AP_DOMAINS))
+ if (filter_cdoms)
do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
if (do_hotplug)
--
2.43.0
When a queue is unbound from the vfio_ap device driver, it is reset to
ensure its crypto data is not leaked when it is bound to another device
driver. If the queue is unbound due to the fact that the adapter or domain
was removed from the host's AP configuration, then attempting to reset it
will fail with response code 01 (APID not valid) getting returned from the
reset command. Let's ensure that the queue is assigned to the host's
configuration before resetting it.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification")
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e014108067dc..84decb0d5c97 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2197,6 +2197,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
q = dev_get_drvdata(&apdev->device);
get_update_locks_for_queue(q);
matrix_mdev = q->matrix_mdev;
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) {
/* If the queue is assigned to the guest's AP configuration */
@@ -2214,8 +2216,16 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
}
}
- vfio_ap_mdev_reset_queue(q);
- flush_work(&q->reset_work);
+ /*
+ * If the queue is not in the host's AP configuration, then resetting
+ * it will fail with response code 01, (APQN not valid); so, let's make
+ * sure it is in the host's config.
+ */
+ if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) &&
+ test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) {
+ vfio_ap_mdev_reset_queue(q);
+ flush_work(&q->reset_work);
+ }
done:
if (matrix_mdev)
--
2.43.0
IPQ6018 has 32 tcsr_mutex hwlock registers with stride 0x1000.
The compatible string qcom,ipq6018-tcsr-mutex is mapped to
of_msm8226_tcsr_mutex which has 32 locks configured with stride of 0x80
and doesn't match the HW present in IPQ6018.
Remove IPQ6018 specific compatible string so that it fallsback to
of_tcsr_mutex data which maps to the correct configuration for IPQ6018.
Cc: stable(a)vger.kernel.org
Fixes: 5d4753f741d8 ("hwspinlock: qcom: add support for MMIO on older SoCs")
Signed-off-by: Vignesh Viswanathan <quic_viswanat(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
---
This patch was already posted [2] and applied [3], but missing in the
linux-next TIP. Resending with r-b tags so that it can be picked up
again.
[2] https://lore.kernel.org/all/20230905095535.1263113-3-quic_viswanat@quicinc.…
[3] https://lore.kernel.org/all/169522934567.2501390.1112201061322984444.b4-ty@…
Changes in v2:
- Updated commit message
- Added Fixes and stable tags
drivers/hwspinlock/qcom_hwspinlock.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/hwspinlock/qcom_hwspinlock.c b/drivers/hwspinlock/qcom_hwspinlock.c
index a0fd67fd2934..814dfe8697bf 100644
--- a/drivers/hwspinlock/qcom_hwspinlock.c
+++ b/drivers/hwspinlock/qcom_hwspinlock.c
@@ -115,7 +115,6 @@ static const struct of_device_id qcom_hwspinlock_of_match[] = {
{ .compatible = "qcom,sfpb-mutex", .data = &of_sfpb_mutex },
{ .compatible = "qcom,tcsr-mutex", .data = &of_tcsr_mutex },
{ .compatible = "qcom,apq8084-tcsr-mutex", .data = &of_msm8226_tcsr_mutex },
- { .compatible = "qcom,ipq6018-tcsr-mutex", .data = &of_msm8226_tcsr_mutex },
{ .compatible = "qcom,msm8226-tcsr-mutex", .data = &of_msm8226_tcsr_mutex },
{ .compatible = "qcom,msm8974-tcsr-mutex", .data = &of_msm8226_tcsr_mutex },
{ .compatible = "qcom,msm8994-tcsr-mutex", .data = &of_msm8226_tcsr_mutex },
--
2.41.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b8bd342d50cbf606666488488f9fea374aceb2d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091601-spotted-untie-0ba4@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b8bd342d50cb ("fuse: nlookup missing decrement in fuse_direntplus_link")
d123d8e1833c ("fuse: split out readdir.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b8bd342d50cbf606666488488f9fea374aceb2d5 Mon Sep 17 00:00:00 2001
From: ruanmeisi <ruan.meisi(a)zte.com.cn>
Date: Tue, 25 Apr 2023 19:13:54 +0800
Subject: [PATCH] fuse: nlookup missing decrement in fuse_direntplus_link
During our debugging of glusterfs, we found an Assertion failed error:
inode_lookup >= nlookup, which was caused by the nlookup value in the
kernel being greater than that in the FUSE file system.
The issue was introduced by fuse_direntplus_link, where in the function,
fuse_iget increments nlookup, and if d_splice_alias returns failure,
fuse_direntplus_link returns failure without decrementing nlookup
https://github.com/gluster/glusterfs/pull/4081
Signed-off-by: ruanmeisi <ruan.meisi(a)zte.com.cn>
Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
Cc: <stable(a)vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index dc603479b30e..b3d498163f97 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -243,8 +243,16 @@ static int fuse_direntplus_link(struct file *file,
dput(dentry);
dentry = alias;
}
- if (IS_ERR(dentry))
+ if (IS_ERR(dentry)) {
+ if (!IS_ERR(inode)) {
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fi->nlookup--;
+ spin_unlock(&fi->lock);
+ }
return PTR_ERR(dentry);
+ }
}
if (fc->readdirplus_auto)
set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
The bug `KASAN: slab-use-after-free in qxl_mode_dumb_create` is reproduced
on 5.10 stable branch.
The problem has been fixed by the following patch which can be cleanly
applied to 5.10. The fix is already included in all stable branches
starting from 5.15.
Link to the "failed to apply to 5.10" report [1].
[1]: https://lore.kernel.org/stable/2023082121-mumps-residency-9108@gregkh/
This is the start of the stable review cycle for the 5.15.142 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 07 Dec 2023 03:14:57 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.142-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.142-rc1
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix deadlock on RTL8125 in jumbo mtu mode
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: disable ASPM in case of tx timeout
Wenchao Chen <wenchao.chen(a)unisoc.com>
mmc: sdhci-sprd: Fix vqmmc not shutting down after the card was pulled
Heiner Kallweit <hkallweit1(a)gmail.com>
mmc: core: add helpers mmc_regulator_enable/disable_vqmmc
Lu Baolu <baolu.lu(a)linux.intel.com>
iommu/vt-d: Make context clearing consistent with context mapping
Lu Baolu <baolu.lu(a)linux.intel.com>
iommu/vt-d: Omit devTLB invalidation requests when TES=0
Christoph Niedermaier <cniedermaier(a)dh-electronics.com>
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
Christoph Niedermaier <cniedermaier(a)dh-electronics.com>
cpufreq: imx6q: don't warn for disabling a non-existing frequency
Steve French <stfrench(a)microsoft.com>
smb3: fix caching of ctime on setxattr
Jeff Layton <jlayton(a)kernel.org>
fs: add ctime accessors infrastructure
Helge Deller <deller(a)gmx.de>
fbdev: stifb: Make the STI next font pointer a 32-bit signed offset
Mark Hasemeyer <markhas(a)chromium.org>
ASoC: SOF: sof-pci-dev: Fix community key quirk detection
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: SOF: sof-pci-dev: don't use the community key on APL Chromebooks
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: SOF: sof-pci-dev: add parameter to override topology filename
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: SOF: sof-pci-dev: use community key on all Up boards
Hans de Goede <hdegoede(a)redhat.com>
ASoC: Intel: Move soc_intel_is_foo() helpers to a generic header
Steve French <stfrench(a)microsoft.com>
smb3: fix touch -h of symlink
Gaurav Batra <gbatra(a)linux.vnet.ibm.com>
powerpc/pseries/iommu: enable_ddw incorrectly returns direct mapping for SR-IOV device
Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
selftests/resctrl: Move _GNU_SOURCE define into Makefile
Shaopeng Tan <tan.shaopeng(a)jp.fujitsu.com>
selftests/resctrl: Add missing SPDX license to Makefile
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix async branch flags
Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
net: ravb: Stop DMA in case of failures on ravb_open()
Phil Edworthy <phil.edworthy(a)renesas.com>
ravb: Support separate Line0 (Desc), Line1 (Err) and Line2 (Mgmt) irqs
Phil Edworthy <phil.edworthy(a)renesas.com>
ravb: Separate handling of irq enable/disable regs into feature
Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
net: ravb: Start TX queues after HW initialization succeeded
Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
net: ravb: Use pm_runtime_resume_and_get()
Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
net: ravb: Check return value of reset_control_deassert()
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
ravb: Fix races between ravb_tx_timeout_work() and net related ops
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: prevent potential deadlock in rtl8169_close
Andrey Grodzovsky <andrey.grodzovsky(a)amd.com>
Revert "workqueue: remove unused cancel_work()"
Geetha sowjanya <gakula(a)marvell.com>
octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64
Furong Xu <0x1207(a)gmail.com>
net: stmmac: xgmac: Disable FPE MMC interrupts
Elena Salomatkina <elena.salomatkina.cmc(a)gmail.com>
octeontx2-af: Fix possible buffer overflow
Willem de Bruijn <willemb(a)google.com>
selftests/net: ipsec: fix constant out of range
Dmitry Antipov <dmantipov(a)yandex.ru>
uapi: propagate __struct_group() attributes to the container union
Ioana Ciornei <ioana.ciornei(a)nxp.com>
dpaa2-eth: increase the needed headroom to account for alignment
Zhengchao Shao <shaozhengchao(a)huawei.com>
ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet
Niklas Neronin <niklas.neronin(a)linux.intel.com>
usb: config: fix iteration issue in 'usb_get_bos_descriptor()'
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Change configuration warnings to notices
Haiyang Zhang <haiyangz(a)microsoft.com>
hv_netvsc: fix race of netvsc and VF register_netdevice
Patrick Wang <patrick.wang.shcn(a)gmail.com>
rcu: Avoid tracing a few functions executed in stop machine
Xin Long <lucien.xin(a)gmail.com>
vlan: move dev_put into vlan_dev_uninit
Xin Long <lucien.xin(a)gmail.com>
vlan: introduce vlan_dev_free_egress_priority
Max Nguyen <maxwell.nguyen(a)hp.com>
Input: xpad - add HyperX Clutch Gladiate Support
Filipe Manana <fdmanana(a)suse.com>
btrfs: make error messages more clear when getting a chunk map
Jann Horn <jannh(a)google.com>
btrfs: send: ensure send_fd is writable
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix off-by-one when checking chunk map includes logical address
Bragatheswaran Manickavel <bragathemanick0908(a)gmail.com>
btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()
Qu Wenruo <wqu(a)suse.com>
btrfs: add dmesg output for first mount and last unmount of a filesystem
Helge Deller <deller(a)gmx.de>
parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
Timothy Pearson <tpearson(a)raptorengineering.com>
powerpc: Don't clobber f0/vs0 during fp|altivec register save
Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim(a)intel.com>
iommu/vt-d: Add MTL to quirk list to skip TE disabling
Markus Weippert <markus(a)gekmihesg.de>
bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
Wu Bo <bo.wu(a)vivo.com>
dm verity: don't perform FEC for failed readahead IO
Mikulas Patocka <mpatocka(a)redhat.com>
dm-verity: align struct dm_verity_fec_io properly
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek: Add supported ALC257 for ChromeOS
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek: Headset Mic VREF to 100%
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Disable power-save on KONTRON SinglePC
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: block: Be sure to wait while busy in CQE error recovery
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: block: Do not lose cache flush during CQE error recovery
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: block: Retry commands in CQE error recovery
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: cqhci: Fix task clearing in CQE error recovery
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: cqhci: Warn of halt or task clear failure
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: cqhci: Increase recovery halt timeout
Yang Yingliang <yangyingliang(a)huawei.com>
firewire: core: fix possible memory leak in create_units()
Maria Yu <quic_aiquny(a)quicinc.com>
pinctrl: avoid reload of p state in list iteration
Adrian Hunter <adrian.hunter(a)intel.com>
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
-------------
Diffstat:
Makefile | 4 +-
arch/parisc/include/uapi/asm/errno.h | 2 -
arch/powerpc/kernel/fpu.S | 13 ++++
arch/powerpc/kernel/vector.S | 2 +
arch/powerpc/platforms/pseries/iommu.c | 8 +-
drivers/cpufreq/imx6q-cpufreq.c | 32 ++++----
drivers/firewire/core-device.c | 11 +--
drivers/input/joystick/xpad.c | 2 +
drivers/iommu/intel/dmar.c | 18 +++++
drivers/iommu/intel/iommu.c | 6 +-
drivers/md/bcache/btree.c | 2 +-
drivers/md/dm-verity-fec.c | 3 +-
drivers/md/dm-verity-target.c | 4 +-
drivers/md/dm-verity.h | 6 --
drivers/mmc/core/block.c | 2 +
drivers/mmc/core/core.c | 9 ++-
drivers/mmc/core/regulator.c | 41 ++++++++++
drivers/mmc/host/cqhci-core.c | 44 +++++------
drivers/mmc/host/sdhci-sprd.c | 25 ++++++
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 8 +-
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 2 +-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 4 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 7 +-
drivers/net/ethernet/realtek/r8169_main.c | 23 +++++-
drivers/net/ethernet/renesas/ravb.h | 4 +
drivers/net/ethernet/renesas/ravb_main.c | 91 ++++++++++++++++++----
drivers/net/ethernet/renesas/ravb_ptp.c | 6 +-
drivers/net/ethernet/stmicro/stmmac/mmc_core.c | 4 +
drivers/net/hyperv/netvsc_drv.c | 25 +++---
drivers/pinctrl/core.c | 6 +-
drivers/usb/core/config.c | 85 ++++++++++----------
drivers/video/fbdev/sticore.h | 2 +-
fs/btrfs/disk-io.c | 1 +
fs/btrfs/ref-verify.c | 2 +
fs/btrfs/send.c | 2 +-
fs/btrfs/super.c | 5 +-
fs/btrfs/volumes.c | 9 ++-
fs/cifs/cifsfs.c | 1 +
fs/cifs/xattr.c | 5 +-
fs/inode.c | 16 ++++
include/linux/fs.h | 45 ++++++++++-
include/linux/mmc/host.h | 3 +
include/linux/platform_data/x86/soc.h | 65 ++++++++++++++++
include/linux/workqueue.h | 1 +
include/uapi/linux/stddef.h | 2 +-
kernel/rcu/tree_plugin.h | 8 +-
kernel/workqueue.c | 9 +++
lib/errname.c | 6 --
net/8021q/vlan.h | 2 +-
net/8021q/vlan_dev.c | 15 +++-
net/8021q/vlan_netlink.c | 7 +-
net/ipv4/igmp.c | 6 +-
sound/pci/hda/hda_intel.c | 2 +
sound/pci/hda/patch_realtek.c | 12 +++
sound/soc/intel/common/soc-intel-quirks.h | 51 +-----------
sound/soc/sof/sof-pci-dev.c | 62 +++++++++++----
tools/arch/parisc/include/uapi/asm/errno.h | 2 -
tools/perf/util/genelf.h | 4 +-
tools/perf/util/intel-pt.c | 2 +
tools/testing/selftests/net/ipsec.c | 4 +-
tools/testing/selftests/resctrl/Makefile | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 1 -
62 files changed, 606 insertions(+), 249 deletions(-)
In min_key_size_set():
if (val > hdev->le_max_key_size || val < SMP_MIN_ENC_KEY_SIZE)
return -EINVAL;
hci_dev_lock(hdev);
hdev->le_min_key_size = val;
hci_dev_unlock(hdev);
In max_key_size_set():
if (val > SMP_MAX_ENC_KEY_SIZE || val < hdev->le_min_key_size)
return -EINVAL;
hci_dev_lock(hdev);
hdev->le_max_key_size = val;
hci_dev_unlock(hdev);
The atomicity violation occurs due to concurrent execution of set_min and
set_max funcs.Consider a scenario where setmin writes a new, valid 'min'
value, and concurrently, setmax writes a value that is greater than the
old 'min' but smaller than the new 'min'. In this case, setmax might check
against the old 'min' value (before acquiring the lock) but write its
value after the 'min' has been updated by setmin. This leads to a
situation where the 'max' value ends up being smaller than the 'min'
value, which is an inconsistency.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To resolve this issue, it is suggested to encompass the validity checks
within the locked sections in both set_min and set_max funcs. The
modification ensures that the validation of 'val' against the
current min/max values is atomic, thus maintaining the integrity of the
settings. With this patch applied, our tool no longer reports the bug,
with the kernel configuration allyesconfig for x86_64. Due to the lack of
associated hardware, we cannot test the patch in runtime testing, and just
verify it according to the code logic.
[1] https://sites.google.com/view/basscheck/
Fixes: 18f81241b74f ("Bluetooth: Move {min,max}_key_size debugfs ...")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
v2:
* Adjust the format to pass the CI.
---
net/bluetooth/hci_debugfs.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/bluetooth/hci_debugfs.c b/net/bluetooth/hci_debugfs.c
index 6b7741f6e95b..3ffbf3f25363 100644
--- a/net/bluetooth/hci_debugfs.c
+++ b/net/bluetooth/hci_debugfs.c
@@ -1045,11 +1045,13 @@ DEFINE_DEBUGFS_ATTRIBUTE(adv_max_interval_fops, adv_max_interval_get,
static int min_key_size_set(void *data, u64 val)
{
struct hci_dev *hdev = data;
-
- if (val > hdev->le_max_key_size || val < SMP_MIN_ENC_KEY_SIZE)
+
+ hci_dev_lock(hdev);
+ if (val > hdev->le_max_key_size || val < SMP_MIN_ENC_KEY_SIZE) {
+ hci_dev_unlock(hdev);
return -EINVAL;
+ }
- hci_dev_lock(hdev);
hdev->le_min_key_size = val;
hci_dev_unlock(hdev);
@@ -1073,11 +1075,13 @@ DEFINE_DEBUGFS_ATTRIBUTE(min_key_size_fops, min_key_size_get,
static int max_key_size_set(void *data, u64 val)
{
struct hci_dev *hdev = data;
-
- if (val > SMP_MAX_ENC_KEY_SIZE || val < hdev->le_min_key_size)
+
+ hci_dev_lock(hdev);
+ if (val > SMP_MAX_ENC_KEY_SIZE || val < hdev->le_min_key_size) {
+ hci_dev_unlock(hdev);
return -EINVAL;
+ }
- hci_dev_lock(hdev);
hdev->le_max_key_size = val;
hci_dev_unlock(hdev);
--
2.34.1
vmw_context_cotable can return either an error or a null pointer and its
usage sometimes went unchecked. Subsequent code would then try to access
either a null pointer or an error value.
The invalid dereferences were only possible with malformed userspace
apps which never properly initialized the rendering contexts.
Check the results of vmw_context_cotable to fix the invalid derefs.
Thanks:
ziming zhang(@ezrak1e) from Ant Group Light-Year Security Lab
who was the first person to discover it.
Niels De Graef who reported it and helped to track down the poc.
Fixes: 9c079b8ce8bf ("drm/vmwgfx: Adapt execbuf to the new validation api")
Cc: <stable(a)vger.kernel.org> # v4.20+
Reported-by: Niels De Graef <ndegraef(a)redhat.com>
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Cc: Martin Krastev <martin.krastev(a)broadcom.com>
Cc: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Cc: Ian Forbes <ian.forbes(a)broadcom.com>
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
---
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 272141b6164c..4f09959d27ba 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -447,7 +447,7 @@ static int vmw_resource_context_res_add(struct vmw_private *dev_priv,
vmw_res_type(ctx) == vmw_res_dx_context) {
for (i = 0; i < cotable_max; ++i) {
res = vmw_context_cotable(ctx, i);
- if (IS_ERR(res))
+ if (IS_ERR_OR_NULL(res))
continue;
ret = vmw_execbuf_res_val_add(sw_context, res,
@@ -1266,6 +1266,8 @@ static int vmw_cmd_dx_define_query(struct vmw_private *dev_priv,
return -EINVAL;
cotable_res = vmw_context_cotable(ctx_node->ctx, SVGA_COTABLE_DXQUERY);
+ if (IS_ERR_OR_NULL(cotable_res))
+ return cotable_res ? PTR_ERR(cotable_res) : -EINVAL;
ret = vmw_cotable_notify(cotable_res, cmd->body.queryId);
return ret;
@@ -2484,6 +2486,8 @@ static int vmw_cmd_dx_view_define(struct vmw_private *dev_priv,
return ret;
res = vmw_context_cotable(ctx_node->ctx, vmw_view_cotables[view_type]);
+ if (IS_ERR_OR_NULL(res))
+ return res ? PTR_ERR(res) : -EINVAL;
ret = vmw_cotable_notify(res, cmd->defined_id);
if (unlikely(ret != 0))
return ret;
@@ -2569,8 +2573,8 @@ static int vmw_cmd_dx_so_define(struct vmw_private *dev_priv,
so_type = vmw_so_cmd_to_type(header->id);
res = vmw_context_cotable(ctx_node->ctx, vmw_so_cotables[so_type]);
- if (IS_ERR(res))
- return PTR_ERR(res);
+ if (IS_ERR_OR_NULL(res))
+ return res ? PTR_ERR(res) : -EINVAL;
cmd = container_of(header, typeof(*cmd), header);
ret = vmw_cotable_notify(res, cmd->defined_id);
@@ -2689,6 +2693,8 @@ static int vmw_cmd_dx_define_shader(struct vmw_private *dev_priv,
return -EINVAL;
res = vmw_context_cotable(ctx_node->ctx, SVGA_COTABLE_DXSHADER);
+ if (IS_ERR_OR_NULL(res))
+ return res ? PTR_ERR(res) : -EINVAL;
ret = vmw_cotable_notify(res, cmd->body.shaderId);
if (ret)
return ret;
@@ -3010,6 +3016,8 @@ static int vmw_cmd_dx_define_streamoutput(struct vmw_private *dev_priv,
}
res = vmw_context_cotable(ctx_node->ctx, SVGA_COTABLE_STREAMOUTPUT);
+ if (IS_ERR_OR_NULL(res))
+ return res ? PTR_ERR(res) : -EINVAL;
ret = vmw_cotable_notify(res, cmd->body.soid);
if (ret)
return ret;
--
2.40.1
In xc4000_get_frequency():
*freq = priv->freq_hz + priv->freq_offset;
The code accesses priv->freq_hz and priv->freq_offset without holding any
lock.
In xc4000_set_params():
// Code that updates priv->freq_hz and priv->freq_offset
...
xc4000_get_frequency() and xc4000_set_params() may execute concurrently,
risking inconsistent reads of priv->freq_hz and priv->freq_offset. Since
these related data may update during reading, it can result in incorrect
frequency calculation, leading to atomicity violations.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 6.2.
To address this issue, it is proposed to add a mutex lock pair in
xc4000_get_frequency() to ensure atomicity. With this patch applied, our
tool no longer reports the possible bug, with the kernel configuration
allyesconfig for x86_64. Due to the lack of associated hardware, we cannot
test the patch in runtime testing, and just verify it according to the
code logic.
[1] https://sites.google.com/view/basscheck/
Fixes: 4c07e32884ab6 ("[media] xc4000: Fix get_frequency()")
Cc: stable(a)vger.kernel.org
Reported-by: BassCheck <bass(a)buaa.edu.cn>
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
v2:
* In this patch v2, we've added some information of the static analysis
tool used, as per the researcher guidelines. Also, we've added a cc in the
signed-off-by area, according to the stable-kernel-rules.
Thank Greg KH for helpful advice.
---
drivers/media/tuners/xc4000.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/tuners/xc4000.c b/drivers/media/tuners/xc4000.c
index 57ded9ff3f04..29bc63021c5a 100644
--- a/drivers/media/tuners/xc4000.c
+++ b/drivers/media/tuners/xc4000.c
@@ -1515,10 +1515,10 @@ static int xc4000_get_frequency(struct dvb_frontend *fe, u32 *freq)
{
struct xc4000_priv *priv = fe->tuner_priv;
+ mutex_lock(&priv->lock);
*freq = priv->freq_hz + priv->freq_offset;
if (debug) {
- mutex_lock(&priv->lock);
if ((priv->cur_fw.type
& (BASE | FM | DTV6 | DTV7 | DTV78 | DTV8)) == BASE) {
u16 snr = 0;
@@ -1529,8 +1529,8 @@ static int xc4000_get_frequency(struct dvb_frontend *fe, u32 *freq)
return 0;
}
}
- mutex_unlock(&priv->lock);
}
+ mutex_unlock(&priv->lock);
dprintk(1, "%s()\n", __func__);
--
2.34.1
From: Wayne Lin <Wayne.Lin(a)amd.com>
[Why]
For usb4 connector, AUX transaction is handled by dmub utilizing a differnt
code path comparing to legacy DP connector. If the usb4 DP connector is
disconnected, AUX access will report EBUSY and cause igt@kms_dp_aux_dev
fail.
[How]
Align the error code with the one reported by legacy DP as EIO.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Acked-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index d3966ce3dc91..e3915c4f8566 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -978,6 +978,11 @@ int dm_helper_dmub_aux_transfer_sync(
struct aux_payload *payload,
enum aux_return_code_type *operation_result)
{
+ if (!link->hpd_status) {
+ *operation_result = AUX_RET_ERROR_HPD_DISCON;
+ return -1;
+ }
+
return amdgpu_dm_process_dmub_aux_transfer_sync(ctx, link->link_index, payload,
operation_result);
}
--
2.34.1
From: Ilya Bakoulin <ilya.bakoulin(a)amd.com>
[Why]
Not clearing the memory select bits prior to OPTC disable can cause DSC
corruption issues when attempting to reuse a memory instance for another
OPTC that enables ODM.
[How]
Clear the memory select bits prior to disabling an OPTC.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu(a)amd.com>
Acked-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Ilya Bakoulin <ilya.bakoulin(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c | 3 +++
drivers/gpu/drm/amd/display/dc/optc/dcn35/dcn35_optc.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c b/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c
index 1788eb29474b..823493543325 100644
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c
@@ -173,6 +173,9 @@ static bool optc32_disable_crtc(struct timing_generator *optc)
OPTC_SEG3_SRC_SEL, 0xf,
OPTC_NUM_OF_INPUT_SEGMENT, 0);
+ REG_UPDATE(OPTC_MEMORY_CONFIG,
+ OPTC_MEM_SEL, 0);
+
/* disable otg request until end of the first line
* in the vertical blank region
*/
diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn35/dcn35_optc.c b/drivers/gpu/drm/amd/display/dc/optc/dcn35/dcn35_optc.c
index 3d6c1b2c2b4d..5b1547508850 100644
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn35/dcn35_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/optc/dcn35/dcn35_optc.c
@@ -145,6 +145,9 @@ static bool optc35_disable_crtc(struct timing_generator *optc)
OPTC_SEG3_SRC_SEL, 0xf,
OPTC_NUM_OF_INPUT_SEGMENT, 0);
+ REG_UPDATE(OPTC_MEMORY_CONFIG,
+ OPTC_MEM_SEL, 0);
+
/* disable otg request until end of the first line
* in the vertical blank region
*/
--
2.34.1
The patch titled
Subject: selftests: mm: hugepage-vmemmap fails on 64K page size systems.
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-hugepage-vmemmap-fails-on-64k-page-size-systems.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.vnet.ibm.com>
Subject: selftests: mm: hugepage-vmemmap fails on 64K page size systems.
Date: Wed, 10 Jan 2024 14:03:35 +0530
The kernel sefltest mm/hugepage-vmemmap fails on architectures which has
different page size other than 4K. In hugepage-vmemmap page size used is
4k so the pfn calculation will go wrong on systems which has different
page size .The length of MAP_HUGETLB memory must be hugepage aligned but
in hugepage-vmemmap map length is 2M so this will not get aligned if the
system has differnet hugepage size.
Added psize() to get the page size and default_huge_page_size() to
get the default hugepage size at run time, hugepage-vmemmap test pass
on powerpc with 64K page size and x86 with 4K page size.
Result on powerpc without patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 0
Head page flags (100000000) is invalid
check_page_flags: Invalid argument
*#
Result on powerpc with patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 600
*#
Result on x86 with patch (page size 4K)
*# ./hugepage-vmemmap
Returned address is 0x7fc7c2c00000 whose pfn is 1dac00
*#
Link: https://lkml.kernel.org/r/3b3a3ae37ba21218481c482a872bbf7526031600.17048657…
Fixes: b147c89cd429 ("selftests: vm: add a hugetlb test case")
Signed-off-by: Donet Tom <donettom(a)linux.vnet.ibm.com>
Reported-by: Geetika Moolchandani (geetika(a)linux.ibm.com)
Tested-by: Geetika Moolchandani (geetika(a)linux.ibm.com)
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugepage-vmemmap.c | 29 +++++++++-------
1 file changed, 18 insertions(+), 11 deletions(-)
--- a/tools/testing/selftests/mm/hugepage-vmemmap.c~selftests-mm-hugepage-vmemmap-fails-on-64k-page-size-systems
+++ a/tools/testing/selftests/mm/hugepage-vmemmap.c
@@ -10,10 +10,7 @@
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
-
-#define MAP_LENGTH (2UL * 1024 * 1024)
-
-#define PAGE_SIZE 4096
+#include "vm_util.h"
#define PAGE_COMPOUND_HEAD (1UL << 15)
#define PAGE_COMPOUND_TAIL (1UL << 16)
@@ -39,6 +36,9 @@
#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
#endif
+static size_t pagesize;
+static size_t maplength;
+
static void write_bytes(char *addr, size_t length)
{
unsigned long i;
@@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *a
if (fd < 0)
return -1UL;
- lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+ lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET);
read(fd, &pagemap, sizeof(pagemap));
close(fd);
@@ -86,7 +86,7 @@ static int check_page_flags(unsigned lon
* this also verifies kernel has correctly set the fake page_head to tail
* while hugetlb_free_vmemmap is enabled.
*/
- for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+ for (i = 1; i < maplength / pagesize; i++) {
read(fd, &pageflags, sizeof(pageflags));
if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
(pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
@@ -106,18 +106,25 @@ int main(int argc, char **argv)
void *addr;
unsigned long pfn;
- addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+ pagesize = psize();
+ maplength = default_huge_page_size();
+ if (!maplength) {
+ printf("Unable to determine huge page size\n");
+ exit(1);
+ }
+
+ addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(1);
}
/* Trigger allocation of HugeTLB page. */
- write_bytes(addr, MAP_LENGTH);
+ write_bytes(addr, maplength);
pfn = virt_to_pfn(addr);
if (pfn == -1UL) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("virt_to_pfn");
exit(1);
}
@@ -125,13 +132,13 @@ int main(int argc, char **argv)
printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
if (check_page_flags(pfn) < 0) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("check_page_flags");
exit(1);
}
/* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
- if (munmap(addr, MAP_LENGTH)) {
+ if (munmap(addr, maplength)) {
perror("munmap");
exit(1);
}
_
Patches currently in -mm which might be from donettom(a)linux.vnet.ibm.com are
selftests-mm-hugepage-vmemmap-fails-on-64k-page-size-systems.patch
Add extra sanity check for btrfs_ioctl_defrag_range_args::flags.
This is not really to enhance fuzzing tests, but as a preparation for
future expansion on btrfs_ioctl_defrag_range_args.
In the future we're adding new members, allowing more fine tuning for
btrfs defrag.
Without the -ENONOTSUPP error, there would be no way to detect if the
kernel supports those new defrag features.
cc: stable(a)vger.kernel.org #4.14+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/ioctl.c | 4 ++++
include/uapi/linux/btrfs.h | 2 ++
2 files changed, 6 insertions(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a1743904202b..3a846b983b28 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2608,6 +2608,10 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
ret = -EFAULT;
goto out;
}
+ if (range.flags & ~BTRFS_DEFRAG_RANGE_FLAGS_SUPP) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
/* compression requires us to start the IO */
if ((range.flags & BTRFS_DEFRAG_RANGE_COMPRESS)) {
range.flags |= BTRFS_DEFRAG_RANGE_START_IO;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 7c29d82db9ee..48e9b7ffecf1 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -614,6 +614,8 @@ struct btrfs_ioctl_clone_range_args {
*/
#define BTRFS_DEFRAG_RANGE_COMPRESS 1
#define BTRFS_DEFRAG_RANGE_START_IO 2
+#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS |\
+ BTRFS_DEFRAG_RANGE_START_IO)
struct btrfs_ioctl_defrag_range_args {
/* start of the defrag operation */
__u64 start;
--
2.43.0
The patch titled
Subject: xfs: disable large folio support in xfile_create
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
xfs-disable-large-folio-support-in-xfile_create.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Christoph Hellwig <hch(a)lst.de>
Subject: xfs: disable large folio support in xfile_create
Date: Wed, 10 Jan 2024 10:21:09 +0100
The xfarray code will crash if large folios are force enabled using:
echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled
Fixing this will require a bit of an API change, and prefeably sorting out
the hwpoison story for pages vs folio and where it is placed in the shmem
API. For now use this one liner to disable large folios.
Link: https://lkml.kernel.org/r/20240110092109.1950011-3-hch@lst.de
Reported-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Cc: Chandan Babu R <chandan.babu(a)oracle.com>
Cc: Christian K��nig <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/xfs/scrub/xfile.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/fs/xfs/scrub/xfile.c~xfs-disable-large-folio-support-in-xfile_create
+++ a/fs/xfs/scrub/xfile.c
@@ -94,6 +94,11 @@ xfile_create(
lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
+ /*
+ * We're not quite ready for large folios yet.
+ */
+ mapping_clear_large_folios(inode->i_mapping);
+
trace_xfile_create(xf);
*xfilep = xf;
_
Patches currently in -mm which might be from hch(a)lst.de are
mm-add-a-mapping_clear_large_folios-helper.patch
xfs-disable-large-folio-support-in-xfile_create.patch
The patch titled
Subject: mm: add a mapping_clear_large_folios helper
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-add-a-mapping_clear_large_folios-helper.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Christoph Hellwig <hch(a)lst.de>
Subject: mm: add a mapping_clear_large_folios helper
Date: Wed, 10 Jan 2024 10:21:08 +0100
Patch series "disable large folios for shmem file used by xfs xfile".
Darrick reported that the fairly new XFS xfile code blows up when force
enabling large folio for shmem. This series fixes this quickly by
disabling large folios for this particular shmem file for now until it can
be fixed properly, which will be a lot more invasive.
This patch (of 2):
Users of shmem_kernel_file_setup might not be able to deal with large
folios (yet). Give them a way to disable large folio support on their
mapping.
Link: https://lkml.kernel.org/r/20240110092109.1950011-1-hch@lst.de
Link: https://lkml.kernel.org/r/20240110092109.1950011-2-hch@lst.de
Fixes: 137db333b2918 ("xfs: teach xfile to pass back direct-map pages to caller")
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Cc: Chandan Babu R <chandan.babu(a)oracle.com>
Cc: Christian K��nig <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: "Darrick J. Wong" <djwong(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pagemap.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/include/linux/pagemap.h~mm-add-a-mapping_clear_large_folios-helper
+++ a/include/linux/pagemap.h
@@ -343,6 +343,20 @@ static inline void mapping_set_large_fol
__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
}
+/**
+ * mapping_clear_large_folios() - Disable large folio support for a mapping
+ * @mapping: The mapping.
+ *
+ * This can be called to undo the effect of mapping_set_large_folios().
+ *
+ * Context: This should not be called while the inode is active as it
+ * is non-atomic.
+ */
+static inline void mapping_clear_large_folios(struct address_space *mapping)
+{
+ __clear_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+}
+
/*
* Large folio support currently depends on THP. These dependencies are
* being worked on but are not yet fixed.
_
Patches currently in -mm which might be from hch(a)lst.de are
mm-add-a-mapping_clear_large_folios-helper.patch
xfs-disable-large-folio-support-in-xfile_create.patch
Currently, the function update_port_device_state gets the usb_hub from
udev->parent by calling usb_hub_to_struct_hub.
However, in case the actconfig or the maxchild is 0, the usb_hub would
be NULL and upon further accessing to get port_dev would result in null
pointer dereference.
Fix this by introducing an if check after the usb_hub is populated.
Fixes: 83cb2604f641 ("usb: core: add sysfs entry for usb device state")
Cc: stable(a)vger.kernel.org
Signed-off-by: Udipto Goswami <quic_ugoswami(a)quicinc.com>
---
v5: Addressed nit picks in commit and the comment.
v4: Fixed minor mistakes in the comment.
v3: Re-wrote the comment for better context.
v2: Introduced comment for the if check & CC'ed stable.
drivers/usb/core/hub.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index ffd7c99e24a3..48409d51ea43 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2053,9 +2053,19 @@ static void update_port_device_state(struct usb_device *udev)
if (udev->parent) {
hub = usb_hub_to_struct_hub(udev->parent);
- port_dev = hub->ports[udev->portnum - 1];
- WRITE_ONCE(port_dev->state, udev->state);
- sysfs_notify_dirent(port_dev->state_kn);
+
+ /*
+ * The Link Layer Validation System Driver (lvstest)
+ * has a test step to unbind the hub before running the
+ * rest of the procedure. This triggers hub_disconnect
+ * which will set the hub's maxchild to 0, further
+ * resulting in usb_hub_to_struct_hub returning NULL.
+ */
+ if (hub) {
+ port_dev = hub->ports[udev->portnum - 1];
+ WRITE_ONCE(port_dev->state, udev->state);
+ sysfs_notify_dirent(port_dev->state_kn);
+ }
}
}
--
2.17.1
The patch titled
Subject: mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory_hotplug-fix-memmap_on_memory-sysfs-value-retrieval.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval
Date: Wed, 10 Jan 2024 15:01:27 +0100
set_memmap_mode() stores the kernel parameter memmap mode as an integer.
However, the get_memmap_mode() function utilizes param_get_bool() to fetch
the value as a boolean, leading to potential endianness issue. On
Big-endian architectures, the memmap_on_memory is consistently displayed
as 'N' regardless of its actual status.
To address this endianness problem, the solution involves obtaining the
mode as an integer. This adjustment ensures the proper display of the
memmap_on_memory parameter, presenting it as one of the following options:
Force, Y, or N.
Link: https://lkml.kernel.org/r/20240110140127.241451-1-sumanthk@linux.ibm.com
Fixes: 2d1f649c7c08 ("mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks")
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Suggested-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-memmap_on_memory-sysfs-value-retrieval
+++ a/mm/memory_hotplug.c
@@ -101,9 +101,11 @@ static int set_memmap_mode(const char *v
static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
{
- if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
- return sprintf(buffer, "force\n");
- return param_get_bool(buffer, kp);
+ int mode = *((int *)kp->arg);
+
+ if (mode == MEMMAP_ON_MEMORY_FORCE)
+ return sprintf(buffer, "force\n");
+ return sprintf(buffer, "%c\n", mode ? 'Y' : 'N');
}
static const struct kernel_param_ops memmap_mode_ops = {
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-memory_hotplug-fix-memmap_on_memory-sysfs-value-retrieval.patch
mm-memory_hotplug-introduce-mem_prepare_online-mem_finish_offline-notifiers.patch
s390-mm-allocate-vmemmap-pages-from-self-contained-memory-range.patch
s390-sclp-remove-unhandled-memory-notifier-type.patch
s390-mm-implement-mem_prepare_online-mem_finish_offline-notifiers.patch
s390-enable-mhp_memmap_on_memory.patch
I'm announcing the release of the 4.14.336 kernel. This is the LAST 4.14.y
kernel to be released. It is now officially end-of-life. Do NOT use this
kernel version anymore, please move to a newer one, as shown on the kernel.org
releases page.
All users of the 4.14 kernel series must upgrade. But then, move to a newer
release. If you are stuck at this version due to a vendor requiring it, go get
support from that vendor for this obsolete kernel tree, as that is what you are
paying them for :)
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/firewire/ohci.c | 51 +++++++++++++++++++++++++
drivers/mmc/core/block.c | 7 +--
drivers/mmc/core/host.c | 1
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 +++
net/nfc/llcp_core.c | 39 +++++++++++++++++--
net/sched/em_text.c | 4 +
8 files changed, 106 insertions(+), 10 deletions(-)
Adrian Cinal (1):
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Geert Uytterhoeven (1):
mmc: core: Cancel delayed work before releasing host
Greg Kroah-Hartman (1):
Linux 4.14.336
Hangyu Hua (1):
net: sched: em_text: fix possible memory leak in em_text_destroy()
Jorge Ramirez-Ortiz (1):
mmc: rpmb: fixes pause retune on all RPMB partitions.
Ke Xiao (1):
i40e: fix use-after-free in i40e_aqc_add_filters()
Siddh Raman Pant (1):
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
Takashi Sakamoto (1):
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
-------------------------------
NOTE, this is the LAST 4.14.y-rc release cycle that is going to happen.
After this release, this branch will be end-of-life. You all should
have moved to the 4.19.y branch at the very least by now, as this is it,
time to stop using this one.
-------------------------------
This is the start of the stable review cycle for the 4.14.336 release.
There are 7 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Jan 2024 14:18:47 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.336-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.336-rc1
Geert Uytterhoeven <geert+renesas(a)glider.be>
mmc: core: Cancel delayed work before releasing host
Jorge Ramirez-Ortiz <jorge(a)foundries.io>
mmc: rpmb: fixes pause retune on all RPMB partitions.
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
Ke Xiao <xiaoke(a)sangfor.com.cn>
i40e: fix use-after-free in i40e_aqc_add_filters()
Adrian Cinal <adriancinal(a)gmail.com>
net: bcmgenet: Fix FCS generation for fragmented skbuffs
Hangyu Hua <hbh25y(a)gmail.com>
net: sched: em_text: fix possible memory leak in em_text_destroy()
Siddh Raman Pant <code(a)siddh.me>
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
-------------
Diffstat:
Makefile | 4 +-
drivers/firewire/ohci.c | 51 ++++++++++++++++++++++++++
drivers/mmc/core/block.c | 7 ++--
drivers/mmc/core/host.c | 1 +
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 +++-
net/nfc/llcp_core.c | 39 ++++++++++++++++++--
net/sched/em_text.c | 4 +-
8 files changed, 107 insertions(+), 11 deletions(-)
Hi there,
We are excited to offer you a comprehensive email list of school districts that includes key contact information such as phone numbers, email addresses, mailing addresses, company revenue, size, and web addresses. Our databases also cover related industries such as:
* K-12 schools
* Universities
* Vocational schools and training programs
* Performing arts schools
* Fitness centers and gyms
* Child care services and providers
* Educational publishers and suppliers
If you're interested, we would be happy to provide you with relevant counts and a test file based on your specific requirements.
Thank you for your time and consideration, and please let us know if you have any questions or concerns.
Thanks,
Amelia Turner
To remove from this mailing reply with the subject line " LEAVE US".
This is a request for guidance on where is the most appropriate I should ask
for advice on how to do problem source identification on a problem with a
Linux PC which looks like it might be kernel-related. I an providing the
minimal amount of problem symptom information here that will help responders.
Simple problem statement: My PC (Gigabyte GA-H81M-S2H based) will not start
the OS (Linux MX 23.2, Debian 12, Kernel 6.2) if my PCIE GPU card (ZOTAC
nVidia GTX-1050) is installed. The start-up process hangs after Grub is
processed but before the Login screen is presented. Start-up completes
normally when the PCIE card is not installed.
Further detail:
The ZOTAC card works without issue in another PC with a different brand
motherboard.
This PC works without issue, with the ZOTAC card installed, with a different
kernel/distribution (e.g. Mint 21.2, Ubuntu 22.04, Kernel 5.15/5.19 or Windows
10).
With any Linux version installed in this PC, which uses a Via chip, operation
is subject to the issue which prevents USB 3 operation on back plane ports,
unless ‘iommu=off’ is set in grub. This might be an entirely different problem
to that associated with the presence of the PCIE card.
Who should I be seeking help/advice problem resolution from ?
Currently,the function update_port_device_state gets the usb_hub from
udev->parent by calling usb_hub_to_struct_hub.
However, in case the actconfig or the maxchild is 0, the usb_hub would
be NULL and upon further accessing to get port_dev would result in null
pointer dereference.
Fix this by introducing an if check after the usb_hub is populated.
Fixes: 83cb2604f641 ("usb: core: add sysfs entry for usb device state")
Cc: stable(a)vger.kernel.org
Signed-off-by: Udipto Goswami <quic_ugoswami(a)quicinc.com>
---
v4: Fixed minor mistakes in the comment.
v3: Re-wrote the comment for better context.
v2: Introduced comment for the if check & CC'ed stable.
drivers/usb/core/hub.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index ffd7c99e24a3..5ba1875e6bf4 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2053,9 +2053,22 @@ static void update_port_device_state(struct usb_device *udev)
if (udev->parent) {
hub = usb_hub_to_struct_hub(udev->parent);
- port_dev = hub->ports[udev->portnum - 1];
- WRITE_ONCE(port_dev->state, udev->state);
- sysfs_notify_dirent(port_dev->state_kn);
+
+ /*
+ * The Link Layer Validation System Driver (lvstest)
+ * has step to unbind the hub before running the rest
+ * of the procedure. This triggers hub_disconnect which
+ * will set the hub's maxchild to 0, further resulting
+ * usb_hub_to_struct_hub returning NULL.
+ *
+ * Add if check to avoid running into NULL pointer
+ * de-reference.
+ */
+ if (hub) {
+ port_dev = hub->ports[udev->portnum - 1];
+ WRITE_ONCE(port_dev->state, udev->state);
+ sysfs_notify_dirent(port_dev->state_kn);
+ }
}
}
--
2.17.1
Hi,
I wanted to check with you if you had a time to go through my previous
email,
Let me know your thoughts about acquiring this email list
Regards,
*Dyana *
------------------------------------------------------------------------------------------------------------------------------------
Hi,
Would you be interested in acquiring *Physicians Email & Mailing List* for
your upcoming campaigns?
*Physician Specialties*
Anesthesiologist
Ophthalmologist
Cardiologist
Optometrist
Dermatologist
Pathologist
Dentist
Pediatrician
Emergency Medicine
Psychiatrist
Family Practitioners
Psychologist
Gastroenterologist
Plastic Surgeon
General Practitioners
Podiatrist
Gynecologist
Pulmonologist
Hospitalist
Radiologist
Hematologist
Rheumatologist
Internal Medicine
Urologist
Nephrologists
Physician Assistants
Neurologist
Nurse Practitioners
Oncologist
Registered Nurses etc.
Let me know your *target audience* so that I will get back to you with more
information along with *pricing*.
If you think I should be talking to someone else, please forward this email
to the concerned person.
Looking forward to hearing from you.
Regards,
*Dyana Collins **| **Online Marketing Executive*
PWe have a responsibility to the environment
Before printing this e-mail or any other document, let's ask ourselves
whether we need a hard copy
To unsubscribe, reply with “leave out” in the subject line.
Currently,the function update_port_device_state gets the usb_hub from
udev->parent by calling usb_hub_to_struct_hub.
However, in case the actconfig or the maxchild is 0, the usb_hub would
be NULL and upon further accessing to get port_dev would result in null
pointer dereference.
Fix this by introducing an if check after the usb_hub is populated.
Fixes: 83cb2604f641 ("usb: core: add sysfs entry for usb device state")
Cc: stable(a)vger.kernel.org
Signed-off-by: Udipto Goswami <quic_ugoswami(a)quicinc.com>
---
v3: Re-wrote the comment for better context.
v2: Introduced comment for the if check & CC'ed stable.
drivers/usb/core/hub.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index ffd7c99e24a3..6b514546e59b 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2053,9 +2053,23 @@ static void update_port_device_state(struct usb_device *udev)
if (udev->parent) {
hub = usb_hub_to_struct_hub(udev->parent);
- port_dev = hub->ports[udev->portnum - 1];
- WRITE_ONCE(port_dev->state, udev->state);
- sysfs_notify_dirent(port_dev->state_kn);
+
+ /*
+ * The Link Layer Validation System Driver (lvstest)
+ * has procedure of unbinding the hub before running
+ * the rest of the procedure. This triggers
+ * hub_disconnect will set the hub's maxchild to 0.
+ * This would result usb_hub_to_struct_hub in this
+ * function to return NULL.
+ *
+ * Add if check to avoid running into NULL pointer
+ * de-reference.
+ */
+ if (hub) {
+ port_dev = hub->ports[udev->portnum - 1];
+ WRITE_ONCE(port_dev->state, udev->state);
+ sysfs_notify_dirent(port_dev->state_kn);
+ }
}
}
--
2.17.1
The patch titled
Subject: readahead: avoid multiple marked readahead pages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
readahead-avoid-multiple-marked-readahead-pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: readahead: avoid multiple marked readahead pages
Date: Thu, 4 Jan 2024 09:58:39 +0100
ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page being
allocated. This can however lead to multiple consecutive pages being
marked with readahead flag. Consider situation with index == 1, mark ==
1, order == 0. We insert order 0 page at index 1 and mark it. Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well. Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well. The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1. This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:
"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:
[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Guo Xuenan <guoxuenan(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/readahead.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/readahead.c~readahead-avoid-multiple-marked-readahead-pages
+++ a/mm/readahead.c
@@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct
if (!folio)
return -ENOMEM;
- mark = round_up(mark, 1UL << order);
+ mark = round_down(mark, 1UL << order);
if (index == mark)
folio_set_readahead(folio);
err = filemap_add_folio(ractl->mapping, folio, index, gfp);
_
Patches currently in -mm which might be from jack(a)suse.cz are
readahead-avoid-multiple-marked-readahead-pages.patch
The patch titled
Subject: efi: disable mirror feature during crashkernel
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
efi-disable-mirror-feature-during-crashkernel.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: efi: disable mirror feature during crashkernel
Date: Tue, 9 Jan 2024 12:15:36 +0800
If the system has no mirrored memory or uses crashkernel.high while
kernelcore=mirror is enabled on the command line then during crashkernel,
there will be limited mirrored memory and this usually leads to OOM.
To solve this problem, disable the mirror feature during crashkernel.
Link: https://lkml.kernel.org/r/20240109041536.3903042-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mm_init.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/mm_init.c~efi-disable-mirror-feature-during-crashkernel
+++ a/mm/mm_init.c
@@ -26,6 +26,7 @@
#include <linux/pgtable.h>
#include <linux/swap.h>
#include <linux/cma.h>
+#include <linux/crash_dump.h>
#include "internal.h"
#include "slab.h"
#include "shuffle.h"
@@ -381,6 +382,11 @@ static void __init find_zone_movable_pfn
goto out;
}
+ if (is_kdump_kernel()) {
+ pr_warn("The system is under kdump, ignore kernelcore=mirror.\n");
+ goto out;
+ }
+
for_each_mem_region(r) {
if (memblock_is_mirror(r))
continue;
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
efi-disable-mirror-feature-during-crashkernel.patch
The patch titled
Subject: kexec: do syscore_shutdown() in kernel_kexec
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kexec-do-syscore_shutdown-in-kernel_kexec.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: James Gowans <jgowans(a)amazon.com>
Subject: kexec: do syscore_shutdown() in kernel_kexec
Date: Wed, 13 Dec 2023 08:40:04 +0200
syscore_shutdown() runs driver and module callbacks to get the system into
a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM
/ reboot: call syscore_shutdown() after disable_nonboot_cpus()")
syscore_shutdown() was removed from kernel_restart_prepare() and hence got
(incorrectly?) removed from the kexec flow. This was innocuous until
commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to
hook restart/shutdown") changed the way that KVM registered its shutdown
callbacks, switching from reboot notifiers to syscore_ops.shutdown. As
syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run
and virtualisation is left enabled on the boot CPU which results in triple
faults when switching to the new kernel on Intel x86 VT-x with VMXE
enabled.
Fix this by adding syscore_shutdown() to the kexec sequence. In terms of
where to add it, it is being added after migrating the kexec task to the
boot CPU, but before APs are shut down. It is not totally clear if this
is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call
syscore_shutdown() after disable_nonboot_cpus()") it is stated that
"syscore_ops operations should be carried with one CPU on-line and
interrupts disabled." APs are only offlined later in machine_shutdown(),
so this syscore_shutdown() is being run while APs are still online. This
seems to be the correct place as it matches where syscore_shutdown() is
run in the reboot and halt flows - they also run it before APs are shut
down. The assumption is that the commit message in commit 6f389a8f1dd2
("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is
no longer valid.
KVM has been discussed here as it is what broke loudly by not having
syscore_shutdown() in kexec, but this change impacts more than just KVM;
all drivers/modules which register a syscore_ops.shutdown callback will
now be invoked in the kexec flow. Looking at some of them like x86 MCE it
is probably more correct to also shut these down during kexec.
Maintainers of all drivers which use syscore_ops.shutdown are added on CC
for visibility. They are:
arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown,
arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown,
arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown,
drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown,
drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown,
kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown,
virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
This has been tested by doing a kexec on x86_64 and aarch64.
Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com
Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
Signed-off-by: James Gowans <jgowans(a)amazon.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Chen-Yu Tsai <wens(a)csie.org>
Cc: Jernej Skrabec <jernej.skrabec(a)gmail.com>
Cc: Samuel Holland <samuel(a)sholland.org>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: Sebastian Reichel <sre(a)kernel.org>
Cc: Orson Zhai <orsonzhai(a)gmail.com>
Cc: Alexander Graf <graf(a)amazon.de>
Cc: Jan H. Schoenherr <jschoenh(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_core.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/kexec_core.c~kexec-do-syscore_shutdown-in-kernel_kexec
+++ a/kernel/kexec_core.c
@@ -1257,6 +1257,7 @@ int kernel_kexec(void)
kexec_in_progress = true;
kernel_restart_prepare("kexec reboot");
migrate_to_reboot_cpu();
+ syscore_shutdown();
/*
* migrate_to_reboot_cpu() disables CPU hotplug assuming that
_
Patches currently in -mm which might be from jgowans(a)amazon.com are
kexec-do-syscore_shutdown-in-kernel_kexec.patch
The patch titled
Subject: fs/proc/task_mmu: move mmu notification mechanism inside mm lock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: fs/proc/task_mmu: move mmu notification mechanism inside mm lock
Date: Tue, 9 Jan 2024 16:24:42 +0500
Move mmu notification mechanism inside mm lock to prevent race condition
in other components which depend on it. The notifier will invalidate
memory range. Depending upon the number of iterations, different memory
ranges would be invalidated.
The following warning would be removed by this patch:
WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734
There is no behavioural and performance change with this patch when
there is no component registered with the mmu notifier.
Link: https://lkml.kernel.org/r/20240109112445.590736-1-usama.anjum@collabora.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reported-by: syzbot+81227d2bd69e9dedb802(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Micha�� Miros��aw <mirq-linux(a)rere.qmqm.pl>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock
+++ a/fs/proc/task_mmu.c
@@ -2448,13 +2448,6 @@ static long do_pagemap_scan(struct mm_st
if (ret)
return ret;
- /* Protection change for the range is going to happen. */
- if (p.arg.flags & PM_SCAN_WP_MATCHING) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
- mm, p.arg.start, p.arg.end);
- mmu_notifier_invalidate_range_start(&range);
- }
-
for (walk_start = p.arg.start; walk_start < p.arg.end;
walk_start = p.arg.walk_end) {
long n_out;
@@ -2467,8 +2460,20 @@ static long do_pagemap_scan(struct mm_st
ret = mmap_read_lock_killable(mm);
if (ret)
break;
+
+ /* Protection change for the range is going to happen. */
+ if (p.arg.flags & PM_SCAN_WP_MATCHING) {
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
+ mm, walk_start, p.arg.end);
+ mmu_notifier_invalidate_range_start(&range);
+ }
+
ret = walk_page_range(mm, walk_start, p.arg.end,
&pagemap_scan_ops, &p);
+
+ if (p.arg.flags & PM_SCAN_WP_MATCHING)
+ mmu_notifier_invalidate_range_end(&range);
+
mmap_read_unlock(mm);
n_out = pagemap_scan_flush_buffer(&p);
@@ -2494,9 +2499,6 @@ static long do_pagemap_scan(struct mm_st
if (pagemap_scan_writeback_args(&p.arg, uarg))
ret = -EFAULT;
- if (p.arg.flags & PM_SCAN_WP_MATCHING)
- mmu_notifier_invalidate_range_end(&range);
-
kfree(p.vec_buf);
return ret;
}
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
fs-proc-task_mmu-move-mmu-notification-mechanism-inside-mm-lock.patch
Move mmu notification mechanism inside mm lock to prevent race condition
in other components which depend on it. The notifier will invalidate
memory range. Depending upon the number of iterations, different memory
ranges would be invalidated.
The following warning would be removed by this patch:
WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734
There is no behavioural and performance change with this patch when
there is no component registered with the mmu notifier.
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Reported-by: syzbot+81227d2bd69e9dedb802(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
fs/proc/task_mmu.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 62b16f42d5d2..56c2e7357494 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -2448,13 +2448,6 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
if (ret)
return ret;
- /* Protection change for the range is going to happen. */
- if (p.arg.flags & PM_SCAN_WP_MATCHING) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
- mm, p.arg.start, p.arg.end);
- mmu_notifier_invalidate_range_start(&range);
- }
-
for (walk_start = p.arg.start; walk_start < p.arg.end;
walk_start = p.arg.walk_end) {
long n_out;
@@ -2467,8 +2460,20 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
ret = mmap_read_lock_killable(mm);
if (ret)
break;
+
+ /* Protection change for the range is going to happen. */
+ if (p.arg.flags & PM_SCAN_WP_MATCHING) {
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
+ mm, walk_start, p.arg.end);
+ mmu_notifier_invalidate_range_start(&range);
+ }
+
ret = walk_page_range(mm, walk_start, p.arg.end,
&pagemap_scan_ops, &p);
+
+ if (p.arg.flags & PM_SCAN_WP_MATCHING)
+ mmu_notifier_invalidate_range_end(&range);
+
mmap_read_unlock(mm);
n_out = pagemap_scan_flush_buffer(&p);
@@ -2494,9 +2499,6 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
if (pagemap_scan_writeback_args(&p.arg, uarg))
ret = -EFAULT;
- if (p.arg.flags & PM_SCAN_WP_MATCHING)
- mmu_notifier_invalidate_range_end(&range);
-
kfree(p.vec_buf);
return ret;
}
--
2.42.0