From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
Commit 73a6e474cb37 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") exposed several issues with
the memory map initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the probelm was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and after a long discussion
between us [3] I think these patches are the proper fix.
v2 changes:
* added patch that adds all regions in memblock.reserved that do not
overlap with memblock.memory to memblock.memory in the beginning of
free_area_init()
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun…
Mike Rapoport (2):
mm: memblock: enforce overlap of memory.memblock and memory.reserved
mm: fix initialization of struct page for holes in memory layout
include/linux/memblock.h | 1 +
mm/memblock.c | 24 ++++++
mm/page_alloc.c | 159 ++++++++++++++++++---------------------
3 files changed, 97 insertions(+), 87 deletions(-)
--
2.28.0
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:
[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read
[reader]
numjobs=4
directory=/mnt
where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).
Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.
Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
block/bfq-iosched.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 3d411716d7ee..50017275915f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2937,6 +2937,7 @@ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
}
bfqd->in_service_queue = bfqq;
+ bfqd->in_serv_last_pos = 0;
}
/*
--
2.16.4
From: Fenghua Yu <fenghua.yu(a)intel.com>
Currently when moving a task to a resource group the PQR_ASSOC MSR
is updated with the new closid and rmid in an added task callback.
If the task is running the work is run as soon as possible. If the
task is not running the work is executed later in the kernel exit path
when the kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when the
task is scheduled in) and causing system resource waste with the way in
which it is implemented: Work to update the PQR_ASSOC register is queued
every time the user writes a task id to the "tasks" file, even if the task
already belongs to the resource group. This could result in multiple
pending work items associated with a single task even if they are all
identical and even though only a single update with most recent values
is needed. Specifically, even if a task is moved between different
resource groups while it is sleeping then it is only the last move that
is relevant but yet a work item is queued during each move.
This unnecessary queueing of work items could result in significant system
resource waste, especially on tasks sleeping for a long time. For example,
as demonstrated by Shakeel Butt in [1] writing the same task id to the
"tasks" file can quickly consume significant memory. The same problem
(wasted system resources) occurs when moving a task between different
resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing is
done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3…
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.…
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Valentin Schneider <valentin.schneider(a)arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
Cc: stable(a)vger.kernel.org
---
V1->V2:
* Add Reviewed-by tags.
* Use task_curr() instead of task_struct->on_cpu (James)
* Fixup subject line (James)
* Remove unnecessary check for non-existent resource group type. (James and Valentin)
* Include barrier() after closid/rmid written. (Valentin)
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 118 ++++++++++---------------
1 file changed, 49 insertions(+), 69 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f3418428682b..c5937a12d731 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,89 +525,69 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
kfree(rdtgrp);
}
-struct task_move_callback {
- struct callback_head work;
- struct rdtgroup *rdtgrp;
-};
-
-static void move_myself(struct callback_head *head)
+static void _update_task_closid_rmid(void *task)
{
- struct task_move_callback *callback;
- struct rdtgroup *rdtgrp;
-
- callback = container_of(head, struct task_move_callback, work);
- rdtgrp = callback->rdtgrp;
-
/*
- * If resource group was deleted before this task work callback
- * was invoked, then assign the task to root group and free the
- * resource group.
+ * If the task is still current on this CPU, update PQR_ASSOC MSR.
+ * Otherwise, the MSR is updated when the task is scheduled in.
*/
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- current->closid = 0;
- current->rmid = 0;
- rdtgroup_remove(rdtgrp);
- }
-
- if (unlikely(current->flags & PF_EXITING))
- goto out;
-
- preempt_disable();
- /* update PQR_ASSOC MSR to make resource group go into effect */
- resctrl_sched_in();
- preempt_enable();
+ if (task == current)
+ resctrl_sched_in();
+}
-out:
- kfree(callback);
+#ifdef CONFIG_SMP
+static void update_task_closid_rmid(struct task_struct *t)
+{
+ if (task_curr(t))
+ smp_call_function_single(task_cpu(t), _update_task_closid_rmid,
+ t, 1);
}
+#else
+static void update_task_closid_rmid(struct task_struct *t)
+{
+ _update_task_closid_rmid(t);
+}
+#endif
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
- struct task_move_callback *callback;
- int ret;
-
- callback = kzalloc(sizeof(*callback), GFP_KERNEL);
- if (!callback)
- return -ENOMEM;
- callback->work.func = move_myself;
- callback->rdtgrp = rdtgrp;
-
/*
- * Take a refcount, so rdtgrp cannot be freed before the
- * callback has been invoked.
+ * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+ * updated by them.
+ *
+ * For ctrl_mon groups, move both closid and rmid.
+ * For monitor groups, can move the tasks only from
+ * their parent CTRL group.
*/
- atomic_inc(&rdtgrp->waitcount);
- ret = task_work_add(tsk, &callback->work, TWA_RESUME);
- if (ret) {
- /*
- * Task is exiting. Drop the refcount and free the callback.
- * No need to check the refcount as the group cannot be
- * deleted before the write function unlocks rdtgroup_mutex.
- */
- atomic_dec(&rdtgrp->waitcount);
- kfree(callback);
- rdt_last_cmd_puts("Task exited\n");
- } else {
- /*
- * For ctrl_mon groups move both closid and rmid.
- * For monitor groups, can move the tasks only from
- * their parent CTRL group.
- */
- if (rdtgrp->type == RDTCTRL_GROUP) {
- tsk->closid = rdtgrp->closid;
+
+ if (rdtgrp->type == RDTCTRL_GROUP) {
+ tsk->closid = rdtgrp->closid;
+ tsk->rmid = rdtgrp->mon.rmid;
+ } else if (rdtgrp->type == RDTMON_GROUP) {
+ if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
- if (rdtgrp->mon.parent->closid == tsk->closid) {
- tsk->rmid = rdtgrp->mon.rmid;
- } else {
- rdt_last_cmd_puts("Can't move task to different control group\n");
- ret = -EINVAL;
- }
+ } else {
+ rdt_last_cmd_puts("Can't move task to different control group\n");
+ return -EINVAL;
}
}
- return ret;
+
+ /*
+ * Ensure the task's closid and rmid are written before determining if
+ * the task is current that will decide if it will be interrupted.
+ */
+ barrier();
+
+ /*
+ * By now, the task's closid and rmid are set. If the task is current
+ * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+ * group go into effect. If the task is not current, the MSR will be
+ * updated when the task is scheduled in.
+ */
+ update_task_closid_rmid(tsk);
+
+ return 0;
}
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
--
2.26.2
From: Fenghua Yu <fenghua.yu(a)intel.com>
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.
Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3…
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Cc: stable(a)vger.kernel.org
---
V1->V2:
* No changes
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c5937a12d731..4042e1eb4f5d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -552,6 +552,13 @@ static void update_task_closid_rmid(struct task_struct *t)
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
+ /* If the task is already in rdtgrp, no need to move the task. */
+ if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
+ tsk->rmid == rdtgrp->mon.rmid) ||
+ (rdtgrp->type == RDTMON_GROUP && tsk->rmid == rdtgrp->mon.rmid &&
+ tsk->closid == rdtgrp->mon.parent->closid))
+ return 0;
+
/*
* Set the task's closid/rmid before the PQR_ASSOC MSR can be
* updated by them.
--
2.26.2
In 4.x kernel a dst in DST_OBSOLETE_DEAD state is associated
with loopback net_device and leads to loopback neighbour. It
leads to an ethernet header with all zero addresses.
A very troubling case is working with mac80211 and ath9k.
A packet with all zero source MAC address to mac80211 will
eventually fail ieee80211_find_sta_by_ifaddr in ath9k (xmit.c).
As result, ath9k flushes tx queue (ath_tx_complete_aggr) without
updating baw (block ack window), damages baw logic and disables
transmission.
Signed-off-by: Tong Zhu <zhutong(a)amazon.com>
---
net/core/neighbour.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6e890f51b7d8..e471c32e448f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1271,7 +1271,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
* we can reinject the packet there.
*/
n2 = NULL;
- if (dst) {
+ if (dst && dst->obsolete != DST_OBSOLETE_DEAD) {
n2 = dst_neigh_lookup_skb(dst, skb);
if (n2)
n1 = n2;
--
2.17.1
This is the start of the stable review cycle for the 5.10.3 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 25 Dec 2020 15:05:02 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.3-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.3-rc1
Dae R. Jeong <dae.r.jeong(a)kaist.ac.kr>
md: fix a warning caused by a race between concurrent md_ioctl()s
Anant Thazhemadam <anant.thazhemadam(a)gmail.com>
nl80211: validate key indexes for cfg80211_registered_device
Eric Biggers <ebiggers(a)google.com>
crypto: af_alg - avoid undefined behavior accessing salg_name
Antti Palosaari <crope(a)iki.fi>
media: msi2500: assign SPI bus number dynamically
Anant Thazhemadam <anant.thazhemadam(a)gmail.com>
fs: quota: fix array-index-out-of-bounds bug by passing correct argument to vfs_cleanup_quota_inode()
Jan Kara <jack(a)suse.cz>
quota: Sanity-check quota file headers on load
Peilin Ye <yepeilin.cs(a)gmail.com>
Bluetooth: Fix slab-out-of-bounds read in hci_le_direct_adv_report_evt()
Eric Biggers <ebiggers(a)google.com>
f2fs: prevent creating duplicate encrypted filenames
Eric Biggers <ebiggers(a)google.com>
ext4: prevent creating duplicate encrypted filenames
Eric Biggers <ebiggers(a)google.com>
ubifs: prevent creating duplicate encrypted filenames
Eric Biggers <ebiggers(a)google.com>
fscrypt: add fscrypt_is_nokey_name()
Eric Biggers <ebiggers(a)google.com>
fscrypt: remove kernel-internal constants from UAPI header
Alexey Kardashevskiy <aik(a)ozlabs.ru>
serial_core: Check for port state when tty is in error state
Julian Sax <jsbc(a)gmx.de>
HID: i2c-hid: add Vero K147 to descriptor override
Arnd Bergmann <arnd(a)arndb.de>
scsi: megaraid_sas: Check user-provided offsets
Jack Qiu <jack.qiu(a)huawei.com>
f2fs: init dirty_secmap incorrectly
Chao Yu <chao(a)kernel.org>
f2fs: fix to seek incorrect data offset in inline data file
Suzuki K Poulose <suzuki.poulose(a)arm.com>
coresight: etm4x: Handle TRCVIPCSSCTLR accesses
Suzuki K Poulose <suzuki.poulose(a)arm.com>
coresight: etm4x: Fix accesses to TRCPROCSELR
Suzuki K Poulose <suzuki.poulose(a)arm.com>
coresight: etm4x: Fix accesses to TRCCIDCTLR1
Suzuki K Poulose <suzuki.poulose(a)arm.com>
coresight: etm4x: Fix accesses to TRCVMIDCTLR1
Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
coresight: etm4x: Skip setting LPOVERRIDE bit for qcom, skip-power-up
Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
coresight: etb10: Fix possible NULL ptr dereference in etb_enable_perf()
Suzuki K Poulose <suzuki.poulose(a)arm.com>
coresight: tmc-etr: Fix barrier packet insertion for perf buffer
Mao Jinlong <jinlmao(a)codeaurora.org>
coresight: tmc-etr: Check if page is valid before dma_map_page()
Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()
Krzysztof Kozlowski <krzk(a)kernel.org>
ARM: dts: exynos: fix USB 3.0 pins supply being turned off on Odroid XU
Krzysztof Kozlowski <krzk(a)kernel.org>
ARM: dts: exynos: fix USB 3.0 VBUS control and over-current pins on Exynos5410
Krzysztof Kozlowski <krzk(a)kernel.org>
ARM: dts: exynos: fix roles of USB 3.0 ports on Odroid XU
Fabio Estevam <festevam(a)gmail.com>
usb: chipidea: ci_hdrc_imx: Pass DISABLE_DEVICE_STREAMING flag to imx6ul
Will McVicker <willmcvicker(a)google.com>
USB: gadget: f_rndis: fix bitrate for SuperSpeed and above
Jack Pham <jackp(a)codeaurora.org>
usb: gadget: f_fs: Re-use SS descriptors for SuperSpeedPlus
Will McVicker <willmcvicker(a)google.com>
USB: gadget: f_midi: setup SuperSpeed Plus descriptors
taehyun.cho <taehyun.cho(a)samsung.com>
USB: gadget: f_acm: add support for SuperSpeed Plus
Johan Hovold <johan(a)kernel.org>
USB: serial: option: add interface-number sanity check to flag handling
Dan Carpenter <dan.carpenter(a)oracle.com>
usb: mtu3: fix memory corruption in mtu3_debugfs_regset()
Nicolin Chen <nicoleotsuka(a)gmail.com>
soc/tegra: fuse: Fix index bug in get_process_id
Artem Labazov <123321artyom(a)gmail.com>
exfat: Avoid allocating upcase table using kcalloc()
Andi Kleen <ak(a)linux.intel.com>
x86/split-lock: Avoid returning with interrupts enabled
Thierry Reding <treding(a)nvidia.com>
net: ipconfig: Avoid spurious blank lines in boot log
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/exynos5410-odroidxu.dts | 6 ++-
arch/arm/boot/dts/exynos5410-pinctrl.dtsi | 28 ++++++++++++
arch/arm/boot/dts/exynos5410.dtsi | 4 ++
arch/x86/kernel/traps.c | 3 +-
crypto/af_alg.c | 10 +++--
drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c | 8 ++++
drivers/hwtracing/coresight/coresight-etb10.c | 4 +-
drivers/hwtracing/coresight/coresight-etm4x-core.c | 41 ++++++++++-------
drivers/hwtracing/coresight/coresight-priv.h | 2 +
drivers/hwtracing/coresight/coresight-tmc-etf.c | 4 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 4 +-
drivers/md/md.c | 7 ++-
drivers/media/usb/msi2500/msi2500.c | 2 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 16 ++++---
drivers/soc/tegra/fuse/speedo-tegra210.c | 2 +-
drivers/tty/serial/serial_core.c | 4 ++
drivers/usb/chipidea/ci_hdrc_imx.c | 3 +-
drivers/usb/gadget/function/f_acm.c | 2 +-
drivers/usb/gadget/function/f_fs.c | 5 ++-
drivers/usb/gadget/function/f_midi.c | 6 +++
drivers/usb/gadget/function/f_rndis.c | 4 +-
drivers/usb/mtu3/mtu3_debugfs.c | 2 +-
drivers/usb/serial/option.c | 23 +++++++++-
fs/crypto/fscrypt_private.h | 9 ++--
fs/crypto/hooks.c | 5 ++-
fs/crypto/keyring.c | 2 +-
fs/crypto/keysetup.c | 4 +-
fs/crypto/policy.c | 5 ++-
fs/exfat/nls.c | 6 +--
fs/ext4/namei.c | 3 ++
fs/f2fs/f2fs.h | 2 +
fs/f2fs/file.c | 11 +++--
fs/f2fs/segment.c | 2 +-
fs/quota/dquot.c | 2 +-
fs/quota/quota_v2.c | 19 ++++++++
fs/ubifs/dir.c | 17 ++++++--
include/linux/fscrypt.h | 34 +++++++++++++++
include/uapi/linux/fscrypt.h | 5 +--
include/uapi/linux/if_alg.h | 16 +++++++
net/bluetooth/hci_event.c | 12 +++--
net/ipv4/ipconfig.c | 14 +++---
net/wireless/core.h | 2 +
net/wireless/nl80211.c | 7 +--
net/wireless/util.c | 51 ++++++++++++++++++----
45 files changed, 334 insertions(+), 88 deletions(-)