(second try, sending with mailx...)
After 3 days of successfully running 5.4.143 with this patch attached
and no issues, on a production workload (host + vms) of a busy
webserver and mysql database, I request queueing this for a future 5.4
stable, like the 5.10 one requested by Borislav; copying his mail text
in the hope that this is best form.
please queue for 5.4 stable
See https://bugzilla.kernel.org/show_bug.cgi?id=214159 for more info.
---
Commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 upstream.
The kthread_is_per_cpu() construct relies on only being called on
PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the
following usage pattern:
if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
However, as reported by syzcaller, this is broken. The scenario is:
CPU0 CPU1 (running p)
(p->flags & PF_KTHREAD) // true
begin_new_exec()
me->flags &= ~(PF_KTHREAD|...);
kthread_is_per_cpu(p)
to_kthread(p)
WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
Introduce __to_kthread() that omits the WARN and is sure to check both
values.
Use this to remove the problematic pattern for kthread_is_per_cpu()
and fix a number of other kthread_*() functions that have similar
issues but are currently not used in ways that would expose the
problem.
Notably kthread_func() is only ever called on 'current', while
kthread_probe_data() is only used for PF_WQ_WORKER, which implies the
task is from kthread_create*().
Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Patrick Schaaf <bof(a)bof.de>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b2bac5d929d2..22750a8af83e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -76,6 +76,25 @@ static inline struct kthread *to_kthread(struct task_struct *k)
return (__force void *)k->set_child_tid;
}
+/*
+ * Variant of to_kthread() that doesn't assume @p is a kthread.
+ *
+ * Per construction; when:
+ *
+ * (p->flags & PF_KTHREAD) && p->set_child_tid
+ *
+ * the task is both a kthread and struct kthread is persistent. However
+ * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and
+ * begin_new_exec()).
+ */
+static inline struct kthread *__to_kthread(struct task_struct *p)
+{
+ void *kthread = (__force void *)p->set_child_tid;
+ if (kthread && !(p->flags & PF_KTHREAD))
+ kthread = NULL;
+ return kthread;
+}
+
void free_kthread_struct(struct task_struct *k)
{
struct kthread *kthread;
@@ -176,10 +195,11 @@ void *kthread_data(struct task_struct *task)
*/
void *kthread_probe_data(struct task_struct *task)
{
- struct kthread *kthread = to_kthread(task);
+ struct kthread *kthread = __to_kthread(task);
void *data = NULL;
- probe_kernel_read(&data, &kthread->data, sizeof(data));
+ if (kthread)
+ probe_kernel_read(&data, &kthread->data, sizeof(data));
return data;
}
@@ -490,9 +510,9 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu)
set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
}
-bool kthread_is_per_cpu(struct task_struct *k)
+bool kthread_is_per_cpu(struct task_struct *p)
{
- struct kthread *kthread = to_kthread(k);
+ struct kthread *kthread = __to_kthread(p);
if (!kthread)
return false;
@@ -1272,11 +1292,9 @@ EXPORT_SYMBOL(kthread_destroy_worker);
*/
void kthread_associate_blkcg(struct cgroup_subsys_state *css)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
+
- if (!(current->flags & PF_KTHREAD))
- return;
- kthread = to_kthread(current);
if (!kthread)
return;
@@ -1298,13 +1316,10 @@ EXPORT_SYMBOL(kthread_associate_blkcg);
*/
struct cgroup_subsys_state *kthread_blkcg(void)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
- if (current->flags & PF_KTHREAD) {
- kthread = to_kthread(current);
- if (kthread)
- return kthread->blkcg_css;
- }
+ if (kthread)
+ return kthread->blkcg_css;
return NULL;
}
EXPORT_SYMBOL(kthread_blkcg);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 74cb20f32f72..87d9fad9d01d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7301,7 +7301,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;
/* Disregard pcpu kthreads; they are where they need to be. */
- if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ if (kthread_is_per_cpu(p))
return 0;
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
[ Upstream commit a49145acfb975d921464b84fe00279f99827d816 ]
A fb_ioctl() FBIOPUT_VSCREENINFO call with invalid xres setting
or yres setting in struct fb_var_screeninfo will result in a
KASAN: vmalloc-out-of-bounds failure in bitfill_aligned() as
the margins are being cleared. The margins are cleared in
chunks and if the xres setting or yres setting is a value of
zero upto the chunk size, the failure will occur.
Add a margin check to validate xres and yres settings.
Note that, this patch needs special handling to backport it to linux
kernel 4.19, 4.14, 4.9, 4.4.
Signed-off-by: George Kennedy <george.kennedy(a)oracle.com>
Reported-by: syzbot+e5fd3e65515b48c02a30(a)syzkaller.appspotmail.com
Reviewed-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: Dhaval Giani <dhaval.giani(a)oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie(a)samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1594149963-13801-1-git-send-e…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/video/fbdev/core/fbmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 84845275dbef..de04c097d67c 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -991,6 +991,10 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo *var)
goto done;
}
+ /* bitfill_aligned() assumes that it's at least 8x8 */
+ if (var->xres < 8 || var->yres < 8)
+ return -EINVAL;
+
ret = info->fbops->fb_check_var(var, info);
if (ret)
--
2.25.1
of_parse_thermal_zones() parses the thermal-zones node and registers a
thermal_zone device for each subnode. However, if a thermal zone is
consuming a thermal sensor and that thermal sensor device hasn't probed
yet, an attempt to set trip_point_*_temp for that thermal zone device
can cause a NULL pointer dereference. Fix it.
console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
...
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
Call trace:
of_thermal_set_trip_temp+0x40/0xc4
trip_point_temp_store+0xc0/0x1dc
dev_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
do_el0_svc+0x28/0xa0
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x1c0/0x200
Cc: stable(a)vger.kernel.org
Suggested-by: David Collins <quic_collinsd(a)quicinc.com>
Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram(a)quicinc.com>
---
drivers/thermal/thermal_of.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 6379f26..ba53252 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -301,7 +301,7 @@ static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
if (trip >= data->ntrips || trip < 0)
return -EDOM;
- if (data->ops->set_trip_temp) {
+ if (data->ops && data->ops->set_trip_temp) {
int ret;
ret = data->ops->set_trip_temp(data->sensor_data, trip, temp);
--
2.7.4
The patch titled
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
has been removed from the -mm tree. Its filename was
hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
Guillaume Morin reported hitting the following WARNING followed by GPF or
NULL pointer deference either in cgroups_destroy or in the kill_css path.:
percpu ref (css_release) <= 0 (-1) after switching to atomic
WARNING: CPU: 23 PID: 130 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x127/0x130
CPU: 23 PID: 130 Comm: ksoftirqd/23 Kdump: loaded Tainted: G O 5.10.60 #1
RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x127/0x130
Call Trace:
rcu_core+0x30f/0x530
rcu_core_si+0xe/0x10
__do_softirq+0x103/0x2a2
? sort_range+0x30/0x30
run_ksoftirqd+0x2b/0x40
smpboot_thread_fn+0x11a/0x170
kthread+0x10a/0x140
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x22/0x30
Upon further examination, it was discovered that the css structure was
associated with hugetlb reservations.
For private hugetlb mappings the vma points to a reserve map that contains
a pointer to the css. At mmap time, reservations are set up and a
reference to the css is taken. This reference is dropped in the vma close
operation; hugetlb_vm_op_close. However, if a vma is split no additional
reference to the css is taken yet hugetlb_vm_op_close will be called twice
for the split vma resulting in an underflow.
Fix by taking another reference in hugetlb_vm_op_open. Note that the
reference is only taken for the owner of the reserve map. In the more
common fork case, the pointer to the reserve map is cleared for non-owning
vmas.
Link: https://lkml.kernel.org/r/20210830215015.155224-1-mike.kravetz@oracle.com
Fixes: e9fe92ae0cd2 ("hugetlb_cgroup: add reservation accounting for
private mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Guillaume Morin <guillaume(a)morinfr.org>
Suggested-by: Guillaume Morin <guillaume(a)morinfr.org>
Tested-by: Guillaume Morin <guillaume(a)morinfr.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++
mm/hugetlb.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/include/linux/hugetlb_cgroup.h
@@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rs
css_put(&h_cg->css);
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+ if (resv_map->css)
+ css_get(resv_map->css);
+}
+
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rs
{
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+}
+
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
{
--- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/mm/hugetlb.c
@@ -4106,8 +4106,10 @@ static void hugetlb_vm_op_open(struct vm
* after this open call completes. It is therefore safe to take a
* new reference here without additional locking.
*/
- if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
+ if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ resv_map_dup_hugetlb_cgroup_uncharge_info(resv);
kref_get(&resv->refs);
+ }
}
static void hugetlb_vm_op_close(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: mm: fix panic caused by __page_handle_poison()
has been removed from the -mm tree. Its filename was
mm-fix-panic-caused-by-__page_handle_poison.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michael Wang <yun.wang(a)linux.alibaba.com>
Subject: mm: fix panic caused by __page_handle_poison()
In commit 510d25c92ec4 ("mm/hwpoison: disable pcp for
page_handle_poison()"), __page_handle_poison() was introduced, and if we
mark:
RET_A = dissolve_free_huge_page();
RET_B = take_page_off_buddy();
then __page_handle_poison was supposed to return TRUE When RET_A == 0 &&
RET_B == TRUE
But since it failed to take care the case when RET_A is -EBUSY or -ENOMEM,
and just return the ret as a bool which actually become TRUE, it break the
original logic.
The following result is a huge page in freelist but was
referenced as poisoned, and lead into the final panic:
kernel BUG at mm/internal.h:95!
invalid opcode: 0000 [#1] SMP PTI
skip...
RIP: 0010:set_page_refcounted mm/internal.h:95 [inline]
RIP: 0010:remove_hugetlb_page+0x23c/0x240 mm/hugetlb.c:1371
skip...
Call Trace:
remove_pool_huge_page+0xe4/0x110 mm/hugetlb.c:1892
return_unused_surplus_pages+0x8d/0x150 mm/hugetlb.c:2272
hugetlb_acct_memory.part.91+0x524/0x690 mm/hugetlb.c:4017
This patch replaces 'bool' with 'int' to handle RET_A correctly.
Link: https://lkml.kernel.org/r/61782ac6-1e8a-4f6f-35e6-e94fce3b37f5@linux.alibab…
Fixes: 510d25c92ec4 ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Michael Wang <yun.wang(a)linux.alibaba.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-fix-panic-caused-by-__page_handle_poison
+++ a/mm/memory-failure.c
@@ -68,7 +68,7 @@ atomic_long_t num_poisoned_pages __read_
static bool __page_handle_poison(struct page *page)
{
- bool ret;
+ int ret;
zone_pcp_disable(page_zone(page));
ret = dissolve_free_huge_page(page);
@@ -76,7 +76,7 @@ static bool __page_handle_poison(struct
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));
- return ret;
+ return ret > 0;
}
static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
_
Patches currently in -mm which might be from yun.wang(a)linux.alibaba.com are
Changes since v1* [1]:
- Rearrange setters to be next to getters (Jonathan)
- Fix endian bug in nsl_set_slot() (kbuild robot)
- Return NULL instead of !name (Jonathan)
- Use {import,export}_uuid() where UUIDs are used in external interface
structures (Andy)
- Fix uuid_to_nvdimm_class() to be static (kbuild robot)
- Fixup changelog to note uuid copying fixups (Jonathan)
- Fix the broken nlabel/nrange confusion for CXL labels (Jonathan)
- Add a dedicated nlabel validation helper
- Add nrange helpers for CXL
- Introduce __mock to fix unnecessary global symbols (kbuild robot)
- Include core.h to fix some missing prototype warnings (kbuild robot)
- Fix excessive stack usage from devm_cxl_add_decoder() (kbuild robot)
- Add spec reference for namespace label fields (Jonathan)
- Fix uninitialized variable use in cxl_nvdimm_probe() (kbuild robot)
- Move cxl region definition to its own patch for readability (Jonathan)
- Move exclusive command validation to cxl_validate_cmd_from_user() (Ben)
- Fix exclusive command locking (Ben)
- Fold in Alison's acpi_pci_find_root() fix and rebase (Alison)
- Rebase on 0day-induced fixups of the baseline
[1]: https://lore.kernel.org/r/162854806653.1980150.3354618413963083778.stgit@dw…
Note that there were some one-off direct replies marked v2, but now this
set supersedes those.
---
Changed or new(*) patches since v1 are:
[ PATCH v3 03/28] libnvdimm/labels: Introduce label setter helpers
[ PATCH v3 09/28] libnvdimm/labels: Add address-abstraction uuid definitions
[ PATCH v3 10/28] libnvdimm/labels: Add uuid helpers
[*PATCH v3 11/28] libnvdimm/label: Add a helper for nlabel validation
[*PATCH v3 12/28] libnvdimm/labels: Introduce the concept of multi-range namespace labels
[*PATCH v3 13/28] libnvdimm/label: Define CXL region labels
[ PATCH v3 14/28] libnvdimm/labels: Introduce CXL labels
[ PATCH v3 17/28] cxl/mbox: Move mailbox and other non-PCI specific infrastructure to the core
[ PATCH v3 20/28] cxl/mbox: Add exclusive kernel command support
[ PATCH v3 21/28] cxl/pmem: Translate NVDIMM label commands to CXL label commands
[ PATCH v3 22/28] cxl/pmem: Add support for multiple nvdimm-bridge objects
[*PATCH v3 23/28] cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports
[ PATCH v3 24/28] tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
[ PATCH v3 27/28] tools/testing/cxl: Introduce a mock memory device + driver
[*PATCH v3 28/28] cxl/core: Split decoder setup into alloc + add
---
As mentioned in patch 24 in this series the response of upstream QEMU
community to CXL device emulation has been underwhelming to date. Even
if that picked up it still results in a situation where new driver
features and new test capabilities for those features are split across
multiple repositories.
The "nfit_test" approach of mocking up platform resources via an
external test module continues to yield positive results catching
regressions early and often. So this attempts to repeat that success
with a "cxl_test" module to inject custom crafted topologies and command
responses into the CXL subsystem's sysfs and ioctl UAPIs.
The first target for cxl_test to verify is the integration of CXL with
LIBNVDIMM and the new support for the CXL namespace label + region-label
format. The first 14 patches introduce support for the new label format.
The next 9 patches rework the CXL PCI driver and to move more common
infrastructure into the core for the unit test environment to reuse. The
largest change here is disconnecting the mailbox command processing
infrastructure from the PCI specific transport. The unit test
environment replaces the PCI transport with a custom backend with mocked
responses to command requests.
Patch 24 introduces just enough mocked functionality for the cxl_acpi
driver to load against cxl_test resources. Patch 21 fixes the first bug
discovered by this framework, namely that HDM decoder target list maps
were not being filled out.
Finally patches 26 and 27 introduce a cxl_test representation of memory
expander devices. In this initial implementation these memory expander
targets implement just enough command support to pass the basic driver
init sequence and enable label command passthrough to LIBNVDIMM.
The topology of cxl_test includes:
- (4) platform fixed memory windows. One each of a x1-volatile,
x4-volatile, x1-persistent, and x4-persistent.
- (4) Host bridges each with (2) root ports
- (8) CXL memory expanders, one for each root port
- Each memory expander device supports the GET_SUPPORTED_LOGS, GET_LOG,
IDENTIFY, GET_LSA, and SET_LSA commands.
Going forward the expectation is that where possible new UAPI visible
subsystem functionality comes with cxl_test emulation of the same.
The build process for cxl_test is:
make M=tools/testing/cxl
make M=tools/testing/cxl modules_install
The implementation methodology of the test module is the same as
nfit_test where the bulk of the emulation comes from replacing symbols
that cxl_acpi and the cxl_core import with mocked implementation of
those symbols. See the "--wrap=" lines in tools/testing/cxl/Kbuild. Some
symbols need to be replaced, but are local to the modules like
match_add_root_ports(). In those cases the local symbol is marked __weak
(via __mock) with a strong implementation coming from
tools/testing/cxl/. The goal being to be minimally invasive to
production code paths.
---
Alison Schofield (1):
cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports
Dan Williams (27):
libnvdimm/labels: Introduce getters for namespace label fields
libnvdimm/labels: Add isetcookie validation helper
libnvdimm/labels: Introduce label setter helpers
libnvdimm/labels: Add a checksum calculation helper
libnvdimm/labels: Add blk isetcookie set / validation helpers
libnvdimm/labels: Add blk special cases for nlabel and position helpers
libnvdimm/labels: Add type-guid helpers
libnvdimm/labels: Add claim class helpers
libnvdimm/labels: Add address-abstraction uuid definitions
libnvdimm/labels: Add uuid helpers
libnvdimm/label: Add a helper for nlabel validation
libnvdimm/labels: Introduce the concept of multi-range namespace labels
libnvdimm/label: Define CXL region labels
libnvdimm/labels: Introduce CXL labels
cxl/pci: Make 'struct cxl_mem' device type generic
cxl/mbox: Introduce the mbox_send operation
cxl/mbox: Move mailbox and other non-PCI specific infrastructure to the core
cxl/pci: Use module_pci_driver
cxl/mbox: Convert 'enabled_cmds' to DECLARE_BITMAP
cxl/mbox: Add exclusive kernel command support
cxl/pmem: Translate NVDIMM label commands to CXL label commands
cxl/pmem: Add support for multiple nvdimm-bridge objects
tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
cxl/bus: Populate the target list at decoder create
cxl/mbox: Move command definitions to common location
tools/testing/cxl: Introduce a mock memory device + driver
cxl/core: Split decoder setup into alloc + add
Documentation/driver-api/cxl/memory-devices.rst | 3
drivers/cxl/acpi.c | 143 ++-
drivers/cxl/core/Makefile | 1
drivers/cxl/core/bus.c | 87 +-
drivers/cxl/core/core.h | 8
drivers/cxl/core/mbox.c | 798 +++++++++++++++++
drivers/cxl/core/memdev.c | 115 ++-
drivers/cxl/core/pmem.c | 32 +
drivers/cxl/cxl.h | 45 +
drivers/cxl/cxlmem.h | 188 ++++
drivers/cxl/pci.c | 1051 +----------------------
drivers/cxl/pmem.c | 160 +++-
drivers/nvdimm/btt.c | 11
drivers/nvdimm/btt_devs.c | 14
drivers/nvdimm/core.c | 40 -
drivers/nvdimm/label.c | 361 +++++---
drivers/nvdimm/label.h | 121 ++-
drivers/nvdimm/namespace_devs.c | 204 ++--
drivers/nvdimm/nd-core.h | 5
drivers/nvdimm/nd.h | 289 ++++++
drivers/nvdimm/pfn_devs.c | 2
include/linux/nd.h | 4
tools/testing/cxl/Kbuild | 38 +
tools/testing/cxl/config_check.c | 13
tools/testing/cxl/mock_acpi.c | 109 ++
tools/testing/cxl/mock_pmem.c | 24 +
tools/testing/cxl/test/Kbuild | 10
tools/testing/cxl/test/cxl.c | 587 +++++++++++++
tools/testing/cxl/test/mem.c | 255 ++++++
tools/testing/cxl/test/mock.c | 171 ++++
tools/testing/cxl/test/mock.h | 27 +
31 files changed, 3422 insertions(+), 1494 deletions(-)
create mode 100644 drivers/cxl/core/mbox.c
create mode 100644 tools/testing/cxl/Kbuild
create mode 100644 tools/testing/cxl/config_check.c
create mode 100644 tools/testing/cxl/mock_acpi.c
create mode 100644 tools/testing/cxl/mock_pmem.c
create mode 100644 tools/testing/cxl/test/Kbuild
create mode 100644 tools/testing/cxl/test/cxl.c
create mode 100644 tools/testing/cxl/test/mem.c
create mode 100644 tools/testing/cxl/test/mock.c
create mode 100644 tools/testing/cxl/test/mock.h
base-commit: ceeb0da0a0322bcba4c50ab3cf97fe9a7aa8a2e4
This is the start of the stable review cycle for the 4.9.282 release.
There are 16 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.282-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.282-rc1
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Sean Christopherson <seanjc(a)google.com>
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Gerd Rausch <gerd.rausch(a)oracle.com>
net/rds: dma_map_sg is entitled to merge entries
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Shreyansh Chouhan <chouhan.shreyansh630(a)gmail.com>
ip_gre: add validation for csum_start
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li <islituo(a)gmail.com>
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Guenter Roeck <linux(a)roeck-us.net>
ARC: Fix CONFIG_STACKDEPOT
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/kernel/vmlinux.lds.S | 2 ++
arch/x86/kvm/mmu.c | 11 ++++++++++-
drivers/block/floppy.c | 27 +++++++++++++--------------
drivers/infiniband/hw/hfi1/sdma.c | 9 ++++-----
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 +++++++----
drivers/usb/dwc3/gadget.c | 16 ++++++++--------
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
net/ipv4/ip_gre.c | 2 ++
net/rds/ib_frmr.c | 4 ++--
18 files changed, 80 insertions(+), 44 deletions(-)
After 3 days of successfully running 5.4.143 with this patch attached
and no issues, on a production workload (host + vms) of a busy
webserver and mysql database, I request queueing this for a future 5.4
stable, like the 5.10 one requested by Borislav; copying his mail text
in the hope that this is best form.
please queue for 5.4 stable
See https://bugzilla.kernel.org/show_bug.cgi?id=214159 for more info.
---
Commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 upstream.
The kthread_is_per_cpu() construct relies on only being called on
PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the
following usage pattern:
if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
However, as reported by syzcaller, this is broken. The scenario is:
CPU0 CPU1 (running p)
(p->flags & PF_KTHREAD) // true
begin_new_exec()
me->flags &= ~(PF_KTHREAD|...);
kthread_is_per_cpu(p)
to_kthread(p)
WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
Introduce __to_kthread() that omits the WARN and is sure to check both
values.
Use this to remove the problematic pattern for kthread_is_per_cpu()
and fix a number of other kthread_*() functions that have similar
issues but are currently not used in ways that would expose the
problem.
Notably kthread_func() is only ever called on 'current', while
kthread_probe_data() is only used for PF_WQ_WORKER, which implies the
task is from kthread_create*().
Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Patrick Schaaf <bof(a)bof.de>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b2bac5d929d2..22750a8af83e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -76,6 +76,25 @@ static inline struct kthread *to_kthread(struct
task_struct *k)
return (__force void *)k->set_child_tid;
}
+/*
+ * Variant of to_kthread() that doesn't assume @p is a kthread.
+ *
+ * Per construction; when:
+ *
+ * (p->flags & PF_KTHREAD) && p->set_child_tid
+ *
+ * the task is both a kthread and struct kthread is persistent. However
+ * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and
+ * begin_new_exec()).
+ */
+static inline struct kthread *__to_kthread(struct task_struct *p)
+{
+ void *kthread = (__force void *)p->set_child_tid;
+ if (kthread && !(p->flags & PF_KTHREAD))
+ kthread = NULL;
+ return kthread;
+}
+
void free_kthread_struct(struct task_struct *k)
{
struct kthread *kthread;
@@ -176,10 +195,11 @@ void *kthread_data(struct task_struct *task)
*/
void *kthread_probe_data(struct task_struct *task)
{
- struct kthread *kthread = to_kthread(task);
+ struct kthread *kthread = __to_kthread(task);
void *data = NULL;
- probe_kernel_read(&data, &kthread->data, sizeof(data));
+ if (kthread)
+ probe_kernel_read(&data, &kthread->data, sizeof(data));
return data;
}
@@ -490,9 +510,9 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu)
set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
}
-bool kthread_is_per_cpu(struct task_struct *k)
+bool kthread_is_per_cpu(struct task_struct *p)
{
- struct kthread *kthread = to_kthread(k);
+ struct kthread *kthread = __to_kthread(p);
if (!kthread)
return false;
@@ -1272,11 +1292,9 @@ EXPORT_SYMBOL(kthread_destroy_worker);
*/
void kthread_associate_blkcg(struct cgroup_subsys_state *css)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
+
- if (!(current->flags & PF_KTHREAD))
- return;
- kthread = to_kthread(current);
if (!kthread)
return;
@@ -1298,13 +1316,10 @@ EXPORT_SYMBOL(kthread_associate_blkcg);
*/
struct cgroup_subsys_state *kthread_blkcg(void)
{
- struct kthread *kthread;
+ struct kthread *kthread = __to_kthread(current);
- if (current->flags & PF_KTHREAD) {
- kthread = to_kthread(current);
- if (kthread)
- return kthread->blkcg_css;
- }
+ if (kthread)
+ return kthread->blkcg_css;
return NULL;
}
EXPORT_SYMBOL(kthread_blkcg);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 74cb20f32f72..87d9fad9d01d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7301,7 +7301,7 @@ int can_migrate_task(struct task_struct *p,
struct lb_env *env)
return 0;
/* Disregard pcpu kthreads; they are where they need to be. */
- if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ if (kthread_is_per_cpu(p))
return 0;
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: firewire: firedtv-avc: fix a buffer overflow in avc_ca_pmt()
Author: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Mon Jun 7 17:23:48 2021 +0200
The bounds checking in avc_ca_pmt() is not strict enough. It should
be checking "read_pos + 4" because it's reading 5 bytes. If the
"es_info_length" is non-zero then it reads a 6th byte so there needs to
be an additional check for that.
I also added checks for the "write_pos". I don't think these are
required because "read_pos" and "write_pos" are tied together so
checking one ought to be enough. But they make the code easier to
understand for me. The check on write_pos is:
if (write_pos + 4 >= sizeof(c->operand) - 4) {
The first "+ 4" is because we're writing 5 bytes and the last " - 4"
is to leave space for the CRC.
The other problem is that "length" can be invalid. It comes from
"data_length" in fdtv_ca_pmt().
Cc: stable(a)vger.kernel.org
Reported-by: Luo Likang <luolikang(a)nsfocus.com>
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/firewire/firedtv-avc.c | 14 +++++++++++---
drivers/media/firewire/firedtv-ci.c | 2 ++
2 files changed, 13 insertions(+), 3 deletions(-)
---
diff --git a/drivers/media/firewire/firedtv-avc.c b/drivers/media/firewire/firedtv-avc.c
index 2bf9467b917d..71991f8638e6 100644
--- a/drivers/media/firewire/firedtv-avc.c
+++ b/drivers/media/firewire/firedtv-avc.c
@@ -1165,7 +1165,11 @@ int avc_ca_pmt(struct firedtv *fdtv, char *msg, int length)
read_pos += program_info_length;
write_pos += program_info_length;
}
- while (read_pos < length) {
+ while (read_pos + 4 < length) {
+ if (write_pos + 4 >= sizeof(c->operand) - 4) {
+ ret = -EINVAL;
+ goto out;
+ }
c->operand[write_pos++] = msg[read_pos++];
c->operand[write_pos++] = msg[read_pos++];
c->operand[write_pos++] = msg[read_pos++];
@@ -1177,13 +1181,17 @@ int avc_ca_pmt(struct firedtv *fdtv, char *msg, int length)
c->operand[write_pos++] = es_info_length >> 8;
c->operand[write_pos++] = es_info_length & 0xff;
if (es_info_length > 0) {
+ if (read_pos >= length) {
+ ret = -EINVAL;
+ goto out;
+ }
pmt_cmd_id = msg[read_pos++];
if (pmt_cmd_id != 1 && pmt_cmd_id != 4)
dev_err(fdtv->device, "invalid pmt_cmd_id %d at stream level\n",
pmt_cmd_id);
- if (es_info_length > sizeof(c->operand) - 4 -
- write_pos) {
+ if (es_info_length > sizeof(c->operand) - 4 - write_pos ||
+ es_info_length > length - read_pos) {
ret = -EINVAL;
goto out;
}
diff --git a/drivers/media/firewire/firedtv-ci.c b/drivers/media/firewire/firedtv-ci.c
index 9363d005e2b6..e0d57e09dab0 100644
--- a/drivers/media/firewire/firedtv-ci.c
+++ b/drivers/media/firewire/firedtv-ci.c
@@ -134,6 +134,8 @@ static int fdtv_ca_pmt(struct firedtv *fdtv, void *arg)
} else {
data_length = msg->msg[3];
}
+ if (data_length > sizeof(msg->msg) - data_pos)
+ return -EINVAL;
return avc_ca_pmt(fdtv, &msg->msg[data_pos], data_length);
}
This series resolves some silent cherry-pick conflicts due to the
prototype of inode_operations::getattr having changed in v5.12, as well
as a conflict in the ubifs patch. Please apply to 5.10-stable.
Eric Biggers (4):
fscrypt: add fscrypt_symlink_getattr() for computing st_size
ext4: report correct st_size for encrypted symlinks
f2fs: report correct st_size for encrypted symlinks
ubifs: report correct st_size for encrypted symlinks
fs/crypto/hooks.c | 44 +++++++++++++++++++++++++++++++++++++++++
fs/ext4/symlink.c | 11 ++++++++++-
fs/f2fs/namei.c | 11 ++++++++++-
fs/ubifs/file.c | 12 ++++++++++-
include/linux/fscrypt.h | 7 +++++++
5 files changed, 82 insertions(+), 3 deletions(-)
--
2.33.0
This series backports some patches that failed to apply to 5.4-stable
due to the prototype of inode_operations::getattr having changed in
v5.12, as well several other conflicts. Please apply to 5.4-stable.
Eric Biggers (4):
fscrypt: add fscrypt_symlink_getattr() for computing st_size
ext4: report correct st_size for encrypted symlinks
f2fs: report correct st_size for encrypted symlinks
ubifs: report correct st_size for encrypted symlinks
fs/crypto/hooks.c | 44 +++++++++++++++++++++++++++++++++++++++++
fs/ext4/symlink.c | 11 ++++++++++-
fs/f2fs/namei.c | 11 ++++++++++-
fs/ubifs/file.c | 12 ++++++++++-
include/linux/fscrypt.h | 7 +++++++
5 files changed, 82 insertions(+), 3 deletions(-)
--
2.33.0
Kernel v5.14 has various changes to optimize unaligned memory accesses,
e.g. commit 0652035a5794 ("asm-generic: unaligned: remove byteshift helpers").
Those changes triggered an unalignment-exception and thus crashed the
bootloader on parisc because the unaligned "output_len" variable now suddenly
was read word-wise while it was read byte-wise in the past.
Fix this issue by declaring the external output_len variable as char which then
forces the compiler to generate byte-accesses.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Arnd Bergmann <arnd(a)kernel.org>
Cc: John David Anglin <dave.anglin(a)bell.net>
Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
Fixes: 8c031ba63f8f ("parisc: Unbreak bootloader due to gcc-7 optimizations")
Fixes: 0652035a5794 ("asm-generic: unaligned: remove byteshift helpers")
Cc: <stable(a)vger.kernel.org> # v5.14+
---
arch/parisc/boot/compressed/misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/parisc/boot/compressed/misc.c b/arch/parisc/boot/compressed/misc.c
index 2d395998f524..7ee49f5881d1 100644
--- a/arch/parisc/boot/compressed/misc.c
+++ b/arch/parisc/boot/compressed/misc.c
@@ -26,7 +26,7 @@
extern char input_data[];
extern int input_len;
/* output_len is inserted by the linker possibly at an unaligned address */
-extern __le32 output_len __aligned(1);
+extern char output_len;
extern char _text, _end;
extern char _bss, _ebss;
extern char _startcode_end;
--
2.31.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: rtl28xxu: fix zero-length control request
Author: Johan Hovold <johan(a)kernel.org>
Date: Wed Jun 23 10:45:21 2021 +0200
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.
The driver uses a zero-length i2c-read request for type detection so
update the control-request code to use usb_sndctrlpipe() in this case.
Note that actually trying to read the i2c register in question does not
work as the register might not exist (e.g. depending on the demodulator)
as reported by Eero Lehtinen <debiangamer2(a)gmail.com>.
Reported-by: syzbot+faf11bbadc5a372564da(a)syzkaller.appspotmail.com
Reported-by: Eero Lehtinen <debiangamer2(a)gmail.com>
Tested-by: Eero Lehtinen <debiangamer2(a)gmail.com>
Fixes: d0f232e823af ("[media] rtl28xxu: add heuristic to detect chip type")
Cc: stable(a)vger.kernel.org # 4.0
Cc: Antti Palosaari <crope(a)iki.fi>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/dvb-usb-v2/rtl28xxu.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
---
diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
index 0cbdb95f8d35..795a012d4020 100644
--- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
+++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
@@ -37,7 +37,16 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req)
} else {
/* read */
requesttype = (USB_TYPE_VENDOR | USB_DIR_IN);
- pipe = usb_rcvctrlpipe(d->udev, 0);
+
+ /*
+ * Zero-length transfers must use usb_sndctrlpipe() and
+ * rtl28xxu_identify_state() uses a zero-length i2c read
+ * command to determine the chip type.
+ */
+ if (req->size)
+ pipe = usb_rcvctrlpipe(d->udev, 0);
+ else
+ pipe = usb_sndctrlpipe(d->udev, 0);
}
ret = usb_control_msg(d->udev, pipe, 0, requesttype, req->value,
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: v4l2-ioctl: Fix check_ext_ctrls
Author: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Fri Jun 18 14:29:03 2021 +0200
Drivers that do not use the ctrl-framework use this function instead.
Fix the following issues:
- Do not check for multiple classes when getting the DEF_VAL.
- Return -EINVAL for request_api calls
- Default value cannot be changed, return EINVAL as soon as possible.
- Return the right error_idx
[If an error is found when validating the list of controls passed with
VIDIOC_G_EXT_CTRLS, then error_idx shall be set to ctrls->count to
indicate to userspace that no actual hardware was touched.
It would have been much nicer of course if error_idx could point to the
control index that failed the validation, but sadly that's not how the
API was designed.]
Fixes v4l2-compliance:
Control ioctls (Input 0):
warn: v4l2-test-controls.cpp(834): error_idx should be equal to count
warn: v4l2-test-controls.cpp(855): error_idx should be equal to count
fail: v4l2-test-controls.cpp(813): doioctl(node, VIDIOC_G_EXT_CTRLS, &ctrls)
test VIDIOC_G/S/TRY_EXT_CTRLS: FAIL
Buffer ioctls (Input 0):
fail: v4l2-test-buffers.cpp(1994): ret != EINVAL && ret != EBADR && ret != ENOTTY
test Requests: FAIL
Cc: stable(a)vger.kernel.org
Fixes: 6fa6f831f095 ("media: v4l2-ctrls: add core request support")
Suggested-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Reviewed-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/v4l2-core/v4l2-ioctl.c | 60 +++++++++++++++++++++++-------------
1 file changed, 39 insertions(+), 21 deletions(-)
---
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index d4f97ab1b237..dc817f8ba9d7 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -869,7 +869,7 @@ static void v4l_print_default(const void *arg, bool write_only)
pr_cont("driver-specific ioctl\n");
}
-static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv)
+static bool check_ext_ctrls(struct v4l2_ext_controls *c, unsigned long ioctl)
{
__u32 i;
@@ -878,23 +878,41 @@ static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv)
for (i = 0; i < c->count; i++)
c->controls[i].reserved2[0] = 0;
- /* V4L2_CID_PRIVATE_BASE cannot be used as control class
- when using extended controls.
- Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
- is it allowed for backwards compatibility.
- */
- if (!allow_priv && c->which == V4L2_CID_PRIVATE_BASE)
- return 0;
- if (!c->which)
- return 1;
+ switch (c->which) {
+ case V4L2_CID_PRIVATE_BASE:
+ /*
+ * V4L2_CID_PRIVATE_BASE cannot be used as control class
+ * when using extended controls.
+ * Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
+ * is it allowed for backwards compatibility.
+ */
+ if (ioctl == VIDIOC_G_CTRL || ioctl == VIDIOC_S_CTRL)
+ return false;
+ break;
+ case V4L2_CTRL_WHICH_DEF_VAL:
+ /* Default value cannot be changed */
+ if (ioctl == VIDIOC_S_EXT_CTRLS ||
+ ioctl == VIDIOC_TRY_EXT_CTRLS) {
+ c->error_idx = c->count;
+ return false;
+ }
+ return true;
+ case V4L2_CTRL_WHICH_CUR_VAL:
+ return true;
+ case V4L2_CTRL_WHICH_REQUEST_VAL:
+ c->error_idx = c->count;
+ return false;
+ }
+
/* Check that all controls are from the same control class. */
for (i = 0; i < c->count; i++) {
if (V4L2_CTRL_ID2WHICH(c->controls[i].id) != c->which) {
- c->error_idx = i;
- return 0;
+ c->error_idx = ioctl == VIDIOC_TRY_EXT_CTRLS ? i :
+ c->count;
+ return false;
}
}
- return 1;
+ return true;
}
static int check_fmt(struct file *file, enum v4l2_buf_type type)
@@ -2189,7 +2207,7 @@ static int v4l_g_ctrl(const struct v4l2_ioctl_ops *ops,
ctrls.controls = &ctrl;
ctrl.id = p->id;
ctrl.value = p->value;
- if (check_ext_ctrls(&ctrls, 1)) {
+ if (check_ext_ctrls(&ctrls, VIDIOC_G_CTRL)) {
int ret = ops->vidioc_g_ext_ctrls(file, fh, &ctrls);
if (ret == 0)
@@ -2223,7 +2241,7 @@ static int v4l_s_ctrl(const struct v4l2_ioctl_ops *ops,
ctrls.controls = &ctrl;
ctrl.id = p->id;
ctrl.value = p->value;
- if (check_ext_ctrls(&ctrls, 1))
+ if (check_ext_ctrls(&ctrls, VIDIOC_S_CTRL))
return ops->vidioc_s_ext_ctrls(file, fh, &ctrls);
return -EINVAL;
}
@@ -2245,8 +2263,8 @@ static int v4l_g_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_g_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_g_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_G_EXT_CTRLS) ?
+ ops->vidioc_g_ext_ctrls(file, fh, p) : -EINVAL;
}
static int v4l_s_ext_ctrls(const struct v4l2_ioctl_ops *ops,
@@ -2266,8 +2284,8 @@ static int v4l_s_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_s_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_s_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_S_EXT_CTRLS) ?
+ ops->vidioc_s_ext_ctrls(file, fh, p) : -EINVAL;
}
static int v4l_try_ext_ctrls(const struct v4l2_ioctl_ops *ops,
@@ -2287,8 +2305,8 @@ static int v4l_try_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_try_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_try_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_TRY_EXT_CTRLS) ?
+ ops->vidioc_try_ext_ctrls(file, fh, p) : -EINVAL;
}
/*
I'm announcing the release of the 4.9.282 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/arc/kernel/vmlinux.lds.S | 2 ++
arch/x86/kvm/mmu.c | 11 ++++++++++-
drivers/block/floppy.c | 27 +++++++++++++--------------
drivers/infiniband/hw/hfi1/sdma.c | 9 ++++-----
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 +++++++----
drivers/usb/dwc3/gadget.c | 16 ++++++++--------
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
net/ipv4/ip_gre.c | 2 ++
net/rds/ib_frmr.c | 4 ++--
18 files changed, 79 insertions(+), 43 deletions(-)
Denis Efremov (1):
Revert "floppy: reintroduce O_NDELAY fix"
George Kennedy (1):
fbmem: add margin check to fb_check_caps()
Gerd Rausch (1):
net/rds: dma_map_sg is entitled to merge entries
Greg Kroah-Hartman (1):
Linux 4.9.282
Guenter Roeck (1):
ARC: Fix CONFIG_STACKDEPOT
Johan Hovold (1):
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Linus Torvalds (1):
vt_kdsetmode: extend console locking
Maxim Kiselev (1):
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Neeraj Upadhyay (1):
vringh: Use wiov->used to check for read/write desc order
Parav Pandit (1):
virtio: Improve vq->broken access to avoid any compiler optimization
Sasha Neftin (1):
e1000e: Fix the max snoop/no-snoop latency for 10M
Sean Christopherson (1):
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
Shreyansh Chouhan (1):
ip_gre: add validation for csum_start
Stefan Mätje (1):
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Thinh Nguyen (1):
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Tuo Li (1):
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Zhengjun Zhang (1):
USB: serial: option: add new VID/PID to support Fibocom FG150
I'm announcing the release of the 4.4.283 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/block/floppy.c | 27 +++++++++++++--------------
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 +++++++----
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
12 files changed, 51 insertions(+), 27 deletions(-)
Denis Efremov (1):
Revert "floppy: reintroduce O_NDELAY fix"
George Kennedy (1):
fbmem: add margin check to fb_check_caps()
Greg Kroah-Hartman (1):
Linux 4.4.283
Johan Hovold (1):
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Linus Torvalds (1):
vt_kdsetmode: extend console locking
Maxim Kiselev (1):
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Neeraj Upadhyay (1):
vringh: Use wiov->used to check for read/write desc order
Parav Pandit (1):
virtio: Improve vq->broken access to avoid any compiler optimization
Sasha Neftin (1):
e1000e: Fix the max snoop/no-snoop latency for 10M
Stefan Mätje (1):
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Zhengjun Zhang (1):
USB: serial: option: add new VID/PID to support Fibocom FG150
ocfs2_data_convert_worker() is currently dropping any cached acl info
for FILE before down-converting meta lock. It should also drop for DIRECTORY.
Otherwise the second acl lookup returns the cached one (from VFS layer) which
could be already stale.
The problem we are seeing is that the acl changes on one node doesn't get
refreshed on other nodes in the following case:
Node 1 Node 2
-------------- ----------------
getfacl dir1
getfacl dir1 <-- this is OK
setfacl -m u:user1:rwX dir1
getfacl dir1 <-- see the change for user1
getfacl dir1 <-- can't see change for user1
Signed-off-by: Wengang Wang <wen.gang.wang(a)oracle.com>
---
fs/ocfs2/dlmglue.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 50a863fc1779..207ec61569ea 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3933,7 +3933,7 @@ static int ocfs2_data_convert_worker(struct ocfs2_lock_res *lockres,
oi = OCFS2_I(inode);
oi->ip_dir_lock_gen++;
mlog(0, "generation: %u\n", oi->ip_dir_lock_gen);
- goto out;
+ goto out_forget;
}
if (!S_ISREG(inode->i_mode))
@@ -3964,6 +3964,7 @@ static int ocfs2_data_convert_worker(struct ocfs2_lock_res *lockres,
filemap_fdatawait(mapping);
}
+out_forget:
forget_all_cached_acls(inode);
out:
--
2.21.0 (Apple Git-122.2)
sd and parent devices must not be removed as sd_open checks for events
sd_need_revalidate and sd_revalidate_disk traverse the device path
to check for event changes. If during this, e.g. the scsi host is being
removed and its resources freed, this traversal crashes.
Locking with scan_mutex for just a scsi disk open may seem blunt, but there
does not seem to be a more granular option. Also opening /dev/sdX directly
happens rarely enough that this shouldn't cause any issues.
The issue occurred on an older kernel with the following trace:
stack segment: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 121457 Comm: python3 Not tainted 4.14.238hyLinux #1
Hardware name: ASUS All Series/H81M-D, BIOS 0601 02/20/2014
task: ffff888213dbb700 task.stack: ffffc90008c14000
RIP: 0010:kobject_get_path+0x2a/0xe0
...
Call Trace:
kobject_uevent_env+0xe6/0x550
disk_check_events+0x101/0x160
disk_clear_events+0x75/0x100
check_disk_change+0x22/0x60
sd_open+0x70/0x170 [sd_mod]
__blkdev_get+0x3fd/0x4b0
? get_empty_filp+0x57/0x1b0
blkdev_get+0x11b/0x330
? bd_acquire+0xc0/0xc0
do_dentry_open+0x1ef/0x320
? __inode_permission+0x85/0xc0
path_openat+0x5cb/0x1500
? terminate_walk+0xeb/0x100
do_filp_open+0x9b/0x110
? __check_object_size+0xb4/0x190
? do_sys_open+0x1bd/0x250
do_sys_open+0x1bd/0x250
do_syscall_64+0x67/0x120
entry_SYSCALL_64_after_hwframe+0x41/0xa6
and this commit fixed that issue, as there has been no other such
synchronization in place since then, the issue should still be present in
recent kernels.
Signed-off-by: Christian Loehle <cloehle(a)hyperstone.com>
---
drivers/scsi/sd.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 610ebba0d66e..ad4da985a473 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1436,6 +1436,16 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
if (!scsi_block_when_processing_errors(sdev))
goto error_out;
+ /*
+ * Checking for changes to the device must not race with the device
+ * or its parent host being removed, so lock until sd_open returns.
+ */
+ mutex_lock(&sdev->host->scan_mutex);
+ if (sdev->sdev_state != SDEV_RUNNING) {
+ retval = -ERESTARTSYS;
+ goto unlock_scan_error_out;
+ }
+
if (sd_need_revalidate(bdev, sdkp))
sd_revalidate_disk(bdev->bd_disk);
@@ -1444,7 +1454,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
*/
retval = -ENOMEDIUM;
if (sdev->removable && !sdkp->media_present && !(mode & FMODE_NDELAY))
- goto error_out;
+ goto unlock_scan_error_out;
/*
* If the device has the write protect tab set, have the open fail
@@ -1452,7 +1462,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
*/
retval = -EROFS;
if (sdkp->write_prot && (mode & FMODE_WRITE))
- goto error_out;
+ goto unlock_scan_error_out;
/*
* It is possible that the disk changing stuff resulted in
@@ -1462,15 +1472,19 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
*/
retval = -ENXIO;
if (!scsi_device_online(sdev))
- goto error_out;
+ goto unlock_scan_error_out;
if ((atomic_inc_return(&sdkp->openers) == 1) && sdev->removable) {
if (scsi_block_when_processing_errors(sdev))
scsi_set_medium_removal(sdev, SCSI_REMOVAL_PREVENT);
}
+ mutex_unlock(&sdev->host->scan_mutex);
return 0;
+unlock_scan_error_out:
+ mutex_unlock(&sdev->host->scan_mutex);
+
error_out:
scsi_disk_put(sdkp);
return retval;
--
2.32.0=
Hyperstone GmbH | Line-Eid-Strasse 3 | 78467 Konstanz
Managing Directors: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
This is the start of the stable review cycle for the 5.14.1 release.
There are 11 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.14.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.14.1-rc1
Richard Guy Briggs <rgb(a)redhat.com>
audit: move put_tree() to avoid trim_trees refcount underflow and UAF
Peter Collingbourne <pcc(a)google.com>
net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
Eric Biggers <ebiggers(a)google.com>
ubifs: report correct st_size for encrypted symlinks
Eric Biggers <ebiggers(a)google.com>
f2fs: report correct st_size for encrypted symlinks
Eric Biggers <ebiggers(a)google.com>
ext4: report correct st_size for encrypted symlinks
Eric Biggers <ebiggers(a)google.com>
fscrypt: add fscrypt_symlink_getattr() for computing st_size
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Qu Wenruo <wqu(a)suse.com>
btrfs: fix NULL pointer dereference when deleting device by invalid id
DENG Qingfang <dqfext(a)gmail.com>
net: dsa: mt7530: fix VLAN traffic leaks again
Pauli Virtanen <pav(a)iki.fi>
Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
-------------
Diffstat:
Makefile | 4 ++--
drivers/block/floppy.c | 30 +++++++++++++++---------------
drivers/bluetooth/btusb.c | 22 ++++++++++++++--------
drivers/net/dsa/mt7530.c | 5 +----
drivers/tty/vt/vt_ioctl.c | 10 ++++++----
fs/btrfs/volumes.c | 2 +-
fs/crypto/hooks.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/symlink.c | 12 +++++++++++-
fs/f2fs/namei.c | 12 +++++++++++-
fs/ubifs/file.c | 13 ++++++++++++-
include/linux/fscrypt.h | 7 +++++++
include/linux/netdevice.h | 4 ++++
kernel/audit_tree.c | 2 +-
net/socket.c | 6 +++++-
14 files changed, 134 insertions(+), 39 deletions(-)
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
Guillaume Morin reported hitting the following WARNING followed by GPF or
NULL pointer deference either in cgroups_destroy or in the kill_css path.:
percpu ref (css_release) <= 0 (-1) after switching to atomic
WARNING: CPU: 23 PID: 130 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x127/0x130
CPU: 23 PID: 130 Comm: ksoftirqd/23 Kdump: loaded Tainted: G O 5.10.60 #1
RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x127/0x130
Call Trace:
rcu_core+0x30f/0x530
rcu_core_si+0xe/0x10
__do_softirq+0x103/0x2a2
? sort_range+0x30/0x30
run_ksoftirqd+0x2b/0x40
smpboot_thread_fn+0x11a/0x170
kthread+0x10a/0x140
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x22/0x30
Upon further examination, it was discovered that the css structure was
associated with hugetlb reservations.
For private hugetlb mappings the vma points to a reserve map that contains
a pointer to the css. At mmap time, reservations are set up and a
reference to the css is taken. This reference is dropped in the vma close
operation; hugetlb_vm_op_close. However, if a vma is split no additional
reference to the css is taken yet hugetlb_vm_op_close will be called twice
for the split vma resulting in an underflow.
Fix by taking another reference in hugetlb_vm_op_open. Note that the
reference is only taken for the owner of the reserve map. In the more
common fork case, the pointer to the reserve map is cleared for non-owning
vmas.
Link: https://lkml.kernel.org/r/20210830215015.155224-1-mike.kravetz@oracle.com
Fixes: e9fe92ae0cd2 ("hugetlb_cgroup: add reservation accounting for
private mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Guillaume Morin <guillaume(a)morinfr.org>
Suggested-by: Guillaume Morin <guillaume(a)morinfr.org>
Tested-by: Guillaume Morin <guillaume(a)morinfr.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++
mm/hugetlb.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/include/linux/hugetlb_cgroup.h
@@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rs
css_put(&h_cg->css);
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+ if (resv_map->css)
+ css_get(resv_map->css);
+}
+
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rs
{
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+}
+
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
{
--- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/mm/hugetlb.c
@@ -4106,8 +4106,10 @@ static void hugetlb_vm_op_open(struct vm
* after this open call completes. It is therefore safe to take a
* new reference here without additional locking.
*/
- if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
+ if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ resv_map_dup_hugetlb_cgroup_uncharge_info(resv);
kref_get(&resv->refs);
+ }
}
static void hugetlb_vm_op_close(struct vm_area_struct *vma)
_
From: Michael Wang <yun.wang(a)linux.alibaba.com>
Subject: mm: fix panic caused by __page_handle_poison()
In commit 510d25c92ec4 ("mm/hwpoison: disable pcp for
page_handle_poison()"), __page_handle_poison() was introduced, and if we
mark:
RET_A = dissolve_free_huge_page();
RET_B = take_page_off_buddy();
then __page_handle_poison was supposed to return TRUE When RET_A == 0 &&
RET_B == TRUE
But since it failed to take care the case when RET_A is -EBUSY or -ENOMEM,
and just return the ret as a bool which actually become TRUE, it break the
original logic.
The following result is a huge page in freelist but was
referenced as poisoned, and lead into the final panic:
kernel BUG at mm/internal.h:95!
invalid opcode: 0000 [#1] SMP PTI
skip...
RIP: 0010:set_page_refcounted mm/internal.h:95 [inline]
RIP: 0010:remove_hugetlb_page+0x23c/0x240 mm/hugetlb.c:1371
skip...
Call Trace:
remove_pool_huge_page+0xe4/0x110 mm/hugetlb.c:1892
return_unused_surplus_pages+0x8d/0x150 mm/hugetlb.c:2272
hugetlb_acct_memory.part.91+0x524/0x690 mm/hugetlb.c:4017
This patch replaces 'bool' with 'int' to handle RET_A correctly.
Link: https://lkml.kernel.org/r/61782ac6-1e8a-4f6f-35e6-e94fce3b37f5@linux.alibab…
Fixes: 510d25c92ec4 ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Michael Wang <yun.wang(a)linux.alibaba.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-fix-panic-caused-by-__page_handle_poison
+++ a/mm/memory-failure.c
@@ -68,7 +68,7 @@ atomic_long_t num_poisoned_pages __read_
static bool __page_handle_poison(struct page *page)
{
- bool ret;
+ int ret;
zone_pcp_disable(page_zone(page));
ret = dissolve_free_huge_page(page);
@@ -76,7 +76,7 @@ static bool __page_handle_poison(struct
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));
- return ret;
+ return ret > 0;
}
static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
_
This is the start of the stable review cycle for the 4.19.206 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.206-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.206-rc1
Peter Collingbourne <pcc(a)google.com>
net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Sean Christopherson <seanjc(a)google.com>
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Gerd Rausch <gerd.rausch(a)oracle.com>
net/rds: dma_map_sg is entitled to merge entries
Ben Skeggs <bskeggs(a)redhat.com>
drm/nouveau/disp: power down unused DP links during init
Mark Yacoub <markyacoub(a)google.com>
drm: Copy drm_wait_vblank to user before returning
Shai Malin <smalin(a)marvell.com>
qed: Fix null-pointer dereference in qed_rdma_create_qp()
Shai Malin <smalin(a)marvell.com>
qed: qed ll2 race condition fixes
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio_pci: Support surprise removal of virtio pci device
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
opp: remove WARN when no valid OPPs remain
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: u_audio: fix race condition on endpoint stop
Guangbin Huang <huangguangbin2(a)huawei.com>
net: hns3: fix get wrong pfc_en when query PFC configuration
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
Shreyansh Chouhan <chouhan.shreyansh630(a)gmail.com>
ip_gre: add validation for csum_start
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li <islituo(a)gmail.com>
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Wesley Cheng <wcheng(a)codeaurora.org>
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Kefeng Wang <wangkefeng.wang(a)huawei.com>
once: Fix panic when module unload
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: collect all entries in one cycle
Guenter Roeck <linux(a)roeck-us.net>
ARC: Fix CONFIG_STACKDEPOT
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix truncation handling for mod32 dst reg wrt zero
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix 32 bit src register truncation on div/mod
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Do not use ax register in interpreter on div/mod
Xiaolong Huang <butterflyhuangxx(a)gmail.com>
net: qrtr: fix another OOB Read in qrtr_endpoint_post
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/vmlinux.lds.S | 2 +
arch/x86/kvm/mmu.c | 11 +++-
drivers/block/floppy.c | 27 ++++----
drivers/gpu/drm/drm_ioc32.c | 4 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 +
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 +++
drivers/infiniband/hw/hfi1/sdma.c | 9 ++-
drivers/net/can/usb/esd_usb2.c | 4 +-
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 13 +---
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 ++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_ll2.c | 20 ++++++
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 3 +-
drivers/opp/of.c | 5 +-
drivers/tty/vt/vt_ioctl.c | 11 ++--
drivers/usb/dwc3/gadget.c | 23 ++++---
drivers/usb/gadget/function/u_audio.c | 5 +-
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 +
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++
drivers/virtio/virtio_pci_common.c | 7 +++
drivers/virtio/virtio_ring.c | 6 +-
include/linux/filter.h | 24 ++++++++
include/linux/netdevice.h | 4 ++
include/linux/once.h | 4 +-
kernel/bpf/core.c | 32 +++++-----
kernel/bpf/verifier.c | 27 ++++----
lib/once.c | 11 +++-
net/ipv4/ip_gre.c | 2 +
net/netfilter/nf_conntrack_core.c | 71 +++++++---------------
net/qrtr/qrtr.c | 2 +-
net/rds/ib_frmr.c | 4 +-
net/socket.c | 6 +-
38 files changed, 228 insertions(+), 157 deletions(-)
This is the start of the stable review cycle for the 4.14.246 release.
There are 23 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.246-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.246-rc1
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Lai Jiangshan <laijs(a)linux.alibaba.com>
KVM: X86: MMU: Use the correct inherited permissions to get shadow page
Sean Christopherson <seanjc(a)google.com>
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Gerd Rausch <gerd.rausch(a)oracle.com>
net/rds: dma_map_sg is entitled to merge entries
Ben Skeggs <bskeggs(a)redhat.com>
drm/nouveau/disp: power down unused DP links during init
Mark Yacoub <markyacoub(a)google.com>
drm: Copy drm_wait_vblank to user before returning
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
opp: remove WARN when no valid OPPs remain
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: u_audio: fix race condition on endpoint stop
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
Shreyansh Chouhan <chouhan.shreyansh630(a)gmail.com>
ip_gre: add validation for csum_start
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li <islituo(a)gmail.com>
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Wesley Cheng <wcheng(a)codeaurora.org>
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Guenter Roeck <linux(a)roeck-us.net>
ARC: Fix CONFIG_STACKDEPOT
-------------
Diffstat:
Documentation/virtual/kvm/mmu.txt | 4 ++--
Makefile | 4 ++--
arch/arc/kernel/vmlinux.lds.S | 2 ++
arch/x86/kvm/mmu.c | 11 +++++++++-
arch/x86/kvm/paging_tmpl.h | 14 ++++++++-----
drivers/base/power/opp/of.c | 5 +++--
drivers/block/floppy.c | 27 ++++++++++++-------------
drivers/gpu/drm/drm_ioc32.c | 4 +---
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 +
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 +++++++++
drivers/infiniband/hw/hfi1/sdma.c | 9 ++++-----
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +++-
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 ++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 ++++++----
drivers/usb/dwc3/gadget.c | 23 ++++++++++-----------
drivers/usb/gadget/function/u_audio.c | 5 ++---
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
net/ipv4/ip_gre.c | 2 ++
net/rds/ib_frmr.c | 4 ++--
27 files changed, 114 insertions(+), 65 deletions(-)
This is the start of the stable review cycle for the 4.4.283 release.
There are 10 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.283-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.283-rc1
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
-------------
Diffstat:
Makefile | 4 ++--
drivers/block/floppy.c | 27 +++++++++++++--------------
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 +++++++----
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
12 files changed, 52 insertions(+), 28 deletions(-)
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: aeef8b5089b76852bd84889f2809e69a7cfb414e
Gitweb: https://git.kernel.org/tip/aeef8b5089b76852bd84889f2809e69a7cfb414e
Author: Jeff Moyer <jmoyer(a)redhat.com>
AuthorDate: Wed, 11 Aug 2021 17:07:37 -04:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 02 Sep 2021 21:53:18 +02:00
x86/pat: Pass valid address to sanitize_phys()
The end address passed to memtype_reserve() is handed directly to
sanitize_phys(). However, end is exclusive and sanitize_phys() expects
an inclusive address. If end falls at the end of the physical address
space, sanitize_phys() will return 0. This can result in drivers
failing to load, and the following warning:
WARNING: CPU: 26 PID: 749 at arch/x86/mm/pat.c:354 reserve_memtype+0x262/0x450
reserve_memtype failed: [mem 0x3ffffff00000-0xffffffffffffffff], req uncached-minus
Call Trace:
[<ffffffffa427b1f2>] reserve_memtype+0x262/0x450
[<ffffffffa42764aa>] ioremap_nocache+0x1a/0x20
[<ffffffffc04620a1>] mpt3sas_base_map_resources+0x151/0xa60 [mpt3sas]
[<ffffffffc0465555>] mpt3sas_base_attach+0xf5/0xa50 [mpt3sas]
---[ end trace 6d6eea4438db89ef ]---
ioremap reserve_memtype failed -22
mpt3sas_cm0: unable to map adapter memory! or resource not found
mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10597/_scsih_probe()!
Fix this by passing the inclusive end address to sanitize_phys().
Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses")
Signed-off-by: Jeff Moyer <jmoyer(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/x49o8a3pu5i.fsf@segfault.boston.devel.redhat.com
---
arch/x86/mm/pat/memtype.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 3112ca7..4ba2a3e 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -583,7 +583,12 @@ int memtype_reserve(u64 start, u64 end, enum page_cache_mode req_type,
int err = 0;
start = sanitize_phys(start);
- end = sanitize_phys(end);
+
+ /*
+ * The end address passed into this function is exclusive, but
+ * sanitize_phys() expects an inclusive address.
+ */
+ end = sanitize_phys(end - 1) + 1;
if (start >= end) {
WARN(1, "%s failed: [mem %#010Lx-%#010Lx], req %s\n", __func__,
start, end - 1, cattr_name(req_type));
The patch titled
Subject: mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
has been added to the -mm tree. Its filename is
mm-page_allocc-avoid-accessing-uninitialized-pcp-page-migratetype.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_allocc-avoid-accessing-un…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_allocc-avoid-accessing-un…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
If it's not prepared to free unref page, the pcp page migratetype is
unset. Thus We will get rubbish from get_pcppage_migratetype() and might
list_del &page->lru again after it's already deleted from the list leading
to grumble about data corruption.
Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com
Fixes: df1acc856923 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-page_allocc-avoid-accessing-uninitialized-pcp-page-migratetype
+++ a/mm/page_alloc.c
@@ -3445,8 +3445,10 @@ void free_unref_page_list(struct list_he
/* Prepare pages for freeing */
list_for_each_entry_safe(page, next, list, lru) {
pfn = page_to_pfn(page);
- if (!free_unref_page_prepare(page, pfn, 0))
+ if (!free_unref_page_prepare(page, pfn, 0)) {
list_del(&page->lru);
+ continue;
+ }
/*
* Free isolated pages directly to the allocator, see
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-page_allocc-avoid-accessing-uninitialized-pcp-page-migratetype.patch
mm-gup-remove-set-but-unused-local-variable-major.patch
mm-gup-remove-unneed-local-variable-orig_refs.patch
mm-gup-remove-useless-bug_on-in-__get_user_pages.patch
mm-gup-fix-potential-pgmap-refcnt-leak-in-__gup_device_huge.patch
mm-gup-use-helper-page_aligned-in-populate_vma_page_range.patch
shmem-remove-unneeded-variable-ret.patch
shmem-remove-unneeded-header-file.patch
shmem-remove-unneeded-function-forward-declaration.patch
shmem-include-header-file-to-declare-swap_info.patch
mm-memcg-remove-unused-functions.patch
mm-memcg-save-some-atomic-ops-when-flush-is-already-true.patch
mm-hwpoison-remove-unneeded-variable-unmap_success.patch
mm-hwpoison-fix-potential-pte_unmap_unlock-pte-error.patch
mm-hwpoison-change-argument-struct-page-hpagep-to-hpage.patch
mm-hwpoison-fix-some-obsolete-comments.patch
mm-vmscan-remove-the-pagedirty-check-after-madv_free-pages-are-page_ref_freezed.patch
mm-vmscan-remove-misleading-setting-to-sc-priority.patch
mm-vmscan-remove-unneeded-return-value-of-kswapd_run.patch
mm-vmscan-add-else-to-remove-check_pending-label.patch
mm-vmstat-correct-some-wrong-comments.patch
mm-vmstat-simplify-the-array-size-calculation.patch
mm-vmstat-remove-unneeded-return-value.patch
mm-memory_hotplug-use-helper-zone_is_zone_device-to-simplify-the-code.patch
mm-memory_hotplug-make-hwpoisoned-dirty-swapcache-pages-unmovable.patch
mm-zsmallocc-close-race-window-between-zs_pool_dec_isolated-and-zs_unregister_migration.patch
mm-zsmallocc-combine-two-atomic-ops-in-zs_pool_dec_isolated.patch
Commit e26f023e01ef ("firmware/dmi: Include product_sku info to modalias")
added a new field to the modalias in the middle of the modalias, breaking
some existing udev/hwdb matches on the whole modalias without a wildcard
('*') in between the pvr and rvn fields.
All modalias matches in e.g. :
https://github.com/systemd/systemd/blob/main/hwdb.d/60-sensor.hwdb
deliberately end in ':*' so that new fields can be added at *the end* of
the modalias, but adding a new field in the middle like this breaks things.
Move the new sku field to the end of the modalias to fix some hwdb
entries no longer matching.
The new sku field has already been put to use in 2 new hwdb entries:
sensor:modalias:platform:HID-SENSOR-200073:dmi:*svnDell*:sku0A3E:*
ACCEL_LOCATION=base
sensor:modalias:platform:HID-SENSOR-200073:dmi:*svnDell*:sku0B0B:*
ACCEL_LOCATION=base
The wildcard use before and after the sku in these matches means that they
should keep working with the sku moved to the end.
Note that there is a second instance of in essence the same problem,
commit f5152f4ded3c ("firmware/dmi: Report DMI Bios & EC firmware release")
Added 2 new br and efr fields in the middle of the modalias. This too
breaks some hwdb modalias matches, but this has gone unnoticed for over
a year. So some newer hwdb modalias matches actually depend on these
fields being in the middle of the string. Moving these to the end now
would break 3 hwdb entries, while fixing 8 entries.
Since there is no good answer for the new br and efr fields I have chosen
to leave these as is. Instead I'll submit a hwdb update to put a wildcard
at the place where these fields may or may not be present depending on the
kernel version.
BugLink: https://github.com/systemd/systemd/issues/20550
Link: https://github.com/systemd/systemd/pull/20562
Fixes: e26f023e01ef ("firmware/dmi: Include product_sku info to modalias")
Cc: stable(a)vger.kernel.org
Cc: Kai-Chuan Hsieh <kaichuan.hsieh(a)canonical.com>
Cc: Erwan Velu <e.velu(a)criteo.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/firmware/dmi-id.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/dmi-id.c b/drivers/firmware/dmi-id.c
index 4d5421d14a41..940ddf916202 100644
--- a/drivers/firmware/dmi-id.c
+++ b/drivers/firmware/dmi-id.c
@@ -73,6 +73,10 @@ static void ascii_filter(char *d, const char *s)
static ssize_t get_modalias(char *buffer, size_t buffer_size)
{
+ /*
+ * Note new fields need to be added at the end to keep compatibility
+ * with udev's hwdb which does matches on "`cat dmi/id/modalias`*".
+ */
static const struct mafield {
const char *prefix;
int field;
@@ -85,13 +89,13 @@ static ssize_t get_modalias(char *buffer, size_t buffer_size)
{ "svn", DMI_SYS_VENDOR },
{ "pn", DMI_PRODUCT_NAME },
{ "pvr", DMI_PRODUCT_VERSION },
- { "sku", DMI_PRODUCT_SKU },
{ "rvn", DMI_BOARD_VENDOR },
{ "rn", DMI_BOARD_NAME },
{ "rvr", DMI_BOARD_VERSION },
{ "cvn", DMI_CHASSIS_VENDOR },
{ "ct", DMI_CHASSIS_TYPE },
{ "cvr", DMI_CHASSIS_VERSION },
+ { "sku", DMI_PRODUCT_SKU },
{ NULL, DMI_NONE }
};
--
2.31.1
Hi, Greg,
https://github.com/gregkh/linux seems to have gone stale. The last
update was about 24 days ago. Do you know what might have happened
there?
Thanks,
Omar
From: Luben Tuikov <luben.tuikov(a)amd.com>
This fixes a bug which if we probe a non-existing
I2C device, and the SMU returns 0xFF, from then on
we can never communicate with the SMU, because the
code before this patch reads and interprets 0xFF
as a terminal error, and thus we never write 0
into register 90 to clear the status (and
subsequently send a new command to the SMU.)
It is not an error that the SMU returns status
0xFF. This means that the SMU executed the last
command successfully (execution status), but the
command result is an error of some sort (execution
result), depending on what the command was.
When doing a status check of the SMU, before we
send a new command, the only status which
precludes us from sending a new command is 0--the
SMU hasn't finished executing a previous command,
and 0xFC--the SMU is busy.
This bug was seen as the following line in the
kernel log,
amdgpu: Msg issuing pre-check failed(0xff) and SMU may be not in the right state!
when subsequent SMU commands, not necessarily
related to I2C, were sent to the SMU.
This patch fixes this bug.
v2: Add a comment to the description of
__smu_cmn_poll_stat() to explain why we're NOT
defining the SMU FW return codes as macros, but
are instead hard-coding them. Such a change, can
be followed up by a subsequent patch.
v3: The changes are,
a) Add comments to break labels in
__smu_cmn_reg2errno().
b) When an unknown/unspecified/undefined result is
returned back from the SMU, map that to
-EREMOTEIO, to distinguish failure at the SMU
FW.
c) Add kernel-doc to
smu_cmn_send_msg_without_waiting(),
smu_cmn_wait_for_response(),
smu_cmn_send_smc_msg_with_param().
d) In smu_cmn_send_smc_msg_with_param(), since we
wait for completion of the command, if the
result of the completion is
undefined/unknown/unspecified, we print that to
the kernel log.
v4: a) Add macros as requested, though redundant, to
be removed when SMU consolidates for all
ASICs--see comment in code.
b) Get out if the SMU code is unknown.
v5: Rename the macro names.
Cc: Alex Deucher <Alexander.Deucher(a)amd.com>
Cc: Evan Quan <evan.quan(a)amd.com>
Cc: Lijo Lazar <Lijo.Lazar(a)amd.com>
Fixes: fcb1fe9c9e0031 ("drm/amd/powerplay: pre-check the SMU state before issuing message")
Signed-off-by: Luben Tuikov <luben.tuikov(a)amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 5810323ba692895b045e3f1b3e107605c3717dab)
Cc: stable(a)vger.kernel.org # 5.14.x
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1670
---
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 288 +++++++++++++++++++++----
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h | 3 +-
2 files changed, 244 insertions(+), 47 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index e802f9a95f08..a0e2111eb783 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -55,7 +55,7 @@
#undef __SMU_DUMMY_MAP
#define __SMU_DUMMY_MAP(type) #type
-static const char* __smu_message_names[] = {
+static const char * const __smu_message_names[] = {
SMU_MESSAGE_TYPES
};
@@ -76,55 +76,258 @@ static void smu_cmn_read_arg(struct smu_context *smu,
*arg = RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82);
}
-int smu_cmn_wait_for_response(struct smu_context *smu)
+/* Redefine the SMU error codes here.
+ *
+ * Note that these definitions are redundant and should be removed
+ * when the SMU has exported a unified header file containing these
+ * macros, which header file we can just include and use the SMU's
+ * macros. At the moment, these error codes are defined by the SMU
+ * per-ASIC unfortunately, yet we're a one driver for all ASICs.
+ */
+#define SMU_RESP_NONE 0
+#define SMU_RESP_OK 1
+#define SMU_RESP_CMD_FAIL 0xFF
+#define SMU_RESP_CMD_UNKNOWN 0xFE
+#define SMU_RESP_CMD_BAD_PREREQ 0xFD
+#define SMU_RESP_BUSY_OTHER 0xFC
+#define SMU_RESP_DEBUG_END 0xFB
+
+/**
+ * __smu_cmn_poll_stat -- poll for a status from the SMU
+ * smu: a pointer to SMU context
+ *
+ * Returns the status of the SMU, which could be,
+ * 0, the SMU is busy with your previous command;
+ * 1, execution status: success, execution result: success;
+ * 0xFF, execution status: success, execution result: failure;
+ * 0xFE, unknown command;
+ * 0xFD, valid command, but bad (command) prerequisites;
+ * 0xFC, the command was rejected as the SMU is busy;
+ * 0xFB, "SMC_Result_DebugDataDumpEnd".
+ *
+ * The values here are not defined by macros, because I'd rather we
+ * include a single header file which defines them, which is
+ * maintained by the SMU FW team, so that we're impervious to firmware
+ * changes. At the moment those values are defined in various header
+ * files, one for each ASIC, yet here we're a single ASIC-agnostic
+ * interface. Such a change can be followed-up by a subsequent patch.
+ */
+static u32 __smu_cmn_poll_stat(struct smu_context *smu)
{
struct amdgpu_device *adev = smu->adev;
- uint32_t cur_value, i, timeout = adev->usec_timeout * 20;
+ int timeout = adev->usec_timeout * 20;
+ u32 reg;
- for (i = 0; i < timeout; i++) {
- cur_value = RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90);
- if ((cur_value & MP1_C2PMSG_90__CONTENT_MASK) != 0)
- return cur_value;
+ for ( ; timeout > 0; timeout--) {
+ reg = RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90);
+ if ((reg & MP1_C2PMSG_90__CONTENT_MASK) != 0)
+ break;
udelay(1);
}
- /* timeout means wrong logic */
- if (i == timeout)
- return -ETIME;
-
- return RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90);
+ return reg;
}
-int smu_cmn_send_msg_without_waiting(struct smu_context *smu,
- uint16_t msg, uint32_t param)
+static void __smu_cmn_reg_print_error(struct smu_context *smu,
+ u32 reg_c2pmsg_90,
+ int msg_index,
+ u32 param,
+ enum smu_message_type msg)
{
struct amdgpu_device *adev = smu->adev;
- int ret;
+ const char *message = smu_get_message_name(smu, msg);
- ret = smu_cmn_wait_for_response(smu);
- if (ret != 0x1) {
- dev_err(adev->dev, "Msg issuing pre-check failed(0x%x) and "
- "SMU may be not in the right state!\n", ret);
- if (ret != -ETIME)
- ret = -EIO;
- return ret;
+ switch (reg_c2pmsg_90) {
+ case SMU_RESP_NONE:
+ dev_err_ratelimited(adev->dev,
+ "SMU: I'm not done with your previous command!");
+ break;
+ case SMU_RESP_OK:
+ /* The SMU executed the command. It completed with a
+ * successful result.
+ */
+ break;
+ case SMU_RESP_CMD_FAIL:
+ /* The SMU executed the command. It completed with an
+ * unsuccessful result.
+ */
+ break;
+ case SMU_RESP_CMD_UNKNOWN:
+ dev_err_ratelimited(adev->dev,
+ "SMU: unknown command: index:%d param:0x%08X message:%s",
+ msg_index, param, message);
+ break;
+ case SMU_RESP_CMD_BAD_PREREQ:
+ dev_err_ratelimited(adev->dev,
+ "SMU: valid command, bad prerequisites: index:%d param:0x%08X message:%s",
+ msg_index, param, message);
+ break;
+ case SMU_RESP_BUSY_OTHER:
+ dev_err_ratelimited(adev->dev,
+ "SMU: I'm very busy for your command: index:%d param:0x%08X message:%s",
+ msg_index, param, message);
+ break;
+ case SMU_RESP_DEBUG_END:
+ dev_err_ratelimited(adev->dev,
+ "SMU: I'm debugging!");
+ break;
+ default:
+ dev_err_ratelimited(adev->dev,
+ "SMU: response:0x%08X for index:%d param:0x%08X message:%s?",
+ reg_c2pmsg_90, msg_index, param, message);
+ break;
+ }
+}
+
+static int __smu_cmn_reg2errno(struct smu_context *smu, u32 reg_c2pmsg_90)
+{
+ int res;
+
+ switch (reg_c2pmsg_90) {
+ case SMU_RESP_NONE:
+ /* The SMU is busy--still executing your command.
+ */
+ res = -ETIME;
+ break;
+ case SMU_RESP_OK:
+ res = 0;
+ break;
+ case SMU_RESP_CMD_FAIL:
+ /* Command completed successfully, but the command
+ * status was failure.
+ */
+ res = -EIO;
+ break;
+ case SMU_RESP_CMD_UNKNOWN:
+ /* Unknown command--ignored by the SMU.
+ */
+ res = -EOPNOTSUPP;
+ break;
+ case SMU_RESP_CMD_BAD_PREREQ:
+ /* Valid command--bad prerequisites.
+ */
+ res = -EINVAL;
+ break;
+ case SMU_RESP_BUSY_OTHER:
+ /* The SMU is busy with other commands. The client
+ * should retry in 10 us.
+ */
+ res = -EBUSY;
+ break;
+ default:
+ /* Unknown or debug response from the SMU.
+ */
+ res = -EREMOTEIO;
+ break;
}
+ return res;
+}
+
+static void __smu_cmn_send_msg(struct smu_context *smu,
+ u16 msg,
+ u32 param)
+{
+ struct amdgpu_device *adev = smu->adev;
+
WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90, 0);
WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82, param);
WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, msg);
+}
- return 0;
+/**
+ * smu_cmn_send_msg_without_waiting -- send the message; don't wait for status
+ * @smu: pointer to an SMU context
+ * @msg_index: message index
+ * @param: message parameter to send to the SMU
+ *
+ * Send a message to the SMU with the parameter passed. Do not wait
+ * for status/result of the message, thus the "without_waiting".
+ *
+ * Return 0 on success, -errno on error if we weren't able to _send_
+ * the message for some reason. See __smu_cmn_reg2errno() for details
+ * of the -errno.
+ */
+int smu_cmn_send_msg_without_waiting(struct smu_context *smu,
+ uint16_t msg_index,
+ uint32_t param)
+{
+ u32 reg;
+ int res;
+
+ if (smu->adev->no_hw_access)
+ return 0;
+
+ mutex_lock(&smu->message_lock);
+ reg = __smu_cmn_poll_stat(smu);
+ res = __smu_cmn_reg2errno(smu, reg);
+ if (reg == SMU_RESP_NONE ||
+ reg == SMU_RESP_BUSY_OTHER ||
+ res == -EREMOTEIO)
+ goto Out;
+ __smu_cmn_send_msg(smu, msg_index, param);
+ res = 0;
+Out:
+ mutex_unlock(&smu->message_lock);
+ return res;
}
+/**
+ * smu_cmn_wait_for_response -- wait for response from the SMU
+ * @smu: pointer to an SMU context
+ *
+ * Wait for status from the SMU.
+ *
+ * Return 0 on success, -errno on error, indicating the execution
+ * status and result of the message being waited for. See
+ * __smu_cmn_reg2errno() for details of the -errno.
+ */
+int smu_cmn_wait_for_response(struct smu_context *smu)
+{
+ u32 reg;
+
+ reg = __smu_cmn_poll_stat(smu);
+ return __smu_cmn_reg2errno(smu, reg);
+}
+
+/**
+ * smu_cmn_send_smc_msg_with_param -- send a message with parameter
+ * @smu: pointer to an SMU context
+ * @msg: message to send
+ * @param: parameter to send to the SMU
+ * @read_arg: pointer to u32 to return a value from the SMU back
+ * to the caller
+ *
+ * Send the message @msg with parameter @param to the SMU, wait for
+ * completion of the command, and return back a value from the SMU in
+ * @read_arg pointer.
+ *
+ * Return 0 on success, -errno on error, if we weren't able to send
+ * the message or if the message completed with some kind of
+ * error. See __smu_cmn_reg2errno() for details of the -errno.
+ *
+ * If we weren't able to send the message to the SMU, we also print
+ * the error to the standard log.
+ *
+ * Command completion status is printed only if the -errno is
+ * -EREMOTEIO, indicating that the SMU returned back an
+ * undefined/unknown/unspecified result. All other cases are
+ * well-defined, not printed, but instead given back to the client to
+ * decide what further to do.
+ *
+ * The return value, @read_arg is read back regardless, to give back
+ * more information to the client, which on error would most likely be
+ * @param, but we can't assume that. This also eliminates more
+ * conditionals.
+ */
int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
enum smu_message_type msg,
uint32_t param,
uint32_t *read_arg)
{
- struct amdgpu_device *adev = smu->adev;
- int ret = 0, index = 0;
+ int res, index;
+ u32 reg;
if (smu->adev->no_hw_access)
return 0;
@@ -136,31 +339,24 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
return index == -EACCES ? 0 : index;
mutex_lock(&smu->message_lock);
- ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, param);
- if (ret)
- goto out;
-
- ret = smu_cmn_wait_for_response(smu);
- if (ret != 0x1) {
- if (ret == -ETIME) {
- dev_err(adev->dev, "message: %15s (%d) \tparam: 0x%08x is timeout (no response)\n",
- smu_get_message_name(smu, msg), index, param);
- } else {
- dev_err(adev->dev, "failed send message: %15s (%d) \tparam: 0x%08x response %#x\n",
- smu_get_message_name(smu, msg), index, param,
- ret);
- ret = -EIO;
- }
- goto out;
+ reg = __smu_cmn_poll_stat(smu);
+ res = __smu_cmn_reg2errno(smu, reg);
+ if (reg == SMU_RESP_NONE ||
+ reg == SMU_RESP_BUSY_OTHER ||
+ res == -EREMOTEIO) {
+ __smu_cmn_reg_print_error(smu, reg, index, param, msg);
+ goto Out;
}
-
+ __smu_cmn_send_msg(smu, (uint16_t) index, param);
+ reg = __smu_cmn_poll_stat(smu);
+ res = __smu_cmn_reg2errno(smu, reg);
+ if (res == -EREMOTEIO)
+ __smu_cmn_reg_print_error(smu, reg, index, param, msg);
if (read_arg)
smu_cmn_read_arg(smu, read_arg);
-
- ret = 0; /* 0 as driver return value */
-out:
+Out:
mutex_unlock(&smu->message_lock);
- return ret;
+ return res;
}
int smu_cmn_send_smc_msg(struct smu_context *smu,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
index 9add5f16ff56..16993daa2ae0 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
@@ -27,7 +27,8 @@
#if defined(SWSMU_CODE_LAYER_L2) || defined(SWSMU_CODE_LAYER_L3) || defined(SWSMU_CODE_LAYER_L4)
int smu_cmn_send_msg_without_waiting(struct smu_context *smu,
- uint16_t msg, uint32_t param);
+ uint16_t msg_index,
+ uint32_t param);
int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
enum smu_message_type msg,
uint32_t param,
--
2.31.1
Currently, outgoing packets larger than 1496 bytes are dropped when
tagged VLAN is used on a switch port.
Add the frame check sequence length to the value of the register
GSWIP_MAC_FLEN to fix this. This matches the lantiq_ppa vendor driver,
which uses a value consisting of 1518 bytes for the MAC frame, plus the
lengths of special tag and VLAN tags.
Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jan Hoffmann <jan(a)3e8.eu>
---
drivers/net/dsa/lantiq_gswip.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index e78026ef6d8c..64d6dfa83122 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -843,7 +843,8 @@ static int gswip_setup(struct dsa_switch *ds)
gswip_switch_mask(priv, 0, GSWIP_MAC_CTRL_2_MLEN,
GSWIP_MAC_CTRL_2p(cpu_port));
- gswip_switch_w(priv, VLAN_ETH_FRAME_LEN + 8, GSWIP_MAC_FLEN);
+ gswip_switch_w(priv, VLAN_ETH_FRAME_LEN + 8 + ETH_FCS_LEN,
+ GSWIP_MAC_FLEN);
gswip_switch_mask(priv, 0, GSWIP_BM_QUEUE_GCTRL_GL_MOD,
GSWIP_BM_QUEUE_GCTRL);
--
2.33.0
Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: stable(a)vger.kernel.org
Signed-off-by: Ernst Sjöstrand <ernstp(a)gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dc3c6b3a00e5..d356e329e6f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -758,7 +758,7 @@ enum amd_hw_ip_block_type {
MAX_HWIP
};
-#define HWIP_MAX_INSTANCE 8
+#define HWIP_MAX_INSTANCE 10
struct amd_powerplay {
void *pp_handle;
--
2.30.2
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: d049bfc3077d - Linux 5.13.14-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system integration test - as root
🚧 ✅ Podman system integration test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage nvme - tcp
🚧 💥 stress: stress-ng
Host 2:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
🚧 ✅ lvm cache test
🚧 ✅ lvm snapper test
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ lvm snapper test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ lvm snapper test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
x86_64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
🚧 ✅ lvm cache test
🚧 ✅ lvm snapper test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
The patch titled
Subject: mm,vmscan: fix divide by zero in get_scan_count
has been added to the -mm tree. Its filename is
mmvmscan-fix-divide-by-zero-in-get_scan_count.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mmvmscan-fix-divide-by-zero-in-ge…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mmvmscan-fix-divide-by-zero-in-ge…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: mm,vmscan: fix divide by zero in get_scan_count
Changeset f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to
proportional memory.low reclaim") introduced a divide by zero corner case
when oomd is being used in combination with cgroup memory.low protection.
When oomd decides to kill a cgroup, it will force the cgroup memory to be
reclaimed after killing the tasks, by writing to the memory.max file for
that cgroup, forcing the remaining page cache and reclaimable slab to be
reclaimed down to zero.
Previously, on cgroups with some memory.low protection that would result
in the memory being reclaimed down to the memory.low limit, or likely not
at all, having the page cache reclaimed asynchronously later.
With f56ce412a59d the oomd write to memory.max tries to reclaim all the
way down to zero, which may race with another reclaimer, to the point of
ending up with the divide by zero below.
This patch implements the obvious fix.
Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
Fixes: f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Chris Down <chris(a)chrisdown.name>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mmvmscan-fix-divide-by-zero-in-get_scan_count
+++ a/mm/vmscan.c
@@ -2592,7 +2592,7 @@ out:
cgroup_size = max(cgroup_size, protection);
scan = lruvec_size - lruvec_size * protection /
- cgroup_size;
+ (cgroup_size + 1);
/*
* Minimally target SWAP_CLUSTER_MAX pages to keep
_
Patches currently in -mm which might be from riel(a)surriel.com are
mmvmscan-fix-divide-by-zero-in-get_scan_count.patch
From: Rafał Miłecki <rafal(a)milecki.pl>
It isn't true that CPU port is always the last one. Switches BCM5301x
have 9 ports (port 6 being inactive) and they use port 5 as CPU by
default (depending on design some other may be CPU ports too).
A more reliable way of determining number of ports is to check for the
last set bit in the "enabled_ports" bitfield.
This fixes b53 internal state, it will allow providing accurate info to
the DSA and is required to fix BCM5301x support.
Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl>
---
drivers/net/dsa/b53/b53_common.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index bd1417a66cbf..dcf9d7e5ae14 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -2612,9 +2612,8 @@ static int b53_switch_init(struct b53_device *dev)
dev->cpu_port = 5;
}
- /* cpu port is always last */
- dev->num_ports = dev->cpu_port + 1;
dev->enabled_ports |= BIT(dev->cpu_port);
+ dev->num_ports = fls(dev->enabled_ports);
/* Include non standard CPU port built-in PHYs to be probed */
if (is539x(dev) || is531x5(dev)) {
--
2.26.2
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).
Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.
Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.
Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba0…
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Cc: <stable(a)vger.kernel.org> # 4.19
---
v2:
- simplify check by using _IOC_TYPE()
- move function inline into header
include/linux/netdevice.h | 4 ++++
net/socket.c | 6 +++++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eaf5bb008aa9..d65ce093e5a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4012,6 +4012,10 @@ int netdev_rx_handler_register(struct net_device *dev,
void netdev_rx_handler_unregister(struct net_device *dev);
bool dev_valid_name(const char *name);
+static inline bool is_socket_ioctl_cmd(unsigned int cmd)
+{
+ return _IOC_TYPE(cmd) == SOCK_IOC_TYPE;
+}
int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr,
bool *need_copyout);
int dev_ifconf(struct net *net, struct ifconf *, int);
diff --git a/net/socket.c b/net/socket.c
index 0b2dad3bdf7f..8808b3617dac 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1109,7 +1109,7 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,
rtnl_unlock();
if (!err && copy_to_user(argp, &ifc, sizeof(struct ifconf)))
err = -EFAULT;
- } else {
+ } else if (is_socket_ioctl_cmd(cmd)) {
struct ifreq ifr;
bool need_copyout;
if (copy_from_user(&ifr, argp, sizeof(struct ifreq)))
@@ -1118,6 +1118,8 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,
if (!err && need_copyout)
if (copy_to_user(argp, &ifr, sizeof(struct ifreq)))
return -EFAULT;
+ } else {
+ err = -ENOTTY;
}
return err;
}
@@ -3306,6 +3308,8 @@ static int compat_ifr_data_ioctl(struct net *net, unsigned int cmd,
struct ifreq ifreq;
u32 data32;
+ if (!is_socket_ioctl_cmd(cmd))
+ return -ENOTTY;
if (copy_from_user(ifreq.ifr_name, u_ifreq32->ifr_name, IFNAMSIZ))
return -EFAULT;
if (get_user(data32, &u_ifreq32->ifr_data))
--
2.33.0.259.gc128427fd7-goog
On 01/09/2021 07:40, Loic Poulain wrote:
> iw wlan0 set power_save off
I do this on wcn3680b and get no loss of signal
If I do this though
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 03966072f34c..ba613fbb728d 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2345,6 +2345,8 @@ int wcn36xx_smd_feature_caps_exchange(struct
wcn36xx *wcn)
set_feat_caps(msg_body.feat_caps, DOT11AC);
set_feat_caps(msg_body.feat_caps,
ANTENNA_DIVERSITY_SELECTION);
}
+ set_feat_caps(msg_body.feat_caps, IBSS_HEARTBEAT_OFFLOAD);
+ set_feat_caps(msg_body.feat_caps, WLANACTIVE_OFFLOAD);
PREPARE_HAL_BUF(wcn->hal_buf, msg_body);
@@ -2589,7 +2591,7 @@ static int wcn36xx_smd_missed_beacon_ind(struct
wcn36xx *wcn,
struct wcn36xx_hal_missed_beacon_ind_msg *rsp = buf;
struct ieee80211_vif *vif = NULL;
struct wcn36xx_vif *tmp;
-
+wcn36xx_info("%s/%d\n", __func__, __LINE__);
/* Old FW does not have bss index */
if (wcn36xx_is_fw_version(wcn, 1, 2, 2, 24)) {
list_for_each_entry(tmp, &wcn->vif_list, list) {
@@ -2608,7 +2610,7 @@ static int wcn36xx_smd_missed_beacon_ind(struct
wcn36xx *wcn,
list_for_each_entry(tmp, &wcn->vif_list, list) {
if (tmp->bss_index == rsp->bss_index) {
- wcn36xx_dbg(WCN36XX_DBG_HAL, "beacon missed
bss_index %d\n",
+ wcn36xx_info("beacon missed bss_index %d\n",
rsp->bss_index);
vif = wcn36xx_priv_to_vif(tmp);
ieee80211_connection_loss(vif);
bingo
root@linaro-developer:~# iw wlan0 set power_save off
# pulls plug on AP
root@linaro-developer:~# [ 83.290987] wcn36xx:
wcn36xx_smd_missed_beacon_ind/2594
[ 83.291070] wcn36xx: beacon missed bss_index 0
[ 83.295403] wlan0: Connection to AP e2:63:da:9c:a4:bd lost
I'm not sure if both flags are required but, this is the behavior we want
From: Jonathan Bell <jonathan(a)raspberrypi.org>
Seen on a VLI VL805 PCIe to USB controller. For non-stream endpoints
at least, if the xHC halts on a particular TRB due to an error then
the DCS field in the Out Endpoint Context maintained by the hardware
is not updated with the current cycle state.
Using the quirk XHCI_EP_CTX_BROKEN_DCS and instead fetch the DCS bit
from the TRB that the xHC stopped on.
[ bjorn: rebased to v5.14-rc2 ]
Cc: stable(a)vger.kernel.org
Link: https://github.com/raspberrypi/linux/issues/3060
Signed-off-by: Jonathan Bell <jonathan(a)raspberrypi.org>
Signed-off-by: Bjørn Mork <bjorn(a)mork.no>
---
[ resending this as I haven't seen any ack/nak and wonder if it might
have gotten lost in a large pile of vacation backlog ]
This took some time...
But now it is at least build and runtime tested on top of v5.14-rc2,
using an RPi4 and a generic RTL2838 DVB-T radio. Running rtl_test
from rtl-sdr on an unpatched v5.14-rc2:
root@idefix:/home/bjorn# rtl_test
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 00000001
Using device 0: Generic RTL2832U OEM
rtlsdr_read_reg failed with -7
rtlsdr_write_reg failed with -7
rtlsdr_read_reg failed with -7
rtlsdr_write_reg failed with -7
rtlsdr_read_reg failed with -7
rtlsdr_write_reg failed with -7
rtlsdr_read_reg failed with -7
rtlsdr_write_reg failed with -7
No supported tuner found
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
Enabled direct sampling mode, input 1
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
Supported gain values (1): 0.0
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
WARNING: Failed to set sample rate.
rtlsdr_demod_write_reg failed with -7
rtlsdr_demod_read_reg failed with -7
rtlsdr_write_reg failed with -7
rtlsdr_write_reg failed with -7
Info: This tool will continuously read from the device, and report if
samples get lost. If you observe no further output, everything is fine.
Reading samples in async mode...
Allocating 15 zero-copy buffers
And with this fix in place:
root@idefix:/home/bjorn# rtl_test
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 00000001
Using device 0: Generic RTL2832U OEM
Detached kernel driver
Found Fitipower FC0012 tuner
Supported gain values (5): -9.9 -4.0 7.1 17.9 19.2
Sampling at 2048000 S/s.
Info: This tool will continuously read from the device, and report if
samples get lost. If you observe no further output, everything is fine.
Reading samples in async mode...
Allocating 15 zero-copy buffers
lost at least 88 bytes
Please apply to stable as well. I'd really like to see this fixed in
my favourite distro kernel. You'll probably want Jonathans original
patch, as posted in <20210702071338.42777-1-bjorn(a)mork.no>, for
anything older than v5.12. I'll resend that for stable once/if this
is accepted in mainline.
Bjørn
drivers/usb/host/xhci-pci.c | 4 +++-
drivers/usb/host/xhci-ring.c | 25 ++++++++++++++++++++++++-
drivers/usb/host/xhci.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 18c2bbddf080..6f3bed09028c 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -279,8 +279,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_EP_CTX_BROKEN_DCS;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 8fea44bbc266..ba978a8fa414 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -559,8 +559,11 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
struct xhci_ring *ep_ring;
struct xhci_command *cmd;
struct xhci_segment *new_seg;
+ struct xhci_segment *halted_seg = NULL;
union xhci_trb *new_deq;
int new_cycle;
+ union xhci_trb *halted_trb;
+ int index = 0;
dma_addr_t addr;
u64 hw_dequeue;
bool cycle_found = false;
@@ -598,7 +601,27 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
hw_dequeue = xhci_get_hw_deq(xhci, dev, ep_index, stream_id);
new_seg = ep_ring->deq_seg;
new_deq = ep_ring->dequeue;
- new_cycle = hw_dequeue & 0x1;
+
+ /*
+ * Quirk: xHC write-back of the DCS field in the hardware dequeue
+ * pointer is wrong - use the cycle state of the TRB pointed to by
+ * the dequeue pointer.
+ */
+ if (xhci->quirks & XHCI_EP_CTX_BROKEN_DCS &&
+ !(ep->ep_state & EP_HAS_STREAMS))
+ halted_seg = trb_in_td(xhci, td->start_seg,
+ td->first_trb, td->last_trb,
+ hw_dequeue & ~0xf, false);
+ if (halted_seg) {
+ index = ((dma_addr_t)(hw_dequeue & ~0xf) - halted_seg->dma) /
+ sizeof(*halted_trb);
+ halted_trb = &halted_seg->trbs[index];
+ new_cycle = halted_trb->generic.field[3] & 0x1;
+ xhci_dbg(xhci, "Endpoint DCS = %d TRB index = %d cycle = %d\n",
+ (u8)(hw_dequeue & 0x1), index, new_cycle);
+ } else {
+ new_cycle = hw_dequeue & 0x1;
+ }
/*
* We want to find the pointer, segment and cycle state of the new trb
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 3c7d281672ae..911aeb7d8a19 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1896,6 +1896,7 @@ struct xhci_hcd {
#define XHCI_SG_TRB_CACHE_SIZE_QUIRK BIT_ULL(39)
#define XHCI_NO_SOFT_RETRY BIT_ULL(40)
#define XHCI_BROKEN_D3COLD BIT_ULL(41)
+#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.20.1
The upstream changes necessary to fix these CVEs rely on the presence of JMP32,
which is not a small backport and brings its own potential set of necessary
follow-ups.
Daniel Borkmann, John Fastabend and Alexei Starovoitov came up with a fix
involving the use of the AX register.
This has been tested against the test_verifier in 4.14.y tree and some tests
specific to the two referred CVEs. The test_bpf module was also tested.
Daniel Borkmann (4):
bpf: Do not use ax register in interpreter on div/mod
bpf: fix subprog verifier bypass by div/mod by 0 exception
bpf: Fix 32 bit src register truncation on div/mod
bpf: Fix truncation handling for mod32 dst reg wrt zero
include/linux/filter.h | 24 ++++++++++++++++++++++++
kernel/bpf/core.c | 40 +++++++++++++++-------------------------
kernel/bpf/verifier.c | 39 +++++++++++++++++++++++++++++++--------
net/core/filter.c | 9 ++++++++-
4 files changed, 78 insertions(+), 34 deletions(-)
--
2.30.2
Hi stable folks,
please queue for 5.10-stable.
See https://bugzilla.kernel.org/show_bug.cgi?id=214159 for more info.
---
Commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 upstream.
The kthread_is_per_cpu() construct relies on only being called on
PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the
following usage pattern:
if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
However, as reported by syzcaller, this is broken. The scenario is:
CPU0 CPU1 (running p)
(p->flags & PF_KTHREAD) // true
begin_new_exec()
me->flags &= ~(PF_KTHREAD|...);
kthread_is_per_cpu(p)
to_kthread(p)
WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
Introduce __to_kthread() that omits the WARN and is sure to check both
values.
Use this to remove the problematic pattern for kthread_is_per_cpu()
and fix a number of other kthread_*() functions that have similar
issues but are currently not used in ways that would expose the
problem.
Notably kthread_func() is only ever called on 'current', while
kthread_probe_data() is only used for PF_WQ_WORKER, which implies the
task is from kthread_create*().
Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider(a)arm.com>
Link: https://lkml.kernel.org/r/YH6WJc825C4P0FCK@hirez.programming.kicks-ass.net
[ Drop the balance_push() hunk as it is not needed. ]
Signed-off-by: Borislav Petkov <bp(a)suse.de>
---
kernel/kthread.c | 33 +++++++++++++++++++++++++++------
kernel/sched/fair.c | 2 +-
2 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9825cf89c614..508fe5278285 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -84,6 +84,25 @@ static inline struct kthread *to_kthread(struct task_struct *k)
return (__force void *)k->set_child_tid;
}
+/*
+ * Variant of to_kthread() that doesn't assume @p is a kthread.
+ *
+ * Per construction; when:
+ *
+ * (p->flags & PF_KTHREAD) && p->set_child_tid
+ *
+ * the task is both a kthread and struct kthread is persistent. However
+ * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and
+ * begin_new_exec()).
+ */
+static inline struct kthread *__to_kthread(struct task_struct *p)
+{
+ void *kthread = (__force void *)p->set_child_tid;
+ if (kthread && !(p->flags & PF_KTHREAD))
+ kthread = NULL;
+ return kthread;
+}
+
void free_kthread_struct(struct task_struct *k)
{
struct kthread *kthread;
@@ -168,8 +187,9 @@ EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
*/
void *kthread_func(struct task_struct *task)
{
- if (task->flags & PF_KTHREAD)
- return to_kthread(task)->threadfn;
+ struct kthread *kthread = __to_kthread(task);
+ if (kthread)
+ return kthread->threadfn;
return NULL;
}
EXPORT_SYMBOL_GPL(kthread_func);
@@ -199,10 +219,11 @@ EXPORT_SYMBOL_GPL(kthread_data);
*/
void *kthread_probe_data(struct task_struct *task)
{
- struct kthread *kthread = to_kthread(task);
+ struct kthread *kthread = __to_kthread(task);
void *data = NULL;
- copy_from_kernel_nofault(&data, &kthread->data, sizeof(data));
+ if (kthread)
+ copy_from_kernel_nofault(&data, &kthread->data, sizeof(data));
return data;
}
@@ -514,9 +535,9 @@ void kthread_set_per_cpu(struct task_struct *k, int cpu)
set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
}
-bool kthread_is_per_cpu(struct task_struct *k)
+bool kthread_is_per_cpu(struct task_struct *p)
{
- struct kthread *kthread = to_kthread(k);
+ struct kthread *kthread = __to_kthread(p);
if (!kthread)
return false;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 262b02d75007..bad97d35684d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7569,7 +7569,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;
/* Disregard pcpu kthreads; they are where they need to be. */
- if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ if (kthread_is_per_cpu(p))
return 0;
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
--
2.29.2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 064c734986011390b4d111f1a99372b7f26c3850 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:49 -0700
Subject: [PATCH] ubifs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 2e4e1d159969..5cfa28cd00cd 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1630,6 +1630,17 @@ static const char *ubifs_get_link(struct dentry *dentry,
return fscrypt_get_symlink(inode, ui->data, ui->data_len, done);
}
+static int ubifs_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ ubifs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ if (IS_ENCRYPTED(d_inode(path->dentry)))
+ return fscrypt_symlink_getattr(path, stat);
+ return 0;
+}
+
const struct address_space_operations ubifs_file_address_operations = {
.readpage = ubifs_readpage,
.writepage = ubifs_writepage,
@@ -1655,7 +1666,7 @@ const struct inode_operations ubifs_file_inode_operations = {
const struct inode_operations ubifs_symlink_inode_operations = {
.get_link = ubifs_get_link,
.setattr = ubifs_setattr,
- .getattr = ubifs_getattr,
+ .getattr = ubifs_symlink_getattr,
.listxattr = ubifs_listxattr,
.update_time = ubifs_update_time,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 064c734986011390b4d111f1a99372b7f26c3850 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:49 -0700
Subject: [PATCH] ubifs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 2e4e1d159969..5cfa28cd00cd 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1630,6 +1630,17 @@ static const char *ubifs_get_link(struct dentry *dentry,
return fscrypt_get_symlink(inode, ui->data, ui->data_len, done);
}
+static int ubifs_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ ubifs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ if (IS_ENCRYPTED(d_inode(path->dentry)))
+ return fscrypt_symlink_getattr(path, stat);
+ return 0;
+}
+
const struct address_space_operations ubifs_file_address_operations = {
.readpage = ubifs_readpage,
.writepage = ubifs_writepage,
@@ -1655,7 +1666,7 @@ const struct inode_operations ubifs_file_inode_operations = {
const struct inode_operations ubifs_symlink_inode_operations = {
.get_link = ubifs_get_link,
.setattr = ubifs_setattr,
- .getattr = ubifs_getattr,
+ .getattr = ubifs_symlink_getattr,
.listxattr = ubifs_listxattr,
.update_time = ubifs_update_time,
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 064c734986011390b4d111f1a99372b7f26c3850 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:49 -0700
Subject: [PATCH] ubifs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 2e4e1d159969..5cfa28cd00cd 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1630,6 +1630,17 @@ static const char *ubifs_get_link(struct dentry *dentry,
return fscrypt_get_symlink(inode, ui->data, ui->data_len, done);
}
+static int ubifs_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ ubifs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ if (IS_ENCRYPTED(d_inode(path->dentry)))
+ return fscrypt_symlink_getattr(path, stat);
+ return 0;
+}
+
const struct address_space_operations ubifs_file_address_operations = {
.readpage = ubifs_readpage,
.writepage = ubifs_writepage,
@@ -1655,7 +1666,7 @@ const struct inode_operations ubifs_file_inode_operations = {
const struct inode_operations ubifs_symlink_inode_operations = {
.get_link = ubifs_get_link,
.setattr = ubifs_setattr,
- .getattr = ubifs_getattr,
+ .getattr = ubifs_symlink_getattr,
.listxattr = ubifs_listxattr,
.update_time = ubifs_update_time,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 064c734986011390b4d111f1a99372b7f26c3850 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:49 -0700
Subject: [PATCH] ubifs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 2e4e1d159969..5cfa28cd00cd 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1630,6 +1630,17 @@ static const char *ubifs_get_link(struct dentry *dentry,
return fscrypt_get_symlink(inode, ui->data, ui->data_len, done);
}
+static int ubifs_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ ubifs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ if (IS_ENCRYPTED(d_inode(path->dentry)))
+ return fscrypt_symlink_getattr(path, stat);
+ return 0;
+}
+
const struct address_space_operations ubifs_file_address_operations = {
.readpage = ubifs_readpage,
.writepage = ubifs_writepage,
@@ -1655,7 +1666,7 @@ const struct inode_operations ubifs_file_inode_operations = {
const struct inode_operations ubifs_symlink_inode_operations = {
.get_link = ubifs_get_link,
.setattr = ubifs_setattr,
- .getattr = ubifs_getattr,
+ .getattr = ubifs_symlink_getattr,
.listxattr = ubifs_listxattr,
.update_time = ubifs_update_time,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:48 -0700
Subject: [PATCH] f2fs: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e149c8c66a71..9c528e583c9d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1323,9 +1323,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry,
return target;
}
+static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
.get_link = f2fs_encrypted_get_link,
- .getattr = f2fs_getattr,
+ .getattr = f2fs_encrypted_symlink_getattr,
.setattr = f2fs_setattr,
.listxattr = f2fs_listxattr,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d18760560593e5af921f51a8c9b64b6109d634c2 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:46 -0700
Subject: [PATCH] fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..af74599ae1cf 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
+
+/**
+ * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks
+ * @path: the path for the encrypted symlink being queried
+ * @stat: the struct being filled with the symlink's attributes
+ *
+ * Override st_size of encrypted symlinks to be the length of the decrypted
+ * symlink target (or the no-key encoded symlink target, if the key is
+ * unavailable) rather than the length of the encrypted symlink target. This is
+ * necessary for st_size to match the symlink target that userspace actually
+ * sees. POSIX requires this, and some userspace programs depend on it.
+ *
+ * This requires reading the symlink target from disk if needed, setting up the
+ * inode's encryption key if possible, and then decrypting or encoding the
+ * symlink target. This makes lstat() more heavyweight than is normally the
+ * case. However, decrypted symlink targets will be cached in ->i_link, so
+ * usually the symlink won't have to be read and decrypted again later if/when
+ * it is actually followed, readlink() is called, or lstat() is called again.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat)
+{
+ struct dentry *dentry = path->dentry;
+ struct inode *inode = d_inode(dentry);
+ const char *link;
+ DEFINE_DELAYED_CALL(done);
+
+ /*
+ * To get the symlink target that userspace will see (whether it's the
+ * decrypted target or the no-key encoded target), we can just get it in
+ * the same way the VFS does during path resolution and readlink().
+ */
+ link = READ_ONCE(inode->i_link);
+ if (!link) {
+ link = inode->i_op->get_link(dentry, inode, &done);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+ stat->size = strlen(link);
+ do_delayed_call(&done);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..b7bfd0cd4f3e 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
+int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
@@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode,
return ERR_PTR(-EOPNOTSUPP);
}
+static inline int fscrypt_symlink_getattr(const struct path *path,
+ struct kstat *stat)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c4bca10ceafc43b1ca0a9fab5fa27e13cbce99e Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 1 Jul 2021 23:53:47 -0700
Subject: [PATCH] ext4: report correct st_size for encrypted symlinks
The stat() family of syscalls report the wrong size for encrypted
symlinks, which has caused breakage in several userspace programs.
Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for
encrypted symlinks. This function computes the correct size by reading
and decrypting the symlink target (if it's not already cached).
For more details, see the commit which added fscrypt_symlink_getattr().
Fixes: f348c252320b ("ext4 crypto: add symlink encryption")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index dd05af983092..69109746e6e2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry,
return paddr;
}
+static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+ const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+
+ return fscrypt_symlink_getattr(path, stat);
+}
+
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.get_link = ext4_encrypted_get_link,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
};
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
- apply documentation changes to Documentation/virtual/kvm/mmu.txt
- add vcpu parameter to gpte_access() call]
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
Note: The backport was validated by running the kvm-unit-tests testcase [1]
mentioned in the commit message (the testcase fails without the patch and
passes with the patch applied).
[1] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/commit/47fd6bc54674fb1d8…
Documentation/virtual/kvm/mmu.txt | 4 ++--
arch/x86/kvm/paging_tmpl.h | 14 +++++++++-----
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index e507a9e0421e..851a8abcadce 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -152,8 +152,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8cf7a09bdd73..677b195f7cf1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -93,8 +93,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -388,13 +388,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(vcpu, pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -432,7 +434,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -601,7 +604,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
{
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, ret;
gfn_t gfn, base_gfn;
@@ -633,6 +636,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
--
2.18.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 112022bdb5bc372e00e6e43cb88ee38ea67b97bd Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:47 -0700
Subject: [PATCH] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP
shadow MMUs
Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
Checking the mitigation itself is not sufficient as it can be toggled on
at any time and KVM doesn't reset MMU contexts when that happens. KVM
could reset the contexts, but that would require purging all SPTEs in all
MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is
disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
for shadow MMUs.
Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b3be690d081a..444e068e6ad9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4221,7 +4221,15 @@ static inline u64 reserved_hpa_bits(void)
void
reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
{
- bool uses_nx = context->nx ||
+ /*
+ * KVM uses NX when TDP is disabled to handle a variety of scenarios,
+ * notably for huge SPTEs if iTLB multi-hit mitigation is enabled and
+ * to generate correct permissions for CR0.WP=0/CR4.SMEP=1/EFER.NX=0.
+ * The iTLB multi-hit workaround can be toggled at any time, so assume
+ * NX can be used by any non-nested shadow MMU to avoid having to reset
+ * MMU contexts. Note, KVM forces EFER.NX=1 when TDP is disabled.
+ */
+ bool uses_nx = context->nx || !tdp_enabled ||
context->mmu_role.base.smep_andnot_wp;
struct rsvd_bits_validate *shadow_zero_check;
int i;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f123c42bbeff26bfe8bdb08a01307e92d51eec39 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 23 Jun 2021 13:39:33 -0700
Subject: [PATCH] lkdtm: Enable DOUBLE_FAULT on all architectures
Where feasible, I prefer to have all tests visible on all architectures,
but to have them wired to XFAIL. DOUBLE_FAIL was set up to XFAIL, but
wasn't actually being added to the test list.
Fixes: cea23efb4de2 ("lkdtm/bugs: Make double-fault test always available")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20210623203936.3151093-7-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 645b31e98c77..2c89fc18669f 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -178,9 +178,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(STACKLEAK_ERASING),
CRASHTYPE(CFI_FORWARD_PROTO),
CRASHTYPE(FORTIFIED_STRSCPY),
-#ifdef CONFIG_X86_32
CRASHTYPE(DOUBLE_FAULT),
-#endif
#ifdef CONFIG_PPC_BOOK3S_64
CRASHTYPE(PPC_SLB_MULTIHIT),
#endif
[ Upstream commit 7428022b50d0fbb4846dd0f00639ea09d36dff02 ]
When a port leaves a VLAN-aware bridge, the current code does not clear
other ports' matrix field bit. If the bridge is later set to VLAN-unaware
mode, traffic in the bridge may leak to that port.
Remove the VLAN filtering check in mt7530_port_bridge_leave.
Fixes: 4fe4e1f48ba1 ("net: dsa: mt7530: fix VLAN traffic leaks")
Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530")
Signed-off-by: DENG Qingfang <dqfext(a)gmail.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/dsa/mt7530.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index aec606058d98..fc45af12612f 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -840,11 +840,8 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
/* Remove this port from the port matrix of the other ports
* in the same bridge. If the port is disabled, port matrix
* is kept and not being setup until the port becomes enabled.
- * And the other port's port matrix cannot be broken when the
- * other port is still a VLAN-aware port.
*/
- if (dsa_is_user_port(ds, i) && i != port &&
- !dsa_port_is_vlan_filtering(&ds->ports[i])) {
+ if (dsa_is_user_port(ds, i) && i != port) {
if (dsa_to_port(ds, i)->bridge_dev != bridge)
continue;
if (priv->ports[i].enable)
--
2.25.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7b40066c97ec66a44e388f82fcf694987451768f Mon Sep 17 00:00:00 2001
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Date: Thu, 5 Aug 2021 15:29:54 -0400
Subject: [PATCH] tracepoint: Use rcu get state and cond sync for static call
updates
State transitions from 1->0->1 and N->2->1 callbacks require RCU
synchronization. Rather than performing the RCU synchronization every
time the state change occurs, which is quite slow when many tracepoints
are registered in batch, instead keep a snapshot of the RCU state on the
most recent transitions which belong to a chain, and conditionally wait
for a grace period on the last transition of the chain if one g.p. has
not elapsed since the last snapshot.
This applies to both RCU and SRCU.
This brings the performance regression caused by commit 231264d6927f
("Fix: tracepoint: static call function vs data state mismatch") back to
what it was originally.
Before this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m10.593s
user 0m0.017s
sys 0m0.259s
After this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m0.878s
user 0m0.000s
sys 0m0.103s
Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios…
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba…
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Stefan Metzmacher <metze(a)samba.org>
Fixes: 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 8d772bd6894d..efd14c79fab4 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -28,6 +28,44 @@ extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
DEFINE_SRCU(tracepoint_srcu);
EXPORT_SYMBOL_GPL(tracepoint_srcu);
+enum tp_transition_sync {
+ TP_TRANSITION_SYNC_1_0_1,
+ TP_TRANSITION_SYNC_N_2_1,
+
+ _NR_TP_TRANSITION_SYNC,
+};
+
+struct tp_transition_snapshot {
+ unsigned long rcu;
+ unsigned long srcu;
+ bool ongoing;
+};
+
+/* Protected by tracepoints_mutex */
+static struct tp_transition_snapshot tp_transition_snapshot[_NR_TP_TRANSITION_SYNC];
+
+static void tp_rcu_get_state(enum tp_transition_sync sync)
+{
+ struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync];
+
+ /* Keep the latest get_state snapshot. */
+ snapshot->rcu = get_state_synchronize_rcu();
+ snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu);
+ snapshot->ongoing = true;
+}
+
+static void tp_rcu_cond_sync(enum tp_transition_sync sync)
+{
+ struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync];
+
+ if (!snapshot->ongoing)
+ return;
+ cond_synchronize_rcu(snapshot->rcu);
+ if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu))
+ synchronize_srcu(&tracepoint_srcu);
+ snapshot->ongoing = false;
+}
+
/* Set to 1 to enable tracepoint debug output */
static const int tracepoint_debug;
@@ -311,6 +349,11 @@ static int tracepoint_add_func(struct tracepoint *tp,
*/
switch (nr_func_state(tp_funcs)) {
case TP_FUNC_1: /* 0->1 */
+ /*
+ * Make sure new static func never uses old data after a
+ * 1->0->1 transition sequence.
+ */
+ tp_rcu_cond_sync(TP_TRANSITION_SYNC_1_0_1);
/* Set static call to first function */
tracepoint_update_call(tp, tp_funcs);
/* Both iterator and static call handle NULL tp->funcs */
@@ -325,10 +368,15 @@ static int tracepoint_add_func(struct tracepoint *tp,
* Requires ordering between RCU assign/dereference and
* static call update/call.
*/
- rcu_assign_pointer(tp->funcs, tp_funcs);
- break;
+ fallthrough;
case TP_FUNC_N: /* N->N+1 (N>1) */
rcu_assign_pointer(tp->funcs, tp_funcs);
+ /*
+ * Make sure static func never uses incorrect data after a
+ * N->...->2->1 (N>1) transition sequence.
+ */
+ if (tp_funcs[0].data != old[0].data)
+ tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1);
break;
default:
WARN_ON_ONCE(1);
@@ -372,24 +420,23 @@ static int tracepoint_remove_func(struct tracepoint *tp,
/* Both iterator and static call handle NULL tp->funcs */
rcu_assign_pointer(tp->funcs, NULL);
/*
- * Make sure new func never uses old data after a 1->0->1
- * transition sequence.
- * Considering that transition 0->1 is the common case
- * and don't have rcu-sync, issue rcu-sync after
- * transition 1->0 to break that sequence by waiting for
- * readers to be quiescent.
+ * Make sure new static func never uses old data after a
+ * 1->0->1 transition sequence.
*/
- tracepoint_synchronize_unregister();
+ tp_rcu_get_state(TP_TRANSITION_SYNC_1_0_1);
break;
case TP_FUNC_1: /* 2->1 */
rcu_assign_pointer(tp->funcs, tp_funcs);
/*
- * On 2->1 transition, RCU sync is needed before setting
- * static call to first callback, because the observer
- * may have loaded any prior tp->funcs after the last one
- * associated with an rcu-sync.
+ * Make sure static func never uses incorrect data after a
+ * N->...->2->1 (N>2) transition sequence. If the first
+ * element's data has changed, then force the synchronization
+ * to prevent current readers that have loaded the old data
+ * from calling the new function.
*/
- tracepoint_synchronize_unregister();
+ if (tp_funcs[0].data != old[0].data)
+ tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1);
+ tp_rcu_cond_sync(TP_TRANSITION_SYNC_N_2_1);
/* Set static call to first function */
tracepoint_update_call(tp, tp_funcs);
break;
@@ -397,6 +444,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
fallthrough;
case TP_FUNC_N:
rcu_assign_pointer(tp->funcs, tp_funcs);
+ /*
+ * Make sure static func never uses incorrect data after a
+ * N->...->2->1 (N>2) transition sequence.
+ */
+ if (tp_funcs[0].data != old[0].data)
+ tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1);
break;
default:
WARN_ON_ONCE(1);
One cannot ftrace functions used to setup ftrace. Doing so leads to a
racy kernel panic as observed by users on SiFive HiFive Unmatched
boards with Ubuntu kernels.
This has been debugged and fixed in v5.12 kernels by ensuring that all
sbi functions are excluded from ftrace.
Link: https://forums.sifive.com/t/u-boot-says-unhandled-exception-illegal-instruc…
BugLink: https://bugs.launchpad.net/bugs/1934548
Guo Ren (2):
riscv: Fixup wrong ftrace remove cflag
riscv: Fixup patch_text panic in ftrace
arch/riscv/kernel/Makefile | 5 +++--
arch/riscv/mm/Makefile | 3 ++-
2 files changed, 5 insertions(+), 3 deletions(-)
--
2.30.2
During tracee-ebpf regression tests, it was discovered that a CO-RE capable
eBPF program, that relied on a kconfig BTF extern, could not be loaded with
the following error:
libbpf: prog 'tracepoint__raw_syscalls__sys_enter': failed to attach to raw
tracepoint 'sys_enter': Invalid argument
That happened because the CONFIG_ARCH_HAS_SYSCALL_WRAPPER variable had the
wrong value, despite kconfig map existing, misleading the eBPF program
execution (which would then have different pointers, not accepted by the
verifier during load time).
I got the patch proposed here by bisecting upstream tree with the testcase
just described. I kindly ask you to include this patch in the LTS v5.4.x
series so CO-RE (Compile Once - Run Everywhere) eBPF programs, relying in
kconfig settings, can be correctly loaded in kernel series v5.4.
Link: https://github.com/aquasecurity/tracee/issues/851#issuecomment-903074596
I have tested latest 5.4 stable tree with this patch and it fixes the issue.
-rafaeldtinoco
A bad "Fixes" SHA in mainline 7387a72c5f8 references a non-public
SHA, instead of referencing f8dd60de1948 -- see:
https://lore.kernel.org/lkml/20210817075644.0b5123d2@canb.auug.org.au/
This matters to -stable since the broken commit is here:
stable-queue$git grep -l f8dd60de1948
releases/5.10.56/tipc-fix-implicit-connect-for-syn.patch
releases/5.13.8/tipc-fix-implicit-connect-for-syn.patch
and hence those releases will need mainline 7387a72c5f8 applied but
I suspect automatic Fixes: parsing won't "see" it.
Thanks,
Paul.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4e9655763b82a91e4c341835bb504a2b1590f984 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Wed, 25 Aug 2021 13:41:42 +0800
Subject: [PATCH] Revert "btrfs: compression: don't try to compress if we don't
have enough pages"
This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763.
[BUG]
It's no longer possible to create compressed inline extent after commit
f2165627319f ("btrfs: compression: don't try to compress if we don't
have enough pages").
[CAUSE]
For compression code, there are several possible reasons we have a range
that needs to be compressed while it's no more than one page.
- Compressed inline write
The data is always smaller than one sector and the test lacks the
condition to properly recognize a non-inline extent.
- Compressed subpage write
For the incoming subpage compressed write support, we require page
alignment of the delalloc range.
And for 64K page size, we can compress just one page into smaller
sectors.
For those reasons, the requirement for the data to be more than one page
is not correct, and is already causing regression for compressed inline
data writeback. The idea of skipping one page to avoid wasting CPU time
could be revisited in the future.
[FIX]
Fix it by reverting the offending commit.
Reported-by: Zygo Blaxell <ce3g8jdj(a)umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.n…
Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 06f9f167222b..bd5689fa290e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -629,7 +629,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
* inode has not been flagged as nocompress. This flag can
* change at any time if we discover bad compression ratios.
*/
- if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
+ if (inode_need_compress(BTRFS_I(inode), start, end)) {
WARN_ON(pages);
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages) {
From: Frieder Schrempf <frieder.schrempf(a)kontron.de>
The new generic NAND ECC framework stores the configuration and
requirements in separate places since commit 93ef92f6f422 ("mtd: nand: Use
the new generic ECC object"). In 5.10.x The SPI NAND layer still uses only
the requirements to track the ECC properties. This mismatch leads to
values of zero being used for ECC strength and step_size in the SPI NAND
layer wherever nanddev_get_ecc_conf() is used and therefore breaks the SPI
NAND on-die ECC support in 5.10.x.
By using nanddev_get_ecc_requirements() instead of nanddev_get_ecc_conf()
for SPI NAND, we make sure that the correct parameters for the detected
chip are used. In later versions (5.11.x) this is fixed anyway with the
implementation of the SPI NAND on-die ECC engine.
Cc: stable(a)vger.kernel.org # 5.10.x
Reported-by: voice INTER connect GmbH <developer(a)voiceinterconnect.de>
Signed-off-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Acked-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changes in v2:
* Fix checkpatch error/warnings for commit message style
* Add Miquel's A-b tag
---
drivers/mtd/nand/spi/core.c | 6 +++---
drivers/mtd/nand/spi/macronix.c | 6 +++---
drivers/mtd/nand/spi/toshiba.c | 6 +++---
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 558d8a14810b..8794a1f6eacd 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -419,7 +419,7 @@ static int spinand_check_ecc_status(struct spinand_device *spinand, u8 status)
* fixed, so let's return the maximum possible value so that
* wear-leveling layers move the data immediately.
*/
- return nanddev_get_ecc_conf(nand)->strength;
+ return nanddev_get_ecc_requirements(nand)->strength;
case STATUS_ECC_UNCOR_ERROR:
return -EBADMSG;
@@ -1090,8 +1090,8 @@ static int spinand_init(struct spinand_device *spinand)
mtd->oobavail = ret;
/* Propagate ECC information to mtd_info */
- mtd->ecc_strength = nanddev_get_ecc_conf(nand)->strength;
- mtd->ecc_step_size = nanddev_get_ecc_conf(nand)->step_size;
+ mtd->ecc_strength = nanddev_get_ecc_requirements(nand)->strength;
+ mtd->ecc_step_size = nanddev_get_ecc_requirements(nand)->step_size;
return 0;
diff --git a/drivers/mtd/nand/spi/macronix.c b/drivers/mtd/nand/spi/macronix.c
index 8e801e4c3a00..cd7a9cacc3fb 100644
--- a/drivers/mtd/nand/spi/macronix.c
+++ b/drivers/mtd/nand/spi/macronix.c
@@ -84,11 +84,11 @@ static int mx35lf1ge4ab_ecc_get_status(struct spinand_device *spinand,
* data around if it's not necessary.
*/
if (mx35lf1ge4ab_get_eccsr(spinand, &eccsr))
- return nanddev_get_ecc_conf(nand)->strength;
+ return nanddev_get_ecc_requirements(nand)->strength;
- if (WARN_ON(eccsr > nanddev_get_ecc_conf(nand)->strength ||
+ if (WARN_ON(eccsr > nanddev_get_ecc_requirements(nand)->strength ||
!eccsr))
- return nanddev_get_ecc_conf(nand)->strength;
+ return nanddev_get_ecc_requirements(nand)->strength;
return eccsr;
diff --git a/drivers/mtd/nand/spi/toshiba.c b/drivers/mtd/nand/spi/toshiba.c
index 21fde2875674..6fe7bd2a94d2 100644
--- a/drivers/mtd/nand/spi/toshiba.c
+++ b/drivers/mtd/nand/spi/toshiba.c
@@ -90,12 +90,12 @@ static int tx58cxgxsxraix_ecc_get_status(struct spinand_device *spinand,
* data around if it's not necessary.
*/
if (spi_mem_exec_op(spinand->spimem, &op))
- return nanddev_get_ecc_conf(nand)->strength;
+ return nanddev_get_ecc_requirements(nand)->strength;
mbf >>= 4;
- if (WARN_ON(mbf > nanddev_get_ecc_conf(nand)->strength || !mbf))
- return nanddev_get_ecc_conf(nand)->strength;
+ if (WARN_ON(mbf > nanddev_get_ecc_requirements(nand)->strength || !mbf))
+ return nanddev_get_ecc_requirements(nand)->strength;
return mbf;
--
2.32.0
From: Filipe Manana <fdmanana(a)suse.com>
commit bc0939fcfab0d7efb2ed12896b1af3d819954a14 upstream.
We have a race between marking that an inode needs to be logged, either
at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between
btrfs_sync_log(). The following steps describe how the race happens.
1) We are at transaction N;
2) Inode I was previously fsynced in the current transaction so it has:
inode->logged_trans set to N;
3) The inode's root currently has:
root->log_transid set to 1
root->last_log_commit set to 0
Which means only one log transaction was committed to far, log
transaction 0. When a log tree is created we set ->log_transid and
->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());
4) One more range of pages is dirtied in inode I;
5) Some task A starts an fsync against some other inode J (same root), and
so it joins log transaction 1.
Before task A calls btrfs_sync_log()...
6) Task B starts an fsync against inode I, which currently has the full
sync flag set, so it starts delalloc and waits for the ordered extent
to complete before calling btrfs_inode_in_log() at btrfs_sync_file();
7) During ordered extent completion we have btrfs_update_inode() called
against inode I, which in turn calls btrfs_set_inode_last_trans(),
which does the following:
spin_lock(&inode->lock);
inode->last_trans = trans->transaction->transid;
inode->last_sub_trans = inode->root->log_transid;
inode->last_log_commit = inode->root->last_log_commit;
spin_unlock(&inode->lock);
So ->last_trans is set to N and ->last_sub_trans set to 1.
But before setting ->last_log_commit...
8) Task A is at btrfs_sync_log():
- it increments root->log_transid to 2
- starts writeback for all log tree extent buffers
- waits for the writeback to complete
- writes the super blocks
- updates root->last_log_commit to 1
It's a lot of slow steps between updating root->log_transid and
root->last_log_commit;
9) The task doing the ordered extent completion, currently at
btrfs_set_inode_last_trans(), then finally runs:
inode->last_log_commit = inode->root->last_log_commit;
spin_unlock(&inode->lock);
Which results in inode->last_log_commit being set to 1.
The ordered extent completes;
10) Task B is resumed, and it calls btrfs_inode_in_log() which returns
true because we have all the following conditions met:
inode->logged_trans == N which matches fs_info->generation &&
inode->last_subtrans (1) <= inode->last_log_commit (1) &&
inode->last_subtrans (1) <= root->last_log_commit (1) &&
list inode->extent_tree.modified_extents is empty
And as a consequence we return without logging the inode, so the
existing logged version of the inode does not point to the extent
that was written after the previous fsync.
It should be impossible in practice for one task be able to do so much
progress in btrfs_sync_log() while another task is at
btrfs_set_inode_last_trans() right after it reads root->log_transid and
before it reads root->last_log_commit. Even if kernel preemption is enabled
we know the task at btrfs_set_inode_last_trans() can not be preempted
because it is holding the inode's spinlock.
However there is another place where we do the same without holding the
spinlock, which is in the memory mapped write path at:
vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
{
(...)
BTRFS_I(inode)->last_trans = fs_info->generation;
BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
(...)
So with preemption happening after setting ->last_sub_trans and before
setting ->last_log_commit, it is less of a stretch to have another task
do enough progress at btrfs_sync_log() such that the task doing the memory
mapped write ends up with ->last_sub_trans and ->last_log_commit set to
the same value. It is still a big stretch to get there, as the task doing
btrfs_sync_log() has to start writeback, wait for its completion and write
the super blocks.
So fix this in two different ways:
1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the
value of ->last_sub_trans minus 1;
2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just
like we do for buffered and direct writes at btrfs_file_write_iter(),
which is all we need to make sure multiple writes and fsyncs to an
inode in the same transaction never result in an fsync missing that
the inode changed and needs to be logged. Turn this into a helper
function and use it both at btrfs_page_mkwrite() and at
btrfs_file_write_iter() - this also fixes the problem that at
btrfs_page_mkwrite() we were setting those fields without the
protection of the inode's spinlock.
This is an extremely unlikely race to happen in practice.
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
fs/btrfs/btrfs_inode.h | 15 +++++++++++++++
fs/btrfs/file.c | 10 ++--------
fs/btrfs/inode.c | 4 +---
fs/btrfs/transaction.h | 2 +-
4 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index f853835c409c..f3ff57b93158 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -268,6 +268,21 @@ static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
mod);
}
+/*
+ * Called every time after doing a buffered, direct IO or memory mapped write.
+ *
+ * This is to ensure that if we write to a file that was previously fsynced in
+ * the current transaction, then try to fsync it again in the same transaction,
+ * we will know that there were changes in the file and that it needs to be
+ * logged.
+ */
+static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
+{
+ spin_lock(&inode->lock);
+ inode->last_sub_trans = inode->root->log_transid;
+ spin_unlock(&inode->lock);
+}
+
static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
{
int ret = 0;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 400b0717b9d4..1279359ed172 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2004,14 +2004,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
inode_unlock(inode);
- /*
- * We also have to set last_sub_trans to the current log transid,
- * otherwise subsequent syncs to a file that's been synced in this
- * transaction will appear to have already occurred.
- */
- spin_lock(&BTRFS_I(inode)->lock);
- BTRFS_I(inode)->last_sub_trans = root->log_transid;
- spin_unlock(&BTRFS_I(inode)->lock);
+ btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
+
if (num_written > 0)
num_written = generic_write_sync(iocb, num_written);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b044b1d910de..1117335374ff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9250,9 +9250,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
set_page_dirty(page);
SetPageUptodate(page);
- BTRFS_I(inode)->last_trans = fs_info->generation;
- BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
- BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
+ btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index d8a7d460e436..cbede328bda5 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -160,7 +160,7 @@ static inline void btrfs_set_inode_last_trans(struct btrfs_trans_handle *trans,
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->last_trans = trans->transaction->transid;
BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
- BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
+ BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->last_sub_trans - 1;
spin_unlock(&BTRFS_I(inode)->lock);
}
--
2.31.1
Joe reports that using a statically allocated buffer for converting CPER
error records into human readable text is probably a bad idea. Even
though we are not aware of any actual issues, a stack buffer is clearly
a better choice here anyway, so let's move the buffer into the stack
frames of the two functions that refer to it.
Cc: <stable(a)vger.kernel.org>
Reported-by: Joe Perches <joe(a)perches.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/cper.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 73bdbd207e7a..6ec8edec6329 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -25,8 +25,6 @@
#include <acpi/ghes.h>
#include <ras/ras_event.h>
-static char rcd_decode_str[CPER_REC_LEN];
-
/*
* CPER record ID need to be unique even after reboot, because record
* ID is used as index for ERST storage, while CPER records from
@@ -312,6 +310,7 @@ const char *cper_mem_err_unpack(struct trace_seq *p,
struct cper_mem_err_compact *cmem)
{
const char *ret = trace_seq_buffer_ptr(p);
+ char rcd_decode_str[CPER_REC_LEN];
if (cper_mem_err_location(cmem, rcd_decode_str))
trace_seq_printf(p, "%s", rcd_decode_str);
@@ -326,6 +325,7 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
int len)
{
struct cper_mem_err_compact cmem;
+ char rcd_decode_str[CPER_REC_LEN];
/* Don't trust UEFI 2.1/2.2 structure with bad validation bits */
if (len == sizeof(struct cper_sec_mem_err_old) &&
--
2.30.2
Reading from the CMOS involves writing to the index register and then
reading from the data register. Therefore access to the CMOS has to be
serialized with a spinlock. An invocation in cmos_set_alarm was not
serialized with rtc_lock, fix this.
Use spin_lock_irq() like the rest of the function.
Nothing in kernel modifies the RTC_DM_BINARY bit, so use a separate pair
of spin_lock_irq() / spin_unlock_irq() before doing the math.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: stable(a)vger.kernel.org
---
drivers/rtc/rtc-cmos.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 0fa66d1039b9..e6ff0fb7591b 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -463,7 +463,10 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
min = t->time.tm_min;
sec = t->time.tm_sec;
+ spin_lock_irq(&rtc_lock);
rtc_control = CMOS_READ(RTC_CONTROL);
+ spin_unlock_irq(&rtc_lock);
+
if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
/* Writing 0xff means "don't care" or "match all". */
mon = (mon <= 12) ? bin2bcd(mon) : 0xff;
--
2.25.1
The patch titled
Subject: mm/hugetlb: initialize hugetlb_usage in mm_init
has been added to the -mm tree. Its filename is
mm-hugetlb-initialize-hugetlb_usage-in-mm_init.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-initialize-hugetlb_usa…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-initialize-hugetlb_usa…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Liu Zixian <liuzixian4(a)huawei.com>
Subject: mm/hugetlb: initialize hugetlb_usage in mm_init
After fork, the child process will get incorrect (2x) hugetlb_usage.
If a process uses 5 2MB hugetlb pages in an anonymous mapping,
HugetlbPages: 10240 kB
and then forks, the child will show,
HugetlbPages: 20480 kB
The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent to
child. Child will have 2x actual usage.
Fix this by adding hugetlb_count_init in mm_init.
Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes: 5d317b2b6536 ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian <liuzixian4(a)huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 9 +++++++++
kernel/fork.c | 1 +
2 files changed, 10 insertions(+)
--- a/include/linux/hugetlb.h~mm-hugetlb-initialize-hugetlb_usage-in-mm_init
+++ a/include/linux/hugetlb.h
@@ -858,6 +858,11 @@ static inline spinlock_t *huge_pte_lockp
void hugetlb_report_usage(struct seq_file *m, struct mm_struct *mm);
+static inline void hugetlb_count_init(struct mm_struct *mm)
+{
+ atomic_long_set(&mm->hugetlb_usage, 0);
+}
+
static inline void hugetlb_count_add(long l, struct mm_struct *mm)
{
atomic_long_add(l, &mm->hugetlb_usage);
@@ -1042,6 +1047,10 @@ static inline spinlock_t *huge_pte_lockp
return &mm->page_table_lock;
}
+static inline void hugetlb_count_init(struct mm_struct *mm)
+{
+}
+
static inline void hugetlb_report_usage(struct seq_file *f, struct mm_struct *m)
{
}
--- a/kernel/fork.c~mm-hugetlb-initialize-hugetlb_usage-in-mm_init
+++ a/kernel/fork.c
@@ -1052,6 +1052,7 @@ static struct mm_struct *mm_init(struct
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
+ hugetlb_count_init(mm);
if (current->mm) {
mm->flags = current->mm->flags & MMF_INIT_MASK;
_
Patches currently in -mm which might be from liuzixian4(a)huawei.com are
mm-hugetlb-initialize-hugetlb_usage-in-mm_init.patch
The patch titled
Subject: mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
has been added to the -mm tree. Its filename is
mm-hmm-bypass-devmap-pte-when-all-pfn-requested-flags-are-fulfilled.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-bypass-devmap-pte-when-all…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-bypass-devmap-pte-when-all…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Subject: mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
Previously, we noticed the one rpma example was failed[1] since
36f30e486d, where it will use ODP feature to do RDMA WRITE between fsdax
files.
After digging into the code, we found hmm_vma_handle_pte() will still
return EFAULT even though all the its requesting flags has been fulfilled.
That's because a DAX page will be marked as (_PAGE_SPECIAL | PAGE_DEVMAP)
by pte_mkdevmap().
[1]: https://github.com/pmem/rpma/issues/1142
Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com
Fixes: 405506274922 ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hmm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/hmm.c~mm-hmm-bypass-devmap-pte-when-all-pfn-requested-flags-are-fulfilled
+++ a/mm/hmm.c
@@ -295,10 +295,13 @@ static int hmm_vma_handle_pte(struct mm_
goto fault;
/*
+ * Bypass devmap pte such as DAX page when all pfn requested
+ * flags(pfn_req_flags) are fulfilled.
* Since each architecture defines a struct page for the zero page, just
* fall through and treat it like a normal page.
*/
- if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+ if (pte_special(pte) && !pte_devmap(pte) &&
+ !is_zero_pfn(pte_pfn(pte))) {
if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
pte_unmap(ptep);
return -EFAULT;
_
Patches currently in -mm which might be from lizhijian(a)cn.fujitsu.com are
mm-hmm-bypass-devmap-pte-when-all-pfn-requested-flags-are-fulfilled.patch
The patch titled
Subject: mm: fix panic caused by __page_handle_poison()
has been added to the -mm tree. Its filename is
mm-fix-panic-caused-by-__page_handle_poison.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-panic-caused-by-__page_han…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-panic-caused-by-__page_han…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michael Wang <yun.wang(a)linux.alibaba.com>
Subject: mm: fix panic caused by __page_handle_poison()
In commit 510d25c92ec4 ("mm/hwpoison: disable pcp for
page_handle_poison()"), __page_handle_poison() was introduced, and if we
mark:
RET_A = dissolve_free_huge_page();
RET_B = take_page_off_buddy();
then __page_handle_poison was supposed to return TRUE When RET_A == 0 &&
RET_B == TRUE
But since it failed to take care the case when RET_A is -EBUSY or -ENOMEM,
and just return the ret as a bool which actually become TRUE, it break the
original logic.
The following result is a huge page in freelist but was
referenced as poisoned, and lead into the final panic:
kernel BUG at mm/internal.h:95!
invalid opcode: 0000 [#1] SMP PTI
skip...
RIP: 0010:set_page_refcounted mm/internal.h:95 [inline]
RIP: 0010:remove_hugetlb_page+0x23c/0x240 mm/hugetlb.c:1371
skip...
Call Trace:
remove_pool_huge_page+0xe4/0x110 mm/hugetlb.c:1892
return_unused_surplus_pages+0x8d/0x150 mm/hugetlb.c:2272
hugetlb_acct_memory.part.91+0x524/0x690 mm/hugetlb.c:4017
This patch replaces 'bool' with 'int' to handle RET_A correctly.
Link: https://lkml.kernel.org/r/61782ac6-1e8a-4f6f-35e6-e94fce3b37f5@linux.alibab…
Fixes: 510d25c92ec4 ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Michael Wang <yun.wang(a)linux.alibaba.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-fix-panic-caused-by-__page_handle_poison
+++ a/mm/memory-failure.c
@@ -68,7 +68,7 @@ atomic_long_t num_poisoned_pages __read_
static bool __page_handle_poison(struct page *page)
{
- bool ret;
+ int ret;
zone_pcp_disable(page_zone(page));
ret = dissolve_free_huge_page(page);
@@ -76,7 +76,7 @@ static bool __page_handle_poison(struct
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));
- return ret;
+ return ret > 0;
}
static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
_
Patches currently in -mm which might be from yun.wang(a)linux.alibaba.com are
mm-fix-panic-caused-by-__page_handle_poison.patch