While reading from the tracing/trace, the ftrace reader rarely encounters
a KASAN invalid access issue.
It is likely that the writer has disrupted the ring_buffer that the reader
is currently parsing. the kasan report is as below:
[name:report&]BUG: KASAN: invalid-access in rb_iter_head_event+0x27c/0x3d0
[name:report&]Read of size 4 at addr 71ffff8111a18000 by task xxxx
[name:report_sw_tags&]Pointer tag: [71], memory tag: [0f]
[name:report&]
CPU: 2 PID: 380 Comm: xxxx
Call trace:
dump_backtrace+0x168/0x1b0
show_stack+0x2c/0x3c
dump_stack_lvl+0xa4/0xd4
print_report+0x268/0x9b0
kasan_report+0xdc/0x148
kasan_tag_mismatch+0x28/0x3c
__hwasan_tag_mismatch+0x2c/0x58
rb_event_length() [inline]
rb_iter_head_event+0x27c/0x3d0
ring_buffer_iter_peek+0x23c/0x6e0
__find_next_entry+0x1ac/0x3d8
s_next+0x1f0/0x310
seq_read_iter+0x4e8/0x77c
seq_read+0xf8/0x150
vfs_read+0x1a8/0x4cc
In some edge cases, ftrace reader could access to an invalid address,
specifically when reading 4 bytes beyond the end of the currently page.
While issue happened, the dump of rb_iter_head_event is shown as below:
in function rb_iter_head_event:
- iter->head = 0xFEC
- iter->next_event = 0xFEC
- commit = 0xFF0
- read_stamp = 0x2955AC46DB0
- page_stamp = 0x2955AC2439A
- iter->head_page->page = 0x71FFFF8111A17000
- iter->head_page->time_stamp = 0x2956A142267
- iter->head_page->page->commit = 0xFF0
- the content in iter->head_page->page
0x71FFFF8111A17FF0: 01010075 00002421 0A123B7C FFFFFFC0
In rb_iter_head_event, reader will call rb_event_length with argument
(struct ring_buffer_event *event = 0x71FFFF8111A17FFC).
Since the content data start at address 0x71FFFF8111A17FFC are 0xFFFFFFC0.
event->type will be interpret as 0x0, than the reader will try to get the
length by accessing event->array[0], which is an invalid address:
&event->array[0] = 0x71FFFF8111A18000
Cc: stable(a)vger.kernel.org
Signed-off-by: Tze-nan Wu <Tze-nan.Wu(a)mediatek.com>
---
resend again due to forget cc stable(a)vger.kernel.org
Following patch may not become a solution, it merely checks if the address
to be accessed is valid or not within the rb_event_length before access.
And not sure if there is any side-effect it can lead to.
I am curious about what a better solution for this issue would look like.
Should we address the problem from the writer or the reader?
Also I wonder if the cause of the issue is exactly as I suspected.
Any Suggestion will be appreciated.
Test below can reproduce the issue in 2 hours on kernel-6.1.24:
$cd /sys/kernel/tracing/
# make the reader and writer race more through resize the buffer to 8kb
$echo 8 > buffer_size_kn
# enable all events
$echo 1 > event/enable
# enable trace
$echo 1 > tracing_on
# write and run a script that keep reading trace
$./read_trace.sh
``` read_trace.sh
while :
do
cat /sys/kernel/tracing/trace > /dev/null
done
```
---
kernel/trace/ring_buffer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 78502d4c7214..ed5ddc3a134b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -200,6 +200,8 @@ rb_event_length(struct ring_buffer_event *event)
if (rb_null_event(event))
/* undefined */
return -1;
+ if (((unsigned long)event & 0xfffUL) >= PAGE_SIZE - 4)
+ return -1;
return event->array[0] + RB_EVNT_HDR_SIZE;
case RINGBUF_TYPE_TIME_EXTEND:
@@ -209,6 +211,8 @@ rb_event_length(struct ring_buffer_event *event)
return RB_LEN_TIME_STAMP;
case RINGBUF_TYPE_DATA:
+ if ((((unsigned long)event & 0xfffUL) >= PAGE_SIZE - 4) && !event->type_len)
+ return -1;
return rb_event_data_length(event);
default:
WARN_ON_ONCE(1);
--
2.18.0
Dear Andrey, dear Nick, dear Greg, dear Sasha,
Compiling the kernel with UBSAN enabled and with gcc-8 and later fails when:
commit 1e1b6d63d634 ("lib/string.c: implement stpcpy") is applied, and
commit bac7a1fff792 ("lib/ubsan: remove returns-nonnull-attribute checks") is
not applied.
To reproduce, run:
tuxmake -r docker -a arm64 -t gcc-13 -k allnoconfig --kconfig-add
CONFIG_UBSAN=y
It then fails with:
aarch64-linux-gnu-ld: lib/string.o: in function `stpcpy':
string.c:(.text+0x694): undefined reference to
`__ubsan_handle_nonnull_return_v1'
string.c:(.text+0x694): relocation truncated to fit:
R_AARCH64_CALL26 against undefined symbol
`__ubsan_handle_nonnull_return_v1'
Below you find a complete list of architectures, compiler versions and kernel
versions that I have tested with.
As commit bac7a1fff792 ("lib/ubsan: remove returns-nonnull-attribute checks") is
included in v4.16, and commit 1e1b6d63d634 ("lib/string.c: implement stpcpy") is
included in v5.9, this is not an issue that can happen on any mainline release
or the stable releases v4.19.y and later.
In the v4.14.y branch, however, commit 1e1b6d63d634 ("lib/string.c: implement
stpcpy") was included with v4.14.200 as commit b6d38137c19f and commit
bac7a1fff792 ("lib/ubsan: remove returns-nonnull-attribute checks") from
mainline was not included yet. Hence, this reported failure with UBSAN can be
observed on v4.14.y with recent gcc versions.
Greg, once checked and confirmed by Andrey or Nick, could you please include
commit bac7a1fff792 ("lib/ubsan: remove returns-nonnull-attribute checks") into
the linux-4.14.y branch?
The commit applies directly, without any change, on v4.14.200 to v4.14.325.
With that, future versions of v4.14.y will have a working UBSAN with the recent
gcc compiler versions.
Note: For any users, intending to run UBSAN on versions 4.14.200 to v4.14.325,
e.g., for bisecting UBSAN-detected kernel bugs on the linux-4.14.y branch, they
would simply need to apply commit bac7a1fff792 on those release versions.
Appendix of my full testing record:
For arm64 and x86-64 architecture, I tested this whole matrix of combinations of
building v4.14.200, i.e., the first version that failed with the reported build
failure and v4.14.325, i.e., the latest v4.14 release version at the time of
writing.
On v4.14.200 and on v4.14.325:
x86_64:
gcc-7: unsupported configuration (according to tuxmake)
gcc-8: affected and resolved by cherry-picking bac7a1fff792
gcc-9: affected and resolved by cherry-picking bac7a1fff792
gcc-10: affected and resolved by cherry-picking bac7a1fff792
gcc-11:
v4.14.200 fails with an unrelated build error on this compiler and arch
v4.14.325 affected and resolved by cherry-picking bac7a1fff792
gcc-12:
v4.14.200 fails with an unrelated build error on this compiler and arch
v4.14.325 affected and resolved by cherry-picking bac7a1fff792
gcc-13:
v4.14.200 fails with an unrelated build error on this compiler and arch
v4.14.325 affected and resolved by cherry-picking bac7a1fff792
clang-9: unsupported configuration (according to tuxmake)
clang-10: not affected, builds with and without cherry-picking bac7a1fff792
clang-17: not affected, builds with and without cherry-picking bac7a1fff792
arm64:
gcc-7: unsupported configuration (according to tuxmake)
gcc-8: affected and resolved by cherry-picking bac7a1fff792
gcc-9: affected and resolved by cherry-picking bac7a1fff792
gcc-10: affected and resolved by cherry-picking bac7a1fff792
gcc-11: affected and resolved by cherry-picking bac7a1fff792
gcc-12: affected and resolved by cherry-picking bac7a1fff792
gcc-13: affected and resolved by cherry-picking bac7a1fff792
clang-9: unsupported configuration (according to tuxmake)
clang-10: not affected, builds with and without cherry-picking bac7a1fff792
clang-17: not affected, builds with and without cherry-picking bac7a1fff792
Best regards,
Lukas
On 9/4/23 08:21, Denis Arefev wrote:
> The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> is subject to overflow due to a failure to cast operands to a larger
> data type before performing arithmetic.
>
> The maximum result of this subtraction is defined by the RCU_FANOUT
> or other srcu level-spread values assigned by rcu_init_levelspread(),
> which can indeed cause the signed 32-bit integer literal ("1") to overflow
> when shifted by any value greater than 31.
We could expand on this:
The maximum result of this subtraction is defined by the RCU_FANOUT
or other srcu level-spread values assigned by rcu_init_levelspread(),
which can indeed cause the signed 32-bit integer literal ("1") to overflow
when shifted by any value greater than 31 on a 64-bit system.
Moreover, when the subtraction value is 31, the 1 << 31 expression results
in 0xffffffff80000000 when the signed integer is promoted to unsigned long
on 64-bit systems due to type promotion rules, which is certainly not the
intended result.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
With the commit message updated with my comment above, please also add:
Fixes: c7e88067c1 ("srcu: Exact tracking of srcu_data structures containing callbacks")
Cc: <stable(a)vger.kernel.org> # v4.11
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Thanks!
Mathieu
>
> Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
> ---
> v3: Changed the name of the patch, as suggested by
> Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
> v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
> kernel/rcu/srcutree.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 20d7a238d675..6c18e6005ae1 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> snp->grplo = cpu;
> snp->grphi = cpu;
> }
> - sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> + sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> }
> smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> return true;
> @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> int cpu;
>
> for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> - if (!(mask & (1 << (cpu - snp->grplo))))
> + if (!(mask & (1UL << (cpu - snp->grplo))))
> continue;
> srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> }
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Hi all,
Recently "memfd: improve userspace warnings for missing exec-related
flags" was merged. On my system, this is a regression, not an
improvement, because the entire 256k kernel log buffer (default on x86)
is filled with these warnings and "__do_sys_memfd_create: 122 callbacks
suppressed". I haven't investigated too closely, but the most likely
cause is Wayland libraries.
This is too serious of a consequence for using an old API, especially
considering how recently the flags were added. The vast majority of
software has not had time to add the flags: glibc does not define the
macros until 2.38 which was released less than one month ago, man-pages
does not document the flags, and according to Debian Code Search, only
systemd, stress-ng, and strace actually pass either of these flags.
Furthermore, since old kernels reject unknown flags, it's not just a
matter of defining and passing the flag; every program needs to
add logic to handle EINVAL and try again.
Some other way needs to be found to encourage userspace to add the
flags; otherwise, this message will be patched out because the kernel
log becomes unusable after running unupdated programs, which will still
exist even after upstreams are fixed. In particular, AppImages,
flatpaks, snaps, and similar app bundles contain vendored Wayland
libraries which can be difficult or impossible to update.
Thanks,
Alex.
From: Eric Biggers <ebiggers(a)google.com>
Since commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for
fscrypt_master_key"), xfstest generic/270 causes a WARNING when run on
f2fs with test_dummy_encryption in the mount options:
$ kvm-xfstests -c f2fs/encrypt generic/270
[...]
WARNING: CPU: 1 PID: 2453 at fs/crypto/keyring.c:240 fscrypt_destroy_keyring+0x1f5/0x260
The cause of the WARNING is that not all encrypted inodes have been
evicted before fscrypt_destroy_keyring() is called, which violates an
assumption. This happens because the test uses an external quota file,
which gets automatically encrypted due to test_dummy_encryption.
Encryption of quota files has never really been supported. On ext4,
ext4_quota_read() does not decrypt the data, so encrypted quota files
are always considered invalid on ext4. On f2fs, f2fs_quota_read() uses
the pagecache, so trying to use an encrypted quota file gets farther,
resulting in the issue described above being possible. But this was
never intended to be possible, and there is no use case for it.
Therefore, make the quota support layer explicitly reject using
IS_ENCRYPTED inodes when quotaon is attempted.
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/quota/dquot.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9e72bfe8bbad9..7e268cd2727cc 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2339,6 +2339,20 @@ static int vfs_setup_quota_inode(struct inode *inode, int type)
if (sb_has_quota_loaded(sb, type))
return -EBUSY;
+ /*
+ * Quota files should never be encrypted. They should be thought of as
+ * filesystem metadata, not user data. New-style internal quota files
+ * cannot be encrypted by users anyway, but old-style external quota
+ * files could potentially be incorrectly created in an encrypted
+ * directory, hence this explicit check. Some reasons why encrypted
+ * quota files don't work include: (1) some filesystems that support
+ * encryption don't handle it in their quota_read and quota_write, and
+ * (2) cleaning up encrypted quota files at unmount would need special
+ * consideration, as quota files are cleaned up later than user files.
+ */
+ if (IS_ENCRYPTED(inode))
+ return -EINVAL;
+
dqopt->files[type] = igrab(inode);
if (!dqopt->files[type])
return -EIO;
base-commit: 708283abf896dd4853e673cc8cba70acaf9bf4ea
--
2.42.0