Post my improvement of the test:
https://lore.kernel.org/all/20240522070435.773918-3-dev.jain@arm.com/
The test begins to fail on 4k and 16k pages, on non-LPA2 systems. To
reduce noise in the CI systems, let us skip the test when higher address
space is not implemented.
v1->v2:
- Guard with ifdeffery to prevent compiler warning on other arches
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
The patch applies on linux-next.
tools/testing/selftests/mm/va_high_addr_switch.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/va_high_addr_switch.c b/tools/testing/selftests/mm/va_high_addr_switch.c
index fa7eabfaf841..896b3f73fc53 100644
--- a/tools/testing/selftests/mm/va_high_addr_switch.c
+++ b/tools/testing/selftests/mm/va_high_addr_switch.c
@@ -293,6 +293,20 @@ static int run_test(struct testcase *test, int count)
return ret;
}
+#ifdef __aarch64__
+/* Check if userspace VA > 48 bits */
+static int high_address_present(void)
+{
+ void *ptr = mmap((void *)(1UL << 50), 1, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+ if (ptr == MAP_FAILED)
+ return 0;
+
+ munmap(ptr, 1);
+ return 1;
+}
+#endif
+
static int supported_arch(void)
{
#if defined(__powerpc64__)
@@ -300,7 +314,7 @@ static int supported_arch(void)
#elif defined(__x86_64__)
return 1;
#elif defined(__aarch64__)
- return 1;
+ return high_address_present();
#else
return 0;
#endif
--
2.34.1
Hi guys,
This is another try to allow userspace to change ID_AA64PFR1_EL1, and we want to
give userspace the ability to control the visible feature set for a VM, which
could be used by userspace in such a way to transparently migrate VMs.
The patch series have three part:
The first patch disable those fields which KVM doesn't know how to handle, so
KVM will only expose value 0 of those fields to the guest.
The second patch allow userspace to change ID_AA64PFR1_EL1, it allow as much as
possible fields to be writable, except some special fields which is still not
writable.
The third patch adds the kselftest to test if userspace can change the
ID_AA64PFR1_EL1.
Besides, I also noticed there is another patch [1] which try to make the
ID_AA64PFR1_EL1 writable. This patch [1] is try to enable GCS on baremental, and
add GCS support for the guest. What I understand is if we have GCS support on
baremental, it will be clear to how to handle them in KVM. And same for other
fields like NMI, THE, DF2, MTEX..
I'm still not confident about the correctness of this patch series, but I've try
my best to understand each of the fields. And follow Marc's comments to tweak
this patch series.
The question confuse me a lot is that should we allow those fields (NMI, GCS,
THE, DF2, MTEX..) which KVM doesn't know how to handle writable? Baremental
doesn't know about them, and the ftr_id_aa64pfr1[] doesn't know about them. I
follow the comment "I should handle all 15 fields", so I allow them writable
because they're disabled in the register read accessor, and their value will
alwyas be 0, the userspace can write to it but only value 0.
If I did anything wrong, please point me out. Thanks a lot.
[1] [PATCH v9 13/39] KVM: arm64: Manage GCS registers for guests
https://lore.kernel.org/all/20240625-arm64-gcs-v9-13-0f634469b8f0@kernel.or…
Changelog:
----------
v3 -> v4:
* Add a new patch to disable some feature which KVM doesn't know how to
handle in the register accessor.
* Handle all the fields in the register.
* Fixes a small cnt issue in kselftest.
v2 -> v3:
* Give more description about why only part of the fields can be writable.
* Updated the writable mask by referring the latest ARM spec.
v1 -> v2:
* Tackling the full register instead of single field.
* Changing the patch title and commit message.
RFCv1 -> v1:
* Fix the compilation error.
* Delete the machine specific information and make the description more
generable.
RFCv1: https://lore.kernel.org/all/20240612023553.127813-1-shahuang@redhat.com/
v1: https://lore.kernel.org/all/20240617075131.1006173-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20240618063808.1040085-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240628060454.1936886-2-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: arm64: Disable fields that KVM doesn't know how to handle in
ID_AA64PFR1_EL1
KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1
KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
arch/arm64/kvm/sys_regs.c | 13 ++++++++++-
.../selftests/kvm/aarch64/set_id_regs.c | 23 ++++++++++++++++---
2 files changed, 32 insertions(+), 4 deletions(-)
--
2.40.1
`CStr` became a part of `core` library in Rust 1.75. This change replaces
the custom `CStr` implementation with the one from `core`.
`core::CStr` behaves generally the same as the removed implementation,
with the following differences:
- It does not implement `Display`.
- It does not provide `from_bytes_with_nul_unchecked_mut` method.
- It has `as_ptr()` method instead of `as_char_ptr()`, which also returns
`*const c_char`.
The first two differences are handled by providing the `CStrExt` trait,
with `display()` and `from_bytes_with_nul_unchecked_mut()` methods.
`display()` returns a `CStrDisplay` wrapper, with a custom `Display`
implementation.
`DerefMut` implementation for `CString` is removed here, as it's not
being used anywhere.
Signed-off-by: Michal Rostecki <vadorovsky(a)gmail.com>
---
v1 -> v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
v2 -> v3:
- Fix the commit message.
- Remove redundant braces in `use`, when only one item is imported.
v3 -> v4:
- Provide the `CStrExt` trait with `display()` method, which returns a
`CStrDisplay` wrapper with `Display` implementation. This addresses
the lack of `Display` implementation for `core::ffi::CStr`.
- Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`,
which might be useful and is going to prevent manual, unsafe casts.
- Fix a typo (s/preffered/prefered/).
v4 -> v5:
- Keep the `test_cstr_display*` unit tests.
rust/kernel/error.rs | 7 +-
rust/kernel/kunit.rs | 18 +-
rust/kernel/net/phy.rs | 2 +-
rust/kernel/prelude.rs | 4 +-
rust/kernel/str.rs | 501 ++++++------------------------------
rust/kernel/sync/condvar.rs | 5 +-
rust/kernel/sync/lock.rs | 6 +-
rust/kernel/workqueue.rs | 2 +-
scripts/rustdoc_test_gen.rs | 4 +-
9 files changed, 111 insertions(+), 438 deletions(-)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 55280ae9fe40..18808b29604d 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -4,10 +4,11 @@
//!
//! C header: [`include/uapi/asm-generic/errno-base.h`](srctree/include/uapi/asm-generic/errno-base.h)
-use crate::{alloc::AllocError, str::CStr};
+use crate::alloc::AllocError;
use alloc::alloc::LayoutError;
+use core::ffi::CStr;
use core::fmt;
use core::num::TryFromIntError;
use core::str::Utf8Error;
@@ -142,7 +143,7 @@ pub fn name(&self) -> Option<&'static CStr> {
None
} else {
// SAFETY: The string returned by `errname` is static and `NUL`-terminated.
- Some(unsafe { CStr::from_char_ptr(ptr) })
+ Some(unsafe { CStr::from_ptr(ptr) })
}
}
@@ -164,7 +165,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
None => f.debug_tuple("Error").field(&-self.0).finish(),
// SAFETY: These strings are ASCII-only.
Some(name) => f
- .debug_tuple(unsafe { core::str::from_utf8_unchecked(name) })
+ .debug_tuple(unsafe { core::str::from_utf8_unchecked(name.to_bytes()) })
.finish(),
}
}
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 0ba77276ae7e..79a50ab59af0 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -56,13 +56,15 @@ macro_rules! kunit_assert {
break 'out;
}
- static FILE: &'static $crate::str::CStr = $crate::c_str!($file);
+ static FILE: &'static core::ffi::CStr = $file;
static LINE: i32 = core::line!() as i32 - $diff;
- static CONDITION: &'static $crate::str::CStr = $crate::c_str!(stringify!($condition));
+ static CONDITION: &'static core::ffi::CStr = $crate::c_str!(stringify!($condition));
// SAFETY: FFI call without safety requirements.
let kunit_test = unsafe { $crate::bindings::kunit_get_current_test() };
if kunit_test.is_null() {
+ use kernel::str::CStrExt;
+
// The assertion failed but this task is not running a KUnit test, so we cannot call
// KUnit, but at least print an error to the kernel log. This may happen if this
// macro is called from an spawned thread in a test (see
@@ -71,11 +73,13 @@ macro_rules! kunit_assert {
//
// This mimics KUnit's failed assertion format.
$crate::kunit::err(format_args!(
- " # {}: ASSERTION FAILED at {FILE}:{LINE}\n",
- $name
+ " # {}: ASSERTION FAILED at {}:{LINE}\n",
+ $name.display(),
+ FILE.display(),
));
$crate::kunit::err(format_args!(
- " Expected {CONDITION} to be true, but is false\n"
+ " Expected {} to be true, but is false\n",
+ CONDITION.display(),
));
$crate::kunit::err(format_args!(
" Failure not reported to KUnit since this is a non-KUnit task\n"
@@ -98,12 +102,12 @@ unsafe impl Sync for Location {}
unsafe impl Sync for UnaryAssert {}
static LOCATION: Location = Location($crate::bindings::kunit_loc {
- file: FILE.as_char_ptr(),
+ file: FILE.as_ptr(),
line: LINE,
});
static ASSERTION: UnaryAssert = UnaryAssert($crate::bindings::kunit_unary_assert {
assert: $crate::bindings::kunit_assert {},
- condition: CONDITION.as_char_ptr(),
+ condition: CONDITION.as_ptr(),
expected_true: true,
});
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index fd40b703d224..19f45922ec42 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -502,7 +502,7 @@ unsafe impl Sync for DriverVTable {}
pub const fn create_phy_driver<T: Driver>() -> DriverVTable {
// INVARIANT: All the fields of `struct phy_driver` are initialized properly.
DriverVTable(Opaque::new(bindings::phy_driver {
- name: T::NAME.as_char_ptr().cast_mut(),
+ name: T::NAME.as_ptr().cast_mut(),
flags: T::FLAGS,
phy_id: T::PHY_DEVICE_ID.id,
phy_id_mask: T::PHY_DEVICE_ID.mask_as_int(),
diff --git a/rust/kernel/prelude.rs b/rust/kernel/prelude.rs
index b37a0b3180fb..b0969ca78f10 100644
--- a/rust/kernel/prelude.rs
+++ b/rust/kernel/prelude.rs
@@ -12,7 +12,7 @@
//! ```
#[doc(no_inline)]
-pub use core::pin::Pin;
+pub use core::{ffi::CStr, pin::Pin};
pub use crate::alloc::{box_ext::BoxExt, flags::*, vec_ext::VecExt};
@@ -35,7 +35,7 @@
pub use super::error::{code::*, Error, Result};
-pub use super::{str::CStr, ThisModule};
+pub use super::ThisModule;
pub use super::init::{InPlaceInit, Init, PinInit};
diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index bb8d4f41475b..f1acaa377694 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -4,8 +4,9 @@
use crate::alloc::{flags::*, vec_ext::VecExt, AllocError};
use alloc::vec::Vec;
+use core::ffi::CStr;
use core::fmt::{self, Write};
-use core::ops::{self, Deref, DerefMut, Index};
+use core::ops::Deref;
use crate::error::{code::*, Error};
@@ -41,11 +42,11 @@ impl fmt::Display for BStr {
/// # use kernel::{fmt, b_str, str::{BStr, CString}};
/// let ascii = b_str!("Hello, BStr!");
/// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "Hello, BStr!".as_bytes());
+ /// assert_eq!(s.to_bytes(), "Hello, BStr!".as_bytes());
///
/// let non_ascii = b_str!("🦀");
/// let s = CString::try_from_fmt(fmt!("{}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for &b in &self.0 {
@@ -72,11 +73,11 @@ impl fmt::Debug for BStr {
/// // Embedded double quotes are escaped.
/// let ascii = b_str!("Hello, \"BStr\"!");
/// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
///
/// let non_ascii = b_str!("😺");
/// let s = CString::try_from_fmt(fmt!("{:?}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_char('"')?;
@@ -128,271 +129,29 @@ macro_rules! b_str {
}};
}
-/// Possible errors when using conversion functions in [`CStr`].
-#[derive(Debug, Clone, Copy)]
-pub enum CStrConvertError {
- /// Supplied bytes contain an interior `NUL`.
- InteriorNul,
+/// Wrapper around [`CStr`] which implements [`Display`](core::fmt::Display).
+pub struct CStrDisplay<'a>(&'a CStr);
- /// Supplied bytes are not terminated by `NUL`.
- NotNulTerminated,
-}
-
-impl From<CStrConvertError> for Error {
- #[inline]
- fn from(_: CStrConvertError) -> Error {
- EINVAL
- }
-}
-
-/// A string that is guaranteed to have exactly one `NUL` byte, which is at the
-/// end.
-///
-/// Used for interoperability with kernel APIs that take C strings.
-#[repr(transparent)]
-pub struct CStr([u8]);
-
-impl CStr {
- /// Returns the length of this string excluding `NUL`.
- #[inline]
- pub const fn len(&self) -> usize {
- self.len_with_nul() - 1
- }
-
- /// Returns the length of this string with `NUL`.
- #[inline]
- pub const fn len_with_nul(&self) -> usize {
- // SAFETY: This is one of the invariant of `CStr`.
- // We add a `unreachable_unchecked` here to hint the optimizer that
- // the value returned from this function is non-zero.
- if self.0.is_empty() {
- unsafe { core::hint::unreachable_unchecked() };
- }
- self.0.len()
- }
-
- /// Returns `true` if the string only includes `NUL`.
- #[inline]
- pub const fn is_empty(&self) -> bool {
- self.len() == 0
- }
-
- /// Wraps a raw C string pointer.
- ///
- /// # Safety
- ///
- /// `ptr` must be a valid pointer to a `NUL`-terminated C string, and it must
- /// last at least `'a`. When `CStr` is alive, the memory pointed by `ptr`
- /// must not be mutated.
- #[inline]
- pub unsafe fn from_char_ptr<'a>(ptr: *const core::ffi::c_char) -> &'a Self {
- // SAFETY: The safety precondition guarantees `ptr` is a valid pointer
- // to a `NUL`-terminated C string.
- let len = unsafe { bindings::strlen(ptr) } + 1;
- // SAFETY: Lifetime guaranteed by the safety precondition.
- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) };
- // SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`.
- // As we have added 1 to `len`, the last byte is known to be `NUL`.
- unsafe { Self::from_bytes_with_nul_unchecked(bytes) }
- }
-
- /// Creates a [`CStr`] from a `[u8]`.
- ///
- /// The provided slice must be `NUL`-terminated, does not contain any
- /// interior `NUL` bytes.
- pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError> {
- if bytes.is_empty() {
- return Err(CStrConvertError::NotNulTerminated);
- }
- if bytes[bytes.len() - 1] != 0 {
- return Err(CStrConvertError::NotNulTerminated);
- }
- let mut i = 0;
- // `i + 1 < bytes.len()` allows LLVM to optimize away bounds checking,
- // while it couldn't optimize away bounds checks for `i < bytes.len() - 1`.
- while i + 1 < bytes.len() {
- if bytes[i] == 0 {
- return Err(CStrConvertError::InteriorNul);
- }
- i += 1;
- }
- // SAFETY: We just checked that all properties hold.
- Ok(unsafe { Self::from_bytes_with_nul_unchecked(bytes) })
- }
-
- /// Creates a [`CStr`] from a `[u8]` without performing any additional
- /// checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub const unsafe fn from_bytes_with_nul_unchecked(bytes: &[u8]) -> &CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { core::mem::transmute(bytes) }
- }
-
- /// Creates a mutable [`CStr`] from a `[u8]` without performing any
- /// additional checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
- }
-
- /// Returns a C pointer to the string.
- #[inline]
- pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
- self.0.as_ptr() as _
- }
-
- /// Convert the string to a byte slice without the trailing `NUL` byte.
- #[inline]
- pub fn as_bytes(&self) -> &[u8] {
- &self.0[..self.len()]
- }
-
- /// Convert the string to a byte slice containing the trailing `NUL` byte.
- #[inline]
- pub const fn as_bytes_with_nul(&self) -> &[u8] {
- &self.0
- }
-
- /// Yields a [`&str`] slice if the [`CStr`] contains valid UTF-8.
- ///
- /// If the contents of the [`CStr`] are valid UTF-8 data, this
- /// function will return the corresponding [`&str`] slice. Otherwise,
- /// it will return an error with details of where UTF-8 validation failed.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::str::CStr;
- /// let cstr = CStr::from_bytes_with_nul(b"foo\0").unwrap();
- /// assert_eq!(cstr.to_str(), Ok("foo"));
- /// ```
- #[inline]
- pub fn to_str(&self) -> Result<&str, core::str::Utf8Error> {
- core::str::from_utf8(self.as_bytes())
- }
-
- /// Unsafely convert this [`CStr`] into a [`&str`], without checking for
- /// valid UTF-8.
- ///
- /// # Safety
- ///
- /// The contents must be valid UTF-8.
+impl fmt::Display for CStrDisplay<'_> {
+ /// Formats printable ASCII characters, escaping the rest.
///
/// # Examples
///
/// ```
- /// # use kernel::c_str;
- /// # use kernel::str::CStr;
- /// let bar = c_str!("ツ");
- /// // SAFETY: String literals are guaranteed to be valid UTF-8
- /// // by the Rust compiler.
- /// assert_eq!(unsafe { bar.as_str_unchecked() }, "ツ");
- /// ```
- #[inline]
- pub unsafe fn as_str_unchecked(&self) -> &str {
- unsafe { core::str::from_utf8_unchecked(self.as_bytes()) }
- }
-
- /// Convert this [`CStr`] into a [`CString`] by allocating memory and
- /// copying over the string data.
- pub fn to_cstring(&self) -> Result<CString, AllocError> {
- CString::try_from(self)
- }
-
- /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new lowercased value without modifying the existing one, use
- /// [`to_ascii_lowercase()`].
- ///
- /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
- pub fn make_ascii_lowercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_lowercase();
- }
-
- /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new uppercased value without modifying the existing one, use
- /// [`to_ascii_uppercase()`].
- ///
- /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
- pub fn make_ascii_uppercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_uppercase();
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII lower case equivalent.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To lowercase the value in-place, use [`make_ascii_lowercase`].
- ///
- /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
- pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_lowercase();
-
- Ok(s)
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII upper case equivalent.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To uppercase the value in-place, use [`make_ascii_uppercase`].
- ///
- /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
- pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_uppercase();
-
- Ok(s)
- }
-}
-
-impl fmt::Display for CStr {
- /// Formats printable ASCII characters, escaping the rest.
- ///
- /// ```
+ /// # use core::ffi::CStr;
/// # use kernel::c_str;
/// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
- ///
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "so \"cool\"\0".as_bytes());
+ /// # use kernel::str::{CStrExt, CString};
+ /// let penguin = c"🐧";
+ /// let s = CString::try_from_fmt(fmt!("{}", penguin.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
+ ///
+ /// let ascii = c"so \"cool\"";
+ /// let s = CString::try_from_fmt(fmt!("{}", ascii.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "so \"cool\"\0".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- for &c in self.as_bytes() {
+ for &c in self.0.to_bytes() {
if (0x20..0x7f).contains(&c) {
// Printable character.
f.write_char(c as char)?;
@@ -404,116 +163,70 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
}
}
-impl fmt::Debug for CStr {
- /// Formats printable ASCII characters with a double quote on either end, escaping the rest.
+/// Extensions to [`CStr`].
+pub trait CStrExt {
+ /// Returns an object that implements [`Display`](core::fmt::Display) for
+ /// safely printing a [`CStr`] that may contain non-ASCII data, which are
+ /// escaped.
+ ///
+ /// # Examples
///
/// ```
+ /// # use core::ffi::CStr;
/// # use kernel::c_str;
/// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{:?}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"\\xf0\\x9f\\x90\\xa7\"\0".as_bytes());
- ///
- /// // Embedded double quotes are escaped.
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"so \\\"cool\\\"\"\0".as_bytes());
+ /// # use kernel::str::{CStrExt, CString};
+ /// let penguin = c"🐧";
+ /// let s = CString::try_from_fmt(fmt!("{}", penguin.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
+ ///
+ /// let ascii = c"so \"cool\"";
+ /// let s = CString::try_from_fmt(fmt!("{}", ascii.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "so \"cool\"\0".as_bytes());
/// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- f.write_str("\"")?;
- for &c in self.as_bytes() {
- match c {
- // Printable characters.
- b'\"' => f.write_str("\\\"")?,
- 0x20..=0x7e => f.write_char(c as char)?,
- _ => write!(f, "\\x{:02x}", c)?,
- }
- }
- f.write_str("\"")
- }
-}
-
-impl AsRef<BStr> for CStr {
- #[inline]
- fn as_ref(&self) -> &BStr {
- BStr::from_bytes(self.as_bytes())
- }
-}
-
-impl Deref for CStr {
- type Target = BStr;
-
- #[inline]
- fn deref(&self) -> &Self::Target {
- self.as_ref()
- }
-}
+ fn display(&self) -> CStrDisplay<'_>;
-impl Index<ops::RangeFrom<usize>> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, index: ops::RangeFrom<usize>) -> &Self::Output {
- // Delegate bounds checking to slice.
- // Assign to _ to mute clippy's unnecessary operation warning.
- let _ = &self.as_bytes()[index.start..];
- // SAFETY: We just checked the bounds.
- unsafe { Self::from_bytes_with_nul_unchecked(&self.0[index.start..]) }
- }
+ /// Creates a mutable [`CStr`] from a `[u8]` without performing any
+ /// additional checks.
+ ///
+ /// # Safety
+ ///
+ /// `bytes` *must* end with a `NUL` byte, and should only have a single
+ /// `NUL` byte (or the string will be truncated).
+ unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut Self;
}
-impl Index<ops::RangeFull> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, _index: ops::RangeFull) -> &Self::Output {
- self
+impl CStrExt for CStr {
+ fn display(&self) -> CStrDisplay<'_> {
+ CStrDisplay(self)
}
-}
-
-mod private {
- use core::ops;
- // Marker trait for index types that can be forward to `BStr`.
- pub trait CStrIndex {}
-
- impl CStrIndex for usize {}
- impl CStrIndex for ops::Range<usize> {}
- impl CStrIndex for ops::RangeInclusive<usize> {}
- impl CStrIndex for ops::RangeToInclusive<usize> {}
-}
-
-impl<Idx> Index<Idx> for CStr
-where
- Idx: private::CStrIndex,
- BStr: Index<Idx>,
-{
- type Output = <BStr as Index<Idx>>::Output;
-
- #[inline]
- fn index(&self, index: Idx) -> &Self::Output {
- &self.as_ref()[index]
+ unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut Self {
+ // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
+ unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
}
}
/// Creates a new [`CStr`] from a string literal.
///
-/// The string literal should not contain any `NUL` bytes.
+/// This macro is not needed when C-string literals (`c"hello"` syntax) can be
+/// used directly, but can be used when a C-string version of a standard string
+/// literal is required (often when working with macros).
+///
+/// The string should not contain any `NUL` bytes.
///
/// # Examples
///
/// ```
+/// # use core::ffi::CStr;
/// # use kernel::c_str;
-/// # use kernel::str::CStr;
-/// const MY_CSTR: &CStr = c_str!("My awesome CStr!");
+/// const MY_CSTR: &CStr = c_str!(stringify!(5));
/// ```
#[macro_export]
macro_rules! c_str {
($str:expr) => {{
const S: &str = concat!($str, "\0");
- const C: &$crate::str::CStr = match $crate::str::CStr::from_bytes_with_nul(S.as_bytes()) {
+ const C: &core::ffi::CStr = match core::ffi::CStr::from_bytes_with_nul(S.as_bytes()) {
Ok(v) => v,
Err(_) => panic!("string contains interior NUL"),
};
@@ -540,65 +253,6 @@ mod tests {
\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\
\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf7\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff";
- #[test]
- fn test_cstr_to_str() {
- let good_bytes = b"\xf0\x9f\xa6\x80\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let checked_str = checked_cstr.to_str().unwrap();
- assert_eq!(checked_str, "🦀");
- }
-
- #[test]
- #[should_panic]
- fn test_cstr_to_str_panic() {
- let bad_bytes = b"\xc3\x28\0";
- let checked_cstr = CStr::from_bytes_with_nul(bad_bytes).unwrap();
- checked_cstr.to_str().unwrap();
- }
-
- #[test]
- fn test_cstr_as_str_unchecked() {
- let good_bytes = b"\xf0\x9f\x90\xA7\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let unchecked_str = unsafe { checked_cstr.as_str_unchecked() };
- assert_eq!(unchecked_str, "🐧");
- }
-
- #[test]
- fn test_cstr_display() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{}", hello_world), "hello, world!");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{}", non_printables), "\\x01\\x09\\x0a");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{}", non_ascii), "d\\xe9j\\xe0 vu");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{}", good_bytes), "\\xf0\\x9f\\xa6\\x80");
- }
-
- #[test]
- fn test_cstr_display_all_bytes() {
- let mut bytes: [u8; 256] = [0; 256];
- // fill `bytes` with [1..=255] + [0]
- for i in u8::MIN..=u8::MAX {
- bytes[i as usize] = i.wrapping_add(1);
- }
- let cstr = CStr::from_bytes_with_nul(&bytes).unwrap();
- assert_eq!(format!("{}", cstr), ALL_ASCII_CHARS);
- }
-
- #[test]
- fn test_cstr_debug() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{:?}", hello_world), "\"hello, world!\"");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{:?}", non_printables), "\"\\x01\\x09\\x0a\"");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{:?}", non_ascii), "\"d\\xe9j\\xe0 vu\"");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{:?}", good_bytes), "\"\\xf0\\x9f\\xa6\\x80\"");
- }
-
#[test]
fn test_bstr_display() {
let hello_world = BStr::from_bytes(b"hello, world!");
@@ -626,6 +280,29 @@ fn test_bstr_debug() {
let good_bytes = BStr::from_bytes(b"\xf0\x9f\xa6\x80");
assert_eq!(format!("{:?}", good_bytes), "\"\\xf0\\x9f\\xa6\\x80\"");
}
+
+ #[test]
+ fn test_cstr_display() {
+ let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
+ assert_eq!(format!("{}", hello_world.display()), "hello, world!");
+ let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
+ assert_eq!(format!("{}", non_printables.display()), "\\x01\\x09\\x0a");
+ let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
+ assert_eq!(format!("{}", non_ascii.display()), "d\\xe9j\\xe0 vu");
+ let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
+ assert_eq!(format!("{}", good_bytes.display()), "\\xf0\\x9f\\xa6\\x80");
+ }
+
+ #[test]
+ fn test_cstr_display_all_bytes() {
+ let mut bytes: [u8; 256] = [0; 256];
+ // fill `bytes` with [1..=255] + [0]
+ for i in u8::MIN..=u8::MAX {
+ bytes[i as usize] = i.wrapping_add(1);
+ }
+ let cstr = CStr::from_bytes_with_nul(&bytes).unwrap();
+ assert_eq!(format!("{}", cstr.display()), ALL_ASCII_CHARS);
+ }
}
/// Allows formatting of [`fmt::Arguments`] into a raw buffer.
@@ -779,11 +456,11 @@ fn write_str(&mut self, s: &str) -> fmt::Result {
/// use kernel::{str::CString, fmt};
///
/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "abc1020\0".as_bytes());
///
/// let tmp = "testing";
/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "testing123\0".as_bytes());
///
/// // This fails because it has an embedded `NUL` byte.
/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
@@ -838,21 +515,13 @@ fn deref(&self) -> &Self::Target {
}
}
-impl DerefMut for CString {
- fn deref_mut(&mut self) -> &mut Self::Target {
- // SAFETY: A `CString` is always NUL-terminated and contains no other
- // NUL bytes.
- unsafe { CStr::from_bytes_with_nul_unchecked_mut(self.buf.as_mut_slice()) }
- }
-}
-
impl<'a> TryFrom<&'a CStr> for CString {
type Error = AllocError;
fn try_from(cstr: &'a CStr) -> Result<CString, AllocError> {
let mut buf = Vec::new();
- <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.as_bytes_with_nul(), GFP_KERNEL)
+ <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.to_bytes_with_nul(), GFP_KERNEL)
.map_err(|_| AllocError)?;
// INVARIANT: The `CStr` and `CString` types have the same invariants for
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
index 2b306afbe56d..16d1a1cb8d00 100644
--- a/rust/kernel/sync/condvar.rs
+++ b/rust/kernel/sync/condvar.rs
@@ -9,12 +9,11 @@
use crate::{
init::PinInit,
pin_init,
- str::CStr,
task::{MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE, TASK_NORMAL, TASK_UNINTERRUPTIBLE},
time::Jiffies,
types::Opaque,
};
-use core::ffi::{c_int, c_long};
+use core::ffi::{c_int, c_long, CStr};
use core::marker::PhantomPinned;
use core::ptr;
use macros::pin_data;
@@ -108,7 +107,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
wait_queue_head <- Opaque::ffi_init(|slot| unsafe {
- bindings::__init_waitqueue_head(slot, name.as_char_ptr(), key.as_ptr())
+ bindings::__init_waitqueue_head(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index f6c34ca4d819..318ecb5a5916 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -6,8 +6,8 @@
//! spinlocks, raw spinlocks) to be provided with minimal effort.
use super::LockClassKey;
-use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
-use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
+use crate::{init::PinInit, pin_init, types::Opaque, types::ScopeGuard};
+use core::{cell::UnsafeCell, ffi::CStr, marker::PhantomData, marker::PhantomPinned};
use macros::pin_data;
pub mod mutex;
@@ -113,7 +113,7 @@ pub fn new(t: T, name: &'static CStr, key: &'static LockClassKey) -> impl PinIni
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
state <- Opaque::ffi_init(|slot| unsafe {
- B::init(slot, name.as_char_ptr(), key.as_ptr())
+ B::init(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 553a5cba2adc..a6418873e82e 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -380,7 +380,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
slot,
Some(T::Pointer::run),
false,
- name.as_char_ptr(),
+ name.as_ptr(),
key.as_ptr(),
)
}
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 5ebd42ae4a3f..339991ee6885 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -172,7 +172,7 @@ pub extern "C" fn {kunit_name}(__kunit_test: *mut kernel::bindings::kunit) {{
#[allow(unused)]
macro_rules! assert {{
($cond:expr $(,)?) => {{{{
- kernel::kunit_assert!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
+ kernel::kunit_assert!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
}}}}
}}
@@ -180,7 +180,7 @@ macro_rules! assert {{
#[allow(unused)]
macro_rules! assert_eq {{
($left:expr, $right:expr $(,)?) => {{{{
- kernel::kunit_assert_eq!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
+ kernel::kunit_assert_eq!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
}}}}
}}
--
2.45.2
`CStr` became a part of `core` library in Rust 1.75. This change replaces
the custom `CStr` implementation with the one from `core`.
`core::CStr` behaves generally the same as the removed implementation,
with the following differences:
- It does not implement `Display`.
- It does not provide `from_bytes_with_nul_unchecked_mut` method.
- It has `as_ptr()` method instead of `as_char_ptr()`, which also returns
`*const c_char`.
The first two differences are handled by providing the `CStrExt` trait,
with `display()` and `from_bytes_with_nul_unchecked_mut()` methods.
`display()` returns a `CStrDisplay` wrapper, with a custom `Display`
implementation.
`DerefMut` implementation for `CString` is removed here, as it's not
being used anywhere.
Signed-off-by: Michal Rostecki <vadorovsky(a)gmail.com>
---
v1 -> v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
v2 -> v3:
- Fix the commit message.
- Remove redundant braces in `use`, when only one item is imported.
v3 -> v4:
- Provide the `CStrExt` trait with `display()` method, which returns a
`CStrDisplay` wrapper with `Display` implementation. This addresses
the lack of `Display` implementation for `core::ffi::CStr`.
- Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`,
which might be useful and is going to prevent manual, unsafe casts.
- Fix a typo (s/preffered/prefered/).
rust/kernel/error.rs | 7 +-
rust/kernel/kunit.rs | 18 +-
rust/kernel/net/phy.rs | 2 +-
rust/kernel/prelude.rs | 4 +-
rust/kernel/str.rs | 492 +++++-------------------------------
rust/kernel/sync/condvar.rs | 5 +-
rust/kernel/sync/lock.rs | 6 +-
rust/kernel/workqueue.rs | 2 +-
scripts/rustdoc_test_gen.rs | 4 +-
9 files changed, 88 insertions(+), 452 deletions(-)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 55280ae9fe40..18808b29604d 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -4,10 +4,11 @@
//!
//! C header: [`include/uapi/asm-generic/errno-base.h`](srctree/include/uapi/asm-generic/errno-base.h)
-use crate::{alloc::AllocError, str::CStr};
+use crate::alloc::AllocError;
use alloc::alloc::LayoutError;
+use core::ffi::CStr;
use core::fmt;
use core::num::TryFromIntError;
use core::str::Utf8Error;
@@ -142,7 +143,7 @@ pub fn name(&self) -> Option<&'static CStr> {
None
} else {
// SAFETY: The string returned by `errname` is static and `NUL`-terminated.
- Some(unsafe { CStr::from_char_ptr(ptr) })
+ Some(unsafe { CStr::from_ptr(ptr) })
}
}
@@ -164,7 +165,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
None => f.debug_tuple("Error").field(&-self.0).finish(),
// SAFETY: These strings are ASCII-only.
Some(name) => f
- .debug_tuple(unsafe { core::str::from_utf8_unchecked(name) })
+ .debug_tuple(unsafe { core::str::from_utf8_unchecked(name.to_bytes()) })
.finish(),
}
}
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 0ba77276ae7e..79a50ab59af0 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -56,13 +56,15 @@ macro_rules! kunit_assert {
break 'out;
}
- static FILE: &'static $crate::str::CStr = $crate::c_str!($file);
+ static FILE: &'static core::ffi::CStr = $file;
static LINE: i32 = core::line!() as i32 - $diff;
- static CONDITION: &'static $crate::str::CStr = $crate::c_str!(stringify!($condition));
+ static CONDITION: &'static core::ffi::CStr = $crate::c_str!(stringify!($condition));
// SAFETY: FFI call without safety requirements.
let kunit_test = unsafe { $crate::bindings::kunit_get_current_test() };
if kunit_test.is_null() {
+ use kernel::str::CStrExt;
+
// The assertion failed but this task is not running a KUnit test, so we cannot call
// KUnit, but at least print an error to the kernel log. This may happen if this
// macro is called from an spawned thread in a test (see
@@ -71,11 +73,13 @@ macro_rules! kunit_assert {
//
// This mimics KUnit's failed assertion format.
$crate::kunit::err(format_args!(
- " # {}: ASSERTION FAILED at {FILE}:{LINE}\n",
- $name
+ " # {}: ASSERTION FAILED at {}:{LINE}\n",
+ $name.display(),
+ FILE.display(),
));
$crate::kunit::err(format_args!(
- " Expected {CONDITION} to be true, but is false\n"
+ " Expected {} to be true, but is false\n",
+ CONDITION.display(),
));
$crate::kunit::err(format_args!(
" Failure not reported to KUnit since this is a non-KUnit task\n"
@@ -98,12 +102,12 @@ unsafe impl Sync for Location {}
unsafe impl Sync for UnaryAssert {}
static LOCATION: Location = Location($crate::bindings::kunit_loc {
- file: FILE.as_char_ptr(),
+ file: FILE.as_ptr(),
line: LINE,
});
static ASSERTION: UnaryAssert = UnaryAssert($crate::bindings::kunit_unary_assert {
assert: $crate::bindings::kunit_assert {},
- condition: CONDITION.as_char_ptr(),
+ condition: CONDITION.as_ptr(),
expected_true: true,
});
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index fd40b703d224..19f45922ec42 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -502,7 +502,7 @@ unsafe impl Sync for DriverVTable {}
pub const fn create_phy_driver<T: Driver>() -> DriverVTable {
// INVARIANT: All the fields of `struct phy_driver` are initialized properly.
DriverVTable(Opaque::new(bindings::phy_driver {
- name: T::NAME.as_char_ptr().cast_mut(),
+ name: T::NAME.as_ptr().cast_mut(),
flags: T::FLAGS,
phy_id: T::PHY_DEVICE_ID.id,
phy_id_mask: T::PHY_DEVICE_ID.mask_as_int(),
diff --git a/rust/kernel/prelude.rs b/rust/kernel/prelude.rs
index b37a0b3180fb..b0969ca78f10 100644
--- a/rust/kernel/prelude.rs
+++ b/rust/kernel/prelude.rs
@@ -12,7 +12,7 @@
//! ```
#[doc(no_inline)]
-pub use core::pin::Pin;
+pub use core::{ffi::CStr, pin::Pin};
pub use crate::alloc::{box_ext::BoxExt, flags::*, vec_ext::VecExt};
@@ -35,7 +35,7 @@
pub use super::error::{code::*, Error, Result};
-pub use super::{str::CStr, ThisModule};
+pub use super::ThisModule;
pub use super::init::{InPlaceInit, Init, PinInit};
diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index bb8d4f41475b..ec220a760d89 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -4,8 +4,9 @@
use crate::alloc::{flags::*, vec_ext::VecExt, AllocError};
use alloc::vec::Vec;
+use core::ffi::CStr;
use core::fmt::{self, Write};
-use core::ops::{self, Deref, DerefMut, Index};
+use core::ops::Deref;
use crate::error::{code::*, Error};
@@ -41,11 +42,11 @@ impl fmt::Display for BStr {
/// # use kernel::{fmt, b_str, str::{BStr, CString}};
/// let ascii = b_str!("Hello, BStr!");
/// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "Hello, BStr!".as_bytes());
+ /// assert_eq!(s.to_bytes(), "Hello, BStr!".as_bytes());
///
/// let non_ascii = b_str!("🦀");
/// let s = CString::try_from_fmt(fmt!("{}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for &b in &self.0 {
@@ -72,11 +73,11 @@ impl fmt::Debug for BStr {
/// // Embedded double quotes are escaped.
/// let ascii = b_str!("Hello, \"BStr\"!");
/// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
///
/// let non_ascii = b_str!("😺");
/// let s = CString::try_from_fmt(fmt!("{:?}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_char('"')?;
@@ -128,271 +129,29 @@ macro_rules! b_str {
}};
}
-/// Possible errors when using conversion functions in [`CStr`].
-#[derive(Debug, Clone, Copy)]
-pub enum CStrConvertError {
- /// Supplied bytes contain an interior `NUL`.
- InteriorNul,
+/// Wrapper around [`CStr`] which implements [`Display`](core::fmt::Display).
+pub struct CStrDisplay<'a>(&'a CStr);
- /// Supplied bytes are not terminated by `NUL`.
- NotNulTerminated,
-}
-
-impl From<CStrConvertError> for Error {
- #[inline]
- fn from(_: CStrConvertError) -> Error {
- EINVAL
- }
-}
-
-/// A string that is guaranteed to have exactly one `NUL` byte, which is at the
-/// end.
-///
-/// Used for interoperability with kernel APIs that take C strings.
-#[repr(transparent)]
-pub struct CStr([u8]);
-
-impl CStr {
- /// Returns the length of this string excluding `NUL`.
- #[inline]
- pub const fn len(&self) -> usize {
- self.len_with_nul() - 1
- }
-
- /// Returns the length of this string with `NUL`.
- #[inline]
- pub const fn len_with_nul(&self) -> usize {
- // SAFETY: This is one of the invariant of `CStr`.
- // We add a `unreachable_unchecked` here to hint the optimizer that
- // the value returned from this function is non-zero.
- if self.0.is_empty() {
- unsafe { core::hint::unreachable_unchecked() };
- }
- self.0.len()
- }
-
- /// Returns `true` if the string only includes `NUL`.
- #[inline]
- pub const fn is_empty(&self) -> bool {
- self.len() == 0
- }
-
- /// Wraps a raw C string pointer.
- ///
- /// # Safety
- ///
- /// `ptr` must be a valid pointer to a `NUL`-terminated C string, and it must
- /// last at least `'a`. When `CStr` is alive, the memory pointed by `ptr`
- /// must not be mutated.
- #[inline]
- pub unsafe fn from_char_ptr<'a>(ptr: *const core::ffi::c_char) -> &'a Self {
- // SAFETY: The safety precondition guarantees `ptr` is a valid pointer
- // to a `NUL`-terminated C string.
- let len = unsafe { bindings::strlen(ptr) } + 1;
- // SAFETY: Lifetime guaranteed by the safety precondition.
- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) };
- // SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`.
- // As we have added 1 to `len`, the last byte is known to be `NUL`.
- unsafe { Self::from_bytes_with_nul_unchecked(bytes) }
- }
-
- /// Creates a [`CStr`] from a `[u8]`.
- ///
- /// The provided slice must be `NUL`-terminated, does not contain any
- /// interior `NUL` bytes.
- pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError> {
- if bytes.is_empty() {
- return Err(CStrConvertError::NotNulTerminated);
- }
- if bytes[bytes.len() - 1] != 0 {
- return Err(CStrConvertError::NotNulTerminated);
- }
- let mut i = 0;
- // `i + 1 < bytes.len()` allows LLVM to optimize away bounds checking,
- // while it couldn't optimize away bounds checks for `i < bytes.len() - 1`.
- while i + 1 < bytes.len() {
- if bytes[i] == 0 {
- return Err(CStrConvertError::InteriorNul);
- }
- i += 1;
- }
- // SAFETY: We just checked that all properties hold.
- Ok(unsafe { Self::from_bytes_with_nul_unchecked(bytes) })
- }
-
- /// Creates a [`CStr`] from a `[u8]` without performing any additional
- /// checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub const unsafe fn from_bytes_with_nul_unchecked(bytes: &[u8]) -> &CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { core::mem::transmute(bytes) }
- }
-
- /// Creates a mutable [`CStr`] from a `[u8]` without performing any
- /// additional checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
- }
-
- /// Returns a C pointer to the string.
- #[inline]
- pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
- self.0.as_ptr() as _
- }
-
- /// Convert the string to a byte slice without the trailing `NUL` byte.
- #[inline]
- pub fn as_bytes(&self) -> &[u8] {
- &self.0[..self.len()]
- }
-
- /// Convert the string to a byte slice containing the trailing `NUL` byte.
- #[inline]
- pub const fn as_bytes_with_nul(&self) -> &[u8] {
- &self.0
- }
-
- /// Yields a [`&str`] slice if the [`CStr`] contains valid UTF-8.
- ///
- /// If the contents of the [`CStr`] are valid UTF-8 data, this
- /// function will return the corresponding [`&str`] slice. Otherwise,
- /// it will return an error with details of where UTF-8 validation failed.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::str::CStr;
- /// let cstr = CStr::from_bytes_with_nul(b"foo\0").unwrap();
- /// assert_eq!(cstr.to_str(), Ok("foo"));
- /// ```
- #[inline]
- pub fn to_str(&self) -> Result<&str, core::str::Utf8Error> {
- core::str::from_utf8(self.as_bytes())
- }
-
- /// Unsafely convert this [`CStr`] into a [`&str`], without checking for
- /// valid UTF-8.
- ///
- /// # Safety
- ///
- /// The contents must be valid UTF-8.
+impl fmt::Display for CStrDisplay<'_> {
+ /// Formats printable ASCII characters, escaping the rest.
///
/// # Examples
///
/// ```
- /// # use kernel::c_str;
- /// # use kernel::str::CStr;
- /// let bar = c_str!("ツ");
- /// // SAFETY: String literals are guaranteed to be valid UTF-8
- /// // by the Rust compiler.
- /// assert_eq!(unsafe { bar.as_str_unchecked() }, "ツ");
- /// ```
- #[inline]
- pub unsafe fn as_str_unchecked(&self) -> &str {
- unsafe { core::str::from_utf8_unchecked(self.as_bytes()) }
- }
-
- /// Convert this [`CStr`] into a [`CString`] by allocating memory and
- /// copying over the string data.
- pub fn to_cstring(&self) -> Result<CString, AllocError> {
- CString::try_from(self)
- }
-
- /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new lowercased value without modifying the existing one, use
- /// [`to_ascii_lowercase()`].
- ///
- /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
- pub fn make_ascii_lowercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_lowercase();
- }
-
- /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new uppercased value without modifying the existing one, use
- /// [`to_ascii_uppercase()`].
- ///
- /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
- pub fn make_ascii_uppercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_uppercase();
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII lower case equivalent.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To lowercase the value in-place, use [`make_ascii_lowercase`].
- ///
- /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
- pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_lowercase();
-
- Ok(s)
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII upper case equivalent.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To uppercase the value in-place, use [`make_ascii_uppercase`].
- ///
- /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
- pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_uppercase();
-
- Ok(s)
- }
-}
-
-impl fmt::Display for CStr {
- /// Formats printable ASCII characters, escaping the rest.
- ///
- /// ```
+ /// # use core::ffi::CStr;
/// # use kernel::c_str;
/// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
- ///
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "so \"cool\"\0".as_bytes());
+ /// # use kernel::str::{CStrExt, CString};
+ /// let penguin = c"🐧";
+ /// let s = CString::try_from_fmt(fmt!("{}", penguin.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
+ ///
+ /// let ascii = c"so \"cool\"";
+ /// let s = CString::try_from_fmt(fmt!("{}", ascii.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "so \"cool\"\0".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- for &c in self.as_bytes() {
+ for &c in self.0.to_bytes() {
if (0x20..0x7f).contains(&c) {
// Printable character.
f.write_char(c as char)?;
@@ -404,116 +163,70 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
}
}
-impl fmt::Debug for CStr {
- /// Formats printable ASCII characters with a double quote on either end, escaping the rest.
+/// Extensions to [`CStr`].
+pub trait CStrExt {
+ /// Returns an object that implements [`Display`](core::fmt::Display) for
+ /// safely printing a [`CStr`] that may contain non-ASCII data, which are
+ /// escaped.
+ ///
+ /// # Examples
///
/// ```
+ /// # use core::ffi::CStr;
/// # use kernel::c_str;
/// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{:?}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"\\xf0\\x9f\\x90\\xa7\"\0".as_bytes());
- ///
- /// // Embedded double quotes are escaped.
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"so \\\"cool\\\"\"\0".as_bytes());
+ /// # use kernel::str::{CStrExt, CString};
+ /// let penguin = c"🐧";
+ /// let s = CString::try_from_fmt(fmt!("{}", penguin.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
+ ///
+ /// let ascii = c"so \"cool\"";
+ /// let s = CString::try_from_fmt(fmt!("{}", ascii.display())).unwrap();
+ /// assert_eq!(s.to_bytes_with_nul(), "so \"cool\"\0".as_bytes());
/// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- f.write_str("\"")?;
- for &c in self.as_bytes() {
- match c {
- // Printable characters.
- b'\"' => f.write_str("\\\"")?,
- 0x20..=0x7e => f.write_char(c as char)?,
- _ => write!(f, "\\x{:02x}", c)?,
- }
- }
- f.write_str("\"")
- }
-}
-
-impl AsRef<BStr> for CStr {
- #[inline]
- fn as_ref(&self) -> &BStr {
- BStr::from_bytes(self.as_bytes())
- }
-}
-
-impl Deref for CStr {
- type Target = BStr;
-
- #[inline]
- fn deref(&self) -> &Self::Target {
- self.as_ref()
- }
-}
-
-impl Index<ops::RangeFrom<usize>> for CStr {
- type Output = CStr;
+ fn display(&self) -> CStrDisplay<'_>;
- #[inline]
- fn index(&self, index: ops::RangeFrom<usize>) -> &Self::Output {
- // Delegate bounds checking to slice.
- // Assign to _ to mute clippy's unnecessary operation warning.
- let _ = &self.as_bytes()[index.start..];
- // SAFETY: We just checked the bounds.
- unsafe { Self::from_bytes_with_nul_unchecked(&self.0[index.start..]) }
- }
+ /// Creates a mutable [`CStr`] from a `[u8]` without performing any
+ /// additional checks.
+ ///
+ /// # Safety
+ ///
+ /// `bytes` *must* end with a `NUL` byte, and should only have a single
+ /// `NUL` byte (or the string will be truncated).
+ unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut Self;
}
-impl Index<ops::RangeFull> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, _index: ops::RangeFull) -> &Self::Output {
- self
+impl CStrExt for CStr {
+ fn display(&self) -> CStrDisplay<'_> {
+ CStrDisplay(self)
}
-}
-
-mod private {
- use core::ops;
-
- // Marker trait for index types that can be forward to `BStr`.
- pub trait CStrIndex {}
-
- impl CStrIndex for usize {}
- impl CStrIndex for ops::Range<usize> {}
- impl CStrIndex for ops::RangeInclusive<usize> {}
- impl CStrIndex for ops::RangeToInclusive<usize> {}
-}
-
-impl<Idx> Index<Idx> for CStr
-where
- Idx: private::CStrIndex,
- BStr: Index<Idx>,
-{
- type Output = <BStr as Index<Idx>>::Output;
- #[inline]
- fn index(&self, index: Idx) -> &Self::Output {
- &self.as_ref()[index]
+ unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut Self {
+ // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
+ unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
}
}
/// Creates a new [`CStr`] from a string literal.
///
-/// The string literal should not contain any `NUL` bytes.
+/// This macro is not needed when C-string literals (`c"hello"` syntax) can be
+/// used directly, but can be used when a C-string version of a standard string
+/// literal is required (often when working with macros).
+///
+/// The string should not contain any `NUL` bytes.
///
/// # Examples
///
/// ```
+/// # use core::ffi::CStr;
/// # use kernel::c_str;
-/// # use kernel::str::CStr;
-/// const MY_CSTR: &CStr = c_str!("My awesome CStr!");
+/// const MY_CSTR: &CStr = c_str!(stringify!(5));
/// ```
#[macro_export]
macro_rules! c_str {
($str:expr) => {{
const S: &str = concat!($str, "\0");
- const C: &$crate::str::CStr = match $crate::str::CStr::from_bytes_with_nul(S.as_bytes()) {
+ const C: &core::ffi::CStr = match core::ffi::CStr::from_bytes_with_nul(S.as_bytes()) {
Ok(v) => v,
Err(_) => panic!("string contains interior NUL"),
};
@@ -526,79 +239,6 @@ mod tests {
use super::*;
use alloc::format;
- const ALL_ASCII_CHARS: &'static str =
- "\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09\\x0a\\x0b\\x0c\\x0d\\x0e\\x0f\
- \\x10\\x11\\x12\\x13\\x14\\x15\\x16\\x17\\x18\\x19\\x1a\\x1b\\x1c\\x1d\\x1e\\x1f \
- !\"#$%&'()*+,-./0123456789:;<=>?@\
- ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\x7f\
- \\x80\\x81\\x82\\x83\\x84\\x85\\x86\\x87\\x88\\x89\\x8a\\x8b\\x8c\\x8d\\x8e\\x8f\
- \\x90\\x91\\x92\\x93\\x94\\x95\\x96\\x97\\x98\\x99\\x9a\\x9b\\x9c\\x9d\\x9e\\x9f\
- \\xa0\\xa1\\xa2\\xa3\\xa4\\xa5\\xa6\\xa7\\xa8\\xa9\\xaa\\xab\\xac\\xad\\xae\\xaf\
- \\xb0\\xb1\\xb2\\xb3\\xb4\\xb5\\xb6\\xb7\\xb8\\xb9\\xba\\xbb\\xbc\\xbd\\xbe\\xbf\
- \\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\
- \\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd7\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde\\xdf\
- \\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\
- \\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf7\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff";
-
- #[test]
- fn test_cstr_to_str() {
- let good_bytes = b"\xf0\x9f\xa6\x80\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let checked_str = checked_cstr.to_str().unwrap();
- assert_eq!(checked_str, "🦀");
- }
-
- #[test]
- #[should_panic]
- fn test_cstr_to_str_panic() {
- let bad_bytes = b"\xc3\x28\0";
- let checked_cstr = CStr::from_bytes_with_nul(bad_bytes).unwrap();
- checked_cstr.to_str().unwrap();
- }
-
- #[test]
- fn test_cstr_as_str_unchecked() {
- let good_bytes = b"\xf0\x9f\x90\xA7\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let unchecked_str = unsafe { checked_cstr.as_str_unchecked() };
- assert_eq!(unchecked_str, "🐧");
- }
-
- #[test]
- fn test_cstr_display() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{}", hello_world), "hello, world!");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{}", non_printables), "\\x01\\x09\\x0a");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{}", non_ascii), "d\\xe9j\\xe0 vu");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{}", good_bytes), "\\xf0\\x9f\\xa6\\x80");
- }
-
- #[test]
- fn test_cstr_display_all_bytes() {
- let mut bytes: [u8; 256] = [0; 256];
- // fill `bytes` with [1..=255] + [0]
- for i in u8::MIN..=u8::MAX {
- bytes[i as usize] = i.wrapping_add(1);
- }
- let cstr = CStr::from_bytes_with_nul(&bytes).unwrap();
- assert_eq!(format!("{}", cstr), ALL_ASCII_CHARS);
- }
-
- #[test]
- fn test_cstr_debug() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{:?}", hello_world), "\"hello, world!\"");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{:?}", non_printables), "\"\\x01\\x09\\x0a\"");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{:?}", non_ascii), "\"d\\xe9j\\xe0 vu\"");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{:?}", good_bytes), "\"\\xf0\\x9f\\xa6\\x80\"");
- }
-
#[test]
fn test_bstr_display() {
let hello_world = BStr::from_bytes(b"hello, world!");
@@ -779,11 +419,11 @@ fn write_str(&mut self, s: &str) -> fmt::Result {
/// use kernel::{str::CString, fmt};
///
/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "abc1020\0".as_bytes());
///
/// let tmp = "testing";
/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "testing123\0".as_bytes());
///
/// // This fails because it has an embedded `NUL` byte.
/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
@@ -838,21 +478,13 @@ fn deref(&self) -> &Self::Target {
}
}
-impl DerefMut for CString {
- fn deref_mut(&mut self) -> &mut Self::Target {
- // SAFETY: A `CString` is always NUL-terminated and contains no other
- // NUL bytes.
- unsafe { CStr::from_bytes_with_nul_unchecked_mut(self.buf.as_mut_slice()) }
- }
-}
-
impl<'a> TryFrom<&'a CStr> for CString {
type Error = AllocError;
fn try_from(cstr: &'a CStr) -> Result<CString, AllocError> {
let mut buf = Vec::new();
- <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.as_bytes_with_nul(), GFP_KERNEL)
+ <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.to_bytes_with_nul(), GFP_KERNEL)
.map_err(|_| AllocError)?;
// INVARIANT: The `CStr` and `CString` types have the same invariants for
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
index 2b306afbe56d..16d1a1cb8d00 100644
--- a/rust/kernel/sync/condvar.rs
+++ b/rust/kernel/sync/condvar.rs
@@ -9,12 +9,11 @@
use crate::{
init::PinInit,
pin_init,
- str::CStr,
task::{MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE, TASK_NORMAL, TASK_UNINTERRUPTIBLE},
time::Jiffies,
types::Opaque,
};
-use core::ffi::{c_int, c_long};
+use core::ffi::{c_int, c_long, CStr};
use core::marker::PhantomPinned;
use core::ptr;
use macros::pin_data;
@@ -108,7 +107,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
wait_queue_head <- Opaque::ffi_init(|slot| unsafe {
- bindings::__init_waitqueue_head(slot, name.as_char_ptr(), key.as_ptr())
+ bindings::__init_waitqueue_head(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index f6c34ca4d819..318ecb5a5916 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -6,8 +6,8 @@
//! spinlocks, raw spinlocks) to be provided with minimal effort.
use super::LockClassKey;
-use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
-use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
+use crate::{init::PinInit, pin_init, types::Opaque, types::ScopeGuard};
+use core::{cell::UnsafeCell, ffi::CStr, marker::PhantomData, marker::PhantomPinned};
use macros::pin_data;
pub mod mutex;
@@ -113,7 +113,7 @@ pub fn new(t: T, name: &'static CStr, key: &'static LockClassKey) -> impl PinIni
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
state <- Opaque::ffi_init(|slot| unsafe {
- B::init(slot, name.as_char_ptr(), key.as_ptr())
+ B::init(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 553a5cba2adc..a6418873e82e 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -380,7 +380,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
slot,
Some(T::Pointer::run),
false,
- name.as_char_ptr(),
+ name.as_ptr(),
key.as_ptr(),
)
}
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 5ebd42ae4a3f..339991ee6885 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -172,7 +172,7 @@ pub extern "C" fn {kunit_name}(__kunit_test: *mut kernel::bindings::kunit) {{
#[allow(unused)]
macro_rules! assert {{
($cond:expr $(,)?) => {{{{
- kernel::kunit_assert!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
+ kernel::kunit_assert!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
}}}}
}}
@@ -180,7 +180,7 @@ macro_rules! assert {{
#[allow(unused)]
macro_rules! assert_eq {{
($left:expr, $right:expr $(,)?) => {{{{
- kernel::kunit_assert_eq!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
+ kernel::kunit_assert_eq!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
}}}}
}}
--
2.45.2
`CStr` became a part of `core` library in Rust 1.75. This change replaces
the custom `CStr` implementation with the one from `core`.
`core::CStr` behaves generally the same as the removed implementation,
with the following differences:
- It does not implement `Display` (but implements `Debug`). Therefore,
by switching to `core::CStr`, we lose the `Display` implementation.
- Lack of `Display` implementation impacted only rust/kernel/kunit.rs.
In this change, we use `Debug` format there. The only difference
between the removed `Display` output and `Debug` output are quotation
marks present in the latter (`foo` vs `"foo"`).
- It does not provide `from_bytes_with_nul_unchecked_mut` method.
- It was used only in `DerefMut` implementation for `CString`. This
change removes that implementation.
- Otherwise, having such a method is not desirable. The rule in Rust
std is that `str` is used only as an immutable reference (`&str`),
while mutating strings is done with the owned `String` type.
Similarly, we can introduce the rule that `CStr` should be used only
as an immutable reference (`&CStr`), while mutating is done only with
the owned `CString` type.
- It has `as_ptr()` method instead of `as_char_ptr()`, which also returns
`*const c_char`.
Signed-off-by: Michal Rostecki <vadorovsky(a)gmail.com>
---
v1 -> v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
v2 -> v3:
- Fix the commit message.
- Remove redundant braces in `use`, when only one item is imported.
rust/kernel/error.rs | 7 +-
rust/kernel/kunit.rs | 12 +-
rust/kernel/net/phy.rs | 2 +-
rust/kernel/prelude.rs | 4 +-
rust/kernel/str.rs | 486 ++----------------------------------
rust/kernel/sync/condvar.rs | 5 +-
rust/kernel/sync/lock.rs | 6 +-
rust/kernel/workqueue.rs | 2 +-
scripts/rustdoc_test_gen.rs | 4 +-
9 files changed, 44 insertions(+), 484 deletions(-)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 55280ae9fe40..18808b29604d 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -4,10 +4,11 @@
//!
//! C header: [`include/uapi/asm-generic/errno-base.h`](srctree/include/uapi/asm-generic/errno-base.h)
-use crate::{alloc::AllocError, str::CStr};
+use crate::alloc::AllocError;
use alloc::alloc::LayoutError;
+use core::ffi::CStr;
use core::fmt;
use core::num::TryFromIntError;
use core::str::Utf8Error;
@@ -142,7 +143,7 @@ pub fn name(&self) -> Option<&'static CStr> {
None
} else {
// SAFETY: The string returned by `errname` is static and `NUL`-terminated.
- Some(unsafe { CStr::from_char_ptr(ptr) })
+ Some(unsafe { CStr::from_ptr(ptr) })
}
}
@@ -164,7 +165,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
None => f.debug_tuple("Error").field(&-self.0).finish(),
// SAFETY: These strings are ASCII-only.
Some(name) => f
- .debug_tuple(unsafe { core::str::from_utf8_unchecked(name) })
+ .debug_tuple(unsafe { core::str::from_utf8_unchecked(name.to_bytes()) })
.finish(),
}
}
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 0ba77276ae7e..c08f9dddaa6f 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -56,9 +56,9 @@ macro_rules! kunit_assert {
break 'out;
}
- static FILE: &'static $crate::str::CStr = $crate::c_str!($file);
+ static FILE: &'static core::ffi::CStr = $file;
static LINE: i32 = core::line!() as i32 - $diff;
- static CONDITION: &'static $crate::str::CStr = $crate::c_str!(stringify!($condition));
+ static CONDITION: &'static core::ffi::CStr = $crate::c_str!(stringify!($condition));
// SAFETY: FFI call without safety requirements.
let kunit_test = unsafe { $crate::bindings::kunit_get_current_test() };
@@ -71,11 +71,11 @@ macro_rules! kunit_assert {
//
// This mimics KUnit's failed assertion format.
$crate::kunit::err(format_args!(
- " # {}: ASSERTION FAILED at {FILE}:{LINE}\n",
+ " # {:?}: ASSERTION FAILED at {FILE:?}:{LINE:?}\n",
$name
));
$crate::kunit::err(format_args!(
- " Expected {CONDITION} to be true, but is false\n"
+ " Expected {CONDITION:?} to be true, but is false\n"
));
$crate::kunit::err(format_args!(
" Failure not reported to KUnit since this is a non-KUnit task\n"
@@ -98,12 +98,12 @@ unsafe impl Sync for Location {}
unsafe impl Sync for UnaryAssert {}
static LOCATION: Location = Location($crate::bindings::kunit_loc {
- file: FILE.as_char_ptr(),
+ file: FILE.as_ptr(),
line: LINE,
});
static ASSERTION: UnaryAssert = UnaryAssert($crate::bindings::kunit_unary_assert {
assert: $crate::bindings::kunit_assert {},
- condition: CONDITION.as_char_ptr(),
+ condition: CONDITION.as_ptr(),
expected_true: true,
});
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index fd40b703d224..19f45922ec42 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -502,7 +502,7 @@ unsafe impl Sync for DriverVTable {}
pub const fn create_phy_driver<T: Driver>() -> DriverVTable {
// INVARIANT: All the fields of `struct phy_driver` are initialized properly.
DriverVTable(Opaque::new(bindings::phy_driver {
- name: T::NAME.as_char_ptr().cast_mut(),
+ name: T::NAME.as_ptr().cast_mut(),
flags: T::FLAGS,
phy_id: T::PHY_DEVICE_ID.id,
phy_id_mask: T::PHY_DEVICE_ID.mask_as_int(),
diff --git a/rust/kernel/prelude.rs b/rust/kernel/prelude.rs
index b37a0b3180fb..b0969ca78f10 100644
--- a/rust/kernel/prelude.rs
+++ b/rust/kernel/prelude.rs
@@ -12,7 +12,7 @@
//! ```
#[doc(no_inline)]
-pub use core::pin::Pin;
+pub use core::{ffi::CStr, pin::Pin};
pub use crate::alloc::{box_ext::BoxExt, flags::*, vec_ext::VecExt};
@@ -35,7 +35,7 @@
pub use super::error::{code::*, Error, Result};
-pub use super::{str::CStr, ThisModule};
+pub use super::ThisModule;
pub use super::init::{InPlaceInit, Init, PinInit};
diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index bb8d4f41475b..e491a9803187 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -4,8 +4,9 @@
use crate::alloc::{flags::*, vec_ext::VecExt, AllocError};
use alloc::vec::Vec;
+use core::ffi::CStr;
use core::fmt::{self, Write};
-use core::ops::{self, Deref, DerefMut, Index};
+use core::ops::Deref;
use crate::error::{code::*, Error};
@@ -41,11 +42,11 @@ impl fmt::Display for BStr {
/// # use kernel::{fmt, b_str, str::{BStr, CString}};
/// let ascii = b_str!("Hello, BStr!");
/// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "Hello, BStr!".as_bytes());
+ /// assert_eq!(s.to_bytes(), "Hello, BStr!".as_bytes());
///
/// let non_ascii = b_str!("🦀");
/// let s = CString::try_from_fmt(fmt!("{}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for &b in &self.0 {
@@ -72,11 +73,11 @@ impl fmt::Debug for BStr {
/// // Embedded double quotes are escaped.
/// let ascii = b_str!("Hello, \"BStr\"!");
/// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
///
/// let non_ascii = b_str!("😺");
/// let s = CString::try_from_fmt(fmt!("{:?}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_char('"')?;
@@ -128,392 +129,32 @@ macro_rules! b_str {
}};
}
-/// Possible errors when using conversion functions in [`CStr`].
-#[derive(Debug, Clone, Copy)]
-pub enum CStrConvertError {
- /// Supplied bytes contain an interior `NUL`.
- InteriorNul,
-
- /// Supplied bytes are not terminated by `NUL`.
- NotNulTerminated,
-}
-
-impl From<CStrConvertError> for Error {
- #[inline]
- fn from(_: CStrConvertError) -> Error {
- EINVAL
- }
-}
-
-/// A string that is guaranteed to have exactly one `NUL` byte, which is at the
-/// end.
-///
-/// Used for interoperability with kernel APIs that take C strings.
-#[repr(transparent)]
-pub struct CStr([u8]);
-
-impl CStr {
- /// Returns the length of this string excluding `NUL`.
- #[inline]
- pub const fn len(&self) -> usize {
- self.len_with_nul() - 1
- }
-
- /// Returns the length of this string with `NUL`.
- #[inline]
- pub const fn len_with_nul(&self) -> usize {
- // SAFETY: This is one of the invariant of `CStr`.
- // We add a `unreachable_unchecked` here to hint the optimizer that
- // the value returned from this function is non-zero.
- if self.0.is_empty() {
- unsafe { core::hint::unreachable_unchecked() };
- }
- self.0.len()
- }
-
- /// Returns `true` if the string only includes `NUL`.
- #[inline]
- pub const fn is_empty(&self) -> bool {
- self.len() == 0
- }
-
- /// Wraps a raw C string pointer.
- ///
- /// # Safety
- ///
- /// `ptr` must be a valid pointer to a `NUL`-terminated C string, and it must
- /// last at least `'a`. When `CStr` is alive, the memory pointed by `ptr`
- /// must not be mutated.
- #[inline]
- pub unsafe fn from_char_ptr<'a>(ptr: *const core::ffi::c_char) -> &'a Self {
- // SAFETY: The safety precondition guarantees `ptr` is a valid pointer
- // to a `NUL`-terminated C string.
- let len = unsafe { bindings::strlen(ptr) } + 1;
- // SAFETY: Lifetime guaranteed by the safety precondition.
- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) };
- // SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`.
- // As we have added 1 to `len`, the last byte is known to be `NUL`.
- unsafe { Self::from_bytes_with_nul_unchecked(bytes) }
- }
-
- /// Creates a [`CStr`] from a `[u8]`.
- ///
- /// The provided slice must be `NUL`-terminated, does not contain any
- /// interior `NUL` bytes.
- pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError> {
- if bytes.is_empty() {
- return Err(CStrConvertError::NotNulTerminated);
- }
- if bytes[bytes.len() - 1] != 0 {
- return Err(CStrConvertError::NotNulTerminated);
- }
- let mut i = 0;
- // `i + 1 < bytes.len()` allows LLVM to optimize away bounds checking,
- // while it couldn't optimize away bounds checks for `i < bytes.len() - 1`.
- while i + 1 < bytes.len() {
- if bytes[i] == 0 {
- return Err(CStrConvertError::InteriorNul);
- }
- i += 1;
- }
- // SAFETY: We just checked that all properties hold.
- Ok(unsafe { Self::from_bytes_with_nul_unchecked(bytes) })
- }
-
- /// Creates a [`CStr`] from a `[u8]` without performing any additional
- /// checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub const unsafe fn from_bytes_with_nul_unchecked(bytes: &[u8]) -> &CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { core::mem::transmute(bytes) }
- }
-
- /// Creates a mutable [`CStr`] from a `[u8]` without performing any
- /// additional checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
- }
-
- /// Returns a C pointer to the string.
- #[inline]
- pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
- self.0.as_ptr() as _
- }
-
- /// Convert the string to a byte slice without the trailing `NUL` byte.
- #[inline]
- pub fn as_bytes(&self) -> &[u8] {
- &self.0[..self.len()]
- }
-
- /// Convert the string to a byte slice containing the trailing `NUL` byte.
- #[inline]
- pub const fn as_bytes_with_nul(&self) -> &[u8] {
- &self.0
- }
-
- /// Yields a [`&str`] slice if the [`CStr`] contains valid UTF-8.
- ///
- /// If the contents of the [`CStr`] are valid UTF-8 data, this
- /// function will return the corresponding [`&str`] slice. Otherwise,
- /// it will return an error with details of where UTF-8 validation failed.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::str::CStr;
- /// let cstr = CStr::from_bytes_with_nul(b"foo\0").unwrap();
- /// assert_eq!(cstr.to_str(), Ok("foo"));
- /// ```
- #[inline]
- pub fn to_str(&self) -> Result<&str, core::str::Utf8Error> {
- core::str::from_utf8(self.as_bytes())
- }
-
- /// Unsafely convert this [`CStr`] into a [`&str`], without checking for
- /// valid UTF-8.
- ///
- /// # Safety
- ///
- /// The contents must be valid UTF-8.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::str::CStr;
- /// let bar = c_str!("ツ");
- /// // SAFETY: String literals are guaranteed to be valid UTF-8
- /// // by the Rust compiler.
- /// assert_eq!(unsafe { bar.as_str_unchecked() }, "ツ");
- /// ```
- #[inline]
- pub unsafe fn as_str_unchecked(&self) -> &str {
- unsafe { core::str::from_utf8_unchecked(self.as_bytes()) }
- }
-
- /// Convert this [`CStr`] into a [`CString`] by allocating memory and
- /// copying over the string data.
- pub fn to_cstring(&self) -> Result<CString, AllocError> {
- CString::try_from(self)
- }
-
- /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new lowercased value without modifying the existing one, use
- /// [`to_ascii_lowercase()`].
- ///
- /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
- pub fn make_ascii_lowercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_lowercase();
- }
-
- /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new uppercased value without modifying the existing one, use
- /// [`to_ascii_uppercase()`].
- ///
- /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
- pub fn make_ascii_uppercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_uppercase();
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII lower case equivalent.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To lowercase the value in-place, use [`make_ascii_lowercase`].
- ///
- /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
- pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_lowercase();
-
- Ok(s)
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII upper case equivalent.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To uppercase the value in-place, use [`make_ascii_uppercase`].
- ///
- /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
- pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_uppercase();
-
- Ok(s)
- }
-}
-
-impl fmt::Display for CStr {
- /// Formats printable ASCII characters, escaping the rest.
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
- ///
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "so \"cool\"\0".as_bytes());
- /// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- for &c in self.as_bytes() {
- if (0x20..0x7f).contains(&c) {
- // Printable character.
- f.write_char(c as char)?;
- } else {
- write!(f, "\\x{:02x}", c)?;
- }
- }
- Ok(())
- }
-}
-
-impl fmt::Debug for CStr {
- /// Formats printable ASCII characters with a double quote on either end, escaping the rest.
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{:?}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"\\xf0\\x9f\\x90\\xa7\"\0".as_bytes());
- ///
- /// // Embedded double quotes are escaped.
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"so \\\"cool\\\"\"\0".as_bytes());
- /// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- f.write_str("\"")?;
- for &c in self.as_bytes() {
- match c {
- // Printable characters.
- b'\"' => f.write_str("\\\"")?,
- 0x20..=0x7e => f.write_char(c as char)?,
- _ => write!(f, "\\x{:02x}", c)?,
- }
- }
- f.write_str("\"")
- }
-}
-
-impl AsRef<BStr> for CStr {
- #[inline]
- fn as_ref(&self) -> &BStr {
- BStr::from_bytes(self.as_bytes())
- }
-}
-
-impl Deref for CStr {
- type Target = BStr;
-
- #[inline]
- fn deref(&self) -> &Self::Target {
- self.as_ref()
- }
-}
-
-impl Index<ops::RangeFrom<usize>> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, index: ops::RangeFrom<usize>) -> &Self::Output {
- // Delegate bounds checking to slice.
- // Assign to _ to mute clippy's unnecessary operation warning.
- let _ = &self.as_bytes()[index.start..];
- // SAFETY: We just checked the bounds.
- unsafe { Self::from_bytes_with_nul_unchecked(&self.0[index.start..]) }
- }
-}
-
-impl Index<ops::RangeFull> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, _index: ops::RangeFull) -> &Self::Output {
- self
- }
-}
-
-mod private {
- use core::ops;
-
- // Marker trait for index types that can be forward to `BStr`.
- pub trait CStrIndex {}
-
- impl CStrIndex for usize {}
- impl CStrIndex for ops::Range<usize> {}
- impl CStrIndex for ops::RangeInclusive<usize> {}
- impl CStrIndex for ops::RangeToInclusive<usize> {}
-}
-
-impl<Idx> Index<Idx> for CStr
-where
- Idx: private::CStrIndex,
- BStr: Index<Idx>,
-{
- type Output = <BStr as Index<Idx>>::Output;
-
- #[inline]
- fn index(&self, index: Idx) -> &Self::Output {
- &self.as_ref()[index]
- }
-}
-
/// Creates a new [`CStr`] from a string literal.
///
-/// The string literal should not contain any `NUL` bytes.
+/// Usually, defining C-string literals directly should be preffered, but this
+/// macro is helpful in situations when C-string literals are hard or
+/// impossible to use, for example:
+///
+/// - When working with macros, which already return a Rust string literal
+/// (e.g. `stringify!`).
+/// - When building macros, where we want to take a Rust string literal as an
+/// argument (for caller's convenience), but still use it as a C-string
+/// internally.
+///
+/// The string should not contain any `NUL` bytes.
///
/// # Examples
///
/// ```
+/// # use core::ffi::CStr;
/// # use kernel::c_str;
-/// # use kernel::str::CStr;
-/// const MY_CSTR: &CStr = c_str!("My awesome CStr!");
+/// const MY_CSTR: &CStr = c_str!(stringify!(5));
/// ```
#[macro_export]
macro_rules! c_str {
($str:expr) => {{
const S: &str = concat!($str, "\0");
- const C: &$crate::str::CStr = match $crate::str::CStr::from_bytes_with_nul(S.as_bytes()) {
+ const C: &core::ffi::CStr = match core::ffi::CStr::from_bytes_with_nul(S.as_bytes()) {
Ok(v) => v,
Err(_) => panic!("string contains interior NUL"),
};
@@ -526,79 +167,6 @@ mod tests {
use super::*;
use alloc::format;
- const ALL_ASCII_CHARS: &'static str =
- "\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09\\x0a\\x0b\\x0c\\x0d\\x0e\\x0f\
- \\x10\\x11\\x12\\x13\\x14\\x15\\x16\\x17\\x18\\x19\\x1a\\x1b\\x1c\\x1d\\x1e\\x1f \
- !\"#$%&'()*+,-./0123456789:;<=>?@\
- ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\x7f\
- \\x80\\x81\\x82\\x83\\x84\\x85\\x86\\x87\\x88\\x89\\x8a\\x8b\\x8c\\x8d\\x8e\\x8f\
- \\x90\\x91\\x92\\x93\\x94\\x95\\x96\\x97\\x98\\x99\\x9a\\x9b\\x9c\\x9d\\x9e\\x9f\
- \\xa0\\xa1\\xa2\\xa3\\xa4\\xa5\\xa6\\xa7\\xa8\\xa9\\xaa\\xab\\xac\\xad\\xae\\xaf\
- \\xb0\\xb1\\xb2\\xb3\\xb4\\xb5\\xb6\\xb7\\xb8\\xb9\\xba\\xbb\\xbc\\xbd\\xbe\\xbf\
- \\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\
- \\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd7\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde\\xdf\
- \\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\
- \\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf7\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff";
-
- #[test]
- fn test_cstr_to_str() {
- let good_bytes = b"\xf0\x9f\xa6\x80\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let checked_str = checked_cstr.to_str().unwrap();
- assert_eq!(checked_str, "🦀");
- }
-
- #[test]
- #[should_panic]
- fn test_cstr_to_str_panic() {
- let bad_bytes = b"\xc3\x28\0";
- let checked_cstr = CStr::from_bytes_with_nul(bad_bytes).unwrap();
- checked_cstr.to_str().unwrap();
- }
-
- #[test]
- fn test_cstr_as_str_unchecked() {
- let good_bytes = b"\xf0\x9f\x90\xA7\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let unchecked_str = unsafe { checked_cstr.as_str_unchecked() };
- assert_eq!(unchecked_str, "🐧");
- }
-
- #[test]
- fn test_cstr_display() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{}", hello_world), "hello, world!");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{}", non_printables), "\\x01\\x09\\x0a");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{}", non_ascii), "d\\xe9j\\xe0 vu");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{}", good_bytes), "\\xf0\\x9f\\xa6\\x80");
- }
-
- #[test]
- fn test_cstr_display_all_bytes() {
- let mut bytes: [u8; 256] = [0; 256];
- // fill `bytes` with [1..=255] + [0]
- for i in u8::MIN..=u8::MAX {
- bytes[i as usize] = i.wrapping_add(1);
- }
- let cstr = CStr::from_bytes_with_nul(&bytes).unwrap();
- assert_eq!(format!("{}", cstr), ALL_ASCII_CHARS);
- }
-
- #[test]
- fn test_cstr_debug() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{:?}", hello_world), "\"hello, world!\"");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{:?}", non_printables), "\"\\x01\\x09\\x0a\"");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{:?}", non_ascii), "\"d\\xe9j\\xe0 vu\"");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{:?}", good_bytes), "\"\\xf0\\x9f\\xa6\\x80\"");
- }
-
#[test]
fn test_bstr_display() {
let hello_world = BStr::from_bytes(b"hello, world!");
@@ -779,11 +347,11 @@ fn write_str(&mut self, s: &str) -> fmt::Result {
/// use kernel::{str::CString, fmt};
///
/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "abc1020\0".as_bytes());
///
/// let tmp = "testing";
/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "testing123\0".as_bytes());
///
/// // This fails because it has an embedded `NUL` byte.
/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
@@ -838,21 +406,13 @@ fn deref(&self) -> &Self::Target {
}
}
-impl DerefMut for CString {
- fn deref_mut(&mut self) -> &mut Self::Target {
- // SAFETY: A `CString` is always NUL-terminated and contains no other
- // NUL bytes.
- unsafe { CStr::from_bytes_with_nul_unchecked_mut(self.buf.as_mut_slice()) }
- }
-}
-
impl<'a> TryFrom<&'a CStr> for CString {
type Error = AllocError;
fn try_from(cstr: &'a CStr) -> Result<CString, AllocError> {
let mut buf = Vec::new();
- <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.as_bytes_with_nul(), GFP_KERNEL)
+ <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.to_bytes_with_nul(), GFP_KERNEL)
.map_err(|_| AllocError)?;
// INVARIANT: The `CStr` and `CString` types have the same invariants for
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
index 2b306afbe56d..16d1a1cb8d00 100644
--- a/rust/kernel/sync/condvar.rs
+++ b/rust/kernel/sync/condvar.rs
@@ -9,12 +9,11 @@
use crate::{
init::PinInit,
pin_init,
- str::CStr,
task::{MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE, TASK_NORMAL, TASK_UNINTERRUPTIBLE},
time::Jiffies,
types::Opaque,
};
-use core::ffi::{c_int, c_long};
+use core::ffi::{c_int, c_long, CStr};
use core::marker::PhantomPinned;
use core::ptr;
use macros::pin_data;
@@ -108,7 +107,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
wait_queue_head <- Opaque::ffi_init(|slot| unsafe {
- bindings::__init_waitqueue_head(slot, name.as_char_ptr(), key.as_ptr())
+ bindings::__init_waitqueue_head(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index f6c34ca4d819..318ecb5a5916 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -6,8 +6,8 @@
//! spinlocks, raw spinlocks) to be provided with minimal effort.
use super::LockClassKey;
-use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
-use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
+use crate::{init::PinInit, pin_init, types::Opaque, types::ScopeGuard};
+use core::{cell::UnsafeCell, ffi::CStr, marker::PhantomData, marker::PhantomPinned};
use macros::pin_data;
pub mod mutex;
@@ -113,7 +113,7 @@ pub fn new(t: T, name: &'static CStr, key: &'static LockClassKey) -> impl PinIni
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
state <- Opaque::ffi_init(|slot| unsafe {
- B::init(slot, name.as_char_ptr(), key.as_ptr())
+ B::init(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 553a5cba2adc..a6418873e82e 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -380,7 +380,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
slot,
Some(T::Pointer::run),
false,
- name.as_char_ptr(),
+ name.as_ptr(),
key.as_ptr(),
)
}
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 5ebd42ae4a3f..339991ee6885 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -172,7 +172,7 @@ pub extern "C" fn {kunit_name}(__kunit_test: *mut kernel::bindings::kunit) {{
#[allow(unused)]
macro_rules! assert {{
($cond:expr $(,)?) => {{{{
- kernel::kunit_assert!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
+ kernel::kunit_assert!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
}}}}
}}
@@ -180,7 +180,7 @@ macro_rules! assert {{
#[allow(unused)]
macro_rules! assert_eq {{
($left:expr, $right:expr $(,)?) => {{{{
- kernel::kunit_assert_eq!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
+ kernel::kunit_assert_eq!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
}}}}
}}
--
2.45.2
Post my improvement of the test:
https://lore.kernel.org/all/20240522070435.773918-3-dev.jain@arm.com/
The test begins to fail on 4k and 16k pages, on non-LPA2 systems. To
reduce noise in the CI systems, let us skip the test when higher address
space is not implemented.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
The patch applies on linux-next.
tools/testing/selftests/mm/va_high_addr_switch.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/va_high_addr_switch.c b/tools/testing/selftests/mm/va_high_addr_switch.c
index fa7eabfaf841..c6040e1d6e53 100644
--- a/tools/testing/selftests/mm/va_high_addr_switch.c
+++ b/tools/testing/selftests/mm/va_high_addr_switch.c
@@ -293,6 +293,18 @@ static int run_test(struct testcase *test, int count)
return ret;
}
+/* Check if userspace VA > 48 bits */
+static int high_address_present(void)
+{
+ void *ptr = mmap((void *)(1UL << 50), 1, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+ if (ptr == MAP_FAILED)
+ return 0;
+
+ munmap(ptr, 1);
+ return 1;
+}
+
static int supported_arch(void)
{
#if defined(__powerpc64__)
@@ -300,7 +312,7 @@ static int supported_arch(void)
#elif defined(__x86_64__)
return 1;
#elif defined(__aarch64__)
- return 1;
+ return high_address_present();
#else
return 0;
#endif
--
2.34.1
Hi Linus,
Please pull the kselftest update for Linux 6.11-rc1.
This kselftest next update for Linux 6.11-rc1 consists of:
-- changes to resctrl test to cleanup resctrl_val() and
generalize it by removing test name specific handling
from the function.
-- several clang build failure fixes to framework and tests
-- adds tests to verify IFS (In Field Scan) driver functionality
-- cleanups to remove unused variables and document changes
Testing notes:
Passed on linux-next and linux-kselftest next branch:
- Build - make kselftest-all
- Run - make kselftest
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 256abd8e550ce977b728be79a74e1729438b4948:
Linux 6.10-rc7 (2024-07-07 14:23:46 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.11-rc1
for you to fetch changes up to bb408dae9e73803eab8a648115d6c4a1bca4dba3:
selftests: ifs: verify IFS ARRAY BIST functionality (2024-07-11 11:31:11 -0600)
----------------------------------------------------------------
linux_kselftest-next-6.11-rc1
This kselftest next update for Linux 6.11-rc1 consists of:
-- changes to resctrl test to cleanup resctrl_val() and
generalize it by removing test name specific handling
from the function.
-- several clang build failure fixes to framework and tests
-- adds tests to verify IFS (In Field Scan) driver functionality
-- cleanups to remove unused variables and document changes
----------------------------------------------------------------
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1) only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() & document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from write_bm_pid_to_resctrl()
John Hubbard (8):
selftests/lib.mk: silence some clang warnings that gcc already ignores
selftests/timers: remove unused irqcount variable
selftests/x86: fix Makefile dependencies to work with clang
selftests/x86: build fsgsbase_restore.c with clang
selftests/x86: build sysret_rip.c with clang
selftests/x86: avoid -no-pie warnings from clang during compilation
selftests/x86: remove (or use) unused variables and functions
selftests/x86: fix printk warnings reported by clang
Muhammad Usama Anjum (2):
selftests: Add information about TAP conformance in tests
selftests: x86: test_FISTTP: use fisttps instead of ambiguous fisttp
Pengfei Xu (4):
selftests: ifs: verify test interfaces are created by the driver
selftests: ifs: verify test image loading functionality
selftests: ifs: verify IFS scan test functionality
selftests: ifs: verify IFS ARRAY BIST functionality
Zhu Jun (2):
selftests/breakpoints:Remove unused variable
selftests/dma:remove unused variable
aigourensheng (1):
selftests/sched: fix code format issues
Documentation/dev-tools/kselftest.rst | 7 +
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../breakpoints/step_after_suspend_test.c | 1 -
tools/testing/selftests/dma/dma_map_benchmark.c | 1 -
.../drivers/platform/x86/intel/ifs/Makefile | 6 +
.../drivers/platform/x86/intel/ifs/test_ifs.sh | 494 +++++++++++++++++++++
tools/testing/selftests/lib.mk | 8 +
tools/testing/selftests/resctrl/cache.c | 10 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 +-
tools/testing/selftests/resctrl/resctrl_val.c | 371 +++++++---------
tools/testing/selftests/resctrl/resctrlfs.c | 67 ++-
tools/testing/selftests/sched/cs_prctl_test.c | 10 +-
tools/testing/selftests/timers/rtcpie.c | 3 +-
tools/testing/selftests/x86/Makefile | 31 +-
tools/testing/selftests/x86/amx.c | 16 -
tools/testing/selftests/x86/clang_helpers_32.S | 11 +
tools/testing/selftests/x86/clang_helpers_64.S | 28 ++
tools/testing/selftests/x86/fsgsbase.c | 6 -
tools/testing/selftests/x86/fsgsbase_restore.c | 11 +-
tools/testing/selftests/x86/sigreturn.c | 2 +-
tools/testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/sysret_rip.c | 20 +-
tools/testing/selftests/x86/test_FISTTP.c | 8 +-
tools/testing/selftests/x86/test_vsyscall.c | 15 +-
tools/testing/selftests/x86/vdso_restorer.c | 2 +
30 files changed, 901 insertions(+), 358 deletions(-)
create mode 100644 tools/testing/selftests/drivers/platform/x86/intel/ifs/Makefile
create mode 100755 tools/testing/selftests/drivers/platform/x86/intel/ifs/test_ifs.sh
create mode 100644 tools/testing/selftests/x86/clang_helpers_32.S
create mode 100644 tools/testing/selftests/x86/clang_helpers_64.S
----------------------------------------------------------------
In this series, 4 tests are being conformed to TAP.
Changes since v1:
- Correct the description of patches with what improvements they are
bringing and why they are required
Muhammad Usama Anjum (4):
selftests: x86: check_initial_reg_state: remove manual counting and
increase maintainability
selftests: x86: corrupt_xstate_header: remove manual counting and
increase maintainability
selftests: x86: fsgsbase_restore: remove manual counting and increase
maintainability
selftests: x86: entry_from_vm86: remove manual counting and increase
maintainability
.../selftests/x86/check_initial_reg_state.c | 24 ++--
.../selftests/x86/corrupt_xstate_header.c | 30 +++--
tools/testing/selftests/x86/entry_from_vm86.c | 109 ++++++++--------
.../testing/selftests/x86/fsgsbase_restore.c | 117 +++++++++---------
4 files changed, 139 insertions(+), 141 deletions(-)
--
2.39.2
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 1dbb4b87268fa..9ef3ad3789c17 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -77,14 +77,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 93b0ebf8cc38d..805e8c1892764 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -20,7 +20,7 @@ extern void *vdso_sym(const char *version, const char *name);
extern void vdso_init_from_sysinfo_ehdr(uintptr_t base);
extern void vdso_init_from_auxv(void *auxv);
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -36,6 +36,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -72,7 +86,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 1dbb4b87268fa..9ef3ad3789c17 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -77,14 +77,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 5ac4b00acfbcd..64c369fa43893 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -20,7 +20,7 @@ extern void *vdso_sym(const char *version, const char *name);
extern void vdso_init_from_sysinfo_ehdr(uintptr_t base);
extern void vdso_init_from_auxv(void *auxv);
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -36,6 +36,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -72,7 +86,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35b..4ae417372e9eb 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee17..27f6fdf119691 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35b..4ae417372e9eb 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee17..27f6fdf119691 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35b..4ae417372e9eb 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee17..27f6fdf119691 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35b..4ae417372e9eb 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee17..27f6fdf119691 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35b..4ae417372e9eb 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee17..27f6fdf119691 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
--
2.43.0
The opened file should be closed before exit, otherwise resource leak
will occur that this problem was discovered by reading code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
tools/testing/selftests/rtc/setdate.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/rtc/setdate.c b/tools/testing/selftests/rtc/setdate.c
index b303890b3de2..17a00affb0ec 100644
--- a/tools/testing/selftests/rtc/setdate.c
+++ b/tools/testing/selftests/rtc/setdate.c
@@ -65,6 +65,7 @@ int main(int argc, char **argv)
retval = ioctl(fd, RTC_RD_TIME, ¤t);
if (retval == -1) {
perror("RTC_RD_TIME ioctl");
+ close(fd);
exit(errno);
}
--
2.17.1
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v6: https://lore.kernel.org/r/20240706045454.215701-1-sboyd@kernel.org
* Fix kasan error in platform test by fixing the condition to check for
correct free callback
* Add module descriptions to new modules
Changes from v5: https://lore.kernel.org/r/20240603223811.3815762-1-sboyd@kernel.org
* Pick up reviewed-by tags
* Drop test vendor prefix bindings as dtschema allows anything now
* Use of_node_put_kunit() more to plug some reference leaks
* Select DTC config to avoid compile fails because of missing dtc
* Don't skip for OF_OVERLAY in overlay tests because they depend on it
Changes from v4: https://lore.kernel.org/r/20240422232404.213174-1-sboyd@kernel.org
* Picked up reviewed-by tags
* Check for non-NULL device pointers before calling put_device()
* Fix CFI issues with kunit actions
* Introduce platform_device_prepare_wait_for_probe() helper to wait for
a platform device to probe
* Move platform code to lib/kunit and rename functions to have kunit
prefix
* Fix issue with platform wrappers messing up reference counting
because they used kunit actions
* New patch to populate overlay devices on root node for powerpc
* Make fixed-rate binding generic single clk consumer binding
Changes from v3: https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2: https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1: https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (8):
of/platform: Allow overlays to create platform devices from the root
node
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 11 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 380 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit_helpers.c | 204 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 453 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 2 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit_helpers.c | 74 +++
drivers/of/overlay_test.c | 115 +++++
drivers/of/platform.c | 9 +-
include/kunit/clk.h | 28 ++
include/kunit/of.h | 115 +++++
include/kunit/platform_device.h | 20 +
lib/kunit/Makefile | 4 +-
lib/kunit/platform-test.c | 224 +++++++++
lib/kunit/platform.c | 302 ++++++++++++
28 files changed, 2087 insertions(+), 6 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit_helpers.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit_helpers.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
create mode 100644 lib/kunit/platform-test.c
create mode 100644 lib/kunit/platform.c
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- modifications that better address the root causes.
- only contains the first two patches for -net.
v2:
- add patch 2, a new fix for sk_msg_memcopy_from_iter.
- update patch 3, only test "sk->sk_prot->close" as Eric suggested.
- update patch 4, use "goto err" instead of "return" as Eduard
suggested.
- add "fixes" tag for patch 1-3.
- change subject prefixes as "bpf-next" to trigger BPF CI.
- cc Loongarch maintainers too.
BPF selftests seem to have not been fully tested on Loongarch. When I
ran these tests on Loongarch recently, some errors occur. This patch set
contains two bugfixes for skmsg.
Geliang Tang (2):
skmsg: prevent empty ingress skb from enqueuing
skmsg: bugfix for sk_msg sge iteration
net/core/skmsg.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--
2.43.0
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). The user must provide a shadow stack
address and size, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with a shadow stack token at the top of the
stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
Please further note that the token consumption done by clone3() is not
currently implemented in an atomic fashion, Rick indicated that he would
look into fixing this if people are OK with the implementation.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (9):
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Explicitly handle child exits due to signals
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 41 ++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 104 +++++++---
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 13 ++
include/uapi/linux/sched.h | 13 +-
kernel/fork.c | 76 ++++++--
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 225 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 40 +++-
tools/testing/selftests/ksft_shstk.h | 63 ++++++
15 files changed, 512 insertions(+), 88 deletions(-)
---
base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
`CStr` became a part of `core` library in Rust 1.75. This change replaces
the custom `CStr` implementation with the one from `core`.
no need to keep the custom implementation.
`core::CStr` behaves generally the same as the removed implementation,
with the following differences:
- It does not implement `Display` (but implements `Debug`). Therefore,
by switching to `core::CStr`, we lose the `Display` implementation.
- Lack of `Display` implementation impacted only rust/kernel/kunit.rs.
In this change, we use `Debug` format there. The only difference
between the removed `Display` output and `Debug` output are quotation
marks present in the latter (`foo` vs `"foo"`).
- It does not provide `from_bytes_with_nul_unchecked_mut` method.
- It was used only in `DerefMut` implementation for `CString`. This
change removes that implementation.
- Otherwise, having such a method is not desirable. The rule in Rust
std is that `str` is used only as an immutable reference (`&str`),
while mutating strings is done with the owned `String` type.
Similarly, we can introduce the rule that `CStr` should be used only
as an immutable reference (`&CStr`), while mutating is done only with
the owned `CString` type.
- It has `as_ptr()` method instead of `as_char_ptr()`, which also returns
`*const c_char`.
Signed-off-by: Michal Rostecki <vadorovsky(a)gmail.com>
---
v1 -> v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
rust/kernel/error.rs | 7 +-
rust/kernel/kunit.rs | 12 +-
rust/kernel/net/phy.rs | 2 +-
rust/kernel/prelude.rs | 4 +-
rust/kernel/str.rs | 486 ++----------------------------------
rust/kernel/sync/condvar.rs | 5 +-
rust/kernel/sync/lock.rs | 6 +-
rust/kernel/workqueue.rs | 2 +-
scripts/rustdoc_test_gen.rs | 4 +-
9 files changed, 44 insertions(+), 484 deletions(-)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 55280ae9fe40..18808b29604d 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -4,10 +4,11 @@
//!
//! C header: [`include/uapi/asm-generic/errno-base.h`](srctree/include/uapi/asm-generic/errno-base.h)
-use crate::{alloc::AllocError, str::CStr};
+use crate::alloc::AllocError;
use alloc::alloc::LayoutError;
+use core::ffi::CStr;
use core::fmt;
use core::num::TryFromIntError;
use core::str::Utf8Error;
@@ -142,7 +143,7 @@ pub fn name(&self) -> Option<&'static CStr> {
None
} else {
// SAFETY: The string returned by `errname` is static and `NUL`-terminated.
- Some(unsafe { CStr::from_char_ptr(ptr) })
+ Some(unsafe { CStr::from_ptr(ptr) })
}
}
@@ -164,7 +165,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
None => f.debug_tuple("Error").field(&-self.0).finish(),
// SAFETY: These strings are ASCII-only.
Some(name) => f
- .debug_tuple(unsafe { core::str::from_utf8_unchecked(name) })
+ .debug_tuple(unsafe { core::str::from_utf8_unchecked(name.to_bytes()) })
.finish(),
}
}
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 0ba77276ae7e..c08f9dddaa6f 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -56,9 +56,9 @@ macro_rules! kunit_assert {
break 'out;
}
- static FILE: &'static $crate::str::CStr = $crate::c_str!($file);
+ static FILE: &'static core::ffi::CStr = $file;
static LINE: i32 = core::line!() as i32 - $diff;
- static CONDITION: &'static $crate::str::CStr = $crate::c_str!(stringify!($condition));
+ static CONDITION: &'static core::ffi::CStr = $crate::c_str!(stringify!($condition));
// SAFETY: FFI call without safety requirements.
let kunit_test = unsafe { $crate::bindings::kunit_get_current_test() };
@@ -71,11 +71,11 @@ macro_rules! kunit_assert {
//
// This mimics KUnit's failed assertion format.
$crate::kunit::err(format_args!(
- " # {}: ASSERTION FAILED at {FILE}:{LINE}\n",
+ " # {:?}: ASSERTION FAILED at {FILE:?}:{LINE:?}\n",
$name
));
$crate::kunit::err(format_args!(
- " Expected {CONDITION} to be true, but is false\n"
+ " Expected {CONDITION:?} to be true, but is false\n"
));
$crate::kunit::err(format_args!(
" Failure not reported to KUnit since this is a non-KUnit task\n"
@@ -98,12 +98,12 @@ unsafe impl Sync for Location {}
unsafe impl Sync for UnaryAssert {}
static LOCATION: Location = Location($crate::bindings::kunit_loc {
- file: FILE.as_char_ptr(),
+ file: FILE.as_ptr(),
line: LINE,
});
static ASSERTION: UnaryAssert = UnaryAssert($crate::bindings::kunit_unary_assert {
assert: $crate::bindings::kunit_assert {},
- condition: CONDITION.as_char_ptr(),
+ condition: CONDITION.as_ptr(),
expected_true: true,
});
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index fd40b703d224..19f45922ec42 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -502,7 +502,7 @@ unsafe impl Sync for DriverVTable {}
pub const fn create_phy_driver<T: Driver>() -> DriverVTable {
// INVARIANT: All the fields of `struct phy_driver` are initialized properly.
DriverVTable(Opaque::new(bindings::phy_driver {
- name: T::NAME.as_char_ptr().cast_mut(),
+ name: T::NAME.as_ptr().cast_mut(),
flags: T::FLAGS,
phy_id: T::PHY_DEVICE_ID.id,
phy_id_mask: T::PHY_DEVICE_ID.mask_as_int(),
diff --git a/rust/kernel/prelude.rs b/rust/kernel/prelude.rs
index b37a0b3180fb..5efabfaa5804 100644
--- a/rust/kernel/prelude.rs
+++ b/rust/kernel/prelude.rs
@@ -12,7 +12,7 @@
//! ```
#[doc(no_inline)]
-pub use core::pin::Pin;
+pub use core::{ffi::CStr, pin::Pin};
pub use crate::alloc::{box_ext::BoxExt, flags::*, vec_ext::VecExt};
@@ -35,7 +35,7 @@
pub use super::error::{code::*, Error, Result};
-pub use super::{str::CStr, ThisModule};
+pub use super::{ThisModule};
pub use super::init::{InPlaceInit, Init, PinInit};
diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index bb8d4f41475b..e491a9803187 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -4,8 +4,9 @@
use crate::alloc::{flags::*, vec_ext::VecExt, AllocError};
use alloc::vec::Vec;
+use core::ffi::CStr;
use core::fmt::{self, Write};
-use core::ops::{self, Deref, DerefMut, Index};
+use core::ops::Deref;
use crate::error::{code::*, Error};
@@ -41,11 +42,11 @@ impl fmt::Display for BStr {
/// # use kernel::{fmt, b_str, str::{BStr, CString}};
/// let ascii = b_str!("Hello, BStr!");
/// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "Hello, BStr!".as_bytes());
+ /// assert_eq!(s.to_bytes(), "Hello, BStr!".as_bytes());
///
/// let non_ascii = b_str!("🦀");
/// let s = CString::try_from_fmt(fmt!("{}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\\xf0\\x9f\\xa6\\x80".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for &b in &self.0 {
@@ -72,11 +73,11 @@ impl fmt::Debug for BStr {
/// // Embedded double quotes are escaped.
/// let ascii = b_str!("Hello, \"BStr\"!");
/// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"Hello, \\\"BStr\\\"!\"".as_bytes());
///
/// let non_ascii = b_str!("😺");
/// let s = CString::try_from_fmt(fmt!("{:?}", non_ascii)).unwrap();
- /// assert_eq!(s.as_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
+ /// assert_eq!(s.to_bytes(), "\"\\xf0\\x9f\\x98\\xba\"".as_bytes());
/// ```
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_char('"')?;
@@ -128,392 +129,32 @@ macro_rules! b_str {
}};
}
-/// Possible errors when using conversion functions in [`CStr`].
-#[derive(Debug, Clone, Copy)]
-pub enum CStrConvertError {
- /// Supplied bytes contain an interior `NUL`.
- InteriorNul,
-
- /// Supplied bytes are not terminated by `NUL`.
- NotNulTerminated,
-}
-
-impl From<CStrConvertError> for Error {
- #[inline]
- fn from(_: CStrConvertError) -> Error {
- EINVAL
- }
-}
-
-/// A string that is guaranteed to have exactly one `NUL` byte, which is at the
-/// end.
-///
-/// Used for interoperability with kernel APIs that take C strings.
-#[repr(transparent)]
-pub struct CStr([u8]);
-
-impl CStr {
- /// Returns the length of this string excluding `NUL`.
- #[inline]
- pub const fn len(&self) -> usize {
- self.len_with_nul() - 1
- }
-
- /// Returns the length of this string with `NUL`.
- #[inline]
- pub const fn len_with_nul(&self) -> usize {
- // SAFETY: This is one of the invariant of `CStr`.
- // We add a `unreachable_unchecked` here to hint the optimizer that
- // the value returned from this function is non-zero.
- if self.0.is_empty() {
- unsafe { core::hint::unreachable_unchecked() };
- }
- self.0.len()
- }
-
- /// Returns `true` if the string only includes `NUL`.
- #[inline]
- pub const fn is_empty(&self) -> bool {
- self.len() == 0
- }
-
- /// Wraps a raw C string pointer.
- ///
- /// # Safety
- ///
- /// `ptr` must be a valid pointer to a `NUL`-terminated C string, and it must
- /// last at least `'a`. When `CStr` is alive, the memory pointed by `ptr`
- /// must not be mutated.
- #[inline]
- pub unsafe fn from_char_ptr<'a>(ptr: *const core::ffi::c_char) -> &'a Self {
- // SAFETY: The safety precondition guarantees `ptr` is a valid pointer
- // to a `NUL`-terminated C string.
- let len = unsafe { bindings::strlen(ptr) } + 1;
- // SAFETY: Lifetime guaranteed by the safety precondition.
- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) };
- // SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`.
- // As we have added 1 to `len`, the last byte is known to be `NUL`.
- unsafe { Self::from_bytes_with_nul_unchecked(bytes) }
- }
-
- /// Creates a [`CStr`] from a `[u8]`.
- ///
- /// The provided slice must be `NUL`-terminated, does not contain any
- /// interior `NUL` bytes.
- pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError> {
- if bytes.is_empty() {
- return Err(CStrConvertError::NotNulTerminated);
- }
- if bytes[bytes.len() - 1] != 0 {
- return Err(CStrConvertError::NotNulTerminated);
- }
- let mut i = 0;
- // `i + 1 < bytes.len()` allows LLVM to optimize away bounds checking,
- // while it couldn't optimize away bounds checks for `i < bytes.len() - 1`.
- while i + 1 < bytes.len() {
- if bytes[i] == 0 {
- return Err(CStrConvertError::InteriorNul);
- }
- i += 1;
- }
- // SAFETY: We just checked that all properties hold.
- Ok(unsafe { Self::from_bytes_with_nul_unchecked(bytes) })
- }
-
- /// Creates a [`CStr`] from a `[u8]` without performing any additional
- /// checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub const unsafe fn from_bytes_with_nul_unchecked(bytes: &[u8]) -> &CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { core::mem::transmute(bytes) }
- }
-
- /// Creates a mutable [`CStr`] from a `[u8]` without performing any
- /// additional checks.
- ///
- /// # Safety
- ///
- /// `bytes` *must* end with a `NUL` byte, and should only have a single
- /// `NUL` byte (or the string will be truncated).
- #[inline]
- pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
- // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
- unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
- }
-
- /// Returns a C pointer to the string.
- #[inline]
- pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
- self.0.as_ptr() as _
- }
-
- /// Convert the string to a byte slice without the trailing `NUL` byte.
- #[inline]
- pub fn as_bytes(&self) -> &[u8] {
- &self.0[..self.len()]
- }
-
- /// Convert the string to a byte slice containing the trailing `NUL` byte.
- #[inline]
- pub const fn as_bytes_with_nul(&self) -> &[u8] {
- &self.0
- }
-
- /// Yields a [`&str`] slice if the [`CStr`] contains valid UTF-8.
- ///
- /// If the contents of the [`CStr`] are valid UTF-8 data, this
- /// function will return the corresponding [`&str`] slice. Otherwise,
- /// it will return an error with details of where UTF-8 validation failed.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::str::CStr;
- /// let cstr = CStr::from_bytes_with_nul(b"foo\0").unwrap();
- /// assert_eq!(cstr.to_str(), Ok("foo"));
- /// ```
- #[inline]
- pub fn to_str(&self) -> Result<&str, core::str::Utf8Error> {
- core::str::from_utf8(self.as_bytes())
- }
-
- /// Unsafely convert this [`CStr`] into a [`&str`], without checking for
- /// valid UTF-8.
- ///
- /// # Safety
- ///
- /// The contents must be valid UTF-8.
- ///
- /// # Examples
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::str::CStr;
- /// let bar = c_str!("ツ");
- /// // SAFETY: String literals are guaranteed to be valid UTF-8
- /// // by the Rust compiler.
- /// assert_eq!(unsafe { bar.as_str_unchecked() }, "ツ");
- /// ```
- #[inline]
- pub unsafe fn as_str_unchecked(&self) -> &str {
- unsafe { core::str::from_utf8_unchecked(self.as_bytes()) }
- }
-
- /// Convert this [`CStr`] into a [`CString`] by allocating memory and
- /// copying over the string data.
- pub fn to_cstring(&self) -> Result<CString, AllocError> {
- CString::try_from(self)
- }
-
- /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new lowercased value without modifying the existing one, use
- /// [`to_ascii_lowercase()`].
- ///
- /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
- pub fn make_ascii_lowercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_lowercase();
- }
-
- /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To return a new uppercased value without modifying the existing one, use
- /// [`to_ascii_uppercase()`].
- ///
- /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
- pub fn make_ascii_uppercase(&mut self) {
- // INVARIANT: This doesn't introduce or remove NUL bytes in the C
- // string.
- self.0.make_ascii_uppercase();
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII lower case equivalent.
- ///
- /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To lowercase the value in-place, use [`make_ascii_lowercase`].
- ///
- /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
- pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_lowercase();
-
- Ok(s)
- }
-
- /// Returns a copy of this [`CString`] where each character is mapped to its
- /// ASCII upper case equivalent.
- ///
- /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
- /// but non-ASCII letters are unchanged.
- ///
- /// To uppercase the value in-place, use [`make_ascii_uppercase`].
- ///
- /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
- pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
- let mut s = self.to_cstring()?;
-
- s.make_ascii_uppercase();
-
- Ok(s)
- }
-}
-
-impl fmt::Display for CStr {
- /// Formats printable ASCII characters, escaping the rest.
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\\xf0\\x9f\\x90\\xa7\0".as_bytes());
- ///
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "so \"cool\"\0".as_bytes());
- /// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- for &c in self.as_bytes() {
- if (0x20..0x7f).contains(&c) {
- // Printable character.
- f.write_char(c as char)?;
- } else {
- write!(f, "\\x{:02x}", c)?;
- }
- }
- Ok(())
- }
-}
-
-impl fmt::Debug for CStr {
- /// Formats printable ASCII characters with a double quote on either end, escaping the rest.
- ///
- /// ```
- /// # use kernel::c_str;
- /// # use kernel::fmt;
- /// # use kernel::str::CStr;
- /// # use kernel::str::CString;
- /// let penguin = c_str!("🐧");
- /// let s = CString::try_from_fmt(fmt!("{:?}", penguin)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"\\xf0\\x9f\\x90\\xa7\"\0".as_bytes());
- ///
- /// // Embedded double quotes are escaped.
- /// let ascii = c_str!("so \"cool\"");
- /// let s = CString::try_from_fmt(fmt!("{:?}", ascii)).unwrap();
- /// assert_eq!(s.as_bytes_with_nul(), "\"so \\\"cool\\\"\"\0".as_bytes());
- /// ```
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- f.write_str("\"")?;
- for &c in self.as_bytes() {
- match c {
- // Printable characters.
- b'\"' => f.write_str("\\\"")?,
- 0x20..=0x7e => f.write_char(c as char)?,
- _ => write!(f, "\\x{:02x}", c)?,
- }
- }
- f.write_str("\"")
- }
-}
-
-impl AsRef<BStr> for CStr {
- #[inline]
- fn as_ref(&self) -> &BStr {
- BStr::from_bytes(self.as_bytes())
- }
-}
-
-impl Deref for CStr {
- type Target = BStr;
-
- #[inline]
- fn deref(&self) -> &Self::Target {
- self.as_ref()
- }
-}
-
-impl Index<ops::RangeFrom<usize>> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, index: ops::RangeFrom<usize>) -> &Self::Output {
- // Delegate bounds checking to slice.
- // Assign to _ to mute clippy's unnecessary operation warning.
- let _ = &self.as_bytes()[index.start..];
- // SAFETY: We just checked the bounds.
- unsafe { Self::from_bytes_with_nul_unchecked(&self.0[index.start..]) }
- }
-}
-
-impl Index<ops::RangeFull> for CStr {
- type Output = CStr;
-
- #[inline]
- fn index(&self, _index: ops::RangeFull) -> &Self::Output {
- self
- }
-}
-
-mod private {
- use core::ops;
-
- // Marker trait for index types that can be forward to `BStr`.
- pub trait CStrIndex {}
-
- impl CStrIndex for usize {}
- impl CStrIndex for ops::Range<usize> {}
- impl CStrIndex for ops::RangeInclusive<usize> {}
- impl CStrIndex for ops::RangeToInclusive<usize> {}
-}
-
-impl<Idx> Index<Idx> for CStr
-where
- Idx: private::CStrIndex,
- BStr: Index<Idx>,
-{
- type Output = <BStr as Index<Idx>>::Output;
-
- #[inline]
- fn index(&self, index: Idx) -> &Self::Output {
- &self.as_ref()[index]
- }
-}
-
/// Creates a new [`CStr`] from a string literal.
///
-/// The string literal should not contain any `NUL` bytes.
+/// Usually, defining C-string literals directly should be preffered, but this
+/// macro is helpful in situations when C-string literals are hard or
+/// impossible to use, for example:
+///
+/// - When working with macros, which already return a Rust string literal
+/// (e.g. `stringify!`).
+/// - When building macros, where we want to take a Rust string literal as an
+/// argument (for caller's convenience), but still use it as a C-string
+/// internally.
+///
+/// The string should not contain any `NUL` bytes.
///
/// # Examples
///
/// ```
+/// # use core::ffi::CStr;
/// # use kernel::c_str;
-/// # use kernel::str::CStr;
-/// const MY_CSTR: &CStr = c_str!("My awesome CStr!");
+/// const MY_CSTR: &CStr = c_str!(stringify!(5));
/// ```
#[macro_export]
macro_rules! c_str {
($str:expr) => {{
const S: &str = concat!($str, "\0");
- const C: &$crate::str::CStr = match $crate::str::CStr::from_bytes_with_nul(S.as_bytes()) {
+ const C: &core::ffi::CStr = match core::ffi::CStr::from_bytes_with_nul(S.as_bytes()) {
Ok(v) => v,
Err(_) => panic!("string contains interior NUL"),
};
@@ -526,79 +167,6 @@ mod tests {
use super::*;
use alloc::format;
- const ALL_ASCII_CHARS: &'static str =
- "\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09\\x0a\\x0b\\x0c\\x0d\\x0e\\x0f\
- \\x10\\x11\\x12\\x13\\x14\\x15\\x16\\x17\\x18\\x19\\x1a\\x1b\\x1c\\x1d\\x1e\\x1f \
- !\"#$%&'()*+,-./0123456789:;<=>?@\
- ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\x7f\
- \\x80\\x81\\x82\\x83\\x84\\x85\\x86\\x87\\x88\\x89\\x8a\\x8b\\x8c\\x8d\\x8e\\x8f\
- \\x90\\x91\\x92\\x93\\x94\\x95\\x96\\x97\\x98\\x99\\x9a\\x9b\\x9c\\x9d\\x9e\\x9f\
- \\xa0\\xa1\\xa2\\xa3\\xa4\\xa5\\xa6\\xa7\\xa8\\xa9\\xaa\\xab\\xac\\xad\\xae\\xaf\
- \\xb0\\xb1\\xb2\\xb3\\xb4\\xb5\\xb6\\xb7\\xb8\\xb9\\xba\\xbb\\xbc\\xbd\\xbe\\xbf\
- \\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\
- \\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd7\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde\\xdf\
- \\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\
- \\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf7\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff";
-
- #[test]
- fn test_cstr_to_str() {
- let good_bytes = b"\xf0\x9f\xa6\x80\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let checked_str = checked_cstr.to_str().unwrap();
- assert_eq!(checked_str, "🦀");
- }
-
- #[test]
- #[should_panic]
- fn test_cstr_to_str_panic() {
- let bad_bytes = b"\xc3\x28\0";
- let checked_cstr = CStr::from_bytes_with_nul(bad_bytes).unwrap();
- checked_cstr.to_str().unwrap();
- }
-
- #[test]
- fn test_cstr_as_str_unchecked() {
- let good_bytes = b"\xf0\x9f\x90\xA7\0";
- let checked_cstr = CStr::from_bytes_with_nul(good_bytes).unwrap();
- let unchecked_str = unsafe { checked_cstr.as_str_unchecked() };
- assert_eq!(unchecked_str, "🐧");
- }
-
- #[test]
- fn test_cstr_display() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{}", hello_world), "hello, world!");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{}", non_printables), "\\x01\\x09\\x0a");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{}", non_ascii), "d\\xe9j\\xe0 vu");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{}", good_bytes), "\\xf0\\x9f\\xa6\\x80");
- }
-
- #[test]
- fn test_cstr_display_all_bytes() {
- let mut bytes: [u8; 256] = [0; 256];
- // fill `bytes` with [1..=255] + [0]
- for i in u8::MIN..=u8::MAX {
- bytes[i as usize] = i.wrapping_add(1);
- }
- let cstr = CStr::from_bytes_with_nul(&bytes).unwrap();
- assert_eq!(format!("{}", cstr), ALL_ASCII_CHARS);
- }
-
- #[test]
- fn test_cstr_debug() {
- let hello_world = CStr::from_bytes_with_nul(b"hello, world!\0").unwrap();
- assert_eq!(format!("{:?}", hello_world), "\"hello, world!\"");
- let non_printables = CStr::from_bytes_with_nul(b"\x01\x09\x0a\0").unwrap();
- assert_eq!(format!("{:?}", non_printables), "\"\\x01\\x09\\x0a\"");
- let non_ascii = CStr::from_bytes_with_nul(b"d\xe9j\xe0 vu\0").unwrap();
- assert_eq!(format!("{:?}", non_ascii), "\"d\\xe9j\\xe0 vu\"");
- let good_bytes = CStr::from_bytes_with_nul(b"\xf0\x9f\xa6\x80\0").unwrap();
- assert_eq!(format!("{:?}", good_bytes), "\"\\xf0\\x9f\\xa6\\x80\"");
- }
-
#[test]
fn test_bstr_display() {
let hello_world = BStr::from_bytes(b"hello, world!");
@@ -779,11 +347,11 @@ fn write_str(&mut self, s: &str) -> fmt::Result {
/// use kernel::{str::CString, fmt};
///
/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "abc1020\0".as_bytes());
///
/// let tmp = "testing";
/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
-/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
+/// assert_eq!(s.to_bytes_with_nul(), "testing123\0".as_bytes());
///
/// // This fails because it has an embedded `NUL` byte.
/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
@@ -838,21 +406,13 @@ fn deref(&self) -> &Self::Target {
}
}
-impl DerefMut for CString {
- fn deref_mut(&mut self) -> &mut Self::Target {
- // SAFETY: A `CString` is always NUL-terminated and contains no other
- // NUL bytes.
- unsafe { CStr::from_bytes_with_nul_unchecked_mut(self.buf.as_mut_slice()) }
- }
-}
-
impl<'a> TryFrom<&'a CStr> for CString {
type Error = AllocError;
fn try_from(cstr: &'a CStr) -> Result<CString, AllocError> {
let mut buf = Vec::new();
- <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.as_bytes_with_nul(), GFP_KERNEL)
+ <Vec<_> as VecExt<_>>::extend_from_slice(&mut buf, cstr.to_bytes_with_nul(), GFP_KERNEL)
.map_err(|_| AllocError)?;
// INVARIANT: The `CStr` and `CString` types have the same invariants for
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
index 2b306afbe56d..16d1a1cb8d00 100644
--- a/rust/kernel/sync/condvar.rs
+++ b/rust/kernel/sync/condvar.rs
@@ -9,12 +9,11 @@
use crate::{
init::PinInit,
pin_init,
- str::CStr,
task::{MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE, TASK_NORMAL, TASK_UNINTERRUPTIBLE},
time::Jiffies,
types::Opaque,
};
-use core::ffi::{c_int, c_long};
+use core::ffi::{c_int, c_long, CStr};
use core::marker::PhantomPinned;
use core::ptr;
use macros::pin_data;
@@ -108,7 +107,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
wait_queue_head <- Opaque::ffi_init(|slot| unsafe {
- bindings::__init_waitqueue_head(slot, name.as_char_ptr(), key.as_ptr())
+ bindings::__init_waitqueue_head(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index f6c34ca4d819..318ecb5a5916 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -6,8 +6,8 @@
//! spinlocks, raw spinlocks) to be provided with minimal effort.
use super::LockClassKey;
-use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
-use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
+use crate::{init::PinInit, pin_init, types::Opaque, types::ScopeGuard};
+use core::{cell::UnsafeCell, ffi::CStr, marker::PhantomData, marker::PhantomPinned};
use macros::pin_data;
pub mod mutex;
@@ -113,7 +113,7 @@ pub fn new(t: T, name: &'static CStr, key: &'static LockClassKey) -> impl PinIni
// SAFETY: `slot` is valid while the closure is called and both `name` and `key` have
// static lifetimes so they live indefinitely.
state <- Opaque::ffi_init(|slot| unsafe {
- B::init(slot, name.as_char_ptr(), key.as_ptr())
+ B::init(slot, name.as_ptr(), key.as_ptr())
}),
})
}
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 553a5cba2adc..a6418873e82e 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -380,7 +380,7 @@ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self
slot,
Some(T::Pointer::run),
false,
- name.as_char_ptr(),
+ name.as_ptr(),
key.as_ptr(),
)
}
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 5ebd42ae4a3f..339991ee6885 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -172,7 +172,7 @@ pub extern "C" fn {kunit_name}(__kunit_test: *mut kernel::bindings::kunit) {{
#[allow(unused)]
macro_rules! assert {{
($cond:expr $(,)?) => {{{{
- kernel::kunit_assert!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
+ kernel::kunit_assert!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $cond);
}}}}
}}
@@ -180,7 +180,7 @@ macro_rules! assert {{
#[allow(unused)]
macro_rules! assert_eq {{
($left:expr, $right:expr $(,)?) => {{{{
- kernel::kunit_assert_eq!("{kunit_name}", "{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
+ kernel::kunit_assert_eq!(c"{kunit_name}", c"{real_path}", __DOCTEST_ANCHOR - {line}, $left, $right);
}}}}
}}
--
2.45.2
Hello everyone,
this small series is a first step in a larger effort aiming to help improve
eBPF selftests and the testing coverage in CI. It focuses for now on
test_xdp_veth.sh, a small test which is not integrated yet in test_progs.
The series is mostly about a rewrite of test_xdp_veth.sh to make it able to
run under test_progs, relying on libbpf to manipulate bpf programs involved
in the test.
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
---
Changes in v2:
- fix many formatting issues raised by checkpatch
- use static namespaces instead of random ones
- use SYS_NOFAIL instead of snprintf() + system ()
- squashed the new test addition patch and the old test removal patch
- Link to v1: https://lore.kernel.org/r/20240711-convert_test_xdp_veth-v1-0-868accb0a727@…
---
Alexis Lothoré (eBPF Foundation) (2):
selftests/bpf: update xdp_redirect_map prog sections for libbpf
selftests/bpf: integrate test_xdp_veth into test_progs
tools/testing/selftests/bpf/Makefile | 1 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 211 +++++++++++++++++++++
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 6 +-
tools/testing/selftests/bpf/test_xdp_veth.sh | 121 ------------
4 files changed, 214 insertions(+), 125 deletions(-)
---
base-commit: 4837cbaa1365cdb213b58577197c5b10f6e2aa81
change-id: 20240710-convert_test_xdp_veth-04cc05f5557d
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
v2:
- Fix test_cpuset_prs.sh problems reported by test robot
- Relax restriction imposed between cpuset.cpus.exclusive and
cpuset.cpus of sibling cpusets.
- Make cpuset.cpus.exclusive independent of cpuset.cpus.
- Update test_cpuset_prs.sh accordingly.
[v1] https://lore.kernel.org/lkml/20240605171858.1323464-1-longman@redhat.com/
This patchset attempts to address the following cpuset issues.
1) While reviewing the generate_sched_domains() function, I found a bug
in generating sched domains for remote non-isolating partitions.
2) Test robot had reported a test_cpuset_prs.sh test failure.
3) The current exclusivity test between cpuset.cpus.exclusive and
cpuset.cpus and the restriction that the set effective exclusive
CPUs has to be a subset of cpuset.cpus make it harder to preconfigure
the cgroup hierarchy to enable remote partition.
The test_cpuset_prs.sh script is updated to match changes made in this
patchset and was run to verify that the new code did not cause any
regression.
Waiman Long (5):
cgroup/cpuset: Fix remote root partition creation problem
selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test
robot
cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
selftest/cgroup: Update test_cpuset_prs.sh to match changes
Documentation/admin-guide/cgroup-v2.rst | 12 +-
kernel/cgroup/cpuset.c | 158 +++++++++++++-----
.../selftests/cgroup/test_cpuset_prs.sh | 75 ++++++---
3 files changed, 180 insertions(+), 65 deletions(-)
--
2.39.3
On ARM64 the stack pointer should be aligned at a 16 byte boundary or
the SPAlignmentFault can occur. The fexit_sleep selftest allocates the
stack for the child process as a character array, this is not guaranteed
to be aligned at 16 bytes.
Because of the SPAlignmentFault, the child process is killed before it
can do the nanosleep call and hence fentry_cnt remains as 0. This causes
the main thread to hang on the following line:
while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
Fix this by allocating the stack using mmap() as described in the
example in the man page of clone().
Remove the fexit_sleep test from the DENYLIST of arm64.
Signed-off-by: Puranjay Mohan <puranjay(a)kernel.org>
---
tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 -
tools/testing/selftests/bpf/prog_tests/fexit_sleep.c | 8 +++++++-
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 3c7c3e79aa931..901349da680fa 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,6 +1,5 @@
bpf_cookie/multi_kprobe_attach_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
bpf_cookie/multi_kprobe_link_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
-fexit_sleep # The test never returns. The remaining tests cannot start.
kprobe_multi_bench_attach # needs CONFIG_FPROBE
kprobe_multi_test # needs CONFIG_FPROBE
module_attach # prog 'kprobe_multi': failed to auto-attach: -95
diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_sleep.c b/tools/testing/selftests/bpf/prog_tests/fexit_sleep.c
index f949647dbbc21..552a0875ca6db 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_sleep.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_sleep.c
@@ -21,13 +21,13 @@ static int do_sleep(void *skel)
}
#define STACK_SIZE (1024 * 1024)
-static char child_stack[STACK_SIZE];
void test_fexit_sleep(void)
{
struct fexit_sleep_lskel *fexit_skel = NULL;
int wstatus, duration = 0;
pid_t cpid;
+ char *child_stack = NULL;
int err, fexit_cnt;
fexit_skel = fexit_sleep_lskel__open_and_load();
@@ -38,6 +38,11 @@ void test_fexit_sleep(void)
if (CHECK(err, "fexit_attach", "fexit attach failed: %d\n", err))
goto cleanup;
+ child_stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+ MAP_ANONYMOUS | MAP_STACK, -1, 0);
+ if (!ASSERT_NEQ(child_stack, MAP_FAILED, "mmap"))
+ goto cleanup;
+
cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);
if (CHECK(cpid == -1, "clone", "%s\n", strerror(errno)))
goto cleanup;
@@ -78,5 +83,6 @@ void test_fexit_sleep(void)
goto cleanup;
cleanup:
+ munmap(child_stack, STACK_SIZE);
fexit_sleep_lskel__destroy(fexit_skel);
}
--
2.40.1
It looks like we missed these two errors recently:
- SC2068: Double quote array expansions to avoid re-splitting elements.
- SC2145: Argument mixes string and array. Use * or separate argument.
Two simple fixes, it is not supposed to change the behaviour as the
variable names should not have any spaces in their names. Still, better
to fix them to easily spot new issues.
Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- The mentioned commit is currently only in 'net-next', not in 'net'.
---
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
index 194c8fc2e55a..438280e68434 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
@@ -428,8 +428,8 @@ mptcp_lib_check_tools() {
}
mptcp_lib_ns_init() {
- if ! setup_ns ${@}; then
- mptcp_lib_pr_fail "Failed to setup namespace ${@}"
+ if ! setup_ns "${@}"; then
+ mptcp_lib_pr_fail "Failed to setup namespaces ${*}"
exit ${KSFT_FAIL}
fi
---
base-commit: 2146b7dd354c2a1384381ca3cd5751bfff6137d6
change-id: 20240712-upstream-net-next-20240712-selftests-mptcp-fix-shellcheck-6f17e65c6c1b
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Hello everyone,
this small series is a first step in a larger effort aiming to help improve
eBPF selftests and the testing coverage in CI. It focuses for now on
test_xdp_veth.sh, a small test which is not integrated yet in test_progs.
The series is mostly about a rewrite of test_xdp_veth.sh to make it able to
run under test_progs, relying on libbpf to manipulate bpf programs involved
in the test.
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (3):
selftests/bpf: update xdp_redirect_map prog sections for libbpf
selftests/bpf: integrate test_xdp_veth into test_progs
bpf/selftests: drop old version of test_xdp_veth.sh
tools/testing/selftests/bpf/Makefile | 1 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 234 +++++++++++++++++++++
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 6 +-
tools/testing/selftests/bpf/test_xdp_veth.sh | 121 -----------
4 files changed, 237 insertions(+), 125 deletions(-)
---
base-commit: 4837cbaa1365cdb213b58577197c5b10f6e2aa81
change-id: 20240710-convert_test_xdp_veth-04cc05f5557d
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
The requested resources should be closed before return in main(), otherwise
resource leak will occur. Add a check of cg_fd before close().
Fixes: 435f90a338ae ("selftests/bpf: add a test case for sock_ops perf-event notification")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
tools/testing/selftests/bpf/test_tcpnotify_user.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tcpnotify_user.c b/tools/testing/selftests/bpf/test_tcpnotify_user.c
index 595194453ff8..f81c60db586e 100644
--- a/tools/testing/selftests/bpf/test_tcpnotify_user.c
+++ b/tools/testing/selftests/bpf/test_tcpnotify_user.c
@@ -161,7 +161,8 @@ int main(int argc, char **argv)
error = 0;
err:
bpf_prog_detach(cg_fd, BPF_CGROUP_SOCK_OPS);
- close(cg_fd);
+ if (cg_fd >= 0)
+ close(cg_fd);
cleanup_cgroup_environment();
perf_buffer__free(pb);
return error;
--
2.25.1
Log errors are the most widely used mechanism for reporting issues in
the kernel. When an error is logged using the device helpers, eg
dev_err(), it gets metadata attached that identifies the subsystem and
device where the message is coming from. This series makes use of that
metadata in a new test to report which devices logged errors.
The first two patches move a test and a helper script to keep things
organized before this new test is added in the third patch.
It is expected that there might be many false-positive error messages
throughout the drivers code which will be reported by this test. By
having this test in the first place and working through the results we
can address those occurrences by adjusting the loglevel of the messages
that turn out to not be real errors that require the user's attention.
It will also motivate additional error messages to be introduced in the
code to detect real errors where they turn out to be missing, since
it will be possible to detect said issues automatically.
As an example, below you can see the test result for
mt8192-asurada-spherion. The single standing issue has been investigated
and will be addressed in an EC firmware update [1]:
TAP version 13
1..1
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `model_name' property: -6
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `energy_full_design' property: -6
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
power_supply sbs-8-000b: driver failed to report `time_to_empty_now' property: -5
not ok 1 +power_supply:sbs-8-000b
Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
[1] https://lore.kernel.org/all/cf4d8131-4b63-4c7a-9f27-5a0847c656c4@notapiano
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Changes in v2:
- Rebased onto next-20240703
- Link to v1: https://lore.kernel.org/r/20240423-dev-err-log-selftest-v1-0-690c1741d68b@c…
---
Nícolas F. R. A. Prado (3):
kselftest: devices: Move discoverable devices test to subdirectory
kselftest: Move ksft helper module to common directory
kselftest: devices: Add test to detect device error logs
tools/testing/selftests/Makefile | 4 +-
tools/testing/selftests/devices/Makefile | 4 -
.../testing/selftests/devices/error_logs/Makefile | 3 +
.../devices/error_logs/test_device_error_logs.py | 85 ++++++++++++++++++++++
tools/testing/selftests/devices/probe/Makefile | 4 +
.../{ => probe}/boards/Dell Inc.,XPS 13 9300.yaml | 0
.../{ => probe}/boards/google,spherion.yaml | 0
.../{ => probe}/test_discoverable_devices.py | 7 +-
.../selftests/{devices => kselftest}/ksft.py | 0
9 files changed, 101 insertions(+), 6 deletions(-)
---
base-commit: 0b58e108042b0ed28a71cd7edf5175999955b233
change-id: 20240421-dev-err-log-selftest-28f5b8fc7cd0
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
From: Allison Henderson <allison.henderson(a)oracle.com>
Hi All,
This series is a new selftest that Vegard, Chuck and myself have been
working on to provide some test coverage for rds. I've made quite a few
updates since the rfc sent a few weeks ago:
I've added several knobs to the script to tune network turbulance, and
documented their usage in the README.txt. By default these options
are left off.
Added an extra flag to specify log location
I've also added a flag to the config.sh to skip gcov configurations if
the coverage report is not desired. run.sh has been adapted to skip
the report if the required configs are not present, or if the required
packages are not available
A time out has been added to prevent the test from hanging
indefinitely
The previous gcov issues have been resolved with an appropriate gcov
patch, as well as some extra logic to detect incompatible gcov and gcc
versions.
The shellcheck nits reported in the last review have been addressed
In order to return an appropriate exit code, the run.sh script has
been adapted to analyze the test.py strace, and determine if the test
passed, failed or timed out.
RDS specific GCOV configs have been documented under
Documentation/dev-tools/gcov.rst
Questions and comments appreciated. Thanks everyone!
Allison
Vegard Nossum (3):
.gitignore: add .gcda files
net: rds: add option for GCOV profiling
selftests: rds: add testing infrastructure
.gitignore | 1 +
Documentation/dev-tools/gcov.rst | 11 +
MAINTAINERS | 1 +
net/rds/Kconfig | 9 +
net/rds/Makefile | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/rds/Makefile | 13 +
tools/testing/selftests/net/rds/README.txt | 41 ++++
tools/testing/selftests/net/rds/config.sh | 56 +++++
tools/testing/selftests/net/rds/init.sh | 69 ++++++
tools/testing/selftests/net/rds/run.sh | 271 +++++++++++++++++++++
tools/testing/selftests/net/rds/test.py | 251 +++++++++++++++++++
12 files changed, 729 insertions(+)
create mode 100644 tools/testing/selftests/net/rds/Makefile
create mode 100644 tools/testing/selftests/net/rds/README.txt
create mode 100755 tools/testing/selftests/net/rds/config.sh
create mode 100755 tools/testing/selftests/net/rds/init.sh
create mode 100755 tools/testing/selftests/net/rds/run.sh
create mode 100644 tools/testing/selftests/net/rds/test.py
--
2.25.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- update patch 2 as Martin suggested.
This is the 9th part of series "use network helpers" all BPF selftests
wide.
Patches 1-2 update network helpers interfaces suggested by Martin.
Patch 3 adds a new helper connect_to_addr_str() as Martin suggested
instead of adding connect_fd_to_addr_str().
Patch 4 uses this newly added helper in make_client().
Patch 5 uses make_client() in sk_lookup and drop make_socket().
Geliang Tang (5):
selftests/bpf: Drop type of connect_to_fd_opts
selftests/bpf: Drop must_fail from network_helper_opts
selftests/bpf: Add connect_to_addr_str helper
selftests/bpf: Use connect_to_addr_str in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 67 +++++++--------
tools/testing/selftests/bpf/network_helpers.h | 5 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 12 +--
.../selftests/bpf/prog_tests/sk_lookup.c | 84 ++++---------------
5 files changed, 53 insertions(+), 117 deletions(-)
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- patch 1, only commit log updated
- update patch 2
- add an unsigned variable
- use "switch-case"
- only "Operation not supported", no (-524) in string
- patch 3, a now one
This patchset contains three fixes for handling errno ENOTSUPP.
Patch 1 fixes the return value of fixup_call_args() to make sure
ENOTSUPP is returned to user space correctly.
Patch 2 handles ENOTSUPP in libbpf_strerror_r() in libbpf.
Patch 3 includes str_error.h in BPF selftests, and drop duplicate
ENOTSUPP definitions.
Geliang Tang (3):
bpf: verifier: Fix return value of fixup_call_args
libbpf: handle ENOTSUPP in libbpf_strerror_r
selftests/bpf: Drop duplicate ENOTSUPP definitions
kernel/bpf/verifier.c | 6 +++---
tools/lib/bpf/str_error.c | 18 +++++++++++++-----
tools/lib/bpf/str_error.h | 4 ++++
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ----
.../selftests/bpf/prog_tests/lsm_cgroup.c | 4 ----
.../selftests/bpf/prog_tests/sock_addr.c | 4 ----
tools/testing/selftests/bpf/test_maps.c | 4 ----
tools/testing/selftests/bpf/test_verifier.c | 4 ----
tools/testing/selftests/bpf/testing_helpers.h | 1 +
9 files changed, 21 insertions(+), 28 deletions(-)
--
2.43.0
In this series, 4 tests are being conformed to TAP.
Changes since v1:
- Correct the description of patches with what improvements they are
bringing and why they are required
Changes since v2:
- Correct the subject of series
Muhammad Usama Anjum (4):
selftests: x86: check_initial_reg_state: remove manual counting and
increase maintainability
selftests: x86: corrupt_xstate_header: remove manual counting and
increase maintainability
selftests: x86: fsgsbase_restore: remove manual counting and increase
maintainability
selftests: x86: entry_from_vm86: remove manual counting and increase
maintainability
.../selftests/x86/check_initial_reg_state.c | 24 ++--
.../selftests/x86/corrupt_xstate_header.c | 30 +++--
tools/testing/selftests/x86/entry_from_vm86.c | 109 ++++++++--------
.../testing/selftests/x86/fsgsbase_restore.c | 117 +++++++++---------
4 files changed, 139 insertions(+), 141 deletions(-)
--
2.39.2
Hello,
KernelCI is hosting a bi-weekly call on Thursday to discuss improvements to
existing upstream tests, the development of new tests to increase kernel
testing coverage, and the enablement of these tests in KernelCI. In recent
months, we at Collabora have focused on various kernel areas, assessing the
tests already available upstream and contributing patches to make them
easily runnable in CIs.
Below is a list of the tests we've been working on and their latest status
updates, as discussed in the last meeting held on 2024-07-11:
*USB/PCI devices kselftest*
- Upstream test to detect unprobed devices on discoverable buses:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
- Updated KernelCI PRs according to feedback, now waiting for the first
test results: https://github.com/kernelci/kernelci-core/pull/2577 and
https://github.com/kernelci/kernelci-pipeline/pull/642
*Error log test*
- Proposing new kselftest to report device log errors:
https://lore.kernel.org/all/20240423-dev-err-log-selftest-v1-0-690c1741d68b…
- Series got Acked-By from Greg, going to be picked up by Shuah soon
- Feedback from Tim Bird: this series follows an unusual model where
tests can only fail but never pass, as no test case is generated unless
there is an error. It takes an unusual approach to detect regressions and
fixes. The autogenerated test case names are not very descriptive.
*Suspend/resume in cpufreq kselftest*
- Enabling suspend/resume test within the cpufreq kselftest in KernelCI -
- Sent patch upstream for adding RTC wakeup alarm in the cpufreq
kselftest:
https://lore.kernel.org/all/2e667d-668ff800-1-22d70300@133606496/
- Received a review from Rafael J. Wysocki, who suggested using the
rtcwake utility instead of the sysfs entry
*Boot time test*
- Drafted initial implementation with two scripts, a config fragment and
a bootconfig file
- One script generates a YAML file containing initial timestamps for
relevant boot events, parsed from the trace file (run once)
- The other script is the actual test, which takes the generated YAML
file and a delta in seconds as arguments. The script then parses the
current trace file and checks if any timestamp deviates from the
reference timestamps in the YAML file by more than the specified delta.
- Tracking only a few functions at the moment (populate_rootfs,
unpack_to_rootfs, run_init_process). Next steps: refine bootconfig file
to include more tracepoints (potentially initcalls too?). Useful
tracepoints should be discussed upstream.
- Will present this at LPC 2024 (embedded and IoT MC)
*Support for benchmark data in KTAP*
- Tim Bird is working on adding performance data to KTAP output, which
can be used in tests to detect slowdowns
- The idea is to keep reference values and criteria separate from the
test itself
- There is a need to store per-platform files with previous times for
comparison
- Will need to figure out where these files can be stored so they can be
shared and used by different people and systems. Potential options: KCIDB
or https://github.com/kernelci/platform-test-parameters
- Submitted a proposal for LPC 2024
- Other related topics for discussion at LPC 2024 include: how to avoid
device tree overhead in the boot process and boot phases (time-critical
vs non-critical)
*TAP conformance in kselftests*
- Focusing on standardizing the way kernel's testing modules report
results
- Discussion ongoing upstream over patches converting tests to TAP:
https://lore.kernel.org/all/fb305513-580a-4bac-a078-fe0170a6ffa2@linuxfound…
and
https://lore.kernel.org/all/6d82fa16-ed2e-41f1-a466-c752032b6f68@linuxfound…
Please reply to this thread if you'd like to join the call or discuss any
of the topics further. We look forward to collaborating with the community
to improve upstream tests and expand coverage to more areas of interest
within the kernel.
Best regards,
Laura Nao
This series let kunit macro more neat and clear.
Fix comment and rename the macro.
Also introduce new type of assertion marco for functionality.
This is a follow-up to [0](v1).
v1 -> v2: [PATCH 2/3] changed KUNIT_ASSERT to KUNIT_FAIL_AND_ABORT
[0] https://lore.kernel.org/lkml/20240710170448.1399967-1-ericchancf@google.com/
Eric Chan (3):
kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion
kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT for
readability
kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros
drivers/input/tests/input_test.c | 2 +-
include/kunit/assert.h | 2 +-
include/kunit/test.h | 71 ++++++++++++++++++++++++++++++--
3 files changed, 70 insertions(+), 5 deletions(-)
--
2.45.2.993.g49e7a77208-goog
'%u' in format string requires 'unsigned int' in __wait_for_test()
but the argument type is 'signed int' that this problem was discovered
by reading code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
Changes in v2:
- modify commit info add how to find the problem in the log
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index b634969cbb6f..dbbbcc6c04ee 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -1084,7 +1084,7 @@ void __wait_for_test(struct __test_metadata *t)
}
} else {
fprintf(TH_LOG_STREAM,
- "# %s: Test ended in some other way [%u]\n",
+ "# %s: Test ended in some other way [%d]\n",
t->name,
status);
}
--
2.17.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This is the 9th part of series "use network helpers" all BPF selftests
wide.
Patches 1-2 update network helpers interfaces suggested by Martin.
Patch 3 adds a new helper connect_to_addr_str() as Martin suggested
instead of adding connect_fd_to_addr_str().
Patch 4 uses this newly added helper in make_client().
Patch 5 uses make_client() in sk_lookup and drop make_socket().
Geliang Tang (5):
selftests/bpf: Drop type of connect_to_fd_opts
selftests/bpf: Drop must_fail from network_helper_opts
selftests/bpf: Add connect_to_addr_str helper
selftests/bpf: Use connect_to_addr_str in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 67 +++++++--------
tools/testing/selftests/bpf/network_helpers.h | 5 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 10 +--
.../selftests/bpf/prog_tests/sk_lookup.c | 84 ++++---------------
5 files changed, 53 insertions(+), 115 deletions(-)
--
2.43.0
v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=*
====
v15 got a thorough review and some testing, and this version addresses almost
all the feedback. Some more minor comments where the authors said it
could be done later, I left out.
Major changes:
- Addition of dma-buf introspection to page-pool-get and queue-get.
- Fixes to selftests suggested by Taehee.
- Fixes to documentation suggested by Donald.
- A couple of suggestions and fixes to TCP patches by Eric and David.
- Fixes to number assignements suggested by Arnd.
- Use rtnl_lock()ing to guard against queue reconfiguration while the
page_pool initialization is happening. (Jakub).
- Fixes to a few warnings reproduced by Taehee.
- Fixes to dma-buf binding suggested by Taehee and Jakub.
- Fixes to netlink UAPI suggested by Jakub
- Applied a number of Reviewed-bys and Acked-bys (including ones I lost
from v13+).
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v16/
One caveat: Taehee reproduced a KASAN warning and reported it here:
https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd…
I estimate the issue to be minor and easily fixable:
https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW…
I hope to be able to follow up with a fix to net tree as net-next closes
imminently, but if this iteration doesn't make it in, I will repost with
a fix squashed after net-next reopens, no problem.
v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=*
====
No material changes in this version, only a fix to linking against
libynl.a from the last version. Per Jakub's instructions I've pulled one
of his patches into this series, and now use the new libynl.a correctly,
I hope.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v15/
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Taehee Yoo <ap420073(a)gmail.com>
Cc: Donald Hunter <donald.hunter(a)gmail.com>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
netdev: add dmabuf introspection
Documentation/netlink/specs/netdev.yaml | 61 +++
Documentation/networking/devmem.rst | 269 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 9 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 123 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 193 ++++++++-
include/net/page_pool/helpers.h | 47 ++-
include/net/page_pool/types.h | 8 +
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 8 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 13 +
include/uapi/linux/uio.h | 17 +
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 364 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 111 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 96 +++--
net/core/page_pool_user.c | 4 +
net/core/skbuff.c | 76 +++-
net/core/sock.c | 68 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 13 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 9 +
tools/testing/selftests/net/ncdevmem.c | 536 ++++++++++++++++++++++++
47 files changed, 2505 insertions(+), 98 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.2.803.g4e1b14247a-goog
This series let kunit macro more neat and clear.
Fix comment and rename the macro.
Also introduce new type of assertion marco for functionality.
Eric Chan (3):
kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion
kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_ASSERT for readability
kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros
drivers/input/tests/input_test.c | 2 +-
include/kunit/assert.h | 2 +-
include/kunit/test.h | 71 ++++++++++++++++++++++++++++++--
3 files changed, 70 insertions(+), 5 deletions(-)
--
2.45.2.803.g4e1b14247a-goog
To verify IFS (In Field Scan [1]) driver functionality, add the following 6
test cases:
1. Verify that IFS sysfs entries are created after loading the IFS module
2. Check if loading an invalid IFS test image fails and loading a valid
one succeeds
3. Perform IFS scan test on each CPU using all the available image files
4. Perform IFS scan with first test image file on a random CPU for 3
rounds
5. Perform IFS ARRAY BIST(Board Integrated System Test) test on each CPU
6. Perform IFS ARRAY BIST test on a random CPU for 3 rounds
These are not exhaustive, but some minimal test runs to check various
parts of the driver. Some negative tests are also included.
[1] https://docs.kernel.org/arch/x86/ifs.html
Pengfei Xu (4):
selftests: ifs: verify test interfaces are created by the driver
selftests: ifs: verify test image loading functionality
selftests: ifs: verify IFS scan test functionality
selftests: ifs: verify IFS ARRAY BIST functionality
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../drivers/platform/x86/intel/ifs/Makefile | 6 +
.../platform/x86/intel/ifs/test_ifs.sh | 494 ++++++++++++++++++
4 files changed, 502 insertions(+)
create mode 100644 tools/testing/selftests/drivers/platform/x86/intel/ifs/Makefile
create mode 100755 tools/testing/selftests/drivers/platform/x86/intel/ifs/test_ifs.sh
---
Changes:
v1 to v2:
- Rebase to v6.10 cycle kernel and resolve some code conflicts
- Improved checking of IFS ARRAY_BIST support by leveraging sysfs entry
methods (suggested by Ashok)
--
2.43.0
In this series, 4 tests are being conformed to TAP.
Muhammad Usama Anjum (4):
selftests: x86: check_initial_reg_state: conform test to TAP format
output
selftests: x86: corrupt_xstate_header: conform test to TAP format
output
selftests: fsgsbase_restore: conform test to TAP format output
selftests: entry_from_vm86: conform test to TAP format output
.../selftests/x86/check_initial_reg_state.c | 24 ++--
.../selftests/x86/corrupt_xstate_header.c | 30 +++--
tools/testing/selftests/x86/entry_from_vm86.c | 109 ++++++++--------
.../testing/selftests/x86/fsgsbase_restore.c | 117 +++++++++---------
4 files changed, 139 insertions(+), 141 deletions(-)
--
2.39.2
Add RTC wakeup alarm for devices to resume after specific time interval.
This improvement in the test will help in enabling this test
in the CI systems and will eliminate the need of manual intervention
for resuming back the devices after suspend/hibernation.
Signed-off-by: Shreeya Patel <shreeya.patel(a)collabora.com>
---
tools/testing/selftests/cpufreq/cpufreq.sh | 24 ++++++++++++++++++++++
tools/testing/selftests/cpufreq/main.sh | 13 +++++++++++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cpufreq/cpufreq.sh b/tools/testing/selftests/cpufreq/cpufreq.sh
index a8b1dbc0a3a5..a0f5b944a8fe 100755
--- a/tools/testing/selftests/cpufreq/cpufreq.sh
+++ b/tools/testing/selftests/cpufreq/cpufreq.sh
@@ -231,6 +231,30 @@ do_suspend()
for i in `seq 1 $2`; do
printf "Starting $1\n"
+
+ if [ "$3" = "rtc" ]; then
+ now=$(date +%s)
+ wakeup_time=$((now + 15)) # Wake up after 15 seconds
+
+ echo $wakeup_time > /sys/class/rtc/rtc0/wakealarm
+
+ if [ $? -ne 0 ]; then
+ printf "Failed to set RTC wake alarm\n"
+ return 1
+ fi
+
+ # Enable the RTC as a wakeup source
+ echo enabled > /sys/class/rtc/rtc0/device/power/wakeup
+
+ if [ $? -ne 0 ]; then
+ printf "Failed to set RTC wake alarm\n"
+ return 1
+ fi
+
+ # Reset the wakeup alarm
+ echo 0 > /sys/class/rtc/rtc0/wakealarm
+ fi
+
echo $filename > $SYSFS/power/state
printf "Came out of $1\n"
diff --git a/tools/testing/selftests/cpufreq/main.sh b/tools/testing/selftests/cpufreq/main.sh
index a0eb84cf7167..f12ff7416e41 100755
--- a/tools/testing/selftests/cpufreq/main.sh
+++ b/tools/testing/selftests/cpufreq/main.sh
@@ -24,6 +24,8 @@ helpme()
[-t <basic: Basic cpufreq testing
suspend: suspend/resume,
hibernate: hibernate/resume,
+ suspend_rtc: suspend/resume back using the RTC wakeup alarm,
+ hibernate_rtc: hibernate/resume back using the RTC wakeup alarm,
modtest: test driver or governor modules. Only to be used with -d or -g options,
sptest1: Simple governor switch to produce lockdep.
sptest2: Concurrent governor switch to produce lockdep.
@@ -76,7 +78,8 @@ parse_arguments()
helpme
;;
- t) # --func_type (Function to perform: basic, suspend, hibernate, modtest, sptest1/2/3/4 (default: basic))
+ t) # --func_type (Function to perform: basic, suspend, hibernate,
+ # suspend_rtc, hibernate_rtc, modtest, sptest1/2/3/4 (default: basic))
FUNC=$OPTARG
;;
@@ -121,6 +124,14 @@ do_test()
do_suspend "hibernate" 1
;;
+ "suspend_rtc")
+ do_suspend "suspend" 1 rtc
+ ;;
+
+ "hibernate_rtc")
+ do_suspend "hibernate" 1 rtc
+ ;;
+
"modtest")
# Do we have modules in place?
if [ -z $DRIVER_MOD ] && [ -z $GOVERNOR_MOD ]; then
--
2.39.2
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
Christoph Schlameuss (9):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 78 +++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 612 ++++++++++++++++++
12 files changed, 953 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: 256abd8e550ce977b728be79a74e1729438b4948
--
2.45.2
The requested resources should be closed before return in main(), otherwise
resource leak will occur. Add a check of cgroup_fd and close().
Fixes: 4939b2847d26 ("bpf, selftests: Use single cgroup helpers for both test_sockmap/progs")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
tools/testing/selftests/bpf/test_dev_cgroup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_dev_cgroup.c b/tools/testing/selftests/bpf/test_dev_cgroup.c
index adeaf63cb6fa..e97fc061fab2 100644
--- a/tools/testing/selftests/bpf/test_dev_cgroup.c
+++ b/tools/testing/selftests/bpf/test_dev_cgroup.c
@@ -81,5 +81,7 @@ int main(int argc, char **argv)
cleanup_cgroup_environment();
out:
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
return error;
}
--
2.25.1
The requested resources should be closed before return in main(), otherwise
resource leak will occur. Add a check of cgroup_fd and close().
Fixes: 4939b2847d26 ("bpf, selftests: Use single cgroup helpers for both test_sockmap/progs")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
tools/testing/selftests/bpf/test_cgroup_storage.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c b/tools/testing/selftests/bpf/test_cgroup_storage.c
index 0861ea60dcdd..4265f1348b6b 100644
--- a/tools/testing/selftests/bpf/test_cgroup_storage.c
+++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
@@ -79,6 +79,8 @@ int main(int argc, char **argv)
}
cgroup_fd = cgroup_setup_and_join(TEST_CGROUP);
+ if (cgroup_fd < 0)
+ goto out;
/* Attach the bpf program */
if (bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0)) {
@@ -170,5 +172,7 @@ int main(int argc, char **argv)
free(percpu_value);
out:
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
return error;
}
--
2.25.1
Don't print that 88 sub-tests are going to be executed. But then skip.
The error is printed that executed test was only 1 while 88 should have
run:
Old output:
TAP version 13
1..88
ok 2 # SKIP all tests require euid == 0
# Planned tests != run tests (88 != 1)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
New and correct output:
TAP version 13
1..0 # SKIP all tests require euid == 0
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/openat2/resolve_test.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/openat2/resolve_test.c b/tools/testing/selftests/openat2/resolve_test.c
index bbafad440893c..5472ec478d227 100644
--- a/tools/testing/selftests/openat2/resolve_test.c
+++ b/tools/testing/selftests/openat2/resolve_test.c
@@ -508,12 +508,13 @@ void test_openat2_opath_tests(void)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(NUM_TESTS);
/* NOTE: We should be checking for CAP_SYS_ADMIN here... */
- if (geteuid() != 0)
+ if (geteuid())
ksft_exit_skip("all tests require euid == 0\n");
+ ksft_set_plan(NUM_TESTS);
+
test_openat2_opath_tests();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
--
2.39.2
While exploring uretprobe syscall and trampoline for ARM64, we observed
a slight performance gain for Redis benchmark using uretprobe syscall.
This patchset aims to further improve the performance of uretprobe by
optimizing the management of struct return_instance data.
In details, uretprobe utilizes dynamically allocated memory for struct
return_instance data. These data track the call chain of instrumented
functions. This approach is not efficient, especially considering the
inherent locality of function invocation.
This patchset proposes a rework of the return_instances management. It
replaces dynamic memory allocation with a statically allocated array.
This approach leverages the stack-style usage of return_instance and
remove the need for kamlloc/kfree operations.
This patch has been tested on Kunpeng916 (Hi1616), 4 NUMA nodes, 64
cores @ 2.4GHz. Redis benchmarks show a throughput gain by 2% for Redis
GET and SET commands:
------------------------------------------------------------------
Test case | No uretprobes | uretprobes | uretprobes
| | (current) | (optimized)
==================================================================
Redis SET (RPS) | 47025 | 40619 (-13.6%) | 41529 (-11.6%)
------------------------------------------------------------------
Redis GET (RPS) | 46715 | 41426 (-11.3%) | 42306 (-9.4%)
------------------------------------------------------------------
Liao Chang (2):
uprobes: Optimize the return_instance related routines
selftests/bpf: Add uretprobe test for return_instance management
include/linux/uprobes.h | 10 +-
kernel/events/uprobes.c | 162 +++++++++++-------
.../bpf/prog_tests/uretprobe_depth.c | 150 ++++++++++++++++
.../selftests/bpf/progs/uretprobe_depth.c | 19 ++
4 files changed, 274 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_depth.c
create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_depth.c
--
2.34.1
'%u' in format string requires 'unsigned int' in __wait_for_test()
but the argument type is 'signed int'.
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index b634969cbb6f..dbbbcc6c04ee 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -1084,7 +1084,7 @@ void __wait_for_test(struct __test_metadata *t)
}
} else {
fprintf(TH_LOG_STREAM,
- "# %s: Test ended in some other way [%u]\n",
+ "# %s: Test ended in some other way [%d]\n",
t->name,
status);
}
--
2.17.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- only check the first "link" (link_nl) in test_mixed_links().
- Drop patch 2 in v1.
Resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard
suggested. Together with another fix for xdp_adjust_tail.
Geliang Tang (2):
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Close obj in error path in xdp_adjust_tail
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 16 ++++++++++++----
.../selftests/bpf/prog_tests/xdp_adjust_tail.c | 2 +-
2 files changed, 13 insertions(+), 5 deletions(-)
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v11:
- new patches 2, 4, 6.
- drop expect_errno from network_helper_opts as Eduard and Martin
suggested.
- drop sockmap_ktls patches from this set.
- add a new helper connect_fd_to_addr_str.
v10:
- a new patch 10 is added.
- patches 1-6, 8-9 unchanged, only commit logs updated.
- "err = -errno" is used in patches 7, 11, 12 to get the real error
number before checking value of "err".
v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sk_lookup, and drop the local
helpers inetaddr_len() and make_socket().
Geliang Tang (9):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Add ASSERT_OK_FD macro
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use start_server_addr in sk_lookup
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Add connect_fd_to_addr_str helper
selftests/bpf: Use connect_fd_to_addr_str in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 23 ++-
tools/testing/selftests/bpf/network_helpers.h | 7 +
.../selftests/bpf/prog_tests/sk_lookup.c | 156 ++++++------------
tools/testing/selftests/bpf/test_progs.h | 8 +
4 files changed, 92 insertions(+), 102 deletions(-)
--
2.43.0
Changes from PATCH v1 -> v2:
- Updated selftest to use ksft_test_result_code instead of switch-case
(Muhammad Usama Anjum)
- Included more use cases in the cover letter
(Huang, Ying)
- Added documentation for sysfs and memcg interfaces
- Added an aging-specific struct lru_gen_mm_walk in struct pglist_data
to avoid allocating for each lruvec.
Changes from RFC v3 -> PATCH v1:
- Updated selftest to use ksft_print_msg instead of fprintf(stderr, ...)
(Muhammad Usama Anjum)
- Included more detail in patch skipping pmd_young with force_scan
(Huang, Ying)
- Deferred reaccess histogram as a followup
- Removed per-memcg page age interval configs for simplicity
Changes from RFC v2 -> RFC v3:
- Update to v6.8
- Added an aging kernel thread (gated behind config)
- Added basic selftests for sysfs interface files
- Track swapped out pages for reaccesses
- Refactoring and cleanup
- Dropped the virtio-balloon extension to make things manageable
Changes from RFC v1 -> RFC v2:
- Refactored the patchs into smaller pieces
- Renamed interfaces and functions from wss to wsr (Working Set Reporting)
- Fixed build errors when CONFIG_WSR is not set
- Changed working_set_num_bins to u8 for virtio-balloon
- Added support for per-NUMA node reporting for virtio-balloon
[rfc v1]
https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co…
[rfc v2]
https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/
[rfc v3]
https://lore.kernel.org/linux-mm/20240327213108.2384666-1-yuanchu@google.co…
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace. IMO, the
kernel should provide a set of workingset interfaces that should be
generic enough to accommodate the various use cases, and be extensible
to potential future use cases. The current proposed interfaces are not
sufficient in that regard, but I would like to start somewhere, solicit
feedback, and iterate.
Use cases
==========
Job scheduling
On overcommitted hosts, workingset information allows the job scheduler
to right-size each job and land more jobs on the same host or NUMA node,
and in the case of a job with increasing workingset, policy decisions
can be made to migrate other jobs off the host/NUMA node, or oom-kill
the misbehaving job. If the job shape is very different from the machine
shape, knowing the workingset per-node can also help inform page
allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting allows the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
While this patch series does not extend the virtio-balloon device,
balloon policies benefit from workingset to more precisely determine
the size of the memory balloon. On desktops/laptops/mobile devices where
memory is scarce and overcommitted, the balloon sizing in multiple VMs
running on the same device can be orchestrated with workingset reports
from each one.
Promotion/Demotion
If different mechanisms are used for promition and demotion, workingset
information can help connect the two and avoid pages being migrated back
and forth.
For example, given a promotion hot page threshold defined in reaccess
distance of N seconds (promote pages accessed more often than every N
seconds). The threshold N should be set so that ~80% (e.g.) of pages on
the fast memory node passes the threshold. This calculation can be done
with workingset reports.
To be directly useful for promotion policies, the workingset report
interfaces need to be extended to report hotness and gather hotness
information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
I realize this does not generalize well to hotness information, but I
lack the intuition for an abstraction that presents hotness in a useful
way. Based on a recent proposal for move_phys_pages[2], it seems like
userspace tiering software would like to move specific physical pages,
instead of informing the kernel "move x number of hot pages to y
device". Please advise.
[2]
https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge…
Implementation
==========
Currently, the reporting of user pages is based off of MGLRU, and
therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU
generations for a more fine-grained workingset report. I will make the
generation count configurable in the next version. The workingset
reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the
aging thread is behind CONFIG_WORKINGSET_REPORT_AGING.
Yuanchu Xie (8):
mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
mm: aggregate working set information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend working set reporting to memcgs
mm: add kernel aging thread for workingset reporting
selftest: test system-wide workingset reporting
Docs/admin-guide/mm/workingset_report: document sysfs and memcg
interfaces
Documentation/admin-guide/mm/index.rst | 1 +
.../admin-guide/mm/workingset_report.rst | 105 ++++
drivers/base/node.c | 6 +
include/linux/memcontrol.h | 5 +
include/linux/mmzone.h | 9 +
include/linux/workingset_report.h | 97 +++
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 18 +
mm/memcontrol.c | 184 +++++-
mm/mm_init.c | 2 +
mm/mmzone.c | 2 +
mm/vmscan.c | 58 +-
mm/workingset_report.c | 561 ++++++++++++++++++
mm/workingset_report_aging.c | 127 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
tools/testing/selftests/mm/run_vmtests.sh | 5 +
.../testing/selftests/mm/workingset_report.c | 306 ++++++++++
.../testing/selftests/mm/workingset_report.h | 39 ++
.../selftests/mm/workingset_report_test.c | 329 ++++++++++
21 files changed, 1869 insertions(+), 6 deletions(-)
create mode 100644 Documentation/admin-guide/mm/workingset_report.rst
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.45.1.467.gbab1589fc0-goog
The variable are never referenced in the code, just remove it
that this problem was discovered by reading code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
tools/testing/selftests/dma/dma_map_benchmark.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c
index 3fcea00961c0..c91b485ca99a 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -33,7 +33,6 @@ int main(int argc, char **argv)
int granule = 1;
int cmd = DMA_MAP_BENCHMARK;
- char *p;
while ((opt = getopt(argc, argv, "t:s:n:b:d:x:g:")) != -1) {
switch (opt) {
--
2.17.1
This variable is never referenced in the code, just remove them
that this problem was discovered by reading the code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
Changes in v2:
- modify commit info
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index b8703c499d28..dfec31fb9b30 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -130,7 +130,6 @@ int run_test(int cpu)
void suspend(void)
{
int power_state_fd;
- struct sigevent event = {};
int timerfd;
int err;
struct itimerspec spec = {};
--
2.17.1
Main function return value is int type, so add return
value in the end that this problem was discovered by reading the code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
Changes in v2:
- modify commit info
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index dfec31fb9b30..b473131fce3e 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -166,7 +166,7 @@ int main(int argc, char **argv)
bool succeeded = true;
unsigned int tests = 0;
cpu_set_t available_cpus;
- int err;
+ int err = 0;
int cpu;
ksft_print_header();
@@ -222,4 +222,6 @@ int main(int argc, char **argv)
ksft_exit_pass();
else
ksft_exit_fail();
+
+ return err;
}
--
2.17.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard
suggested. Together with two other cleanups.
Geliang Tang (3):
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Check ASSERT_OK(err) in dummy_st_ops
selftests/bpf: Close obj in error paths in xdp_adjust_tail
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 21 +++++++++++++------
.../selftests/bpf/prog_tests/dummy_st_ops.c | 8 +++++--
.../bpf/prog_tests/xdp_adjust_tail.c | 2 +-
3 files changed, 22 insertions(+), 9 deletions(-)
--
2.43.0
On Thu, Jul 04, 2024 at 04:36:04PM +0200, Arnd Bergmann wrote:
> #define __ARCH_WANT_SYS_CLONE
> +#define __ARCH_WANT_NEW_STAT
>
> -#ifndef __COMPAT_SYSCALL_NR
> -#include <uapi/asm/unistd.h>
> -#endif
> +#include <asm/unistd_64.h>
It looks like this is causing widespread build breakage in kselftest in
-next for arm64, there are *many* errors in the form:
In file included from test_signals_utils.c:14:
/build/stage/build-work/usr/include/asm/unistd.h:2:10: fatal error: unistd_64.h: No such file or directory
2 | #include <unistd_64.h>
| ^~~~~~~~~~~~~
which obviously looks like it's tied to the above but I've not fully
understood the patch/series yet. Build log at:
https://builds.sirena.org.uk/82d01fe6ee52086035b201cfa1410a3b04384257/arm64…
A bisect appears to confirm that it's this commit, which is in -next as
6e4a077c0b607c674536908c5b68f1c31e4e26ec.
git bisect start
# status: waiting for both good and bad commits
# bad: [82d01fe6ee52086035b201cfa1410a3b04384257] Add linux-next specific files for 20240709
git bisect bad 82d01fe6ee52086035b201cfa1410a3b04384257
# status: waiting for good commit(s), bad commit known
# good: [037206cd4cb43d535453723140fde1bcde0b296e] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
git bisect good 037206cd4cb43d535453723140fde1bcde0b296e
# bad: [2ae3e655fc40f1b6620194b90dcf9a4515257918] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad 2ae3e655fc40f1b6620194b90dcf9a4515257918
# bad: [4f2a367612d46dff2068582feadfbdd8e1c0443f] Merge branch 'fs-next' of linux-next
git bisect bad 4f2a367612d46dff2068582feadfbdd8e1c0443f
# bad: [d3da7ed72840f3660f90966490adfd499d96ea8f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
git bisect bad d3da7ed72840f3660f90966490adfd499d96ea8f
# good: [6355edbb3dfe322f0748b1eb3987973a568bbb42] Merge tag 'v6.11-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
git bisect good 6355edbb3dfe322f0748b1eb3987973a568bbb42
# good: [2073cda629a47f2ebe2afcd3cb8b3000d5cd13d1] mm: optimization on page allocation when CMA enabled
git bisect good 2073cda629a47f2ebe2afcd3cb8b3000d5cd13d1
# good: [91a2b5b12867f77dc68d2d15ec7381e6e43820cb] Merge branch 'perf-tools-next' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
git bisect good 91a2b5b12867f77dc68d2d15ec7381e6e43820cb
# bad: [b8c38a39b6ee44b02ee563b60439f417fec441ad] Merge branch 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
git bisect bad b8c38a39b6ee44b02ee563b60439f417fec441ad
# good: [c100216635e922f43d9e783da918a749995350ca] Merge branch 'for-next/vcpu-hotplug' into for-next/core
git bisect good c100216635e922f43d9e783da918a749995350ca
# bad: [fafb823fc82dfb746cc9043b1573c4b29ef1d52a] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux.git
git bisect bad fafb823fc82dfb746cc9043b1573c4b29ef1d52a
# bad: [8d46f9dd06378e346a562c75bc2a260a03abe807] csky: convert to generic syscall table
git bisect bad 8d46f9dd06378e346a562c75bc2a260a03abe807
# good: [57029ba74296a4dafe35f147e88d56d8ae7b69da] kbuild: add syscall table generation to scripts/Makefile.asm-headers
git bisect good 57029ba74296a4dafe35f147e88d56d8ae7b69da
# good: [ea0130bf3c45f276b1f9e005eeb255a80a10358b] arm64: convert unistd_32.h to syscall.tbl format
git bisect good ea0130bf3c45f276b1f9e005eeb255a80a10358b
# bad: [b2595bdb3eb3fe24137d0bd07a51bc622f068a81] arm64: rework compat syscall macros
git bisect bad b2595bdb3eb3fe24137d0bd07a51bc622f068a81
# bad: [6e4a077c0b607c674536908c5b68f1c31e4e26ec] arm64: generate 64-bit syscall.tbl
git bisect bad 6e4a077c0b607c674536908c5b68f1c31e4e26ec
# first bad commit: [6e4a077c0b607c674536908c5b68f1c31e4e26ec] arm64: generate 64-bit syscall.tbl
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwprobe.h | 5 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 249 +++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 25 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 911 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
In the middle of the thread about a patch to add the skip test result,
I suggested documenting the process of deprecating the KTAP v1 Specification
method of marking a skipped test:
https://lore.kernel.org/all/490271eb-1429-2217-6e38-837c6e5e328b@gmail.com/…
In a reply to that email I suggested that we ought to have a process to transition
the KTAP Specification from v1 to v2, and possibly v3 and future.
This email is meant to be the root of that discussion.
My initial thinking is that there are at least three different types of project
and/or community that may have different needs in this area.
Type 1 - project controls both the test output generation and the test output
parsing tool. Both generation and parsing code are in the same repository
and/or synchronized versions are distributed together.
Devicetree unittests are an example of Type 1. I plan to maintain changes
of test output to KTAP v2 format in coordination with updating the parser
to process KTAP v2 data.
Type 2 - project controls both the test output generation and the test output
parsing tool. The test output generation and a parser modifications may be
controlled by the project BUT there are one or more external testing projects
that (1) may have their own parsers, and (2) may have a single framework that
tests multiple versions of the tests.
I think that kselftest and kunit tests are probably examples of Type 2. I also
think that DT unittests will become a Type 2 project as a result of converting
to KTAP v2 data.
Type 3 - project may create and maintain some tests, but is primarily a consumer
of tests created by other projects. Type 3 projects typically have a single
framework that is able to execute and process multiple versions of the tests.
The Fuego test project is an example of Type 3.
Maybe adding all of this complexity of different Types in my initial thinking
was silly -- maybe everything in this topic is governed by the more complex
Type 3.
My thinking was that the three different Types of project would be impacted
in different ways by transition plans. Type 3 would be the most impacted,
so I wanted to be sure that any transition plan especially considered their
needs.
There is an important aspect of the KTAP format that might ease the transition
from one version to another: All KTAP formatted results begin with a "version
line", so as soon as a parser has processed the first line of a test, it can
apply the appropriate KTAP Specification version to all subsequent lines of
test output. A parser implementation could choose to process all versions,
could choose to invoke a version specific parser, or some other approach
all together.
In the "add skip test results" thread, I suggested deprecating the v1
method of marking a skipped test in v2, with a scheduled removal of
the v1 method in v3. But since the KTAP format version is available
in the very first line of test output, is it necessary to do a slow
deprecation and removal over two versions?
One argument to doing a two version deprecation/removal process is that
a parser that is one version older the the test output _might_ be able
to process the test output without error, but would not be able to take
advantage of features added in the newer version of the Specification.
My opinion is that a two version deprecation/removal process will slow
the Specification update process and lead to more versions of the
Specification over a given time interval.
A one version deprecation/removal process puts more of a burden on Type 3
projects and external parsers for Type 2 projects to implement parsers
that can process the newer Specification more quickly and puts a burden
on test maintainers to delay a move to the newer Specification, or possibly
pressure to support selection of more than one Specification version format
for output data.
One additional item... On the KTAP Specification version 2 process wiki page,
I suggested that it is "desirable for test result parsers that understand the
KTAP Specification version 2 data also be able to parse version 1 data."
With the implication "Converting version 1 compliant data to version 2 compliant
data should not require a "flag day" switch of test result parsers." If this
thread discussion results in a different decision, I will update the wiki.
Thoughts?
-Frank
Hi everyone,
I am new to this forum and excited to be here.
I'm interested in learning more about Linux kernel self-tests and contributing where I can.
I look forward to engaging with you all and gaining a deeper understanding of the topics discussed here.
Could someone please guide me on how to ask questions here ?
Where should I post if I have a query ?
Looking forward to your advice and connecting with you all.
Thanks,
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- Although all CI tests passed on x86_64 "bpf/vmtest-bpf-next-VM_Test-22
Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64
with gcc", some unexpect "SKIP"s are showed in the log:
#29/1 bpf_tcp_ca/dctcp:SKIP
#29/2 bpf_tcp_ca/cubic:OK
#29/3 bpf_tcp_ca/invalid_license:OK
#29/4 bpf_tcp_ca/dctcp_fallback:SKIP
#29/5 bpf_tcp_ca/rel_setsockopt:OK
#29/6 bpf_tcp_ca/write_sk_pacing:OK
#29/7 bpf_tcp_ca/incompl_cong_ops:OK
#29/8 bpf_tcp_ca/unsupp_cong_op:OK
#29/9 bpf_tcp_ca/update_ca:OK
#29/10 bpf_tcp_ca/update_wrong:OK
#29/11 bpf_tcp_ca/mixed_links:OK
#29/12 bpf_tcp_ca/multi_links:OK
#29/13 bpf_tcp_ca/link_replace:OK
#29/14 bpf_tcp_ca/tcp_ca_kfunc:OK
#29/15 bpf_tcp_ca/cc_cubic:OK
#29/16 bpf_tcp_ca/dctcp_autoattach_map:SKIP
#29 bpf_tcp_ca:OK (SKIP: 3/16)
Shouldn't skip any tests on X86_64. Fix this in v2.
- add a new helper test_progs_get_error.
BPF selftests seem to have not been fully tested on Loongarch platforms.
There are so many "ENOTSUPP" (-524) errors when running BPF selftests on
them since lacking BPF trampoline on Loongarch.
For these "ENOTSUPP" tests, it's better to skip them, instead of reporting
some "ENOTSUPP" errors. This patchset skips ENOTSUPP in ASSERT_OK/
ASSERT_OK_PTR/ASSERT_GE helpers to fix them. This is useful for running BPF
selftests for other architectures too.
Geliang Tang (6):
selftests/bpf: Define ENOTSUPP in testing_helpers.h
selftests/bpf: Skip ENOTSUPP in ASSERT_OK
selftests/bpf: Use ASSERT_OK to skip ENOTSUPP
selftests/bpf: Null checks for link in bpf_tcp_ca
selftests/bpf: Skip ENOTSUPP in ASSERT_OK_PTR
selftests/bpf: Skip ENOTSUPP in ASSERT_GE
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 20 ++++++-----
.../testing/selftests/bpf/prog_tests/d_path.c | 2 +-
.../selftests/bpf/prog_tests/lsm_cgroup.c | 10 +-----
.../selftests/bpf/prog_tests/module_attach.c | 2 +-
.../selftests/bpf/prog_tests/ringbuf.c | 2 +-
.../selftests/bpf/prog_tests/sock_addr.c | 4 ---
.../selftests/bpf/prog_tests/test_bprm_opts.c | 2 +-
.../selftests/bpf/prog_tests/test_ima.c | 2 +-
.../selftests/bpf/prog_tests/trace_ext.c | 2 +-
tools/testing/selftests/bpf/test_maps.c | 4 ---
tools/testing/selftests/bpf/test_progs.h | 33 +++++++++++++++----
tools/testing/selftests/bpf/test_verifier.c | 4 ---
tools/testing/selftests/bpf/testing_helpers.h | 4 +++
13 files changed, 50 insertions(+), 41 deletions(-)
--
2.43.0
Conform individual tests to TAP output. One patch conform one test. With
this series, all vDSO tests become TAP conformant.
First patch conform the test by using kselftest_harness.h. Other patches
are conforming using default kselftest.h helpers.
All tests have been tested multiple times before and after these
patches. They are working correctly and outputting TAP messaging to find
failures quikly when they happen.
---
Changes since v1:
- Update cover letter
- Update commit message of first patch
Muhammad Usama Anjum (4):
kselftests: vdso: vdso_test_clock_getres: conform test to TAP output
kselftests: vdso: vdso_test_correctness: conform test to TAP output
kselftests: vdso: vdso_test_getcpu: conform test to TAP output
kselftests: vdso: vdso_test_gettimeofday: conform test to TAP output
.../selftests/vDSO/vdso_test_clock_getres.c | 68 ++++----
.../selftests/vDSO/vdso_test_correctness.c | 146 +++++++++---------
.../testing/selftests/vDSO/vdso_test_getcpu.c | 16 +-
.../selftests/vDSO/vdso_test_gettimeofday.c | 23 +--
4 files changed, 126 insertions(+), 127 deletions(-)
--
2.39.2
Changes since v3:
1) Rebased onto Linux 6.10-rc6+.
2) Added Muhammad's acks for the series.
Cover letter for v3:
Hi,
Dave Hansen, Muhammad Usama Anjum, here is the combined series that we
discussed yesterday [1].
As I mentioned then, this is a bit intrusive--but no more than
necessary, IMHO. Specifically, it moves some clang-un-inlineable things
out to "pure" assembly code files.
I've tested this by building with clang, then running each binary on my
x86_64 test system with today's 6.10-rc1, and comparing the console and
dmesg output to a gcc-based build without these patches applied. Aside
from timestamps and virtual addresses, it looks identical.
Earlier cover letter:
Just a bunch of build and warnings fixes that show up when building with
clang. Some of these depend on each other, so I'm sending them as a
series.
Changes since v2:
1) Dropped my test_FISTTP.c patch, and picked up Muhammad's fix instead,
seeing as how that was posted first.
2) Updated patch descriptions to reflect that Valentin Obst's build fix
for LLVM [1] has already been merged into Linux main.
3) Minor wording and typo corrections in the commit logs throughout.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
Enjoy!
[1] https://lore.kernel.org/44428518-4d21-4de7-8587-04eceefb330d@nvidia.com
thanks,
John Hubbard
John Hubbard (6):
selftests/x86: fix Makefile dependencies to work with clang
selftests/x86: build fsgsbase_restore.c with clang
selftests/x86: build sysret_rip.c with clang
selftests/x86: avoid -no-pie warnings from clang during compilation
selftests/x86: remove (or use) unused variables and functions
selftests/x86: fix printk warnings reported by clang
Muhammad Usama Anjum (1):
selftests: x86: test_FISTTP: use fisttps instead of ambiguous fisttp
tools/testing/selftests/x86/Makefile | 31 +++++++++++++++----
tools/testing/selftests/x86/amx.c | 16 ----------
.../testing/selftests/x86/clang_helpers_32.S | 11 +++++++
.../testing/selftests/x86/clang_helpers_64.S | 28 +++++++++++++++++
tools/testing/selftests/x86/fsgsbase.c | 6 ----
.../testing/selftests/x86/fsgsbase_restore.c | 11 +++----
tools/testing/selftests/x86/sigreturn.c | 2 +-
.../testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/sysret_rip.c | 20 ++++--------
tools/testing/selftests/x86/test_FISTTP.c | 8 ++---
tools/testing/selftests/x86/test_vsyscall.c | 15 +++------
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
12 files changed, 87 insertions(+), 64 deletions(-)
create mode 100644 tools/testing/selftests/x86/clang_helpers_32.S
create mode 100644 tools/testing/selftests/x86/clang_helpers_64.S
base-commit: 795c58e4c7fc6163d8fb9f2baa86cfe898fa4b19
--
2.45.2
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v5: https://lore.kernel.org/r/20240603223811.3815762-1-sboyd@kernel.org
* Pick up reviewed-by tags
* Drop test vendor prefix bindings as dtschema allows anything now
* Use of_node_put_kunit() more to plug some reference leaks
* Select DTC config to avoid compile fails because of missing dtc
* Don't skip for OF_OVERLAY in overlay tests because they depend on it
Changes from v4: https://lore.kernel.org/r/20240422232404.213174-1-sboyd@kernel.org
* Picked up reviewed-by tags
* Check for non-NULL device pointers before calling put_device()
* Fix CFI issues with kunit actions
* Introduce platform_device_prepare_wait_for_probe() helper to wait for
a platform device to probe
* Move platform code to lib/kunit and rename functions to have kunit
prefix
* Fix issue with platform wrappers messing up reference counting
because they used kunit actions
* New patch to populate overlay devices on root node for powerpc
* Make fixed-rate binding generic single clk consumer binding
Changes from v3: https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2: https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1: https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (8):
of/platform: Allow overlays to create platform devices from the root
node
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 11 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 379 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit_helpers.c | 204 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 453 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 2 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit_helpers.c | 74 +++
drivers/of/overlay_test.c | 114 +++++
drivers/of/platform.c | 9 +-
include/kunit/clk.h | 28 ++
include/kunit/of.h | 115 +++++
include/kunit/platform_device.h | 20 +
lib/kunit/Makefile | 4 +-
lib/kunit/platform-test.c | 223 +++++++++
lib/kunit/platform.c | 302 ++++++++++++
28 files changed, 2084 insertions(+), 6 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit_helpers.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit_helpers.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
create mode 100644 lib/kunit/platform-test.c
create mode 100644 lib/kunit/platform.c
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
Hi Linus,
Please pull this kselftest fixes update for Linux 6.10.
This kselftest fixes update for Linux 6.10 consists of fixes to clang
build failures to timerns, vDSO tests and fixes to vDSO makefile.
Note: makefile fixes are included to avoid conflicts during 6.11 merge
window.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 48236960c06d32370bfa6f2cc408e786873262c8:
selftests/resctrl: Fix non-contiguous CBM for AMD (2024-06-26 13:22:34 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.10
for you to fetch changes up to 66cde337fa1b7c6cf31f856fa015bd91a4d383e7:
selftests/vDSO: remove duplicate compiler invocations from Makefile (2024-07-05 14:12:34 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.10
This kselftest fixes update for Linux 6.10 consists of fixes to clang
build failures to timerns, vDSO tests and fixes to vDSO makefile.
----------------------------------------------------------------
John Hubbard (4):
selftest/timerns: fix clang build failures for abs() calls
selftests/vDSO: fix clang build errors and warnings
selftests/vDSO: remove partially duplicated "all:" target in Makefile
selftests/vDSO: remove duplicate compiler invocations from Makefile
tools/testing/selftests/timens/exec.c | 6 ++---
tools/testing/selftests/timens/timer.c | 2 +-
tools/testing/selftests/timens/timerfd.c | 2 +-
tools/testing/selftests/timens/vfork_exec.c | 4 +--
tools/testing/selftests/vDSO/Makefile | 29 +++++++++-------------
tools/testing/selftests/vDSO/parse_vdso.c | 16 ++++++++----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++--
7 files changed, 46 insertions(+), 31 deletions(-)
----------------------------------------------------------------
For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
needs an RT budget assigned, otherwise the processes in it will not be able to
get RT at all. The problem with RT group scheduling is that it requires the
budget assigned but there's no way we could assign a default budget, since the
values to assign are both upper and lower time limits, are absolute, and need to
be sum up to < 1 for each individal cgroup. That means we cannot really come up
with values that would work by default in the general case.[1]
For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller
can only be enabled when all RT processes are in the root cgroup. But it will
lose the benefits of cgroup v2 if all RT process were placed in the same cgroup.
Red Hat, Gentoo, Arch Linux and Debian all disable it. systemd also doesn't
support it.[2]
I leave tools/testing/selftests/bpf/config.{s390x,aarch64} untouched because
I don't whether bpf testing requires it.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
[2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383
Celeste Liu (6):
riscv: defconfig: drop RT_GROUP_SCHED=y
loongarch: defconfig: drop RT_GROUP_SCHED=y
mips: defconfig: drop RT_GROUP_SCHED=y from generic/db1xxx/eyeq5
powerpc: defconfig: drop RT_GROUP_SCHED=y from ppc6xx_defconfig
sh: defconfig: drop RT_GROUP_SCHED=y from sdk7786/urquell
arm: defconfig: drop RT_GROUP_SCHED=y from bcm2855/tegra/omap2plus
arch/arm/configs/bcm2835_defconfig | 1 -
arch/arm/configs/omap2plus_defconfig | 1 -
arch/arm/configs/tegra_defconfig | 1 -
arch/loongarch/configs/loongson3_defconfig | 1 -
arch/mips/configs/db1xxx_defconfig | 1 -
arch/mips/configs/eyeq5_defconfig | 1 -
arch/mips/configs/generic_defconfig | 1 -
arch/powerpc/configs/ppc6xx_defconfig | 1 -
arch/riscv/configs/defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
11 files changed, 11 deletions(-)
--
2.45.1
in randomize function, there is a open function, but there is no
close function in the randomize, which is easy to cause memory leaks.
Signed-off-by: Liu Jing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/net/tcp_mmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index 4fcce5150850..ab305e262d0a 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -438,6 +438,7 @@ static void randomize(void *target, size_t count)
perror("read /dev/urandom");
exit(1);
}
+ close(urandom);
}
int main(int argc, char *argv[])
--
2.33.0
+ Dave Miller, Jakub Kicinski, Paolo Abeni, Shuah Khan,
linux-kselftest(a)vger.kernel.org
On Mon, Jul 08, 2024 at 09:04:05PM +0000, zijianzhang(a)bytedance.com wrote:
> From: Zijian Zhang <zijianzhang(a)bytedance.com>
>
> We update selftests/net/msg_zerocopy.c to accommodate the new mechanism,
> cfg_notification_limit has the same semantics for both methods. Test
> results are as follows, we update skb_orphan_frags_rx to the same as
> skb_orphan_frags to support zerocopy in the localhost test.
>
> cfg_notification_limit = 1, both method get notifications after 1 calling
> of sendmsg. In this case, the new method has around 17% cpu savings in TCP
> and 23% cpu savings in UDP.
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB) | 7523 | 7706 | 7489 | 7304 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB) | 8834 | 8993 | 9053 | 9228 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy | 117.42% | 116.70% | 120.88% | 126.34% |
> +---------------------+---------+---------+---------+---------+
>
> cfg_notification_limit = 32, both get notifications after 32 calling of
> sendmsg, which means more chances to coalesce notifications, and less
> overhead of poll + recvmsg for the original method. In this case, the new
> method has around 7% cpu savings in TCP and slightly better cpu usage in
> UDP. In the context of selftest, notifications of TCP are more likely to
> out of order than UDP, it's easier to coalesce more notifications in UDP.
> The original method can get one notification with range of 32 in a recvmsg
> most of the time. In TCP, most notifications' range is around 2, so the
> original method needs around 16 recvmsgs to get notified in one round.
> That's the reason for the "New ZCopy / ZCopy" diff in TCP and UDP here.
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB) | 8842 | 8735 | 10072 | 9380 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB) | 9366 | 9477 | 10108 | 9385 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy | 106.00% | 108.28% | 100.31% | 100.01% |
> +---------------------+---------+---------+---------+---------+
>
> In conclusion, when notification interval is small or notifications are
> hard to be coalesced, the new mechanism is highly recommended. Otherwise,
> the performance gain from the new mechanism is very limited.
>
> Signed-off-by: Zijian Zhang <zijianzhang(a)bytedance.com>
> Signed-off-by: Xiaochun Lu <xiaochun.lu(a)bytedance.com>
> ---
> tools/testing/selftests/net/msg_zerocopy.c | 111 ++++++++++++++++++--
> tools/testing/selftests/net/msg_zerocopy.sh | 1 +
> 2 files changed, 105 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
...
> @@ -466,6 +504,44 @@ static void do_recv_completions(int fd, int domain)
> sends_since_notify = 0;
> }
>
> +static void do_recv_completions2(void)
> +{
> + struct cmsghdr *cm = (struct cmsghdr *)zc_ckbuf;
> + struct zc_info *zc_info;
> + __u32 hi, lo, range;
> + __u8 zerocopy;
> + int i;
> +
> + zc_info = (struct zc_info *)CMSG_DATA(cm);
> + for (i = 0; i < zc_info->size; i++) {
> + hi = zc_info->arr[i].hi;
> + lo = zc_info->arr[i].lo;
> + zerocopy = zc_info->arr[i].zerocopy;
> + range = hi - lo + 1;
> +
> + if (cfg_verbose && lo != next_completion)
> + fprintf(stderr, "gap: %u..%u does not append to %u\n",
> + lo, hi, next_completion);
> + next_completion = hi + 1;
> +
> + if (zerocopied == -1)
> + zerocopied = zerocopy;
> + else if (zerocopied != zerocopy) {
> + fprintf(stderr, "serr: inconsistent\n");
> + zerocopied = zerocopy;
> + }
nit: If any arms of a conditional have {}, then all arms should have them
> +
> + completions += range;
> +
> + if (cfg_verbose >= 2)
> + fprintf(stderr, "completed: %u (h=%u l=%u)\n",
> + range, hi, lo);
> + }
> +
> + sends_since_notify = 0;
> + added_zcopy_info = false;
> +}
...
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, a kernel panic occurs:
'''
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
... ...
ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 0000000c (PPLV0 +PIE +PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
BADV: 0000000000000040
PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160
Code: ...
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
.text @ 0x9000000003710000
.data @ 0x9000000004d70000
.bss @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
'''
This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.
This crash is because a NULL pointer is passed to page_address() in
sk_msg_recvmsg(). Due to the difference implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on X86 platform. So this bug was hidden on X86 platform
for a while, but now it is exposed on Loongarch platform.
The root cause is a zero length skb (skb->len == 0) is put on the queue.
This zero length skb is a TCP FIN packet, which is sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():
shutdown(p1, SHUT_WR);
In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.
And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.
To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
v5:
- update v5 as John suggested.
- skmsg: skip zero length skb in sk_msg_recvmsg
v4:
- skmsg: skip empty sge in sk_msg_recvmsg
v3:
- skmsg: prevent empty ingress skb from enqueuing
v2:
- skmsg: null check for sg_page in sk_msg_recvmsg
---
net/core/skmsg.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..bbf40b999713 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
page = sg_page(sge);
if (copied + copy > len)
copy = len - copied;
- copy = copy_page_to_iter(page, sge->offset, copy, iter);
+ if (copy)
+ copy = copy_page_to_iter(page, sge->offset, copy, iter);
if (!copy) {
copied = copied ? copied : -EFAULT;
goto out;
--
2.43.0
Changes v3:
- Reworked patch 2.
- Changed minor things in patch 1 like function name and made
corrections to the patch message.
Changes v2:
- Removed patches 2 and 3 since now this part will be supported by the
kernel.
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
SNC support in the kernel is currently in review [1]. With SNC enabled
and kernel support in place all the tests will function normally (aside
from effective cache size). There might be a problem when SNC is enabled
but the system is still using an older kernel version without SNC
support. Currently the only message displayed in that situation is a
guess that SNC might be enabled and is causing issues. That message also
is displayed whenever the test fails on an Intel platform.
Add a mechanism to discover kernel support for SNC which will add more
meaning and certainty to the error message.
Add runtime SNC mode detection and verify how reliable that information
is.
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240628215619.76401-1-tony.luck@intel.com/
Previous versions of this series:
[v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1715769576.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (2):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 3 +
tools/testing/selftests/resctrl/cmt_test.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 4 +
tools/testing/selftests/resctrl/mbm_test.c | 6 +-
tools/testing/selftests/resctrl/resctrl.h | 8 +
.../testing/selftests/resctrl/resctrl_tests.c | 7 +
tools/testing/selftests/resctrl/resctrlfs.c | 138 ++++++++++++++++++
7 files changed, 166 insertions(+), 4 deletions(-)
--
2.45.2
Even if a vgem device is configured in, we will skip the import_vgem_fd()
test almost every time.
TAP version 13
1..11
# Testing heap: system
# =======================================
# Testing allocation and importing:
ok 1 # SKIP Could not open vgem -1
The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
version information but leave the name field a non-null-terminated string.
Terminate it properly to actually test against the vgem device.
Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
---
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
index 5f541522364f..2fcc74998fa9 100644
--- a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
+++ b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
@@ -32,6 +32,8 @@ static int check_vgem(int fd)
if (ret)
return 0;
+ name[4] = '\0';
+
return !strcmp(name, "vgem");
}
--
2.33.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v10:
- a new patch 10 is added.
- patches 1-6, 8-9 unchanged, only commit logs updated.
- "err = -errno" is used in patches 7, 11, 12 to get the real error
number before checking value of "err".
v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (12):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd_opts in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Add network_helper_opts for connect_fd_to_fd
selftests/bpf: Add expect_errno for network_helper_opts
selftests/bpf: Set expect_errno for cgroup_skb_sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 23 ++-
tools/testing/selftests/bpf/network_helpers.h | 8 +-
.../testing/selftests/bpf/prog_tests/bpf_nf.c | 5 +-
.../bpf/prog_tests/cgroup_skb_sk_lookup.c | 8 +-
.../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 4 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 1 +
.../selftests/bpf/prog_tests/sk_lookup.c | 162 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 53 ++----
8 files changed, 108 insertions(+), 156 deletions(-)
--
2.43.0
I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.
Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.
This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- amended the selftests to retrieve something from the third argument of
the callback
- Link to v1: https://lore.kernel.org/r/20240705-fix-wq-v1-0-91b4d82cd825@kernel.org
---
Benjamin Tissoires (2):
bpf: helpers: fix bpf_wq_set_callback_impl signature
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
kernel/bpf/helpers.c | 2 +-
tools/testing/selftests/bpf/bpf_experimental.h | 2 +-
tools/testing/selftests/bpf/progs/wq.c | 19 ++++++++++++++-----
tools/testing/selftests/bpf/progs/wq_failures.c | 4 ++--
4 files changed, 18 insertions(+), 9 deletions(-)
---
base-commit: fd8db07705c55a995c42b1e71afc42faad675b0b
change-id: 20240705-fix-wq-f069c7fb36c3
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
BPF selftests seem to have not been fully tested on Loongarch platforms.
There are so many "ENOTSUPP" (-524) errors when running BPF selftests on
them since lacking BPF trampoline on Loongarch.
For these "ENOTSUPP" tests, it's better to skip them, instead of reporting
some "ENOTSUPP" errors. This patchset skips ENOTSUPP in ASSERT_OK/
ASSERT_OK_PTR/ASSERT_GE helpers to fix them. This is useful for running BPF
selftests for other architectures too.
Geliang Tang (6):
selftests/bpf: Define ENOTSUPP in testing_helpers.h
selftests/bpf: Skip ENOTSUPP in ASSERT_OK
selftests/bpf: Use ASSERT_OK to skip ENOTSUPP
selftests/bpf: Null checks for link in bpf_tcp_ca
selftests/bpf: Skip ENOTSUPP in ASSERT_OK_PTR
selftests/bpf: Skip ENOTSUPP in ASSERT_GE
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 20 +++++++++-------
.../testing/selftests/bpf/prog_tests/d_path.c | 2 +-
.../selftests/bpf/prog_tests/lsm_cgroup.c | 10 +-------
.../selftests/bpf/prog_tests/module_attach.c | 2 +-
.../selftests/bpf/prog_tests/ringbuf.c | 2 +-
.../selftests/bpf/prog_tests/sock_addr.c | 4 ----
.../selftests/bpf/prog_tests/test_bprm_opts.c | 2 +-
.../selftests/bpf/prog_tests/test_ima.c | 2 +-
.../selftests/bpf/prog_tests/trace_ext.c | 2 +-
tools/testing/selftests/bpf/test_maps.c | 4 ----
tools/testing/selftests/bpf/test_progs.h | 24 ++++++++++++++-----
tools/testing/selftests/bpf/test_verifier.c | 4 ----
tools/testing/selftests/bpf/testing_helpers.h | 4 ++++
13 files changed, 41 insertions(+), 41 deletions(-)
--
2.43.0
I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.
Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.
This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Benjamin Tissoires (2):
bpf: helpers: fix bpf_wq_set_callback_impl signature
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
kernel/bpf/helpers.c | 2 +-
tools/testing/selftests/bpf/bpf_experimental.h | 2 +-
tools/testing/selftests/bpf/progs/wq.c | 8 ++++----
tools/testing/selftests/bpf/progs/wq_failures.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)
---
base-commit: fd8db07705c55a995c42b1e71afc42faad675b0b
change-id: 20240705-fix-wq-f069c7fb36c3
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (11):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd_opts in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Add network_helper_opts for connect_fd_to_fd
selftests/bpf: Add expect_errno for network_helper_opts
selftests/bpf: Set expect_errno for cgroup_skb_sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 23 ++-
tools/testing/selftests/bpf/network_helpers.h | 8 +-
.../testing/selftests/bpf/prog_tests/bpf_nf.c | 5 +-
.../bpf/prog_tests/cgroup_skb_sk_lookup.c | 8 +-
.../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 4 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 1 +
.../selftests/bpf/prog_tests/sk_lookup.c | 152 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 53 ++----
8 files changed, 106 insertions(+), 148 deletions(-)
--
2.43.0
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.
** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.
** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.
The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
--
v8 -> v9:
- Rebased.
v7 -> v8:
- Rebased
- Redirect flow insertion to /dev/null to avoid spat in test.
- Removed inline keyword in stub execute_psample_action function.
v6 -> v7:
- Rebased
- Fixed typo in comment.
v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
"sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add psample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add psample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 66 ++++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 32 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 566 insertions(+), 16 deletions(-)
--
2.45.2
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (9):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd_opts in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_to_fd_opts in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 2 +-
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/sk_lookup.c | 152 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 53 ++----
4 files changed, 76 insertions(+), 135 deletions(-)
--
2.43.0
Hi Shuah,
These are for 6.10, as we just discussed.
Changes since v4:
1) Subject line on patch #2/3: s/mm/vDSO/
2) Added Muhammad's review tag.
Changes since v3:
1. Rebased onto Linux 6.10-rc6+.
Cover letter for v3:
Jason A. Donenfeld, I've added you because I ended up looking through
your latest "implement getrandom() in vDSO" series [1], which also
touches this Makefile, so just a heads up about upcoming (minor) merge
conflicts.
Changes since v2:
1. Added two patches, both of which apply solely to the Makefile.
These provide a smaller, cleaner, and more accurate Makefile.
2. Added Reviewed-by and Tested-by tags for the original patch, which
fixes all of the clang errors and warnings for this selftest.
3. Removed an obsolete blurb from the commit description of the original
patch, now that Valentin Obst LLVM build fix has been merged.
thanks,
John Hubbard
NVIDIA
John Hubbard (3):
selftests/vDSO: fix clang build errors and warnings
selftests/vDSO: remove partially duplicated "all:" target in Makefile
selftests/vDSO: remove duplicate compiler invocations from Makefile
tools/testing/selftests/vDSO/Makefile | 29 ++++++++-----------
tools/testing/selftests/vDSO/parse_vdso.c | 16 ++++++----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++--
3 files changed, 39 insertions(+), 24 deletions(-)
base-commit: d270dd21bee023ab627f34cfb77a9b89a688492a
--
2.40.1
Hi,
This is basically a resend, with a rebase onto today's latest Linux
main, in order to show that the patches are still relevant and correct.
Changes since v3:
1. Rebased onto Linux 6.10-rc6+.
Cover letter for v3:
Jason A. Donenfeld, I've added you because I ended up looking through
your latest "implement getrandom() in vDSO" series [1], which also
touches this Makefile, so just a heads up about upcoming (minor) merge
conflicts.
Changes since v2:
1. Added two patches, both of which apply solely to the Makefile.
These provide a smaller, cleaner, and more accurate Makefile.
2. Added Reviewed-by and Tested-by tags for the original patch, which
fixes all of the clang errors and warnings for this selftest.
3. Removed an obsolete blurb from the commit description of the original
patch, now that Valentin Obst LLVM build fix has been merged.
John Hubbard (3):
selftests/vDSO: fix clang build errors and warnings
selftests/mm: remove partially duplicated "all:" target in Makefile
selftests/vDSO: remove duplicate compiler invocations from Makefile
tools/testing/selftests/vDSO/Makefile | 29 ++++++++-----------
tools/testing/selftests/vDSO/parse_vdso.c | 16 ++++++----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++--
3 files changed, 39 insertions(+), 24 deletions(-)
base-commit: 8a9c6c40432e265600232b864f97d7c675e8be52
--
2.45.2
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns about an unused irqcount variable. clang is correct: the
variable is incremented and then ignored.
Fix this by deleting the irqcount variable.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since v2:
1) Rebased onto Linux 6.10-rc6+
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/timers/rtcpie.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/timers/rtcpie.c b/tools/testing/selftests/timers/rtcpie.c
index 4ef2184f1558..7c07edd0d450 100644
--- a/tools/testing/selftests/timers/rtcpie.c
+++ b/tools/testing/selftests/timers/rtcpie.c
@@ -29,7 +29,7 @@ static const char default_rtc[] = "/dev/rtc0";
int main(int argc, char **argv)
{
- int i, fd, retval, irqcount = 0;
+ int i, fd, retval;
unsigned long tmp, data, old_pie_rate;
const char *rtc = default_rtc;
struct timeval start, end, diff;
@@ -120,7 +120,6 @@ int main(int argc, char **argv)
fprintf(stderr, " %d",i);
fflush(stderr);
- irqcount++;
}
/* Disable periodic interrupts */
base-commit: 8a9c6c40432e265600232b864f97d7c675e8be52
--
2.45.2
These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series. There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).
Aaron Conole (3):
selftests: openvswitch: Bump timeout to 15 minutes.
selftests: openvswitch: Attempt to autoload module.
selftests: openvswitch: Be more verbose with selftest debugging.
.../selftests/net/openvswitch/openvswitch.sh | 23 ++++++++++++-------
.../selftests/net/openvswitch/settings | 1 +
2 files changed, 16 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/net/openvswitch/settings
--
2.45.1
On Fri, Jul 05, 2024 at 11:49:28AM GMT, Adrián Moreno wrote:
> On Thu, Jul 04, 2024 at 10:56:51AM GMT, Adrian Moreno wrote:
> > ** Background **
> > Currently, OVS supports several packet sampling mechanisms (sFlow,
> > per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
> > userspace action that needs to be handled by ovs-vswitchd's handler
> > threads only to be forwarded to some third party application that
> > will somehow process the sample and provide observability on the
> > datapath.
> >
> > A particularly interesting use-case is controller-driven
> > per-flow IPFIX sampling where the OpenFlow controller can add metadata
> > to samples (via two 32bit integers) and this metadata is then available
> > to the sample-collecting system for correlation.
> >
> > ** Problem **
> > The fact that sampled traffic share netlink sockets and handler thread
> > time with upcalls, apart from being a performance bottleneck in the
> > sample extraction itself, can severely compromise the datapath,
> > yielding this solution unfit for highly loaded production systems.
> >
> > Users are left with little options other than guessing what sampling
> > rate will be OK for their traffic pattern and system load and dealing
> > with the lost accuracy.
> >
> > Looking at available infrastructure, an obvious candidated would be
> > to use psample. However, it's current state does not help with the
> > use-case at stake because sampled packets do not contain user-defined
> > metadata.
> >
> > ** Proposal **
> > This series is an attempt to fix this situation by extending the
> > existing psample infrastructure to carry a variable length
> > user-defined cookie.
> >
> > The main existing user of psample is tc's act_sample. It is also
> > extended to forward the action's cookie to psample.
> >
> > Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
> > It accepts a group and an optional cookie and uses psample to
> > multicast the packet and the metadata.
> >
> > --
> > v8 -> v9:
> > - Rebased.
> >
> > v7 -> v8:
> > - Rebased
> > - Redirect flow insertion to /dev/null to avoid spat in test.
> > - Removed inline keyword in stub execute_psample_action function.
> >
> > v6 -> v7:
> > - Rebased
> > - Fixed typo in comment.
> >
> > v5 -> v6:
> > - Renamed emit_sample -> psample
> > - Addressed unused variable and conditionally compilation of function.
> >
> > v4 -> v5:
> > - Rebased.
> > - Removed lefover enum value and wrapped some long lines in selftests.
> >
> > v3 -> v4:
> > - Rebased.
> > - Addressed Jakub's comment on private and unused nla attributes.
> >
> > v2 -> v3:
> > - Addressed comments from Simon, Aaron and Ilya.
> > - Dropped probability propagation in nested sample actions.
> > - Dropped patch v2's 7/9 in favor of a userspace implementation and
> > consume skb if emit_sample is the last action, same as we do with
> > userspace.
> > - Split ovs-dpctl.py features in independent patches.
> >
> > v1 -> v2:
> > - Create a new action ("emit_sample") rather than reuse existing
> > "sample" one.
> > - Add probability semantics to psample's sampling rate.
> > - Store sampling probability in skb's cb area and use it in emit_sample.
> > - Test combining "emit_sample" with "trunc"
> > - Drop group_id filtering and tracepoint in psample.
> >
> > rfc_v2 -> v1:
> > - Accommodate Ilya's comments.
> > - Split OVS's attribute in two attributes and simplify internal
> > handling of psample arguments.
> > - Extend psample and tc with a user-defined cookie.
> > - Add a tracepoint to psample to facilitate troubleshooting.
> >
> > rfc_v1 -> rfc_v2:
> > - Use psample instead of a new OVS-only multicast group.
> > - Extend psample and tc with a user-defined cookie.
> >
> > Adrian Moreno (10):
> > net: psample: add user cookie
> > net: sched: act_sample: add action cookie to sample
> > net: psample: skip packet copy if no listeners
> > net: psample: allow using rate as probability
> > net: openvswitch: add psample action
> > net: openvswitch: store sampling probability in cb.
> > selftests: openvswitch: add psample action
> > selftests: openvswitch: add userspace parsing
> > selftests: openvswitch: parse trunc action
> > selftests: openvswitch: add psample test
> >
> > Documentation/netlink/specs/ovs_flow.yaml | 17 ++
> > include/net/psample.h | 5 +-
> > include/uapi/linux/openvswitch.h | 31 +-
> > include/uapi/linux/psample.h | 11 +-
> > net/openvswitch/Kconfig | 1 +
> > net/openvswitch/actions.c | 66 ++++-
> > net/openvswitch/datapath.h | 3 +
> > net/openvswitch/flow_netlink.c | 32 ++-
> > net/openvswitch/vport.c | 1 +
> > net/psample/psample.c | 16 +-
> > net/sched/act_sample.c | 12 +
> > .../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
> > .../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
> > 13 files changed, 566 insertions(+), 16 deletions(-)
> >
> > --
> > 2.45.2
> >
>
> Hi,
>
> Simon Horman has spotted that openvswitch.sh tests are failing in the
> debug executor:
>
> https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh
>
> The failing tests are two: psample and upcall_interfaces. These two
> tests have a known source of instability (they use "sleep") that make
> them specially unreliable in slow systems.
>
> Aaron and I already discussed this and I'm working on a patch to make
> both tests more robust by adding a wait-and-retry mechanism.
>
> I hope this series can be considered regardless of this flaky tests.
>
Adding more context to explain our situation.
This series has a counterpart in OVS [1]. The state of this other series
is still RFC just because the kernel bits have not yet been merged.
OVS 3.4 "softfreeze" was declared last monday, which excludes from the
release any series that is stil in RFC state.
Given the kernel parts seemed very close to be merged, an exception was
given to the series so we can consider it for inclusion [2].
I hate to put any pressure on already busy maintainers but I would also
dislike missing this OVS release by just one or two days and having
to wait 6 months (OVS release cadence) for it to be available.
Again, I don't want to put pressure on maintainers. If it's not
possible, that's it. I just wanted to voice our timeline constraints.
Thanks for your understanding.
Adrián
[1] https://patchwork.ozlabs.org/project/openvswitch/cover/20240704085710.35384…
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415261.html
in the do_setcpu, this function does not need to have a return value,
which is meaningless
Signed-off-by: Liu Jing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/net/msg_zerocopy.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
index bdc03a2097e8..0b54f2011449 100644
--- a/tools/testing/selftests/net/msg_zerocopy.c
+++ b/tools/testing/selftests/net/msg_zerocopy.c
@@ -118,7 +118,7 @@ static uint16_t get_ip_csum(const uint16_t *start, int num_words)
return ~sum;
}
-static int do_setcpu(int cpu)
+static void do_setcpu(int cpu)
{
cpu_set_t mask;
@@ -129,7 +129,6 @@ static int do_setcpu(int cpu)
else if (cfg_verbose)
fprintf(stderr, "cpu: %u\n", cpu);
- return 0;
}
static void do_setsockopt(int fd, int level, int optname, int val)
--
2.33.0
Recent CI failures brought my attention to the fact that pmu_counters_test
sometimes fails because it doesn't get any LLC cache misses.
It apparently happens because CLFLUSH can race with CPU prediction.
To attempt to fix this, implement a more aggressive cache flushing - now it is flushed
on each iteration of the measured loop which should at least reduce by order
of magnitude the chance of this happening.
This patch survived more that a day of running in a loop on a Comet Lake machine,
where the test used to fail after about 10-20 minites.
Best regards,
Maxim Levitsky
Maxim Levitsky (1):
KVM: selftests: pmu_counters_test: increase robustness of LLC cache
misses
.../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++----------
1 file changed, 9 insertions(+), 11 deletions(-)
--
2.26.3
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (9):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd_opts in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_to_fd_opts in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 2 +-
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/sk_lookup.c | 150 ++++++++----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 53 ++-----
4 files changed, 77 insertions(+), 132 deletions(-)
--
2.43.0
Hi Linus,
This PR fixes a few kselftests [1]. This has been in linux-next for a week and
rebased to add Mark Brown's Tested-by. The race condition found while writing
this fix is not new and seems specific to UML's hostfs (I also tested against
ext4 and btrfs without being able to trigger this issue).
Feel free to take this PR if you see fit.
Regards,
Mickaël
[1] https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk
--
The following changes since commit f2661062f16b2de5d7b6a5c42a9a5c96326b8454:
Linux 6.10-rc5 (2024-06-23 17:08:54 -0400)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git tags/kselftest-fix-2024-07-04
for you to fetch changes up to 130e42806773013e9cf32d211922c935ae2df86c:
selftests/harness: Fix tests timeout and race condition (2024-06-28 16:06:03 +0200)
----------------------------------------------------------------
Fix Kselftests timeout and race condition
----------------------------------------------------------------
Mickaël Salaün (1):
selftests/harness: Fix tests timeout and race condition
tools/testing/selftests/kselftest_harness.h | 43 ++++++++++++++++-------------
1 file changed, 24 insertions(+), 19 deletions(-)
Current practice in the selftests Makefiles is to use $(LLVM) as a way
to decide if clang is being used as the compiler (and/or the linker
front end). Unfortunately, this does not cover all of the use cases:
1) CC could have been set within selftests/lib.mk, by inferring it from
LLVM==1, or
2) CC could have been set externally, such as when cross compiling.
Solution: In order to allow subsystem selftests to more accurately
control clang-specific behavior, such as compiler options, provide a new
Makefile variable: SELFTESTS_CC_IS_CLANG. If $(CC) contains an
invocation of clang in any form, then SELFTESTS_CC_IS_CLANG will be
non-empty.
SELFTESTS_CC_IS_CLANG does not specify which linker is being used.
However, it can still help with linker options, because $(CC) is often
used to do both the compile and link steps (often in the same step).
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Hi,
If this looks reasonable, I'll break it up into separate patches and
post it as a non-RFC.
thanks,
John Hubbard
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/fchmodat2/Makefile | 12 +++++++-----
tools/testing/selftests/hid/Makefile | 2 +-
tools/testing/selftests/lib.mk | 15 +++++++++++++++
tools/testing/selftests/openat2/Makefile | 16 +++++++++-------
5 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index dd49c1d23a60..6b924297ab71 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -55,7 +55,7 @@ progs/test_sk_lookup.c-CFLAGS := -fno-strict-aliasing
progs/timer_crash.c-CFLAGS := -fno-strict-aliasing
progs/test_global_func9.c-CFLAGS := -fno-strict-aliasing
-ifneq ($(LLVM),)
+ifeq ($(SELFTESTS_CC_IS_CLANG),)
# Silence some warnings when compiled with clang
CFLAGS += -Wno-unused-command-line-argument
endif
diff --git a/tools/testing/selftests/fchmodat2/Makefile b/tools/testing/selftests/fchmodat2/Makefile
index 4373cea79b79..d00b01be5d96 100644
--- a/tools/testing/selftests/fchmodat2/Makefile
+++ b/tools/testing/selftests/fchmodat2/Makefile
@@ -2,14 +2,16 @@
CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined $(KHDR_INCLUDES)
+TEST_GEN_PROGS := fchmodat2_test
+
+include ../lib.mk
+
# gcc requires -static-libasan in order to ensure that Address Sanitizer's
# library is the first one loaded. However, clang already statically links the
# Address Sanitizer if -fsanitize is specified. Therefore, simply omit
# -static-libasan for clang builds.
-ifeq ($(LLVM),)
+# This check must be done after including ../lib.mk, in order to pick up the
+# correct value of SELFTESTS_CC_IS_CLANG.
+ifeq ($(SELFTESTS_CC_IS_CLANG),)
CFLAGS += -static-libasan
endif
-
-TEST_GEN_PROGS := fchmodat2_test
-
-include ../lib.mk
diff --git a/tools/testing/selftests/hid/Makefile b/tools/testing/selftests/hid/Makefile
index 2b5ea18bde38..734a53dc8ad9 100644
--- a/tools/testing/selftests/hid/Makefile
+++ b/tools/testing/selftests/hid/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(OUTPUT)/tools/include
LDLIBS += -lelf -lz -lrt -lpthread
# Silence some warnings when compiled with clang
-ifneq ($(LLVM),)
+ifeq ($(SELFTESTS_CC_IS_CLANG),)
CFLAGS += -Wno-unused-command-line-argument
endif
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 429535816dbd..f321ad5a1d0c 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -43,6 +43,21 @@ else
CC := $(CROSS_COMPILE)gcc
endif # LLVM
+# SELFTESTS_CC_IS_CLANG allows subsystem selftests to more accurately control
+# clang-specific behavior, such as compiler options. If CC is an invocation of
+# clang in any form, then SELFTESTS_CC_IS_CLANG will be non-empty. Notes:
+#
+# 1) CC could have been set above, by inferring it from LLVM==1, or externally,
+# from the CC shell environment variable.
+#
+# 2) SELFTESTS_CC_IS_CLANG does not specify which linker is being used. However,
+# it can still help with linker options, if clang or gcc is used for the
+# linker front end.
+SELFTESTS_CC_IS_CLANG :=
+ifeq ($(findstring clang,$(CC)),clang)
+ SELFTESTS_CC_IS_CLANG := 1
+endif
+
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
OUTPUT := $(shell pwd)
diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile
index 185dc76ebb5f..7acb85a8f2ac 100644
--- a/tools/testing/selftests/openat2/Makefile
+++ b/tools/testing/selftests/openat2/Makefile
@@ -3,16 +3,18 @@
CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
+LOCAL_HDRS += helpers.h
+
+include ../lib.mk
+
+$(TEST_GEN_PROGS): helpers.c
+
# gcc requires -static-libasan in order to ensure that Address Sanitizer's
# library is the first one loaded. However, clang already statically links the
# Address Sanitizer if -fsanitize is specified. Therefore, simply omit
# -static-libasan for clang builds.
-ifeq ($(LLVM),)
+# This check must be done after including ../lib.mk, in order to pick up the
+# correct value of SELFTESTS_CC_IS_CLANG.
+ifeq ($(SELFTESTS_CC_IS_CLANG),)
CFLAGS += -static-libasan
endif
-
-LOCAL_HDRS += helpers.h
-
-include ../lib.mk
-
-$(TEST_GEN_PROGS): helpers.c
base-commit: 9a5cd459be8a425d70cda1fa1c89af7875a35d17
--
2.45.2
Clang does not support implicit LMUL in the vset* instruction sequences.
Introduce an explicit LMUL in the vsetivli instruction.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Fixes: 9d5328eeb185 ("riscv: selftests: Add signal handling vector tests")
---
There is one more error that occurs when the test cases for riscv are
compiled with llvm:
ld.lld: error: undefined symbol: putchar
>>> referenced by crt.h:69 (./../../../../include/nolibc/crt.h:69)
>>> /tmp/v_initval_nolibc-5b14c8.o:(dump)
>>> referenced by crt.h:67 (./../../../../include/nolibc/crt.h:67)
>>> /tmp/v_initval_nolibc-5b14c8.o:(dump)
This is fixed in my rework of the vector tests in a different series [1]
Link: https://patchwork.kernel.org/project/linux-riscv/patch/20240619-xtheadvecto… [1]
---
tools/testing/selftests/riscv/sigreturn/sigreturn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/riscv/sigreturn/sigreturn.c b/tools/testing/selftests/riscv/sigreturn/sigreturn.c
index 62397d5934f1..ed351a1cb917 100644
--- a/tools/testing/selftests/riscv/sigreturn/sigreturn.c
+++ b/tools/testing/selftests/riscv/sigreturn/sigreturn.c
@@ -51,7 +51,7 @@ static int vector_sigreturn(int data, void (*handler)(int, siginfo_t *, void *))
asm(".option push \n\
.option arch, +v \n\
- vsetivli x0, 1, e32, ta, ma \n\
+ vsetivli x0, 1, e32, m1, ta, ma \n\
vmv.s.x v0, %1 \n\
# Generate SIGSEGV \n\
lw a0, 0(x0) \n\
---
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
change-id: 20240701-fix_sigreturn_test-47d7063ac8e6
--
- Charlie
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.
** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.
** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.
The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
--
v7 -> v8:
- Rebased
- Redirect flow insertion to /dev/null to avoid spat in test.
- Removed inline keyword in stub execute_psample_action function.
v6 -> v7:
- Rebased
- Fixed typo in comment.
v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
"sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add psample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add psample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 66 ++++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 32 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 566 insertions(+), 16 deletions(-)
--
2.45.2
Hi,
Dave Hansen, Muhammad Usama Anjum, here is the combined series that we
discussed yesterday [1].
As I mentioned then, this is a bit intrusive--but no more than
necessary, IMHO. Specifically, it moves some clang-un-inlineable things
out to "pure" assembly code files.
I've tested this by building with clang, then running each binary on my
x86_64 test system with today's 6.10-rc1, and comparing the console and
dmesg output to a gcc-based build without these patches applied. Aside
from timestamps and virtual addresses, it looks identical.
Earlier cover letter:
Just a bunch of build and warnings fixes that show up when building with
clang. Some of these depend on each other, so I'm sending them as a
series.
Changes since v2:
1) Dropped my test_FISTTP.c patch, and picked up Muhammad's fix instead,
seeing as how that was posted first.
2) Updated patch descriptions to reflect that Valentin Obst's build fix
for LLVM [1] has already been merged into Linux main.
3) Minor wording and typo corrections in the commit logs throughout.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
Enjoy!
[1] https://lore.kernel.org/44428518-4d21-4de7-8587-04eceefb330d@nvidia.com
thanks,
John Hubbard
John Hubbard (6):
selftests/x86: fix Makefile dependencies to work with clang
selftests/x86: build fsgsbase_restore.c with clang
selftests/x86: build sysret_rip.c with clang
selftests/x86: avoid -no-pie warnings from clang during compilation
selftests/x86: remove (or use) unused variables and functions
selftests/x86: fix printk warnings reported by clang
Muhammad Usama Anjum (1):
selftests: x86: test_FISTTP: use fisttps instead of ambiguous fisttp
tools/testing/selftests/x86/Makefile | 31 +++++++++++++++----
tools/testing/selftests/x86/amx.c | 16 ----------
.../testing/selftests/x86/clang_helpers_32.S | 11 +++++++
.../testing/selftests/x86/clang_helpers_64.S | 28 +++++++++++++++++
tools/testing/selftests/x86/fsgsbase.c | 6 ----
.../testing/selftests/x86/fsgsbase_restore.c | 11 +++----
tools/testing/selftests/x86/sigreturn.c | 2 +-
.../testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/sysret_rip.c | 20 ++++--------
tools/testing/selftests/x86/test_FISTTP.c | 8 ++---
tools/testing/selftests/x86/test_vsyscall.c | 15 +++------
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
12 files changed, 87 insertions(+), 64 deletions(-)
create mode 100644 tools/testing/selftests/x86/clang_helpers_32.S
create mode 100644 tools/testing/selftests/x86/clang_helpers_64.S
base-commit: 4a4be1ad3a6efea16c56615f31117590fd881358
--
2.45.1
The watchdog selftest script supports various parameters for testing
different IOCTLs. The watchdog ping functionality is validated by starting
a loop where the watchdog device is periodically pet, which can only be
stopped by the user interrupting the script.
This results in a timeout when running this test using the kselftest runner
with no non-oneshot parameters (or no parameters at all):
TAP version 13
1..1
# timeout set to 45
# selftests: watchdog: watchdog-test
# Watchdog Ticking Away!
# .............................................#
not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
To address this issue, the first patch in this series limits the loop to 5
iterations by default and adds support for a new '-c' option to customize
the number of pings as required.
The second patch conforms the test output to the KTAP format.
Laura Nao (2):
selftests/watchdog: limit ping loop and allow configuring the number
of pings
selftests/watchdog: convert the test output to KTAP format
.../selftests/watchdog/watchdog-test.c | 166 +++++++++++-------
1 file changed, 101 insertions(+), 65 deletions(-)
--
2.30.2
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (9):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_to_fd in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 2 +-
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/sk_lookup.c | 141 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 51 ++-----
4 files changed, 61 insertions(+), 134 deletions(-)
--
2.43.0
in main_loop_s function, when the open(cfg_input, O_RDONLY) function is run,
the last fd is not closed if the "--cfg_repeat > 0" branch is not taken.
Fixes: 05be5e273c84("selftests: mptcp: add disconnect tests").
Signed-off-by: Liu Jing <liujing(a)cmss.chinamobile.com>
---
Changes from v1
- add close function in main_loop_s function
---
tools/testing/selftests/net/mptcp/mptcp_connect.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index d2043ec3bf6d..48b7389ae75b 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -1119,7 +1119,8 @@ int main_loop_s(int listensock)
if (cfg_input)
close(fd);
goto again;
- }
+ } else
+ close(fd);
return 0;
}
--
2.33.0
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 ++
arch/riscv/include/asm/hwprobe.h | 5 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 224 ++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 25 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 888 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Hi,
Here is the v2 patch to support polling on event 'hist' file.
The previous version is here;
https://lore.kernel.org/all/171932861260.584123.15653284949837094747.stgit@…
This version updates the test program, because previous version will
return fail on stable kernels which does not have this feature.
This checks whether the poll(POLLIN) on hist is timeout or not without
sending event. If poll() is implemented, it should timed out. If not,
poll(POLLIN) retuns soon.
And it tests both of POLLIN and POLLPRI in this version.
Background
----------
There has been interest in allowing user programs to monitor kernel
events in real time. Ftrace provides `trace_pipe` interface to wait
on events in the ring buffer, but it is needed to wait until filling
up a page with events in the ring buffer. We can also peek the
`trace` file periodically, but that is inefficient way to monitor
a randomely happening event.
Overview
--------
This patch set allows user to `poll`(or `select`, `epoll`) on event
histogram interface. As you know each event has its own `hist` file
which shows histograms generated by trigger action. So user can set
a new hist trigger on any event you want to monitor, and poll on the
`hist` file until it is updated.
There are 2 poll events are supported, POLLIN and POLLPRI. POLLIN
means that there are any readable update on `hist` file and this
event will be flashed only when you call read(). So, this is
useful if you want to read the histogram periodically.
The other POLLPRI event is for monitoring trace event. Like the
POLLIN, this will be returned when the histogram is updated, but
you don't need to read() the file and use poll() again.
Note that this waits for histogram update (not event arrival), thus
you must set a histogram on the event at first.
Usage
-----
Here is an example usage:
----
TRACEFS=/sys/kernel/tracing
EVENT=$TRACEFS/events/sched/sched_process_free
# setup histogram trigger and enable event
echo "hist:key=comm" >> $EVENT/trigger
echo 1 > $EVENT/enable
# Wait for update
poll pri $EVENT/hist
# Event arrived.
echo "process free event is comming"
tail $TRACEFS/trace
----
The 'poll' command is in the selftest patch.
You can take this series also from here;
https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=t…
Thank you,
---
Masami Hiramatsu (Google) (3):
tracing/hist: Add poll(POLLIN) support on hist file
tracing/hist: Support POLLPRI event for poll on histogram
selftests/tracing: Add hist poll() support test
include/linux/trace_events.h | 5 +
kernel/trace/trace_events.c | 18 ++++
kernel/trace/trace_events_hist.c | 101 +++++++++++++++++++-
tools/testing/selftests/ftrace/Makefile | 2
tools/testing/selftests/ftrace/poll.c | 62 ++++++++++++
.../ftrace/test.d/trigger/trigger-hist-poll.tc | 74 +++++++++++++++
6 files changed, 259 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/poll.c
create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-poll.tc
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Hi,
Here is an RFC patch to support polling on event 'hist' file.
There has been interest in allowing user programs to monitor kernel
events in real time. Ftrace provides `trace_pipe` interface to wait
on events in the ring buffer, but it is needed to wait until filling
up a page with events in the ring buffer. We can also peek the
`trace` file periodically, but that is inefficient way to monitor
a randomely happening event.
This patch set allows user to `poll`(or `select`, `epoll`) on event
histogram interface. As you know each event has its own `hist` file
which shows histograms generated by trigger action. So user can set
a new hist trigger on any event you want to monitor, and poll on the
`hist` file until it is updated.
There are 2 poll events are supported, POLLIN and POLLPRI. POLLIN
means that there are any readable update on `hist` file and this
event will be flashed only when you call read(). So, this is
useful if you want to read the histogram periodically.
The other POLLPRI event is for monitoring trace event. Like the
POLLIN, this will be returned when the histogram is updated, but
you don't need to read() the file and use poll() again.
Note that this waits for histogram update (not event arrival), thus
you must set a histogram on the event at first.
Here is an example usage:
----
TRACEFS=/sys/kernel/tracing
EVENT=$TRACEFS/events/sched/sched_process_free
# setup histogram trigger and enable event
echo "hist:key=comm" >> $EVENT/trigger
echo 1 > $EVENT/enable
# Wait for update
poll $EVENT/hist
# Event arrived.
echo "process free event is comming"
tail $TRACEFS/trace
----
The 'poll' command is in the selftest patch.
You can take this series also from here;
https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=t…
Thank you,
---
Masami Hiramatsu (Google) (3):
tracing/hist: Add poll(POLLIN) support on hist file
tracing/hist: Support POLLPRI event for poll on histogram
selftests/tracing: Add hist poll() support test
include/linux/trace_events.h | 5 +
kernel/trace/trace_events.c | 18 ++++
kernel/trace/trace_events_hist.c | 101 +++++++++++++++++++-
tools/testing/selftests/ftrace/Makefile | 3 +
tools/testing/selftests/ftrace/poll.c | 34 +++++++
.../ftrace/test.d/trigger/trigger-hist-poll.tc | 46 +++++++++
6 files changed, 204 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/poll.c
create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-poll.tc
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v4: https://lore.kernel.org/r/20240422232404.213174-1-sboyd@kernel.org
* Picked up reviewed-by tags
* Check for non-NULL device pointers before calling put_device()
* Fix CFI issues with kunit actions
* Introduce platform_device_prepare_wait_for_probe() helper to wait for
a platform device to probe
* Move platform code to lib/kunit and rename functions to have kunit
prefix
* Fix issue with platform wrappers messing up reference counting
because they used kunit actions
* New patch to populate overlay devices on root node for powerpc
* Make fixed-rate binding generic single clk consumer binding
Changes from v3: https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2: https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1: https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (11):
of/platform: Allow overlays to create platform devices from the root
node
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: test: Add single clk consumer
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../test/test,single-clk-consumer.yaml | 34 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 9 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 379 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit_helpers.c | 204 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 451 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 2 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit_helpers.c | 74 +++
drivers/of/overlay_test.c | 116 +++++
drivers/of/platform.c | 9 +-
include/kunit/clk.h | 28 ++
include/kunit/of.h | 115 +++++
include/kunit/platform_device.h | 20 +
lib/kunit/Makefile | 4 +-
lib/kunit/platform-test.c | 223 +++++++++
lib/kunit/platform.c | 302 ++++++++++++
31 files changed, 2193 insertions(+), 6 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,single-clk-consumer.yaml
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit_helpers.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit_helpers.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
create mode 100644 lib/kunit/platform-test.c
create mode 100644 lib/kunit/platform.c
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
Hi Linus,
Please pull this kselftest fixes update for Linux 6.10-rc7.
This kselftest fixes update for Linux 6.10-rc7 consists of one single
patch to fix the non-contiguous CBM resctrl:
- AMD supports non-contiguous CBM but does not report it via CPUID. This
test should not use CPUID on AMD to detect non-contiguous CBM support.
Fix the problem so the test uses CPUID to discover non-contiguous CBM
support only on Intel.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ed3994ac847e0d6605f248e7f6776b1d4f445f4b:
selftests/fchmodat2: fix clang build failure due to -static-libasan (2024-06-11 15:05:05 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.10-rc7
for you to fetch changes up to 48236960c06d32370bfa6f2cc408e786873262c8:
selftests/resctrl: Fix non-contiguous CBM for AMD (2024-06-26 13:22:34 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.10-rc7
This kselftest fixes update for Linux 6.10-rc7 consists of one single
patch to fix the non-contiguous CBM resctrl:
- AMD supports non-contiguous CBM but does not report it via CPUID. This
test should not use CPUID on AMD to detect non-contiguous CBM support.
Fix the problem so the test uses CPUID to discover non-contiguous CBM
support only on Intel.
----------------------------------------------------------------
Babu Moger (1):
selftests/resctrl: Fix non-contiguous CBM for AMD
tools/testing/selftests/resctrl/cat_test.c | 32 ++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
----------------------------------------------------------------
Hi,
Jason A. Donenfeld, I've added you because I ended up looking through
your latest "implement getrandom() in vDSO" series [1], which also
touches this Makefile, so just a heads up about upcoming (minor) merge
conflicts.
Changes since v2:
1. Added two patches, both of which apply solely to the Makefile.
These provide a smaller, cleaner, and more accurate Makefile.
2. Added Reviewed-by and Tested-by tags for the original patch, which
fixes all of the clang errors and warnings for this selftest.
3. Removed an obsolete blurb from the commit description of the original
patch, now that Valentin Obst LLVM build fix has been merged.
[1] https://lore.kernel.org/20240614190646.2081057-1-Jason@zx2c4.com
John Hubbard (3):
selftests/vDSO: fix clang build errors and warnings
selftests/mm: remove partially duplicated "all:" target in Makefile
selftests/vDSO: remove duplicate compiler invocations from Makefile
tools/testing/selftests/vDSO/Makefile | 29 ++++++++-----------
tools/testing/selftests/vDSO/parse_vdso.c | 16 ++++++----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++--
3 files changed, 39 insertions(+), 24 deletions(-)
base-commit: 2ccbdf43d5e758f8493a95252073cf9078a5fea5
--
2.45.2
The open() function returns -1 on error. openat() and open() initialize
'from' and 'to', and only 'from' validated with 'if' statement. If the
initialization of variable 'to' fails, we should better check the value
of 'to' and close 'from' to avoid possible file leak. Improve the checking
of 'from' additionally.
Fixes: 32ae976ed3b5 ("selftests/capabilities: Add tests for capability evolution")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- Thank you for your interest in our vulnerability detection method. We
extract vulnerability characteristics from a known vulnerability and match
the same characteristics in the project code. As our work is still in
progress, we are not able to disclose it at this time. Appreciate your
understanding, we could better focus on the potential vulnerability itself.
Reference link: https://lore.kernel.org/all/20240510003424.2016914-1-samasth.norway.ananda@…
Changes in v2:
- modified the patch according to suggestions;
- found by customized static analysis tool.
---
tools/testing/selftests/capabilities/test_execve.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c
index 47bad7ddc5bc..6406ab6aa1f5 100644
--- a/tools/testing/selftests/capabilities/test_execve.c
+++ b/tools/testing/selftests/capabilities/test_execve.c
@@ -145,10 +145,14 @@ static void chdir_to_tmpfs(void)
static void copy_fromat_to(int fromfd, const char *fromname, const char *toname)
{
int from = openat(fromfd, fromname, O_RDONLY);
- if (from == -1)
+ if (from < 0)
ksft_exit_fail_msg("open copy source - %s\n", strerror(errno));
int to = open(toname, O_CREAT | O_WRONLY | O_EXCL, 0700);
+ if (to < 0) {
+ close(from);
+ ksft_exit_fail_msg("open copy destination - %s\n", strerror(errno));
+ }
while (true) {
char buf[4096];
--
2.25.1
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.
** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.
** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.
The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
--
v6 -> v7:
- Rebased
- Fixed typo in comment.
v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
"sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add psample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add psample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 65 ++++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 32 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 565 insertions(+), 16 deletions(-)
--
2.45.2
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add psample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add psample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 65 ++++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 32 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 565 insertions(+), 16 deletions(-)
--
2.45.2
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, a kernel panic occurs:
'''
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
... ...
ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 0000000c (PPLV0 +PIE +PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
BADV: 0000000000000040
PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160
Code: ...
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
.text @ 0x9000000003710000
.data @ 0x9000000004d70000
.bss @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
'''
This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.
This crash is because a NULL pointer is passed to page_address() in
sk_msg_recvmsg(). Due to the difference implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on X86 platform. So this bug was hidden on X86 platform
for a while, but now it is exposed on Loongarch platform.
The root cause is an empty skb (skb->len == 0) is put on the queue.
This empty skb is a TCP FIN package, which is sent by shutdown(), invoked
in test_sockmap_skb_verdict_shutdown():
shutdown(p1, SHUT_WR);
In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.
And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it passed to
kmap_local_page() and page_address(), then kernel panics.
To solve this, we should skip the empty sge on the queue. So in
sk_msg_recvmsg(), if msg_rx->sg.end is zero, that means it's an empty sge,
skip it.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
v4:
- skmsg: skip empty sge in sk_msg_recvmsg
v3:
- skmsg: prevent empty ingress skb from enqueuing
v2:
- skmsg: null check for sg_page in sk_msg_recvmsg
---
net/core/skmsg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..66db1631852b 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -421,7 +421,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
while (copied != len) {
struct scatterlist *sge;
- if (unlikely(!msg_rx))
+ if (unlikely(!msg_rx || !msg_rx->sg.end))
break;
i = msg_rx->sg.start;
--
2.43.0
From: Quan Zhou <zhouquan(a)iscas.ac.cn>
Due to the path that modifies a0 in syscall_enter_from_user_mode before the
actual execution of syscall_handler [1], the kernel currently saves a0 to
orig_a0 at the entry point of do_trap_ecall_u as an original copy of a0.
Once the syscall is interrupted and later resumed, the restarted syscall
will use orig_a0 to continue execution.
The above rules generally apply except for ptrace(PTRACE_SETREGSET,),
where the kernel will ignore the tracer's setting of tracee/a0 and
will restart with the tracee/orig_a0. For the current kernel implementation
of ptrace, projects like CRIU/Proot will encounter issues where the a0
setting becomes ineffective when performing ptrace(PTRACE_SETREGSET,).
Here is a suggested solution, expose orig_a0 to userspace so that ptrace
can choose whether to set orig_a0 based on the actual scenario. In fact,
x86/orig_eax and loongArch/orig_a0 have adopted similar solutions.
[1] link:
https://lore.kernel.org/lkml/20230403-crisping-animosity-04ed8a45c625@spud/…
---
Changes from RFC->v1:
- Rebased on Linux 6.10-rc5.
- Updated the patch description.
- Adjust MAX_REG_OFFSET to match the new bottom of pt_regs (Charlie).
- Simplify selftest to verify if a0 can be set (Charlie).
- Fix .gitignore error (Charlie).
---
RFC link:
https://lore.kernel.org/all/cover.1718693532.git.zhouquan@iscas.ac.cn/
Quan Zhou (2):
riscv: Expose orig_a0 in the user_regs_struct structure
riscv: selftests: Add a ptrace test to verify syscall parameter
modification
arch/riscv/include/asm/ptrace.h | 7 +-
arch/riscv/include/uapi/asm/ptrace.h | 2 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/abi/.gitignore | 1 +
tools/testing/selftests/riscv/abi/Makefile | 12 ++
tools/testing/selftests/riscv/abi/ptrace.c | 124 +++++++++++++++++++
6 files changed, 144 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/riscv/abi/.gitignore
create mode 100644 tools/testing/selftests/riscv/abi/Makefile
create mode 100644 tools/testing/selftests/riscv/abi/ptrace.c
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
--
2.34.1
In the TEST_F(epoll_busy_poll, test_get_params), the initialized value of 'ret' is unused,
because it will be assigned by the ioctl.thus remove it.
Signed-off-by: Liu Jing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/net/epoll_busy_poll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/epoll_busy_poll.c b/tools/testing/selftests/net/epoll_busy_poll.c
index 16e457c2f877..652b0957b6c5 100644
--- a/tools/testing/selftests/net/epoll_busy_poll.c
+++ b/tools/testing/selftests/net/epoll_busy_poll.c
@@ -130,7 +130,7 @@ TEST_F(epoll_busy_poll, test_get_params)
* the default should be default and all fields should be zero'd by the
* kernel, so set params fields to garbage to test this.
*/
- int ret = 0;
+ int ret;
self->params.busy_poll_usecs = 0xff;
self->params.busy_poll_budget = 0xff;
--
2.33.0
We had several complains in linux-next that there were warnings:
CKI was not happy: it was the same situation than in an early report
when HID-BPF was initially included: the automatically generated
vmlinux.h doesn't contain all of the required structs and the
compilation of the bpf program fails.
We have multiple pointer to int cast complains and some docs that were
not rendered properly.
Include everything here.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- Also fix the pointer to int casts
- Also fix the docs complains
- Link to v1: https://lore.kernel.org/r/20240627-fix-cki-v1-1-2b47ceac116a@kernel.org
---
Benjamin Tissoires (4):
selftests/hid: ensure CKI can compile our new tests on old kernels
HID: bpf: fix gcc warning and unify __u64 into u64
HID: bpf: doc fixes for hid_hw_request() hooks
HID: bpf: doc fixes for hid_hw_request() hooks
drivers/hid/bpf/hid_bpf_dispatch.c | 8 +++---
drivers/hid/bpf/hid_bpf_struct_ops.c | 2 +-
drivers/hid/hid-core.c | 4 +--
drivers/hid/hidraw.c | 6 ++---
include/linux/hid_bpf.h | 31 +++++++++++++---------
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 16 +++++++++++
6 files changed, 44 insertions(+), 23 deletions(-)
---
base-commit: d3e15189bfd4d0a9d3a7ad8bd0e6ebb1c0419f93
change-id: 20240627-fix-cki-f372855cbf6f
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Centralizes the definition of _GNU_SOURCE into lib.mk and addresses all
resulting macro redefinition warnings.
The initial attempt at this patch was abandoned because it affected
lines in many source files and caused a large amount of churn. However,
from earlier discussions, centralizing _GNU_SOURCE is still desireable.
This attempt limits the changes to 1 source file and 14 Makefiles.
This is condensed into a single commit to avoid redefinition warnings
from partial merges.
v1: https://lore.kernel.org/linux-kselftest/20240430235057.1351993-1-edliaw@goo…
v2: https://lore.kernel.org/linux-kselftest/20240507214254.2787305-1-edliaw@goo…
- Add -D_GNU_SOURCE to KHDR_INCLUDES so that it is in a single
location.
- Remove #define _GNU_SOURCE from source code to resolve redefinition
warnings.
v3: https://lore.kernel.org/linux-kselftest/20240509200022.253089-1-edliaw@goog…
- Rebase onto linux-next 20240508.
- Split patches by directory.
- Add -D_GNU_SOURCE directly to CFLAGS in lib.mk.
- Delete additional _GNU_SOURCE definitions from source code in
linux-next.
- Delete additional -D_GNU_SOURCE flags from Makefiles.
v4: https://lore.kernel.org/linux-kselftest/20240510000842.410729-1-edliaw@goog…
- Rebase onto linux-next 20240509.
- Remove Fixes tag from patches that drop _GNU_SOURCE definition.
- Restore space between comment and includes for selftests/damon.
v5: https://lore.kernel.org/linux-kselftest/20240522005913.3540131-1-edliaw@goo…
- Rebase onto linux-next 20240521
- Drop initial patches that modify KHDR_INCLUDES.
- Incorporate Mark Brown's patch to replace static_assert with warning.
- Don't drop #define _GNU_SOURCE from nolibc and wireguard.
- Change Makefiles for x86 and vDSO to append to CFLAGS.
v6: https://lore.kernel.org/linux-kselftest/20240624232718.1154427-1-edliaw@goo…
- Rewrite patch to use -D_GNU_SOURCE= form in lib.mk.
- Reduce the amount of churn significantly by allowing definition to
coexist with source code macro defines.
v7:
- Squash patch into a single commit.
Edward Liaw (1):
selftests: Centralize -D_GNU_SOURCE= to CFLAGS in lib.mk
tools/testing/selftests/exec/Makefile | 1 -
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/intel_pstate/Makefile | 2 +-
tools/testing/selftests/iommu/Makefile | 2 --
tools/testing/selftests/kvm/Makefile | 2 +-
tools/testing/selftests/lib.mk | 3 +++
tools/testing/selftests/mm/thuge-gen.c | 2 +-
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tcp_ao/Makefile | 2 +-
tools/testing/selftests/proc/Makefile | 1 -
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/ring-buffer/Makefile | 1 -
tools/testing/selftests/riscv/mm/Makefile | 2 +-
tools/testing/selftests/sgx/Makefile | 2 +-
tools/testing/selftests/tmpfs/Makefile | 1 -
15 files changed, 12 insertions(+), 15 deletions(-)
--
2.45.2.803.g4e1b14247a-goog
From: Adrian Moreno <amorenoz(a)redhat.com>
[ Upstream commit a8763466669d21b570b26160d0a5e0a2ee529d22 ]
Netlink flags, although they don't have payload at the netlink level,
are represented as having "True" as value in pyroute2.
Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
fails with the following traceback:
Traceback (most recent call last):
File "[...]/ovs-dpctl.py", line 2498, in <module>
sys.exit(main(sys.argv))
^^^^^^^^^^^^^^
File "[...]/ovs-dpctl.py", line 2487, in main
ovsflow.add_flow(rep["dpifindex"], flow)
File "[...]/ovs-dpctl.py", line 2136, in add_flow
reply = self.nlm_request(
^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
return tuple(self._genlm_request(*argv, **kwarg))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
nlm_request
return tuple(super().nlm_request(*argv, **kwarg))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
self.sendto_gate(msg, addr)
File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
msg.encode()
File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
offset = self.encode_nlas(offset)
^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
nla_instance.setvalue(cell[1])
File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
nlv.setvalue(nla_tuple[1])
~~~~~~~~~^^^
IndexError: list index out of range
Signed-off-by: Adrian Moreno <amorenoz(a)redhat.com>
Acked-by: Aaron Conole <aconole(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 5e0e539a323d5..8b120718768ec 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -531,7 +531,7 @@ class ovsactions(nla):
for flat_act in parse_flat_map:
if parse_starts_block(actstr, flat_act[0], False):
actstr = actstr[len(flat_act[0]):]
- self["attrs"].append([flat_act[1]])
+ self["attrs"].append([flat_act[1], True])
actstr = actstr[strspn(actstr, ", ") :]
parsed = True
--
2.43.0
From: Adrian Moreno <amorenoz(a)redhat.com>
[ Upstream commit a8763466669d21b570b26160d0a5e0a2ee529d22 ]
Netlink flags, although they don't have payload at the netlink level,
are represented as having "True" as value in pyroute2.
Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
fails with the following traceback:
Traceback (most recent call last):
File "[...]/ovs-dpctl.py", line 2498, in <module>
sys.exit(main(sys.argv))
^^^^^^^^^^^^^^
File "[...]/ovs-dpctl.py", line 2487, in main
ovsflow.add_flow(rep["dpifindex"], flow)
File "[...]/ovs-dpctl.py", line 2136, in add_flow
reply = self.nlm_request(
^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
return tuple(self._genlm_request(*argv, **kwarg))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
nlm_request
return tuple(super().nlm_request(*argv, **kwarg))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
self.sendto_gate(msg, addr)
File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
msg.encode()
File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
offset = self.encode_nlas(offset)
^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
nla_instance.setvalue(cell[1])
File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
nlv.setvalue(nla_tuple[1])
~~~~~~~~~^^^
IndexError: list index out of range
Signed-off-by: Adrian Moreno <amorenoz(a)redhat.com>
Acked-by: Aaron Conole <aconole(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 5e0e539a323d5..8b120718768ec 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -531,7 +531,7 @@ class ovsactions(nla):
for flat_act in parse_flat_map:
if parse_starts_block(actstr, flat_act[0], False):
actstr = actstr[len(flat_act[0]):]
- self["attrs"].append([flat_act[1]])
+ self["attrs"].append([flat_act[1], True])
actstr = actstr[strspn(actstr, ", ") :]
parsed = True
--
2.43.0
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In case of a transparent hugepage, it will be split into 4K pages
as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Changelog
v6 => v7
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com> and David
Rientjes <rientjes(a)google.com>
* remove PFN value from enable_soft_offline log
* save/restore enable_soft_offline in run_vmtests.sh
* v7 is based on commit 7c89bdbd3778 ("khugepaged: simplify the
allocation of slab caches")
v5 => v6:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>
* add a ':' in soft offline log.
* close hugetlbfs file descriptor in selftest.
* no need to "return" after ksft_exit_fail_msg.
v4 => v5:
* incorporate feedbacks from Muhammad Usama Anjum
<usama.anjum(a)collabora.com>
* refactor selftest to use what available in kselftest.h
v3 => v4:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Andrew Morton <akpm(a)linux-foundation.org>, and
Oscar Salvador <osalvador(a)suse.de>.
* insert a refactor commit to unify soft offline's logs to follow
"Soft offline: 0x${pfn}: ${message}" format.
* some rewords in document: fail => will not perform.
* v4 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3"),
akpm/mm-stable.
v2 => v3:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Lance Yang <ioworker0(a)gmail.com>, Oscar Salvador <osalvador(a)suse.de>,
and David Rientjes <rientjes(a)google.com>.
* release potential refcount if enable_soft_offline is 0.
* soft_offline_page() returns EOPNOTSUPP if enable_soft_offline is 0.
* refactor hugetlb-soft-offline.c, for example, introduce
test_soft_offline_common to reduce repeated code.
* rewrite enable_soft_offline's documentation, adds more details about
the cost of soft-offline for transparent and hugetlb hugepages, and
components that are impacted when enable_soft_offline becomes 0.
* fix typos in commit messages.
* v3 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3").
v1 => v2:
* incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and
Jane Chu <jane.chu(a)oracle.com>.
* make the switch to control all pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm.rst | 32 +++
mm/memory-failure.c | 37 ++-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 228 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 6 +
6 files changed, 297 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.2.803.g4e1b14247a-goog
Currently, if a user wants to run pmtu.sh and cover all the provided test
cases, they need to install the Open vSwitch userspace utilities. This
dependency is difficult for users as well as CI environments, because the
userspace build and setup may require lots of support and devel packages
to be installed, system setup to be correct, and things like permissions
and selinux policies to be properly configured.
The kernel selftest suite includes an ovs-dpctl.py utility which can
interact with the openvswitch module directly. This lets developers and
CI environments run without needing too many extra dependencies - just
the pyroute2 python package.
This series enhances the ovs-dpctl utility to provide support for set()
and tunnel() flow specifiers, better ipv6 handling support, and the
ability to add tunnel vports, and LWT interfaces. Finally, it modifies
the pmtu.sh script to call the ovs-dpctl.py utility rather than the
typical OVS userspace utilities. The pmtu.sh can still fall back on
the Open vSwitch userspace utilities if the ovs-dpctl.py script can't
be used.
Aaron Conole (7):
selftests: openvswitch: Support explicit tunnel port creation.
selftests: openvswitch: Refactor actions parsing.
selftests: openvswitch: Add set() and set_masked() support.
selftests: openvswitch: Add support for tunnel() key.
selftests: openvswitch: Support implicit ipv6 arguments.
selftests: net: Use the provided dpctl rather than the vswitchd for
tests.
selftests: net: add config for openvswitch
tools/testing/selftests/net/config | 5 +
.../selftests/net/openvswitch/ovs-dpctl.py | 368 +++++++++++++++---
tools/testing/selftests/net/pmtu.sh | 145 +++++--
3 files changed, 451 insertions(+), 67 deletions(-)
--
2.45.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (9):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_to_fd in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 2 +-
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/sk_lookup.c | 141 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 51 ++-----
4 files changed, 61 insertions(+), 134 deletions(-)
--
2.43.0
Adds a simple implementation of strerror() and makes use of it in
kselftests.
Shuah, could you Ack patch 3?
Willy, this should work *without* your Ack.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
selftests/nolibc: introduce condition to run tests only on nolibc
tools/nolibc: implement strerror()
selftests: kselftest: also use strerror() on nolibc
tools/include/nolibc/stdio.h | 10 ++++++++
tools/testing/selftests/kselftest.h | 8 -------
tools/testing/selftests/nolibc/nolibc-test.c | 36 ++++++++++++++++++----------
3 files changed, 33 insertions(+), 21 deletions(-)
---
base-commit: a3063ba97f31e0364379a3ffc567203e3f79e877
change-id: 20240425-nolibc-strerror-67f4bfa03035
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.
** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.
** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.
The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
--
v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
"sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add psample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add psample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 65 ++++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 32 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 115 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 565 insertions(+), 16 deletions(-)
--
2.45.2
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In case of a transparent hugepage, it will be split into 4K pages
as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Changelog
v5=> v6:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>
* add a ':' in soft offline log.
* close hugetlbfs file descriptor in selftest.
* no need to "return" after ksft_exit_fail_msg.
v4 => v5:
* incorporate feedbacks from Muhammad Usama Anjum
<usama.anjum(a)collabora.com>
* refactor selftest to use what available in kselftest.h
v3 => v4:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Andrew Morton <akpm(a)linux-foundation.org>, and
Oscar Salvador <osalvador(a)suse.de>.
* insert a refactor commit to unify soft offline's logs to follow
"Soft offline: 0x${pfn}: ${message}" format.
* some rewords in document: fail => will not perform.
* v4 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3"),
akpm/mm-stable.
v2 => v3:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Lance Yang <ioworker0(a)gmail.com>, Oscar Salvador <osalvador(a)suse.de>,
and David Rientjes <rientjes(a)google.com>.
* release potential refcount if enable_soft_offline is 0.
* soft_offline_page() returns EOPNOTSUPP if enable_soft_offline is 0.
* refactor hugetlb-soft-offline.c, for example, introduce
test_soft_offline_common to reduce repeated code.
* rewrite enable_soft_offline's documentation, adds more details about
the cost of soft-offline for transparent and hugetlb hugepages, and
components that are impacted when enable_soft_offline becomes 0.
* fix typos in commit messages.
* v3 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3").
v1 => v2:
* incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and
Jane Chu <jane.chu(a)oracle.com>.
* make the switch to control all pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm.rst | 32 +++
mm/memory-failure.c | 38 ++-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 228 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
6 files changed, 296 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.2.741.gdbec12cfda-goog
This patch series introduces a new user namespace capability set, as
well as some plumbing around it (i.e. sysctl, secbit, lsm support).
First patch goes over the motivations for this as well as prior art.
In summary, while user namespaces are a great success today in that they
avoid running a lot of code as root, they also expand the attack surface
of the kernel substantially which is often abused by attackers.
Methods exist to limit the creation of such namespaces [1], however,
application developers often need to assume that user namespaces are
available for various tasks such as sandboxing. Thus, instead of
restricting the creation of user namespaces, we offer ways for userspace
to limit the capabilities granted to them.
Why a new capability set and not something specific to the userns (e.g.
ioctl_ns)?
1. We can't really expect userspace to patch every single callsite
and opt-in this new security mechanism.
2. We don't necessarily want policies enforced at said callsites.
For example a service like systemd-machined or a PAM session need to
be able to place restrictions on any namespace spawned under it.
3. We would need to come up with inheritance rules, querying
capabilities, etc. At this point we're just reinventing capability
sets.
4. We can easily define interactions between capability sets, thus
helping with adoption (patch 2 is an example of this)
Some examples of how this could be leveraged in userspace:
- Prevent user from getting CAP_NET_ADMIN in user namespaces under SSH:
echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
echo "!cap_net_admin $USER" >> /etc/security/capability.conf
capsh --secbits=$((1 << 8)) -- -c /usr/sbin/sshd
- Prevent containers from ever getting CAP_DAC_OVERRIDE:
systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
-p SecureBits=userns-strict-caps \
/usr/bin/dockerd
systemd-run -p UserNSCapabilities=~CAP_DAC_OVERRIDE \
/usr/bin/incusd
- Kernel could be vulnerable to CAP_SYS_RAWIO exploits, prevent it:
sysctl -w cap_bound_userns_mask=0x1fffffdffff
- Drop CAP_SYS_ADMIN for this shell and all the user namespaces below it:
bwrap --unshare-user --cap-drop CAP_SYS_ADMIN /bin/sh
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
---
Changes since v1:
- Add documentation
- Change commit wording
- Cleanup various aspects of the code based on feedback
- Add new CAP_SYS_CONTROL capability for sysctl check
- Add BPF-LSM support for modifying userns capabilities
---
Jonathan Calmels (4):
capabilities: Add user namespace capabilities
capabilities: Add securebit to restrict userns caps
capabilities: Add sysctl to mask off userns caps
bpf,lsm: Allow editing capabilities in BPF-LSM hooks
Documentation/filesystems/proc.rst | 1 +
Documentation/security/credentials.rst | 6 ++
fs/proc/array.c | 9 +++
include/linux/cred.h | 3 +
include/linux/lsm_hook_defs.h | 2 +-
include/linux/securebits.h | 1 +
include/linux/security.h | 4 +-
include/linux/user_namespace.h | 7 ++
include/uapi/linux/capability.h | 6 +-
include/uapi/linux/prctl.h | 7 ++
include/uapi/linux/securebits.h | 11 ++-
kernel/bpf/bpf_lsm.c | 55 +++++++++++++
kernel/cred.c | 3 +
kernel/sysctl.c | 10 +++
kernel/umh.c | 15 ++++
kernel/user_namespace.c | 80 +++++++++++++++++--
security/apparmor/lsm.c | 2 +-
security/commoncap.c | 62 +++++++++++++-
security/keys/process_keys.c | 3 +
security/security.c | 6 +-
security/selinux/hooks.c | 2 +-
security/selinux/include/classmap.h | 5 +-
.../selftests/bpf/prog_tests/deny_namespace.c | 12 ++-
.../selftests/bpf/progs/test_deny_namespace.c | 7 +-
24 files changed, 291 insertions(+), 28 deletions(-)
--
2.45.2
We cannot use CLONE_VFORK because we also need to wait for the timeout
signal.
Restore tests timeout by using the original fork() call in __run_test()
but also in __TEST_F_IMPL(). Also fix a race condition when waiting for
the test child process.
Because test metadata are shared between test processes, only the
parent process must set the test PID (child). Otherwise, t->pid may be
set to zero, leading to inconsistent error cases:
# RUN layout1.rule_on_mountpoint ...
# rule_on_mountpoint: Test ended in some other way [127]
# OK layout1.rule_on_mountpoint
ok 20 layout1.rule_on_mountpoint
As safeguards, initialize the "status" variable with a valid exit code,
and handle unknown test exits as errors.
The use of fork() introduces a new race condition in landlock/fs_test.c
which seems to be specific to hostfs bind mounts, but I haven't found
the root cause and it's difficult to trigger. I'll try to fix it with
another patch.
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Günther Noack <gnoack(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Will Drewry <wad(a)chromium.org>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk
Fixes: a86f18903db9 ("selftests/harness: Fix interleaved scheduling leading to race conditions")
Fixes: 24cf65a62266 ("selftests/harness: Share _metadata between forked processes")
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
Link: https://lore.kernel.org/r/20240621180605.834676-1-mic@digikod.net
---
tools/testing/selftests/kselftest_harness.h | 43 ++++++++++++---------
1 file changed, 24 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index b634969cbb6f..40723a6a083f 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -66,8 +66,6 @@
#include <sys/wait.h>
#include <unistd.h>
#include <setjmp.h>
-#include <syscall.h>
-#include <linux/sched.h>
#include "kselftest.h"
@@ -82,17 +80,6 @@
# define TH_LOG_ENABLED 1
#endif
-/* Wait for the child process to end but without sharing memory mapping. */
-static inline pid_t clone3_vfork(void)
-{
- struct clone_args args = {
- .flags = CLONE_VFORK,
- .exit_signal = SIGCHLD,
- };
-
- return syscall(__NR_clone3, &args, sizeof(args));
-}
-
/**
* TH_LOG()
*
@@ -437,7 +424,7 @@ static inline pid_t clone3_vfork(void)
} \
if (setjmp(_metadata->env) == 0) { \
/* _metadata and potentially self are shared with all forks. */ \
- child = clone3_vfork(); \
+ child = fork(); \
if (child == 0) { \
fixture_name##_setup(_metadata, self, variant->data); \
/* Let setup failure terminate early. */ \
@@ -1016,7 +1003,14 @@ void __wait_for_test(struct __test_metadata *t)
.sa_flags = SA_SIGINFO,
};
struct sigaction saved_action;
- int status;
+ /*
+ * Sets status so that WIFEXITED(status) returns true and
+ * WEXITSTATUS(status) returns KSFT_FAIL. This safe default value
+ * should never be evaluated because of the waitpid(2) check and
+ * SIGALRM handling.
+ */
+ int status = KSFT_FAIL << 8;
+ int child;
if (sigaction(SIGALRM, &action, &saved_action)) {
t->exit_code = KSFT_FAIL;
@@ -1028,7 +1022,15 @@ void __wait_for_test(struct __test_metadata *t)
__active_test = t;
t->timed_out = false;
alarm(t->timeout);
- waitpid(t->pid, &status, 0);
+ child = waitpid(t->pid, &status, 0);
+ if (child == -1 && errno != EINTR) {
+ t->exit_code = KSFT_FAIL;
+ fprintf(TH_LOG_STREAM,
+ "# %s: Failed to wait for PID %d (errno: %d)\n",
+ t->name, t->pid, errno);
+ return;
+ }
+
alarm(0);
if (sigaction(SIGALRM, &saved_action, NULL)) {
t->exit_code = KSFT_FAIL;
@@ -1083,6 +1085,7 @@ void __wait_for_test(struct __test_metadata *t)
WTERMSIG(status));
}
} else {
+ t->exit_code = KSFT_FAIL;
fprintf(TH_LOG_STREAM,
"# %s: Test ended in some other way [%u]\n",
t->name,
@@ -1218,6 +1221,7 @@ void __run_test(struct __fixture_metadata *f,
struct __test_xfail *xfail;
char test_name[1024];
const char *diagnostic;
+ int child;
/* reset test struct */
t->exit_code = KSFT_PASS;
@@ -1236,15 +1240,16 @@ void __run_test(struct __fixture_metadata *f,
fflush(stdout);
fflush(stderr);
- t->pid = clone3_vfork();
- if (t->pid < 0) {
+ child = fork();
+ if (child < 0) {
ksft_print_msg("ERROR SPAWNING TEST CHILD\n");
t->exit_code = KSFT_FAIL;
- } else if (t->pid == 0) {
+ } else if (child == 0) {
setpgrp();
t->fn(t, variant);
_exit(t->exit_code);
} else {
+ t->pid = child;
__wait_for_test(t);
}
ksft_print_msg(" %4s %s\n",
base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
--
2.45.2
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy.
mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But that only works up to a point, and on
busy systems won't be always enough.
In this patch set, clean up and stabilize the mirroring selftests. The
original intention was to port the tests over to UDP, but the logic of
ICMP ends up being so entangled in the mirroring selftests that the
changes feel overly invasive. Instead, ICMP is kept, but where possible,
we match on ICMP message type, thus filtering out hits by other ICMP
messages.
Where this is not practical (where the counter tap is put on a device
that carries encapsulated packets), switch the counter condition to _at
least_ X observed packets. This is less robust, but barely so --
probably the only scenario that this would not catch is something like
erroneous packet duplication, which would hopefully get caught by the
numerous other tests in this extensive suite.
- Patches #1 to #3 clean up parameters at various helpers.
- Patches #4 to #6 stabilize the mirroring selftests as described above.
- Mirroring tests currently allow testing SW datapath even on HW
netdevices by trapping traffic to the SW datapath. This complicates
the tests a bit without a good reason: to test SW datapath, just run
the selftests on the veth topology. Thus in patch #7, drop support for
this dual SW/HW testing.
- At this point, some cleanups were either made possible by the previous
patches, or were always possible. In patches #8 to #11, realize these
cleanups.
- In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
Petr Machata (12):
selftests: libs: Expand "$@" where possible
selftests: mirror: Drop direction argument from several functions
selftests: lib: tc_rule_stats_get(): Move default to argument
definition
selftests: mirror_gre_lag_lacp: Check counters at tunnel
selftests: mirror: do_test_span_dir_ips(): Install accurate taps
selftests: mirror: mirror_test(): Allow exact count of packets
selftests: mirror: Drop dual SW/HW testing
selftests: mlxsw: mirror_gre: Simplify
selftests: mirror_gre_lag_lacp: Drop unnecessary code
selftests: libs: Drop slow_path_trap_install()/_uninstall()
selftests: libs: Drop unused functions
selftests: mlxsw: mirror_gre: Obey TESTS
.../selftests/drivers/net/mlxsw/mirror_gre.sh | 71 ++++++---------
.../drivers/net/mlxsw/mirror_gre_scale.sh | 18 +---
tools/testing/selftests/net/forwarding/lib.sh | 83 +++++++++++------
.../selftests/net/forwarding/mirror_gre.sh | 45 +++-------
.../net/forwarding/mirror_gre_bound.sh | 23 +----
.../net/forwarding/mirror_gre_bridge_1d.sh | 21 +----
.../forwarding/mirror_gre_bridge_1d_vlan.sh | 21 +----
.../net/forwarding/mirror_gre_bridge_1q.sh | 21 +----
.../forwarding/mirror_gre_bridge_1q_lag.sh | 29 ++----
.../net/forwarding/mirror_gre_changes.sh | 73 ++++++---------
.../net/forwarding/mirror_gre_flower.sh | 43 ++++-----
.../net/forwarding/mirror_gre_lag_lacp.sh | 65 ++++++--------
.../net/forwarding/mirror_gre_lib.sh | 90 ++++++++++++++-----
.../net/forwarding/mirror_gre_neigh.sh | 39 +++-----
.../selftests/net/forwarding/mirror_gre_nh.sh | 35 ++------
.../net/forwarding/mirror_gre_vlan.sh | 21 +----
.../forwarding/mirror_gre_vlan_bridge_1q.sh | 69 ++++++--------
.../selftests/net/forwarding/mirror_lib.sh | 79 +++++++++++-----
.../selftests/net/forwarding/mirror_vlan.sh | 43 +++------
tools/testing/selftests/net/lib.sh | 4 +-
20 files changed, 355 insertions(+), 538 deletions(-)
--
2.45.0
Hello,
KernelCI is hosting a bi-weekly call on Thursday to discuss improvements
to existing upstream tests, the development of new tests to increase
kernel testing coverage, and the enablement of these tests in KernelCI.
In recent months, we at Collabora have focused on various kernel areas,
assessing the tests already available upstream and contributing patches
to make them easily runnable in CIs.
Below is a list of the tests we've been working on and their latest
status updates, as discussed in the last meeting held on 2024-06-27:
*USB/PCI devices kselftest*
- Upstream test to detect unprobed devices on discoverable buses:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
- Kernel patches to allow running the test on more platforms on KernelCI
were merged:
https://lore.kernel.org/all/20240613-kselftest-discoverable-probe-mt8195-kc…
- Waiting for KernelCI PRs to be merged:
https://github.com/kernelci/kernelci-core/pull/2577 and https://github.com/kernelci/kernelci-pipeline/pull/642
*Error log test*
- Proposing new kselftest to report device log errors:
https://lore.kernel.org/all/20240423-dev-err-log-selftest-v1-0-690c1741d68b…
- Currently fixing test failures in KernelCI
*Suspend/resume in cpufreq kselftest*
- Enabling suspend/resume test within the cpufreq kselftest in KernelCI
- Parameter support for running subtests in a kselftest was merged:
https://github.com/Linaro/test-definitions/pull/511
- Added rtcwake support in the test to enable automated resume, currently
testing/debugging solution
*Boot time test*
- Investigating possibility of adding new test upstream to measure the
kernel boot time and detect regressions
- Currently looking into boot tracing with ftrace events and kprobes
(see: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html)
- Idea for potential kselftest: insert explicit tracepoints in strategic
places, let the user configure which times to measure. The test could
provide a bootconfig file and a fragment to enable the required configs.
This could be an alternative to using external tools (e.g. grabserial
w/ early serial port init).
- Need a list of functions to track in order to measure key metrics
(e.g. device tree overhead, probe overhead, module load overhead)
- Identify key drivers that need to be loaded early, for potentially
supporting a two-phase boot: (1) time-critical, and (2) rest of the
system
*Other interesting updates*
- Flaky serial on sc7180 was recently fixed:
https://github.com/kernelci/kernelci-project/issues/380 and https://lore.kernel.org/all/20240610222515.3023730-1-dianders@chromium.org/…
*Strategy for test enablement in KernelCI*
- Guidance on test quality: KernelCI should set the standard for test
quality, providing guidance on which tests to enable from various
projects (e.g., kselftest, LTP). By doing so, KernelCI can serve as a
model for other CI systems.
- Develop mechanisms to automatically detect which tests should run on a
specific platform
- Embed metadata in the test themselves to facilitate the test selection
process
- Leverage device tree info to determine the appropriate tests for each
platform
Please reply to this thread if you'd like to join the call or discuss
any of the topics further. We look forward to collaborating with the
community to improve upstream tests and expand coverage to more areas
of interest within the kernel.
Best regards,
Laura Nao
Changes v2:
- Removed patches 2 and 3 since now this part will be supported by the
kernel.
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
SNC support in the kernel is currently in review [1]. With SNC enabled
and kernel support in place all the tests will function normally. There
might be a problem when SNC is enabled but the system is still using an
older kernel version without SNC support. Currently the only message
displayed in that situation is a guess that SNC might be enabled and is
causing issues. That message also is displayed whenever the test fails
on an Intel platform.
Add a mechanism to discover kernel support for SNC which will add more
meaning and certainty to the error message.
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240503203325.21512-1-tony.luck@intel.com/
Previous versions of this series:
[v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (2):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/cmt_test.c | 6 +-
tools/testing/selftests/resctrl/mba_test.c | 2 +
tools/testing/selftests/resctrl/mbm_test.c | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 8 +-
tools/testing/selftests/resctrl/resctrlfs.c | 131 +++++++++++++++++++-
6 files changed, 144 insertions(+), 9 deletions(-)
--
2.45.0
Allow userspace to change the guest-visible value of the register with
some severe limitation:
- No changes to features not virtualized by KVM (MPAM_frac, RAS_frac,
SME, RNDP_trap).
- No changes to features (CSV2_frac, NMI, MTE_frac, GCS, THE, MTEX,
DF2, PFAR) which haven't been added into the ftr_id_aa64pfr1[].
Because the struct arm64_ftr_bits definition for each feature in the
ftr_id_aa64pfr1[] is used by arm64_check_features. If they're not
existing in the ftr_id_aa64pfr1[], the for loop won't check the if
the new_val is safe for those features.
For the question why can't those fields be hidden depending on the VM
configuration? I don't find there is the related VM configuration, maybe we
should add the new VM configuration?
I'm not sure I'm right, so if there're any problems please help to point out and
I will fix them.
Also add the selftest for it.
Changelog:
----------
v2 -> v3:
* Give more description about why only part of the fields can be writable.
* Updated the writable mask by referring the latest ARM spec.
v1 -> v2:
* Tackling the full register instead of single field.
* Changing the patch title and commit message.
RFCv1 -> v1:
* Fix the compilation error.
* Delete the machine specific information and make the description more
generable.
RFCv1: https://lore.kernel.org/all/20240612023553.127813-1-shahuang@redhat.com/
v1: https://lore.kernel.org/all/20240617075131.1006173-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20240618063808.1040085-1-shahuang@redhat.com/
Shaoqin Huang (2):
KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1
KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
arch/arm64/kvm/sys_regs.c | 4 +++-
tools/testing/selftests/kvm/aarch64/set_id_regs.c | 8 ++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
--
2.40.1
Currently, we can run string-stream and assertion tests only when they
are built into the kernel (with config options = y), since some of the
symbols (string-stream functions and functions from assert.c) are not
exported into any of the namespaces, therefore they are not accessible
for the modules.
This patch series exports the required symbols into the KUnit namespace.
Also, it makes the string-stream test a separate module and removes the
log test stub from kunit-test since now we can access the string-stream
symbols even if the test which uses it is built as a module.
Additionally, this patch series merges the assertion test suite into the
kunit-test, since assert.c (and all of the assertion formatting
functions in it) is a part of the KUnit core.
V1 -> V2:
- Patch which exports the non-static assert.c functions is replaced with
the patch which prepares assert_test.c to be merged into kunit-test.c
- Also, David Gow <davidgow(a)google.com> suggested merging 4th and 5th
patches together, but since now the 4th patch does more than it used to
do, I send it separately
Ivan Orlov (5):
kunit: string-stream: export non-static functions
kunit: kunit-test: Remove stub for log tests
kunit: string-stream-test: Make it a separate module
kunit: assert_test: Prepare to be merged into kunit-test.c
kunit: Merge assertion test into kunit-test.c
include/kunit/assert.h | 4 +-
lib/kunit/Kconfig | 8 +
lib/kunit/Makefile | 7 +-
lib/kunit/assert.c | 19 +-
lib/kunit/assert_test.c | 388 --------------------------------
lib/kunit/kunit-test.c | 397 +++++++++++++++++++++++++++++++--
lib/kunit/string-stream-test.c | 2 +
lib/kunit/string-stream.c | 12 +-
8 files changed, 416 insertions(+), 421 deletions(-)
delete mode 100644 lib/kunit/assert_test.c
--
2.34.1
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 208 ++++++++-
include/net/page_pool/helpers.h | 124 ++++--
include/net/page_pool/types.h | 22 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 30 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 376 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 103 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 362 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
47 files changed, 2753 insertions(+), 251 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.2.741.gdbec12cfda-goog
Hi,
This builds on the proposal[1] from Mark and lets me convert the
existing usercopy selftest to KUnit. Besides adding this basic test to
the KUnit collection, it also opens the door for execve testing (which
depends on having a functional current->mm), and should provide the
basic infrastructure for adding Mark's much more complete usercopy tests.
v3:
- use MEMEQ KUnit helper (David)
- exclude pathological address confusion test for systems with separate
address spaces, noticed by David
- add KUnit-conditional exports for alloc_mm() and arch_pick_mmap_layout()
noticed by 0day
v2: https://lore.kernel.org/lkml/20240610213055.it.075-kees@kernel.org/
v1: https://lore.kernel.org/lkml/20240519190422.work.715-kees@kernel.org/
-Kees
[1] https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/
Kees Cook (2):
kunit: test: Add vm_mmap() allocation resource manager
usercopy: Convert test_user_copy to KUnit test
MAINTAINERS | 1 +
include/kunit/test.h | 17 ++
kernel/fork.c | 3 +
lib/Kconfig.debug | 21 +-
lib/Makefile | 2 +-
lib/kunit/Makefile | 1 +
lib/kunit/user_alloc.c | 113 +++++++++
lib/{test_user_copy.c => usercopy_kunit.c} | 282 ++++++++++-----------
mm/util.c | 3 +
9 files changed, 288 insertions(+), 155 deletions(-)
create mode 100644 lib/kunit/user_alloc.c
rename lib/{test_user_copy.c => usercopy_kunit.c} (46%)
--
2.34.1
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string
parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have
been left out since they target microcontrollers/embedded CPUs and are
not needed by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v7:
- Rebased on riscv/for-next to fix conflicts
v6:
- Rebased on riscv/for-next
- Remove ternary operator to use 'if()' instead in extension checks
- v5: https://lore.kernel.org/all/20240517145302.971019-1-cleger@rivosinc.com/
v5:
- Merged in Zimop to avoid any uneeded series dependencies
- Rework dependency resolution loop to loop on source isa first rather
than on all extension.
- Disabled extensions in source isa once set in resolved isa
- Rename riscv_resolve_isa() parameters
- v4: https://lore.kernel.org/all/20240429150553.625165-1-cleger@rivosinc.com/
v4:
- Modify validate() callbacks to return 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (16):
dt-bindings: riscv: add Zimop ISA extension description
riscv: add ISA extension parsing for Zimop
riscv: hwprobe: export Zimop ISA extension
RISC-V: KVM: Allow Zimop extension for Guest/VM
KVM: riscv: selftests: Add Zimop extension to get-reg-list test
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation callback
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 28 ++
.../devicetree/bindings/riscv/extensions.yaml | 95 ++++++
arch/riscv/include/asm/cpufeature.h | 1 +
arch/riscv/include/asm/hwcap.h | 6 +
arch/riscv/include/uapi/asm/hwprobe.h | 6 +
arch/riscv/include/uapi/asm/kvm.h | 6 +
arch/riscv/kernel/cpufeature.c | 277 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 6 +
arch/riscv/kvm/vcpu_onereg.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 24 ++
10 files changed, 375 insertions(+), 86 deletions(-)
--
2.45.2
The open() function returns -1 on error. openat() and open() initialize
'from' and 'to', and only 'from' validated with 'if' statement. If the
initialization of variable 'to' fails, we should better check the value
of 'to' and close 'from' to avoid possible file leak. Improve the checking
of 'from' additionally.
Fixes: 32ae976ed3b5 ("selftests/capabilities: Add tests for capability evolution")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the patch according to suggestions;
- found by customized static analysis tool.
---
tools/testing/selftests/capabilities/test_execve.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c
index 47bad7ddc5bc..6406ab6aa1f5 100644
--- a/tools/testing/selftests/capabilities/test_execve.c
+++ b/tools/testing/selftests/capabilities/test_execve.c
@@ -145,10 +145,14 @@ static void chdir_to_tmpfs(void)
static void copy_fromat_to(int fromfd, const char *fromname, const char *toname)
{
int from = openat(fromfd, fromname, O_RDONLY);
- if (from == -1)
+ if (from < 0)
ksft_exit_fail_msg("open copy source - %s\n", strerror(errno));
int to = open(toname, O_CREAT | O_WRONLY | O_EXCL, 0700);
+ if (to < 0) {
+ close(from);
+ ksft_exit_fail_msg("open copy destination - %s\n", strerror(errno));
+ }
while (true) {
char buf[4096];
--
2.25.1
In the same way than commit ae7487d112cf ("selftests/hid: ensure we can
compile the tests on kernels pre-6.3") we should expose struct hid_bpf_ops
when it's not available in vmlinux.h.
So unexpose an eventual struct hid_bpf_ops, include vmlinux.h, and
re-export struct hid_bpf_ops.
Fixes: d7696738d66b ("selftests/hid: convert the hid_bpf selftests with struct_ops")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/r/202406270328.bscLN1IF-lkp@intel.com/
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Same situation than in an early report when HID-BPF was initially
included: the automatically generated vmlinux.h doesn't contain all of
the required structs and the compilation of the bpf program fails.
---
tools/testing/selftests/hid/progs/hid_bpf_helpers.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
index c72e44321764..5a911f0e8625 100644
--- a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
+++ b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
@@ -7,6 +7,7 @@
/* "undefine" structs and enums in vmlinux.h, because we "override" them below */
#define hid_bpf_ctx hid_bpf_ctx___not_used
+#define hid_bpf_ops hid_bpf_ops___not_used
#define hid_report_type hid_report_type___not_used
#define hid_class_request hid_class_request___not_used
#define hid_bpf_attach_flags hid_bpf_attach_flags___not_used
@@ -24,6 +25,7 @@
#include "vmlinux.h"
#undef hid_bpf_ctx
+#undef hid_bpf_ops
#undef hid_report_type
#undef hid_class_request
#undef hid_bpf_attach_flags
@@ -68,6 +70,20 @@ enum hid_class_request {
HID_REQ_SET_PROTOCOL = 0x0B,
};
+struct hid_bpf_ops {
+ int hid_id;
+ u32 flags;
+ struct list_head list;
+ int (*hid_device_event)(struct hid_bpf_ctx *ctx, enum hid_report_type report_type,
+ __u64 source);
+ int (*hid_rdesc_fixup)(struct hid_bpf_ctx *ctx);
+ int (*hid_hw_request)(struct hid_bpf_ctx *ctx, unsigned char reportnum,
+ enum hid_report_type rtype, enum hid_class_request reqtype,
+ __u64 source);
+ int (*hid_hw_output_report)(struct hid_bpf_ctx *ctx, __u64 source);
+ struct hid_device *hdev;
+};
+
#ifndef BPF_F_BEFORE
#define BPF_F_BEFORE (1U << 3)
#endif
---
base-commit: d3e15189bfd4d0a9d3a7ad8bd0e6ebb1c0419f93
change-id: 20240627-fix-cki-f372855cbf6f
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This series is a followup of the struct_ops conversion.
Therefore, it is based on top of the for-6.11/bpf branch of the hid.git
tree:
https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git/log/?h=for-6.11…
The first patch should go in ASAP, it's a fix that was detected by Dan
and which is actually breaking some use cases.
The rest is adding new capabilities to HID-BPF: being able to intercept
hid_hw_raw_request() and hid_hw_ouptut_report(). Both operations are
write operations to the device.
Having those new hooks allows to implement the "firewall" of HID
devices: this way a bpf program can selectively authorize an hidraw
client to write or not to the device depending on what is requested.
This also allows to completely emulate new behavior: we can now create a
"fake" feature on a HID device, and when we receive a request on this
feature, we can emulate the answer by either statically answering or
even by communicating with the device from bpf, as those new hooks are
sleepable.
Last, there is one change in the kfunc hid_bpf_input_report, in which it
actually waits for the device to be ready. This will not break any
potential users as the function was already declared as sleepable.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- made use of srcu, for sleepable users
- Link to v1: https://lore.kernel.org/r/20240621-hid_hw_req_bpf-v1-0-d7ab8b885a0b@kernel.…
---
Benjamin Tissoires (13):
HID: bpf: fix dispatch_hid_bpf_device_event uninitialized ret value
HID: add source argument to HID low level functions
HID: bpf: protect HID-BPF prog_list access by a SRCU
HID: bpf: add HID-BPF hooks for hid_hw_raw_requests
HID: bpf: prevent infinite recursions with hid_hw_raw_requests hooks
selftests/hid: add tests for hid_hw_raw_request HID-BPF hooks
HID: bpf: add HID-BPF hooks for hid_hw_output_report
selftests/hid: add tests for hid_hw_output_report HID-BPF hooks
HID: bpf: make hid_bpf_input_report() sleep until the device is ready
selftests/hid: add wq test for hid_bpf_input_report()
HID: bpf: allow hid_device_event hooks to inject input reports on self
selftests/hid: add another test for injecting an event from an event hook
selftests/hid: add an infinite loop test for hid_bpf_try_input_report
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 165 ++++++++++-
drivers/hid/bpf/hid_bpf_dispatch.h | 1 +
drivers/hid/bpf/hid_bpf_struct_ops.c | 6 +-
drivers/hid/hid-core.c | 118 +++++---
drivers/hid/hidraw.c | 10 +-
include/linux/hid.h | 7 +
include/linux/hid_bpf.h | 80 ++++-
tools/testing/selftests/hid/hid_bpf.c | 326 +++++++++++++++++++++
tools/testing/selftests/hid/progs/hid.c | 292 ++++++++++++++++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 13 +
11 files changed, 955 insertions(+), 65 deletions(-)
---
base-commit: 33c0fb85b571b0f1bbdbf466e770eebeb29e6f41
change-id: 20240614-hid_hw_req_bpf-df0b95aeb425
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.
** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.
** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.
The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
--
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
"sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10):
net: psample: add user cookie
net: sched: act_sample: add action cookie to sample
net: psample: skip packet copy if no listeners
net: psample: allow using rate as probability
net: openvswitch: add emit_sample action
net: openvswitch: store sampling probability in cb.
selftests: openvswitch: add emit_sample action
selftests: openvswitch: add userspace parsing
selftests: openvswitch: parse trunc action
selftests: openvswitch: add emit_sample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++
include/net/psample.h | 5 +-
include/uapi/linux/openvswitch.h | 31 +-
include/uapi/linux/psample.h | 11 +-
net/openvswitch/Kconfig | 1 +
net/openvswitch/actions.c | 63 +++-
net/openvswitch/datapath.h | 3 +
net/openvswitch/flow_netlink.c | 33 ++-
net/openvswitch/vport.c | 1 +
net/psample/psample.c | 16 +-
net/sched/act_sample.c | 12 +
.../selftests/net/openvswitch/openvswitch.sh | 114 +++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++-
13 files changed, 563 insertions(+), 16 deletions(-)
--
2.45.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- add patch 2, a new fix for sk_msg_memcopy_from_iter.
- update patch 3, only test "sk->sk_prot->close" as Eric suggested.
- update patch 4, use "goto err" instead of "return" as Eduard
suggested.
- add "fixes" tag for patch 1-3.
- change subject prefixes as "bpf-next" to trigger BPF CI.
- cc Loongarch maintainers too.
BPF selftests seem to have not been fully tested on Loongarch. When I
ran these tests on Loongarch recently, some errors occur. This patch set
contains some null-check related fixes for these errors.
Geliang Tang (4):
skmsg: null check for sg_page in sk_msg_recvmsg
skmsg: null check for sg_page in sk_msg_memcopy_from_iter
inet: null check for close in inet_release
selftests/bpf: Null checks for link in bpf_tcp_ca
net/core/skmsg.c | 4 ++++
net/ipv4/af_inet.c | 3 ++-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 16 ++++++++++++----
3 files changed, 18 insertions(+), 5 deletions(-)
--
2.43.0
serial_test_fexit_stress() has a non-robust handling of file descriptor
closure. If an error occurs, the function may exit without closing open
file descriptors, potentially causing resource leaks.
Fix the issue by closing file descriptors in reverse order and starting
from the last opened. Ensure proper closure even if an error occurs early.
Fixes: 8fb9fb2f1728 ("selftests/bpf: Query BPF_MAX_TRAMP_LINKS using BTF")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
tools/testing/selftests/bpf/prog_tests/fexit_stress.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_stress.c b/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
index 596536def43d..b1980bd61583 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
@@ -49,11 +49,14 @@ void serial_test_fexit_stress(void)
ASSERT_OK(err, "bpf_prog_test_run_opts");
out:
- for (i = 0; i < bpf_max_tramp_links; i++) {
+ if (i >= bpf_max_tramp_links)
+ i = bpf_max_tramp_links - 1;
+ while (i >= 0) {
if (link_fd[i])
close(link_fd[i]);
if (fexit_fd[i])
close(fexit_fd[i]);
+ i--;
}
free(fd);
}
--
2.25.1
The selftest noncont_cat_run_test fails on AMD with the warnings. Reason
is, AMD supports non contiguous CBM masks but does not report it via CPUID.
Update noncont_cat_run_test to check for the vendor when verifying CPUID.
Fixes: ae638551ab64 ("selftests/resctrl: Add non-contiguous CBMs CAT test")
Signed-off-by: Babu Moger <babu.moger(a)amd.com>
---
This was part of the series
https://lore.kernel.org/lkml/cover.1708637563.git.babu.moger@amd.com/
Sending this as a separate fix per review comments.
---
tools/testing/selftests/resctrl/cat_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index d4dffc934bc3..b2988888786e 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -308,7 +308,7 @@ static int noncont_cat_run_test(const struct resctrl_test *test,
else
return -EINVAL;
- if (sparse_masks != ((ecx >> 3) & 1)) {
+ if ((get_vendor() == ARCH_INTEL) && sparse_masks != ((ecx >> 3) & 1)) {
ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
return 1;
}
--
2.34.1
The open() function returns -1 on error. openat() and open() initialize
'from' and 'to', and only 'from' validated with 'if' statement. If the
initialization of variable 'to' fails, we should better check the value
of 'to' and close 'from' to avoid possible file leak.
Fixes: 32ae976ed3b5 ("selftests/capabilities: Add tests for capability evolution")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Found this error through static analysis.
---
tools/testing/selftests/capabilities/test_execve.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c
index 47bad7ddc5bc..de187eff177d 100644
--- a/tools/testing/selftests/capabilities/test_execve.c
+++ b/tools/testing/selftests/capabilities/test_execve.c
@@ -149,6 +149,10 @@ static void copy_fromat_to(int fromfd, const char *fromname, const char *toname)
ksft_exit_fail_msg("open copy source - %s\n", strerror(errno));
int to = open(toname, O_CREAT | O_WRONLY | O_EXCL, 0700);
+ if (to == -1) {
+ close(from);
+ ksft_exit_fail_msg("open copy destination - %s\n", strerror(errno));
+ }
while (true) {
char buf[4096];
--
2.25.1
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
v2->v3:
- ucontext describes current state -> ucontext describes interrupted context
- Add a comment for blockage of USR2 even after return from handler
- Describe blockage of signals in a better way
v1->v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Fixed a mismatch between code comment and ksft log
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
v1:
- https://lore.kernel.org/all/20240607122319.768640-1-dev.jain@arm.com/
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 194 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 199 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
Currently if we request a feature that is not set in the Kernel
config we fail silently and return all the available features. However,
the man page indicates we should return an EINVAL.
We need to fix this issue since we can end up with a Kernel warning
should a program request the feature UFFD_FEATURE_WP_UNPOPULATED on
a kernel with the config not set with this feature.
[ 200.812896] WARNING: CPU: 91 PID: 13634 at mm/memory.c:1660 zap_pte_range+0x43d/0x660
[ 200.820738] Modules linked in:
[ 200.869387] CPU: 91 PID: 13634 Comm: userfaultfd Kdump: loaded Not tainted 6.9.0-rc5+ #8
[ 200.877477] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.7.3 03/30/2022
[ 200.885052] RIP: 0010:zap_pte_range+0x43d/0x660
Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API")
Signed-off-by: Audra Mitchell <audra(a)redhat.com>
---
fs/userfaultfd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index eee7320ab0b0..17e409ceaa33 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -2057,7 +2057,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
goto out;
features = uffdio_api.features;
ret = -EINVAL;
- if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
+ if (uffdio_api.api != UFFD_API)
goto err_out;
ret = -EPERM;
if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
@@ -2081,6 +2081,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
#endif
+
+ ret = -EINVAL;
+ if (features & ~uffdio_api.features)
+ goto err_out;
+
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
--
2.44.0
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 208 ++++++++-
include/net/page_pool/helpers.h | 124 ++++--
include/net/page_pool/types.h | 22 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 30 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 376 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 103 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 362 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
47 files changed, 2753 insertions(+), 251 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.2.741.gdbec12cfda-goog
Currently if we request a feature that is not set in the Kernel
config we fail silently and return all the available features. However,
the man page indicates we should return an EINVAL.
We need to fix this issue since we can end up with a Kernel warning
should a program request the feature UFFD_FEATURE_WP_UNPOPULATED on
a kernel with the config not set with this feature.
[ 200.812896] WARNING: CPU: 91 PID: 13634 at mm/memory.c:1660 zap_pte_range+0x43d/0x660
[ 200.820738] Modules linked in:
[ 200.869387] CPU: 91 PID: 13634 Comm: userfaultfd Kdump: loaded Not tainted 6.9.0-rc5+ #8
[ 200.877477] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.7.3 03/30/2022
[ 200.885052] RIP: 0010:zap_pte_range+0x43d/0x660
Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API")
Signed-off-by: Audra Mitchell <audra(a)redhat.com>
---
fs/userfaultfd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index eee7320ab0b0..17e409ceaa33 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -2057,7 +2057,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
goto out;
features = uffdio_api.features;
ret = -EINVAL;
- if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
+ if (uffdio_api.api != UFFD_API)
goto err_out;
ret = -EPERM;
if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
@@ -2081,6 +2081,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
#endif
+
+ ret = -EINVAL;
+ if (features & ~uffdio_api.features)
+ goto err_out;
+
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
--
2.44.0
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In case of a transparent hugepage, it will be split into 4K pages
as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Changelog
v4 => v5:
* incorportate feedbacks from Muhammad Usama Anjum
<usama.anjum(a)collabora.com>
* refactor selftest to use what available in kselftest.h.
* update a comment in soft_offline_page.
v3 => v4:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Andrew Morton <akpm(a)linux-foundation.org>, and
Oscar Salvador <osalvador(a)suse.de>.
* insert a refactor commit to unify soft offline's logs to follow
"Soft offline: 0x${pfn}: ${message}" format.
* some rewords in document: fail => will not perform.
* v4 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3"),
akpm/mm-stable.
v2 => v3:
* incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>,
Lance Yang <ioworker0(a)gmail.com>, Oscar Salvador <osalvador(a)suse.de>,
and David Rientjes <rientjes(a)google.com>.
* release potential refcount if enable_soft_offline is 0.
* soft_offline_page() returns EOPNOTSUPP if enable_soft_offline is 0.
* refactor hugetlb-soft-offline.c, for example, introduce
test_soft_offline_common to reduce repeated code.
* rewrite enable_soft_offline's documentation, adds more details about
the cost of soft-offline for transparent and hugetlb hugepages, and
components that are impacted when enable_soft_offline becomes 0.
* fix typos in commit messages.
* v3 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3").
v1 => v2:
* incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and
Jane Chu <jane.chu(a)oracle.com>.
* make the switch to control all pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm.rst | 32 +++
mm/memory-failure.c | 38 ++-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 227 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
6 files changed, 295 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.2.741.gdbec12cfda-goog
Centralizes the definition of _GNU_SOURCE into lib.mk and addresses all
resulting macro redefinition warnings.
These patches will need to be merged in one shot to avoid redefinition
warnings.
The initial attempt at this patch was abandoned because it affected
lines in many source files and caused a large amount of churn. However,
from earlier discussions, centralizing _GNU_SOURCE is still desireable.
This attempt limits the changes to 1 source file and 12 Makefiles.
v1: https://lore.kernel.org/linux-kselftest/20240430235057.1351993-1-edliaw@goo…
v2: https://lore.kernel.org/linux-kselftest/20240507214254.2787305-1-edliaw@goo…
- Add -D_GNU_SOURCE to KHDR_INCLUDES so that it is in a single
location.
- Remove #define _GNU_SOURCE from source code to resolve redefinition
warnings.
v3: https://lore.kernel.org/linux-kselftest/20240509200022.253089-1-edliaw@goog…
- Rebase onto linux-next 20240508.
- Split patches by directory.
- Add -D_GNU_SOURCE directly to CFLAGS in lib.mk.
- Delete additional _GNU_SOURCE definitions from source code in
linux-next.
- Delete additional -D_GNU_SOURCE flags from Makefiles.
v4: https://lore.kernel.org/linux-kselftest/20240510000842.410729-1-edliaw@goog…
- Rebase onto linux-next 20240509.
- Remove Fixes tag from patches that drop _GNU_SOURCE definition.
- Restore space between comment and includes for selftests/damon.
v5: https://lore.kernel.org/linux-kselftest/20240522005913.3540131-1-edliaw@goo…
- Rebase onto linux-next 20240521
- Drop initial patches that modify KHDR_INCLUDES.
- Incorporate Mark Brown's patch to replace static_assert with warning.
- Don't drop #define _GNU_SOURCE from nolibc and wireguard.
- Change Makefiles for x86 and vDSO to append to CFLAGS.
v6:
- Rewrite patch to use -D_GNU_SOURCE= form in lib.mk.
- Reduce the amount of churn significantly by allowing definition to
coexist with source code macro defines.
Edward Liaw (13):
selftests/mm: Define _GNU_SOURCE to an empty string
selftests: Add -D_GNU_SOURCE= to CFLAGS in lib.mk
selftests/net: Append to lib.mk CFLAGS in Makefile
selftests/exec: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/futex: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/intel_pstate: Drop redundant -D_GNU_SOURCE CFLAGS in
Makefile
selftests/iommu: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/kvm: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/proc: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/resctrl: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/ring-buffer: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/riscv: Drop redundant -D_GNU_SOURCE CFLAGS in Makefile
selftests/sgx: Append CFLAGS from lib.mk to HOST_CFLAGS
tools/testing/selftests/exec/Makefile | 1 -
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/intel_pstate/Makefile | 2 +-
tools/testing/selftests/iommu/Makefile | 2 --
tools/testing/selftests/kvm/Makefile | 2 +-
tools/testing/selftests/lib.mk | 3 +++
tools/testing/selftests/mm/thuge-gen.c | 2 +-
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tcp_ao/Makefile | 2 +-
tools/testing/selftests/proc/Makefile | 1 -
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/ring-buffer/Makefile | 1 -
tools/testing/selftests/riscv/mm/Makefile | 2 +-
tools/testing/selftests/sgx/Makefile | 2 +-
14 files changed, 12 insertions(+), 14 deletions(-)
--
2.45.2.741.gdbec12cfda-goog
This series introduces the selftests/arm directory, which tests 32 and
64-bit kernel compatibility with 32-bit ELFs running on the Aarch platform.
The need for this bucket of tests is that 32 bit applications built on
legacy ARM architecture must not break on the new Aarch64 platforms and
the 64-bit kernel. The kernel must emulate the data structures, system
calls and the registers according to Aarch32, when running a 32-bit
process; this directory fills that testing requirement.
One may find similarity between this directory and selftests/arm64; it is
advisable to refer to that since a lot has been pulled from there itself.
The mm directory includes a test for checking 4GB limit of the virtual
address space of a process.
The signal directory contains two tests, following a common theme: mangle
with arm_cpsr, dumped by the kernel to user space while invoking the signal
handler; kernel must spot this illegal attempt and terminate the program by
SEGV.
The elf directory includes a test for checking the 32-bit status of the ELF.
The abi directory includes two ptrace tests, in the first, a 32-bit parent
debugs a 32-bit child, and in the second, a 64-bit parent debugs a 32-bit
child. The second test will be skipped when running on a 32-bit kernel.
Credits to Mark Brown for suggesting this work.
Testing:
The series has been tested on the Aarch64 kernel. For the Aarch32 kernel,
I used qemu-system-arm with machine 'vexpress-a15', along with a buildroot
rootfs; the individual statically built tests pass on that, but building
the entire test suite on that remains untested, due to my lack of
experience with qemu and rootfses.
Since I have done some changes in selftests/arm64, I have tested that
those tests do not break.
v2->v3:
- mm, elf: Split into multiple testcases
- Eliminate copying in signal/ using ifdeffery and pulling from selftests/arm64
- Delete config file, since it does not make sense for testing a 32-bit kernel
- Split ptrace in selftests/arm64, and pull some stuff from there
- Add abi tests containing ptrace and ptrace_64
- Fix build warnings in selftests/arm64 (can be applied independent of this series)
v1->v2:
- Formatting changes
- Add .gitignore files and config file
v1:
- https://lore.kernel.org/all/20240405084410.256788-1-dev.jain@arm.com/
Dev Jain (9):
selftests/arm: Add mm test
selftests/arm: Add elf test
selftests: arm, arm64: Use ifdeffery to pull signal infrastructure
selftests/arm: Add signal tests
selftests/arm64: Fix build warnings for ptrace
selftests/arm64: Split ptrace, use ifdeffery
selftests/arm: Add ptrace test
selftests/arm: Add ptrace_64 test
selftests: Add build infrastructure along with README
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/arm/Makefile | 56 ++++++++
tools/testing/selftests/arm/README | 32 +++++
tools/testing/selftests/arm/abi/.gitignore | 4 +
tools/testing/selftests/arm/abi/Makefile | 26 ++++
tools/testing/selftests/arm/abi/ptrace.c | 82 +++++++++++
tools/testing/selftests/arm/abi/ptrace.h | 57 ++++++++
tools/testing/selftests/arm/abi/ptrace_64.c | 91 ++++++++++++
.../selftests/arm/abi/trivial_32bit_program.c | 14 ++
tools/testing/selftests/arm/elf/.gitignore | 2 +
tools/testing/selftests/arm/elf/Makefile | 6 +
tools/testing/selftests/arm/elf/parse_elf.c | 77 ++++++++++
tools/testing/selftests/arm/mm/.gitignore | 2 +
tools/testing/selftests/arm/mm/Makefile | 6 +
tools/testing/selftests/arm/mm/compat_va.c | 89 ++++++++++++
tools/testing/selftests/arm/signal/.gitignore | 3 +
tools/testing/selftests/arm/signal/Makefile | 30 ++++
.../selftests/arm/signal/test_signals.c | 2 +
.../selftests/arm/signal/test_signals.h | 2 +
.../selftests/arm/signal/test_signals_utils.c | 2 +
.../selftests/arm/signal/test_signals_utils.h | 2 +
.../testcases/mangle_cpsr_invalid_aif_bits.c | 33 +++++
.../mangle_cpsr_invalid_compat_toggle.c | 29 ++++
tools/testing/selftests/arm64/abi/ptrace.c | 121 ++--------------
tools/testing/selftests/arm64/abi/ptrace.h | 135 ++++++++++++++++++
.../selftests/arm64/signal/test_signals.h | 12 ++
.../arm64/signal/test_signals_utils.c | 51 +++++--
.../arm64/signal/test_signals_utils.h | 3 +
28 files changed, 850 insertions(+), 120 deletions(-)
create mode 100644 tools/testing/selftests/arm/Makefile
create mode 100644 tools/testing/selftests/arm/README
create mode 100644 tools/testing/selftests/arm/abi/.gitignore
create mode 100644 tools/testing/selftests/arm/abi/Makefile
create mode 100644 tools/testing/selftests/arm/abi/ptrace.c
create mode 100644 tools/testing/selftests/arm/abi/ptrace.h
create mode 100644 tools/testing/selftests/arm/abi/ptrace_64.c
create mode 100644 tools/testing/selftests/arm/abi/trivial_32bit_program.c
create mode 100644 tools/testing/selftests/arm/elf/.gitignore
create mode 100644 tools/testing/selftests/arm/elf/Makefile
create mode 100644 tools/testing/selftests/arm/elf/parse_elf.c
create mode 100644 tools/testing/selftests/arm/mm/.gitignore
create mode 100644 tools/testing/selftests/arm/mm/Makefile
create mode 100644 tools/testing/selftests/arm/mm/compat_va.c
create mode 100644 tools/testing/selftests/arm/signal/.gitignore
create mode 100644 tools/testing/selftests/arm/signal/Makefile
create mode 100644 tools/testing/selftests/arm/signal/test_signals.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals.h
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.h
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_invalid_aif_bits.c
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_invalid_compat_toggle.c
create mode 100644 tools/testing/selftests/arm64/abi/ptrace.h
--
2.39.2