The current implementation of netconsole sends all log messages in parallel, which can lead to an intermixed and interleaved output on the receiving side. This makes it challenging to demultiplex the messages and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------ ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0 refcount_t: addition on 0; use-after-free. WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0 Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole subsystem that allows the automatic population of the CPU number in the userdata field for each log message. This enhancement provides several benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are sending messages concurrently, the added CPU number in the userdata makes it easier to differentiate and attribute the messages to their originating CPUs.
* Better visibility into message sources: The CPU number information gives users and developers more insight into which specific CPU a particular log message came from, which can be valuable for debugging and analysis.
The changes in this series are as follows:
Patch "Ensure dynamic_netconsole_mutex is held during userdata update"
Add a lockdep assert to make sure dynamic_netconsole_mutex is held when calling update_userdata().
Patch "netconsole: Add option to auto-populate CPU number in userdata"
Adds a new option to enable automatic CPU number population in the netconsole userdata Provides a new "populate_cpu_nr" sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
Expands the existing netconsole selftest to verify the CPU number auto-population functionality Ensures the received netconsole messages contain the expected "cpu=" entry in the userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
Updates the netconsole documentation to explain the new CPU number auto-population feature Provides instructions on how to enable and use the feature
I believe these changes will be a valuable addition to the netconsole subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao leitao@debian.org --- Breno Leitao (4): netconsole: Ensure dynamic_netconsole_mutex is held during userdata update netconsole: Add option to auto-populate CPU number in userdata netconsole: docs: Add documentation for CPU number auto-population netconsole: selftest: Validate CPU number auto-population in userdata
Documentation/networking/netconsole.rst | 44 +++++++++++++++ drivers/net/netconsole.c | 63 ++++++++++++++++++++++ .../testing/selftests/drivers/net/netcons_basic.sh | 18 +++++++ 3 files changed, 125 insertions(+) --- base-commit: a58f00ed24b849d449f7134fd5d86f07090fe2f5 change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
Add a lock assertion to verify that dynamic_netconsole_mutex is held when updating the userdata fields in the netconsole driver.
This change prevents calling update_userdata() without holding the lock, ensuring data consistency and integrity.
The update_userdata() function is invoked only when the user changes the configuration, so this addition does not impact performance as it is not in the hot path.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 4ea44a2f48f7b1f9059d275f0f0edc40cc1997f0..965712d65a014c60f91bf35e48f1b112f66b8439 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -697,6 +697,8 @@ static void update_userdata(struct netconsole_target *nt) int complete_idx = 0, child_count = 0; struct list_head *entry;
+ lockdep_assert_held(&dynamic_netconsole_mutex); + /* Clear the current string in case the last userdatum was deleted */ nt->userdata_length = 0; nt->userdata_complete[0] = 0;
Add a new option to the netconsole subsystem to automatically populate the CPU number that is sending the log message in the userdata field.
When enabled, this feature will append a "cpu=<cpu_number>" entry to the userdata string, allowing the receiver to demultiplex and differentiate between messages sent from different CPUs. This is particularly useful when dealing with parallel log output that gets intermixed on the receiver side.
The new option is added as a flag in the enum userdata_auto and can be controlled through the populate_cpu_nr sysfs attribute in the netconsole target's configfs hierarchy. The main benefits of this change are:
* Improved log message demultiplexing for parallel output. * Better visibility into which CPU a particular log message originated from.
Create userdata_auto as an bitwise enum, since I have immediate plans to expand it.
Since the CPU id doesn't need to be precise, do not disable preemption when getting the CPU id.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 965712d65a014c60f91bf35e48f1b112f66b8439..861e8988b8db4f29ae3de02222c9131059694aba 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -90,6 +90,14 @@ static DEFINE_MUTEX(target_cleanup_list_lock); */ static struct console netconsole_ext;
+/* Fields requiring kernel auto-population in userdata. + * The fields are designed as bitwise flags, allowing multiple fields to be set + */ +enum userdata_auto { + /* Populate CPU number that is sending the message */ + AUTO_CPU_NR = BIT(0), +}; + /** * struct netconsole_target - Represents a configured netconsole target. * @list: Links this target into the target_list. @@ -97,6 +105,7 @@ static struct console netconsole_ext; * @userdata_group: Links to the userdata configfs hierarchy * @userdata_complete: Cached, formatted string of append * @userdata_length: String length of userdata_complete + * @userdata_auto: Kernel auto-populated bitwise fields in userdata. * @enabled: On / off knob to enable / disable target. * Visible from userspace (read-write). * We maintain a strict 1:1 correspondence between this and @@ -123,6 +132,7 @@ struct netconsole_target { struct config_group userdata_group; char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS]; size_t userdata_length; + enum userdata_auto userdata_auto; #endif bool enabled; bool extended; @@ -371,6 +381,18 @@ static ssize_t remote_mac_show(struct config_item *item, char *buf) return sysfs_emit(buf, "%pM\n", to_target(item)->np.remote_mac); }
+static ssize_t populate_cpu_nr_show(struct config_item *item, char *buf) +{ + struct netconsole_target *nt = to_target(item->ci_parent); + bool cpu_nr_enabled; + + mutex_lock(&dynamic_netconsole_mutex); + cpu_nr_enabled = nt->userdata_auto & AUTO_CPU_NR; + mutex_unlock(&dynamic_netconsole_mutex); + + return sysfs_emit(buf, "%d\n", cpu_nr_enabled); +} + /* * This one is special -- targets created through the configfs interface * are not enabled (and the corresponding netpoll activated) by default. @@ -726,6 +748,15 @@ static void update_userdata(struct netconsole_target *nt) MAX_USERDATA_ENTRY_LENGTH, " %s=%s\n", item->ci_name, udm_item->value); } + + /* Check if CPU NR should be populated, and append it to the user + * dictionary. + */ + if (child_count < MAX_USERDATA_ITEMS && nt->userdata_auto & AUTO_CPU_NR) + scnprintf(&nt->userdata_complete[complete_idx], + MAX_USERDATA_ENTRY_LENGTH, " cpu=%u\n", + raw_smp_processor_id()); + nt->userdata_length = strnlen(nt->userdata_complete, sizeof(nt->userdata_complete)); } @@ -757,7 +788,36 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf, return ret; }
+static ssize_t populate_cpu_nr_store(struct config_item *item, const char *buf, + size_t count) +{ + struct netconsole_target *nt = to_target(item->ci_parent); + bool cpu_nr_enabled; + ssize_t ret; + + if (!nt) + return -EINVAL; + + mutex_lock(&dynamic_netconsole_mutex); + ret = kstrtobool(buf, &cpu_nr_enabled); + if (ret) + goto out_unlock; + + if (cpu_nr_enabled) + nt->userdata_auto |= AUTO_CPU_NR; + else + nt->userdata_auto &= ~AUTO_CPU_NR; + + update_userdata(nt); + + ret = strnlen(buf, count); +out_unlock: + mutex_unlock(&dynamic_netconsole_mutex); + return ret; +} + CONFIGFS_ATTR(userdatum_, value); +CONFIGFS_ATTR(, populate_cpu_nr);
static struct configfs_attribute *userdatum_attrs[] = { &userdatum_attr_value, @@ -819,6 +879,7 @@ static void userdatum_drop(struct config_group *group, struct config_item *item) }
static struct configfs_attribute *userdata_attrs[] = { + &attr_populate_cpu_nr, NULL, };
Sorry for the late review, I think this will miss v6.13 :(
On Wed, 13 Nov 2024 07:10:53 -0800 Breno Leitao wrote:
/**
- struct netconsole_target - Represents a configured netconsole target.
- @list: Links this target into the target_list.
@@ -97,6 +105,7 @@ static struct console netconsole_ext;
- @userdata_group: Links to the userdata configfs hierarchy
- @userdata_complete: Cached, formatted string of append
- @userdata_length: String length of userdata_complete
- @userdata_auto: Kernel auto-populated bitwise fields in userdata.
- @enabled: On / off knob to enable / disable target.
Visible from userspace (read-write).
We maintain a strict 1:1 correspondence between this and
@@ -123,6 +132,7 @@ struct netconsole_target { struct config_group userdata_group; char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS]; size_t userdata_length;
- enum userdata_auto userdata_auto;
If you want to set multiple bits here type should probably be unsigned long. Otherwise the enum will contain combination of its values, which are in themselves not valid enum values ... if that makes sense.
#endif bool enabled; bool extended;
- /* Check if CPU NR should be populated, and append it to the user
* dictionary.
*/
- if (child_count < MAX_USERDATA_ITEMS && nt->userdata_auto & AUTO_CPU_NR)
scnprintf(&nt->userdata_complete[complete_idx],
MAX_USERDATA_ENTRY_LENGTH, " cpu=%u\n",
raw_smp_processor_id());
I guess it may be tricky for backward compat, but shouldn't we return an error rather than silently skip?
nt->userdata_length = strnlen(nt->userdata_complete, sizeof(nt->userdata_complete)); } @@ -757,7 +788,36 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf, return ret; } +static ssize_t populate_cpu_nr_store(struct config_item *item, const char *buf,
size_t count)
+{
- struct netconsole_target *nt = to_target(item->ci_parent);
- bool cpu_nr_enabled;
- ssize_t ret;
- if (!nt)
return -EINVAL;
Can this happen? Only if function gets called with a NULL @item which would be pretty nutty.
- mutex_lock(&dynamic_netconsole_mutex);
- ret = kstrtobool(buf, &cpu_nr_enabled);
- if (ret)
goto out_unlock;
- if (cpu_nr_enabled)
nt->userdata_auto |= AUTO_CPU_NR;
- else
nt->userdata_auto &= ~AUTO_CPU_NR;
- update_userdata(nt);
- ret = strnlen(buf, count);
+out_unlock:
- mutex_unlock(&dynamic_netconsole_mutex);
- return ret;
+}
Hello Jakub,
On Mon, Nov 18, 2024 at 06:33:36PM -0800, Jakub Kicinski wrote:
Sorry for the late review, I think this will miss v6.13 :(
That is fine, there is no rush for this change.
On Wed, 13 Nov 2024 07:10:53 -0800 Breno Leitao wrote:
/**
- struct netconsole_target - Represents a configured netconsole target.
- @list: Links this target into the target_list.
@@ -97,6 +105,7 @@ static struct console netconsole_ext;
- @userdata_group: Links to the userdata configfs hierarchy
- @userdata_complete: Cached, formatted string of append
- @userdata_length: String length of userdata_complete
- @userdata_auto: Kernel auto-populated bitwise fields in userdata.
- @enabled: On / off knob to enable / disable target.
Visible from userspace (read-write).
We maintain a strict 1:1 correspondence between this and
@@ -123,6 +132,7 @@ struct netconsole_target { struct config_group userdata_group; char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS]; size_t userdata_length;
- enum userdata_auto userdata_auto;
If you want to set multiple bits here type should probably be unsigned long. Otherwise the enum will contain combination of its values, which are in themselves not valid enum values ... if that makes sense.
Yes, it does make sense. I had the feeling that something was off as well, but I was unclear if using something different than `enum userdata_auto` would be better. I will change to `unsigned long`
#endif bool enabled; bool extended;
- /* Check if CPU NR should be populated, and append it to the user
* dictionary.
*/
- if (child_count < MAX_USERDATA_ITEMS && nt->userdata_auto & AUTO_CPU_NR)
scnprintf(&nt->userdata_complete[complete_idx],
MAX_USERDATA_ENTRY_LENGTH, " cpu=%u\n",
raw_smp_processor_id());
I guess it may be tricky for backward compat, but shouldn't we return an error rather than silently skip?
yes, this should be easy to do, in fact. Probably return -E2BIG to userspace when trying to update the entry. I thought about something as the following patch, and piggy-back into it.
Author: Breno Leitao leitao@debian.org Date: Tue Nov 19 04:32:56 2024 -0800
netconsole: Enforce userdata entry limit
Currently, attempting to add more than MAX_USERDATA_ITEMS to the userdata dictionary silently fails. This patch modifies the code to return -E2BIG when the number of elements exceeds the preallocated limit, providing clear feedback to userspace about the failure.
Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Breno Leitao leitao@debian.org
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 4ea44a2f48f7b..41cff8c8e8f42 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -692,10 +692,11 @@ static ssize_t userdatum_value_show(struct config_item *item, char *buf) return sysfs_emit(buf, "%s\n", &(to_userdatum(item)->value[0])); }
-static void update_userdata(struct netconsole_target *nt) +static int update_userdata(struct netconsole_target *nt) { int complete_idx = 0, child_count = 0; struct list_head *entry; + int ret = 0;
/* Clear the current string in case the last userdatum was deleted */ nt->userdata_length = 0; @@ -705,8 +706,10 @@ static void update_userdata(struct netconsole_target *nt) struct userdatum *udm_item; struct config_item *item;
- if (child_count >= MAX_USERDATA_ITEMS) + if (child_count >= MAX_USERDATA_ITEMS) { + ret = -E2BIG; break; + } child_count++;
item = container_of(entry, struct config_item, ci_entry); @@ -726,6 +729,7 @@ static void update_userdata(struct netconsole_target *nt) } nt->userdata_length = strnlen(nt->userdata_complete, sizeof(nt->userdata_complete)); + return ret; }
static ssize_t userdatum_value_store(struct config_item *item, const char *buf, @@ -748,8 +752,9 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf,
ud = to_userdata(item->ci_parent); nt = userdata_to_target(ud); - update_userdata(nt); - ret = count; + ret = update_userdata(nt); + if (!ret) + ret = count; out_unlock: mutex_unlock(&dynamic_netconsole_mutex); return ret;
nt->userdata_length = strnlen(nt->userdata_complete, sizeof(nt->userdata_complete)); } @@ -757,7 +788,36 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf, return ret; } +static ssize_t populate_cpu_nr_store(struct config_item *item, const char *buf,
size_t count)
+{
- struct netconsole_target *nt = to_target(item->ci_parent);
- bool cpu_nr_enabled;
- ssize_t ret;
- if (!nt)
return -EINVAL;
Can this happen? Only if function gets called with a NULL @item which would be pretty nutty.
Probably not. It is just me being chicken here. I will remove it for the next version.
Thanks for the review, --breno
On Tue, Nov 19, 2024 at 09:07:45AM -0800, Breno Leitao wrote:
#endif bool enabled; bool extended;
- /* Check if CPU NR should be populated, and append it to the user
* dictionary.
*/
- if (child_count < MAX_USERDATA_ITEMS && nt->userdata_auto & AUTO_CPU_NR)
scnprintf(&nt->userdata_complete[complete_idx],
MAX_USERDATA_ENTRY_LENGTH, " cpu=%u\n",
raw_smp_processor_id());
I guess it may be tricky for backward compat, but shouldn't we return an error rather than silently skip?
yes, this should be easy to do, in fact. Probably return -E2BIG to userspace when trying to update the entry. I thought about something as the following patch, and piggy-back into it.
Back to this topic, in fact, this is not needed at all.
The configfs make item helper (userdatum_make_item()) checks for exceeding entries, and fails if an additional entry is created.
static struct config_item *userdatum_make_item(struct config_group *group, const char *name) { .... child_count = list_count_nodes(&nt->userdata_group.cg_children); if (child_count >= MAX_USERDATA_ITEMS) return ERR_PTR(-ENOSPC);
I've sent an additional test for this mechanism, and make the check in update_userdata() a warning instead of just silently dropping the entry.
https://lore.kernel.org/all/20241204-netcons_overflow_test-v1-0-a85a8d0ace21...
Update the netconsole documentation to explain the new feature that allows automatic population of the CPU number in the userdata field.
The key changes include introducing a new section titled "CPU number auto population in userdata", explaining how to enable the CPU number auto-population feature by writing to the "populate_cpu_nr" file in the netconsole configfs hierarchy.
This documentation update ensures users are aware of the new CPU number auto-population functionality and how to leverage it for better demultiplexing and visibility of parallel netconsole output.
Signed-off-by: Breno Leitao leitao@debian.org --- Documentation/networking/netconsole.rst | 44 +++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst index d55c2a22ec7af0e687b76176c9171eb1b19013bd..a1b269196be390b5f88a6eca77f6676c67c05720 100644 --- a/Documentation/networking/netconsole.rst +++ b/Documentation/networking/netconsole.rst @@ -237,6 +237,50 @@ Delete `userdata` entries with `rmdir`::
It is recommended to not write user data values with newlines.
+CPU number auto population in userdata +-------------------------------------- + +Inside the netconsole configfs hierarchy, there is a file called +`populate_cpu_nr` under the `userdata` directory. This file is used to enable or +disable the automatic CPU number population feature. This feature +automatically populate the CPU number that is sending the message in the +userdata field. + +To enable the CPU number auto-population:: + + echo 1 > /sys/kernel/config/netconsole/target1/userdata/populate_cpu_nr + +When this option is enabled, the netconsole messages will include an additional +line in the userdata field with the format `cpu=<cpu_number>`. This allows the +receiver of the netconsole messages to easily differentiate and demultiplex +messages originating from different CPUs, which is particularly useful when +dealing with parallel log output. + +Example:: + + echo "This is a message" > /dev/kmsg + 12,607,22085407756,-;This is a message + cpu=42 + +In this example, the message was sent by CPU 42. + +.. note:: + + If the user has set a conflicting `cpu` key in the userdata dictionary, + both keys will be reported, with the kernel-populated entry appearing after + the user one. For example:: + + # User-defined CPU entry + mkdir -p /sys/kernel/config/netconsole/target1/userdata/cpu + echo "1" > /sys/kernel/config/netconsole/target1/userdata/cpu/value + + Output might look like:: + + 12,607,22085407756,-;This is a message + cpu=1 + cpu=42 # kernel-populated value + + Extended console: =================
Expand the existing netconsole selftest to verify the functionality of the recently added CPU number auto-population feature in the netconsole userdata.
The changes include enabling the "populate_cpu_nr" option on the netconsole target before sending the test data, and validating that the received data on the listener side contains a "cpu=" entry in the userdata, indicating that the CPU number was successfully auto-populated.
This addition to the netconsole selftest ensures that the new CPU number auto-population feature works as expected and helps catch any regressions in this functionality.
Signed-off-by: Breno Leitao leitao@debian.org --- tools/testing/selftests/drivers/net/netcons_basic.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh index b175f4d966e5056ddb62e335f212c03e55f50fb0..92538a188f0696b3e54071d1d32c7b2e523db839 100755 --- a/tools/testing/selftests/drivers/net/netcons_basic.sh +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -153,6 +153,16 @@ function set_user_data() { echo "${USERDATA_VALUE}" > "${VALUE_PATH}" }
+function set_cpu_nr() { + if [[ ! -f "${NETCONS_PATH}""/userdata/populate_cpu_nr" ]] + then + echo "Populate CPU configfs path not available in ${NETCONS_PATH}/userdata/populate_cpu_nr" >&2 + exit "${ksft_skip}" + fi + + echo 1 > "${NETCONS_PATH}""/userdata/populate_cpu_nr" +} + function listen_port_and_save_to() { local OUTPUT=${1} # Just wait for 2 seconds @@ -185,6 +195,12 @@ function validate_result() { exit "${ksft_fail}" fi
+ if ! grep -q "cpu=[0-9]+" "${TMPFILENAME}"; then + echo "FAIL: 'cpu=' not found in ${TMPFILENAME}" >&2 + cat "${TMPFILENAME}" >&2 + exit "${ksft_fail}" + fi + # Delete the file once it is validated, otherwise keep it # for debugging purposes rm "${TMPFILENAME}" @@ -254,6 +270,8 @@ set_network create_dynamic_target # Set userdata "key" with the "value" value set_user_data +# Auto populate CPU number +set_cpu_nr # Listed for netconsole port inside the namespace and destination interface listen_port_and_save_to "${OUTPUT_FILE}" & # Wait for socat to start and listen to the port.
On Wed, 13 Nov 2024 07:10:55 -0800 Breno Leitao wrote:
- if ! grep -q "cpu=[0-9]+" "${TMPFILENAME}"; then
echo "FAIL: 'cpu=' not found in ${TMPFILENAME}" >&2
cat "${TMPFILENAME}" >&2
exit "${ksft_fail}"
- fi
Could we try to do something like pick a 'random' CPU ID from sysfs, taskset the write / echo, and match that the incoming message has the expected CPU ID, not just any?
linux-kselftest-mirror@lists.linaro.org