After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.
Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.
Therefore this patch set introduces a sysctl that controls this
behavior, net.ipv4.ip_forward_update_priority. It's value is 1 by
default to preserve the current behavior.
All of the above is implemented in patch #1.
Value changes prompt a new NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE
notification, so that the drivers can hook up whatever logic may depend
on this value. That is implemented in patch #2.
In patches #3 and #4, mlxsw is adapted to recognize the sysctl. On
initialization, the RGCR register that handles router configuration is
set in accordance with the sysctl. The new notification is listened to
and RGCR is reconfigured as necessary.
In patches #5 to #7, a selftest is added to verify that mlxsw reflects
the sysctl value as necessary. The test is expressed in terms of the
recently-introduced ieee_setapp support, and works by observing how DSCP
value gets rewritten depending on packet priority. For this reason, the
test is added to the subdirectory drivers/net/mlxsw. Even though it's
not particularly specific to mlxsw, it's not suitable for running on
soft devices (which don't support the ieee_setapp et.al.).
Changes from v1 to v2:
- In patch #1, init sysctl_ip_fwd_update_priority to 1 instead of true.
Changes from RFC to v1:
- Fix wrong sysctl name in ip-sysctl.txt
- Add notifications
- Add mlxsw support
- Add self test
Petr Machata (7):
net: ipv4: Control SKB reprioritization after forwarding
net: ipv4: Notify about changes to ip_forward_update_priority
mlxsw: spectrum: Extract work-scheduling into a new function
mlxsw: spectrum_router: Handle sysctl_ip_fwd_update_priority
selftests: forwarding: Move lldpad waiting to lib.sh
selftests: forwarding: Move DSCP capture to lib.sh
selftests: mlxsw: Add test for ip_forward_update_priority
Documentation/networking/ip-sysctl.txt | 9 +
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 56 +++--
include/net/netevent.h | 1 +
include/net/netns/ipv4.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ip_forward.c | 3 +-
net/ipv4/sysctl_net_ipv4.c | 26 +++
.../selftests/drivers/net/mlxsw/qos_dscp_bridge.sh | 65 +-----
.../selftests/drivers/net/mlxsw/qos_dscp_router.sh | 233 +++++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 63 ++++++
10 files changed, 379 insertions(+), 79 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
--
2.4.11
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.
Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.
Therefore this patch set introduces a sysctl that controls this
behavior, net.ipv4.ip_forward_update_priority. It's value is 1 by
default to preserve the current behavior.
All of the above is implemented in patch #1.
Value changes prompt a new NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE
notification, so that the drivers can hook up whatever logic may depend
on this value. That is implemented in patch #2.
In patches #3 and #4, mlxsw is adapted to recognize the sysctl. On
initialization, the RGCR register that handles router configuration is
set in accordance with the sysctl. The new notification is listened to
and RGCR is reconfigured as necessary.
In patches #5 to #7, a selftest is added to verify that mlxsw reflects
the sysctl value as necessary. The test is expressed in terms of the
recently-introduced ieee_setapp support, and works by observing how DSCP
value gets rewritten depending on packet priority. For this reason, the
test is added to the subdirectory drivers/net/mlxsw. Even though it's
not particularly specific to mlxsw, it's not suitable for running on
soft devices (which don't support the ieee_setapp et.al.).
Changes from RFC to v1:
- Fix wrong sysctl name in ip-sysctl.txt
- Add notifications
- Add mlxsw support
- Add self test
Petr Machata (7):
net: ipv4: Control SKB reprioritization after forwarding
net: ipv4: Notify about changes to ip_forward_update_priority
mlxsw: spectrum: Extract work-scheduling into a new function
mlxsw: spectrum_router: Handle sysctl_ip_fwd_update_priority
selftests: forwarding: Move lldpad waiting to lib.sh
selftests: forwarding: Move DSCP capture to lib.sh
selftests: mlxsw: Add test for ip_forward_update_priority
Documentation/networking/ip-sysctl.txt | 9 +
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 56 +++--
include/net/netevent.h | 1 +
include/net/netns/ipv4.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ip_forward.c | 3 +-
net/ipv4/sysctl_net_ipv4.c | 26 +++
.../selftests/drivers/net/mlxsw/qos_dscp_bridge.sh | 65 +-----
.../selftests/drivers/net/mlxsw/qos_dscp_router.sh | 233 +++++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 63 ++++++
10 files changed, 379 insertions(+), 79 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
--
2.4.11
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello,
A tester ran the upstream selftest on a distro kernel and sounded the alarm when
it reported failures for features which aren't included in that kernel.
This patch set improves the test behavior in that scenario.
Thiago Jung Bauermann (3):
userfaultfd: selftest: Fix checking of userfaultfd_open() result
userfaultfd: selftest: Skip test if a feature isn't supported
userfaultfd: selftest: Report XFAIL if shmem doesn't support zeropage
tools/testing/selftests/vm/userfaultfd.c | 49 ++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.
Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.
It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733https://patchwork.ozlabs.org/patch/44600/http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctl…
All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413https://patchwork.ozlabs.org/patch/433279/
In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752
So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.
The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima@arista.com/T/#u
There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: netdev(a)vger.kernel.org
Dmitry Safonov (18):
x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
compat: Cleanup in_compat_syscall() callers
selftest/net/xfrm: Add test for ipsec tunnel
net/xfrm: Add _packed types for compat users
net/xfrm: Parse userspi_info{,_packed} depending on syscall
netlink: Do not subscribe to non-existent groups
netlink: Pass groups pointer to .bind()
xfrm: Add in-kernel groups for compat notifications
xfrm: Dump usersa_info in compat/native formats
xfrm: Send state notifications in compat format too
xfrm: Add compat support for xfrm_user_expire messages
xfrm: Add compat support for xfrm_userpolicy_info messages
xfrm: Add compat support for xfrm_user_acquire messages
xfrm: Add compat support for xfrm_user_polexpire messages
xfrm: Check compat acquire listeners in xfrm_is_alive()
xfrm: Notify compat listeners about policy flush
xfrm: Notify compat listeners about state flush
xfrm: Enable compat syscalls
MAINTAINERS | 1 +
arch/x86/include/asm/compat.h | 9 +-
arch/x86/include/asm/ftrace.h | 4 +-
arch/x86/kernel/process_64.c | 4 +-
arch/x86/kernel/sys_x86_64.c | 11 +-
arch/x86/mm/hugetlbpage.c | 4 +-
arch/x86/mm/mmap.c | 2 +-
drivers/firmware/efi/efivars.c | 16 +-
include/linux/compat.h | 4 +-
include/linux/netlink.h | 2 +-
include/net/xfrm.h | 14 -
kernel/audit.c | 2 +-
kernel/time/time.c | 2 +-
net/core/rtnetlink.c | 14 +-
net/core/sock_diag.c | 25 +-
net/netfilter/nfnetlink.c | 24 +-
net/netlink/af_netlink.c | 28 +-
net/netlink/af_netlink.h | 4 +-
net/netlink/genetlink.c | 26 +-
net/xfrm/xfrm_state.c | 5 -
net/xfrm/xfrm_user.c | 690 ++++++++---
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 1987 ++++++++++++++++++++++++++++++++
24 files changed, 2612 insertions(+), 268 deletions(-)
create mode 100644 tools/testing/selftests/net/ipsec.c
--
2.13.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
This patchset adds a test for "tc action mirred mirror" where the
mirrored-to device is a gretap, and underlay path contains a team
device.
In patch #1 require_command() is added, which should henceforth be used
to declare dependence on a certain tool.
In patch #2, two new functions, team_create() and team_destroy(), are
added to lib.sh.
The newly-added test uses arping, which isn't necessarily available.
Therefore patch #3 introduces $ARPING, and a preexisting test is fixed
to require_command $ARPING.
In patches #4 and #5, two new tests are added. In both cases, a team
device is on egress path of a mirrored packet in a mirror-to-gretap
scenario. In the first one, the team device is in loadbalance mode, in
the second one it's in lacp mode. (The difference in modes necessitates
a different testing strategy, hence two test cases instead of just
parameterizing one.)
Petr Machata (5):
selftests: forwarding: lib: Add require_command()
selftests: forwarding: lib: Support team devices
selftests: forwarding: Introduce $ARPING
selftests: forwarding: Test mirror-to-gretap w/ UL team
selftests: forwarding: Test mirror-to-gretap w/ UL team LACP
tools/testing/selftests/net/forwarding/lib.sh | 43 +++-
.../net/forwarding/mirror_gre_bridge_1q_lag.sh | 283 ++++++++++++++++++++
.../net/forwarding/mirror_gre_lag_lacp.sh | 285 +++++++++++++++++++++
.../net/forwarding/mirror_gre_vlan_bridge_1q.sh | 6 +-
4 files changed, 607 insertions(+), 10 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q_lag.sh
create mode 100755 tools/testing/selftests/net/forwarding/mirror_gre_lag_lacp.sh
--
2.4.11
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
There are two problems in this test case:
- When indexing in bash associative array, the subscript is interpreted as
string, not as a variable name to be expanded.
- The keys stored to t0s and t1s are not DSCP values, but priority +
base (i.e. the logical DSCP value, not the full bitfield value).
In combination these two bugs conspire to make the test just work,
except it doesn't really test anything and always passes.
Fix the above two problems in obvious manner.
Signed-off-by: Petr Machata <petrm(a)mellanox.com>
---
tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
index 418319f19108..cc527660a022 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
@@ -217,13 +217,13 @@ dscp_ping_test()
for key in ${!t0s[@]}; do
local expect
- if ((key == dscp_10 || key == dscp_20)); then
+ if ((key == prio+10 || key == prio+20)); then
expect=10
else
expect=0
fi
- local delta=$((t1s[key] - t0s[key]))
+ local delta=$((t1s[$key] - t0s[$key]))
((expect == delta))
check_err $? "DSCP $key: Expected to capture $expect packets, got $delta."
done
--
2.4.11
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 20, 2017 at 1:16 AM, Kyle Huey <me(a)kylehuey.com> wrote:
> This matches the only public Intel documentation of this MSR, in the
> "Virtualization Technology FlexMigration Application Note"
> (preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991)
>
> Signed-off-by: Kyle Huey <khuey(a)kylehuey.com>
The old spelling matched volume 4 of the SDM, Table 2-43. "Selected
MSRs Supported by Intel Xeon Phi Processors with
DisplayFamily_DisplayModel Signatures 06_57H and 06_85H."
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 8, 2017 at 12:09 AM, Kyle Huey <me(a)kylehuey.com> wrote:
> Hardware support for faulting on the cpuid instruction is not required to
> emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant
> MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a
> cpuid-induced VM exit checks the cpuid faulting state and the CPL.
> kvm_require_cpl is even kind enough to inject the GP fault for us.
>
> Signed-off-by: Kyle Huey <khuey(a)kylehuey.com>
> Reviewed-by: David Matlack <dmatlack(a)google.com>
I have a couple of concerns about portions of this patch:
1) There are some backward compatibility issues:
A) Suppose we have an old userspace that doesn't know it needs to
zero MSR_PLATFORM_INFO to preserve existing behavior (to the extent
possible). If a VM starts on a new kernel it could set the bit in
MSR_MISC_FEATURES_ENABLES that enables CPUID faulting. On
live-migration to an old kernel, that bit would be lost.
B) With either an old userspace or a new userspace, as a VM migrates
between old and new kernels, the behavior of RDMSR with ECX set to
either MSR_PLATFORM_INFO or MSR_MISC_FEATURES_ENABLES will vary
depending on which kernel the VM is currently running on.
Ideally, I think this new functionality should be guarded by a KVM
capability that has to be enabled from userspace.
2) This doesn't really play well with volume 3 of the SDM, section
18.7.3, where Intel instructs developers to use
MSR_PLATFORM_INFO[15:8] to determine the TSC frequency for a variety
of microarchitectures. When reads of this MSR raised #GP, it was
pretty clear that one couldn't get the TSC frequency that way, but I
don't think many consumers would specifically check for a 0 in that
field when the RDMSR succeeds. If a guest hypervisor used that value
in the computation of the TSC scaling factor for a VMCS12, for
example, it might be surprised to get a #DE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html