** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath.
A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation.
** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata.
** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie.
The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata.
-- v8 -> v9: - Rebased.
v7 -> v8: - Rebased - Redirect flow insertion to /dev/null to avoid spat in test. - Removed inline keyword in stub execute_psample_action function.
v6 -> v7: - Rebased - Fixed typo in comment.
v5 -> v6: - Renamed emit_sample -> psample - Addressed unused variable and conditionally compilation of function.
v4 -> v5: - Rebased. - Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4: - Rebased. - Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3: - Addressed comments from Simon, Aaron and Ilya. - Dropped probability propagation in nested sample actions. - Dropped patch v2's 7/9 in favor of a userspace implementation and consume skb if emit_sample is the last action, same as we do with userspace. - Split ovs-dpctl.py features in independent patches.
v1 -> v2: - Create a new action ("emit_sample") rather than reuse existing "sample" one. - Add probability semantics to psample's sampling rate. - Store sampling probability in skb's cb area and use it in emit_sample. - Test combining "emit_sample" with "trunc" - Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1: - Accommodate Ilya's comments. - Split OVS's attribute in two attributes and simplify internal handling of psample arguments. - Extend psample and tc with a user-defined cookie. - Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2: - Use psample instead of a new OVS-only multicast group. - Extend psample and tc with a user-defined cookie.
Adrian Moreno (10): net: psample: add user cookie net: sched: act_sample: add action cookie to sample net: psample: skip packet copy if no listeners net: psample: allow using rate as probability net: openvswitch: add psample action net: openvswitch: store sampling probability in cb. selftests: openvswitch: add psample action selftests: openvswitch: add userspace parsing selftests: openvswitch: parse trunc action selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++ include/net/psample.h | 5 +- include/uapi/linux/openvswitch.h | 31 +- include/uapi/linux/psample.h | 11 +- net/openvswitch/Kconfig | 1 + net/openvswitch/actions.c | 66 ++++- net/openvswitch/datapath.h | 3 + net/openvswitch/flow_netlink.c | 32 ++- net/openvswitch/vport.c | 1 + net/psample/psample.c | 16 +- net/sched/act_sample.c | 12 + .../selftests/net/openvswitch/openvswitch.sh | 115 +++++++- .../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++- 13 files changed, 566 insertions(+), 16 deletions(-)
Add sample and psample action support to ovs-dpctl.py.
Refactor common attribute parsing logic into an external function.
Reviewed-by: Aaron Conole aconole@redhat.com Signed-off-by: Adrian Moreno amorenoz@redhat.com --- .../selftests/net/openvswitch/ovs-dpctl.py | 162 +++++++++++++++++- 1 file changed, 161 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 182a09975975..dcc400a21a22 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -8,6 +8,7 @@ import argparse import errno import ipaddress import logging +import math import multiprocessing import re import socket @@ -60,6 +61,7 @@ OVS_FLOW_CMD_DEL = 2 OVS_FLOW_CMD_GET = 3 OVS_FLOW_CMD_SET = 4
+UINT32_MAX = 0xFFFFFFFF
def macstr(mac): outstr = ":".join(["%02X" % i for i in mac]) @@ -281,6 +283,75 @@ def parse_extract_field( return str_skipped, data
+def parse_attrs(actstr, attr_desc): + """Parses the given action string and returns a list of netlink + attributes based on a list of attribute descriptions. + + Each element in the attribute description list is a tuple such as: + (name, attr_name, parse_func) + where: + name: is the string representing the attribute + attr_name: is the name of the attribute as defined in the uAPI. + parse_func: is a callable accepting a string and returning either + a single object (the parsed attribute value) or a tuple of + two values (the parsed attribute value and the remaining string) + + Returns a list of attributes and the remaining string. + """ + def parse_attr(actstr, key, func): + actstr = actstr[len(key) :] + + if not func: + return None, actstr + + delim = actstr[0] + actstr = actstr[1:] + + if delim == "=": + pos = strcspn(actstr, ",)") + ret = func(actstr[:pos]) + else: + ret = func(actstr) + + if isinstance(ret, tuple): + (datum, actstr) = ret + else: + datum = ret + actstr = actstr[strcspn(actstr, ",)"):] + + if delim == "(": + if not actstr or actstr[0] != ")": + raise ValueError("Action contains unbalanced parentheses") + + actstr = actstr[1:] + + actstr = actstr[strspn(actstr, ", ") :] + + return datum, actstr + + attrs = [] + attr_desc = list(attr_desc) + while actstr and actstr[0] != ")" and attr_desc: + found = False + for i, (key, attr, func) in enumerate(attr_desc): + if actstr.startswith(key): + datum, actstr = parse_attr(actstr, key, func) + attrs.append([attr, datum]) + found = True + del attr_desc[i] + + if not found: + raise ValueError("Unknown attribute: '%s'" % actstr) + + actstr = actstr[strspn(actstr, ", ") :] + + if actstr[0] != ")": + raise ValueError("Action string contains extra garbage or has " + "unbalanced parenthesis: '%s'" % actstr) + + return attrs, actstr[1:] + + class ovs_dp_msg(genlmsg): # include the OVS version # We need a custom header rather than just being able to rely on @@ -299,7 +370,7 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_SET", "ovskey"), ("OVS_ACTION_ATTR_PUSH_VLAN", "none"), ("OVS_ACTION_ATTR_POP_VLAN", "flag"), - ("OVS_ACTION_ATTR_SAMPLE", "none"), + ("OVS_ACTION_ATTR_SAMPLE", "sample"), ("OVS_ACTION_ATTR_RECIRC", "uint32"), ("OVS_ACTION_ATTR_HASH", "none"), ("OVS_ACTION_ATTR_PUSH_MPLS", "none"), @@ -318,8 +389,85 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_ADD_MPLS", "none"), ("OVS_ACTION_ATTR_DEC_TTL", "none"), ("OVS_ACTION_ATTR_DROP", "uint32"), + ("OVS_ACTION_ATTR_PSAMPLE", "psample"), )
+ class psample(nla): + nla_flags = NLA_F_NESTED + + nla_map = ( + ("OVS_PSAMPLE_ATTR_UNSPEC", "none"), + ("OVS_PSAMPLE_ATTR_GROUP", "uint32"), + ("OVS_PSAMPLE_ATTR_COOKIE", "array(uint8)"), + ) + + def dpstr(self, more=False): + args = "group=%d" % self.get_attr("OVS_PSAMPLE_ATTR_GROUP") + + cookie = self.get_attr("OVS_PSAMPLE_ATTR_COOKIE") + if cookie: + args += ",cookie(%s)" % \ + "".join(format(x, "02x") for x in cookie) + + return "psample(%s)" % args + + def parse(self, actstr): + desc = ( + ("group", "OVS_PSAMPLE_ATTR_GROUP", int), + ("cookie", "OVS_PSAMPLE_ATTR_COOKIE", + lambda x: list(bytearray.fromhex(x))) + ) + + attrs, actstr = parse_attrs(actstr, desc) + + for attr in attrs: + self["attrs"].append(attr) + + return actstr + + class sample(nla): + nla_flags = NLA_F_NESTED + + nla_map = ( + ("OVS_SAMPLE_ATTR_UNSPEC", "none"), + ("OVS_SAMPLE_ATTR_PROBABILITY", "uint32"), + ("OVS_SAMPLE_ATTR_ACTIONS", "ovsactions"), + ) + + def dpstr(self, more=False): + args = [] + + args.append("sample={:.2f}%".format( + 100 * self.get_attr("OVS_SAMPLE_ATTR_PROBABILITY") / + UINT32_MAX)) + + actions = self.get_attr("OVS_SAMPLE_ATTR_ACTIONS") + if actions: + args.append("actions(%s)" % actions.dpstr(more)) + + return "sample(%s)" % ",".join(args) + + def parse(self, actstr): + def parse_nested_actions(actstr): + subacts = ovsactions() + parsed_len = subacts.parse(actstr) + return subacts, actstr[parsed_len :] + + def percent_to_rate(percent): + percent = float(percent.strip('%')) + return int(math.floor(UINT32_MAX * (percent / 100.0) + .5)) + + desc = ( + ("sample", "OVS_SAMPLE_ATTR_PROBABILITY", percent_to_rate), + ("actions", "OVS_SAMPLE_ATTR_ACTIONS", parse_nested_actions), + ) + attrs, actstr = parse_attrs(actstr, desc) + + for attr in attrs: + self["attrs"].append(attr) + + return actstr + class ctact(nla): nla_flags = NLA_F_NESTED
@@ -683,6 +831,18 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_CT", ctact]) parsed = True
+ elif parse_starts_block(actstr, "sample(", False): + sampleact = self.sample() + actstr = sampleact.parse(actstr[len("sample(") : ]) + self["attrs"].append(["OVS_ACTION_ATTR_SAMPLE", sampleact]) + parsed = True + + elif parse_starts_block(actstr, "psample(", False): + psampleact = self.psample() + actstr = psampleact.parse(actstr[len("psample(") : ]) + self["attrs"].append(["OVS_ACTION_ATTR_PSAMPLE", psampleact]) + parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1
The userspace action lacks parsing support plus it contains a bug in the name of one of its attributes.
This patch makes userspace action work.
Reviewed-by: Aaron Conole aconole@redhat.com Signed-off-by: Adrian Moreno amorenoz@redhat.com --- .../selftests/net/openvswitch/ovs-dpctl.py | 24 +++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index dcc400a21a22..4ccf26f96327 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -589,13 +589,27 @@ class ovsactions(nla): print_str += "userdata=" for f in self.get_attr("OVS_USERSPACE_ATTR_USERDATA"): print_str += "%x." % f - if self.get_attr("OVS_USERSPACE_ATTR_TUN_PORT") is not None: + if self.get_attr("OVS_USERSPACE_ATTR_EGRESS_TUN_PORT") is not None: print_str += "egress_tun_port=%d" % self.get_attr( - "OVS_USERSPACE_ATTR_TUN_PORT" + "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT" ) print_str += ")" return print_str
+ def parse(self, actstr): + attrs_desc = ( + ("pid", "OVS_USERSPACE_ATTR_PID", int), + ("userdata", "OVS_USERSPACE_ATTR_USERDATA", + lambda x: list(bytearray.fromhex(x))), + ("egress_tun_port", "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT", int) + ) + + attrs, actstr = parse_attrs(actstr, attrs_desc) + for attr in attrs: + self["attrs"].append(attr) + + return actstr + def dpstr(self, more=False): print_str = ""
@@ -843,6 +857,12 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_PSAMPLE", psampleact]) parsed = True
+ elif parse_starts_block(actstr, "userspace(", False): + uact = self.userspace() + actstr = uact.parse(actstr[len("userspace(") : ]) + self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact]) + parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1
The trunc action was supported decode-able but not parse-able. Add support for parsing the action string.
Reviewed-by: Aaron Conole aconole@redhat.com Signed-off-by: Adrian Moreno amorenoz@redhat.com --- .../testing/selftests/net/openvswitch/ovs-dpctl.py | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 4ccf26f96327..e8dc9af10d4d 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -863,6 +863,19 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact]) parsed = True
+ elif parse_starts_block(actstr, "trunc(", False): + parencount += 1 + actstr, val = parse_extract_field( + actstr, + "trunc(", + r"([0-9]+)", + int, + False, + None, + ) + self["attrs"].append(["OVS_ACTION_ATTR_TRUNC", val]) + parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1
Add a test to verify sampling packets via psample works.
In order to do that, create a subcommand in ovs-dpctl.py to listen to on the psample multicast group and print samples.
Reviewed-by: Aaron Conole aconole@redhat.com Tested-by: Ilya Maximets i.maximets@ovn.org Signed-off-by: Adrian Moreno amorenoz@redhat.com --- .../selftests/net/openvswitch/openvswitch.sh | 115 +++++++++++++++++- .../selftests/net/openvswitch/ovs-dpctl.py | 73 ++++++++++- 2 files changed, 182 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 531951086d9c..bc71dbc18b21 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -20,7 +20,8 @@ tests=" nat_related_v4 ip4-nat-related: ICMP related matches work with SNAT netlink_checks ovsnl: validate netlink attrs and settings upcall_interfaces ovs: test the upcall interfaces - drop_reason drop: test drop reasons are emitted" + drop_reason drop: test drop reasons are emitted + psample psample: Sampling packets with psample"
info() { [ "${ovs_dir}" != "" ] && @@ -105,12 +106,21 @@ ovs_netns_spawn_daemon() { shift netns=$1 shift - info "spawning cmd: $*" - ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + if [ "$netns" == "_default" ]; then + $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + else + ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + fi pid=$! ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null" }
+ovs_spawn_daemon() { + sbx=$1 + shift + ovs_netns_spawn_daemon $sbx "_default" $* +} + ovs_add_netns_and_veths () { info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}" ovs_sbx "$1" ip netns add "$3" || return 1 @@ -173,6 +183,19 @@ ovs_drop_reason_count() return `echo "$perf_output" | grep "$pattern" | wc -l` }
+ovs_test_flow_fails () { + ERR_MSG="Flow actions may not be safe on all matching packets" + + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}") + ovs_add_flow $@ &> /dev/null $@ && return 1 + POST_TEST=$(dmesg | grep -c "${ERR_MSG}") + + if [ "$PRE_TEST" == "$POST_TEST" ]; then + return 1 + fi + return 0 +} + usage() { echo echo "$0 [OPTIONS] [TEST]..." @@ -187,6 +210,92 @@ usage() { exit 1 }
+ +# psample test +# - use psample to observe packets +test_psample() { + sbx_add "test_psample" || return $? + + # Add a datapath with per-vport dispatching. + ovs_add_dp "test_psample" psample -V 2:1 || return 1 + + info "create namespaces" + ovs_add_netns_and_veths "test_psample" "psample" \ + client c0 c1 172.31.110.10/24 -u || return 1 + ovs_add_netns_and_veths "test_psample" "psample" \ + server s0 s1 172.31.110.20/24 -u || return 1 + + # Check if psample actions can be configured. + ovs_add_flow "test_psample" psample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' 'psample(group=1)' &> /dev/null + if [ $? == 1 ]; then + info "no support for psample - skipping" + ovs_exit_sig + return $ksft_skip + fi + + ovs_del_flows "test_psample" psample + + # Test action verification. + OLDIFS=$IFS + IFS='*' + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()' + for testcase in \ + "cookie to large"*"psample(group=1,cookie=1615141312111009080706050403020100)" \ + "no group with cookie"*"psample(cookie=abcd)" \ + "no group"*"psample()"; + do + set -- $testcase; + ovs_test_flow_fails "test_psample" psample $min_key $2 + if [ $? == 1 ]; then + info "failed - $1" + return 1 + fi + done + IFS=$OLDIFS + + ovs_del_flows "test_psample" psample + # Allow ARP + ovs_add_flow "test_psample" psample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1 + ovs_add_flow "test_psample" psample \ + 'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1 + + # Sample first 14 bytes of all traffic. + ovs_add_flow "test_psample" psample \ + "in_port(1),eth(),eth_type(0x0800),ipv4()" \ + "trunc(14),psample(group=1,cookie=c0ffee),2" + + # Sample all traffic. In this case, use a sample() action with both + # psample and an upcall emulating simultaneous local sampling and + # sFlow / IPFIX. + nlpid=$(grep -E "listening on upcall packet handler" \ + $ovs_dir/s0.out | cut -d ":" -f 2 | tr -d ' ') + + ovs_add_flow "test_psample" psample \ + "in_port(2),eth(),eth_type(0x0800),ipv4()" \ + "sample(sample=100%,actions(psample(group=2,cookie=eeff0c),userspace(pid=${nlpid},userdata=eeff0c))),1" + + # Record psample data. + ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events + + # Send a single ping. + sleep 1 + ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1 + sleep 1 + + # We should have received one userspace action upcall and 2 psample packets. + grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1 + + # client -> server samples should only contain the first 14 bytes of the packet. + grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \ + $ovs_dir/stdout >/dev/null 2>&1 || return 1 + grep -E "rate:4294967295,group:2,cookie:eeff0c" \ + $ovs_dir/stdout >/dev/null 2>&1 || return 1 + + return 0 +} + # drop_reason test # - drop packets and verify the right drop reason is reported test_drop_reason() { diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index e8dc9af10d4d..1e15b0818074 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -28,8 +28,10 @@ try: from pyroute2.netlink import genlmsg from pyroute2.netlink import nla from pyroute2.netlink import nlmsg_atoms + from pyroute2.netlink.event import EventSocket from pyroute2.netlink.exceptions import NetlinkError from pyroute2.netlink.generic import GenericNetlinkSocket + from pyroute2.netlink.nlsocket import Marshal import pyroute2 import pyroute2.iproute
@@ -2460,10 +2462,70 @@ class OvsFlow(GenericNetlinkSocket): print("MISS upcall[%d/%s]: %s" % (seq, pktpres, keystr), flush=True)
def execute(self, packetmsg): - print("userspace execute command") + print("userspace execute command", flush=True)
def action(self, packetmsg): - print("userspace action command") + print("userspace action command", flush=True) + + +class psample_sample(genlmsg): + nla_map = ( + ("PSAMPLE_ATTR_IIFINDEX", "none"), + ("PSAMPLE_ATTR_OIFINDEX", "none"), + ("PSAMPLE_ATTR_ORIGSIZE", "none"), + ("PSAMPLE_ATTR_SAMPLE_GROUP", "uint32"), + ("PSAMPLE_ATTR_GROUP_SEQ", "none"), + ("PSAMPLE_ATTR_SAMPLE_RATE", "uint32"), + ("PSAMPLE_ATTR_DATA", "array(uint8)"), + ("PSAMPLE_ATTR_GROUP_REFCOUNT", "none"), + ("PSAMPLE_ATTR_TUNNEL", "none"), + ("PSAMPLE_ATTR_PAD", "none"), + ("PSAMPLE_ATTR_OUT_TC", "none"), + ("PSAMPLE_ATTR_OUT_TC_OCC", "none"), + ("PSAMPLE_ATTR_LATENCY", "none"), + ("PSAMPLE_ATTR_TIMESTAMP", "none"), + ("PSAMPLE_ATTR_PROTO", "none"), + ("PSAMPLE_ATTR_USER_COOKIE", "array(uint8)"), + ) + + def dpstr(self): + fields = [] + data = "" + for (attr, value) in self["attrs"]: + if attr == "PSAMPLE_ATTR_SAMPLE_GROUP": + fields.append("group:%d" % value) + if attr == "PSAMPLE_ATTR_SAMPLE_RATE": + fields.append("rate:%d" % value) + if attr == "PSAMPLE_ATTR_USER_COOKIE": + value = "".join(format(x, "02x") for x in value) + fields.append("cookie:%s" % value) + if attr == "PSAMPLE_ATTR_DATA" and len(value) > 0: + data = "data:%s" % "".join(format(x, "02x") for x in value) + + return ("%s %s" % (",".join(fields), data)).strip() + + +class psample_msg(Marshal): + PSAMPLE_CMD_SAMPLE = 0 + PSAMPLE_CMD_GET_GROUP = 1 + PSAMPLE_CMD_NEW_GROUP = 2 + PSAMPLE_CMD_DEL_GROUP = 3 + PSAMPLE_CMD_SET_FILTER = 4 + msg_map = {PSAMPLE_CMD_SAMPLE: psample_sample} + + +class PsampleEvent(EventSocket): + genl_family = "psample" + mcast_groups = ["packets"] + marshal_class = psample_msg + + def read_samples(self): + while True: + try: + for msg in self.get(): + print(msg.dpstr(), flush=True) + except NetlinkError as ne: + raise ne
def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()): @@ -2530,7 +2592,7 @@ def main(argv): help="Increment 'verbose' output counter.", default=0, ) - subparsers = parser.add_subparsers() + subparsers = parser.add_subparsers(dest="subcommand")
showdpcmd = subparsers.add_parser("show") showdpcmd.add_argument( @@ -2605,6 +2667,8 @@ def main(argv): delfscmd = subparsers.add_parser("del-flows") delfscmd.add_argument("flsbr", help="Datapath name")
+ subparsers.add_parser("psample-events") + args = parser.parse_args()
if args.verbose > 0: @@ -2619,6 +2683,9 @@ def main(argv):
sys.setrecursionlimit(100000)
+ if args.subcommand == "psample-events": + PsampleEvent().read_samples() + if hasattr(args, "showdp"): found = False for iface in ndb.interfaces:
On Thu, Jul 04, 2024 at 10:56:51AM GMT, Adrian Moreno wrote:
** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath.
A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation.
** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy.
Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata.
** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie.
The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample.
Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata.
-- v8 -> v9:
- Rebased.
v7 -> v8:
- Rebased
- Redirect flow insertion to /dev/null to avoid spat in test.
- Removed inline keyword in stub execute_psample_action function.
v6 -> v7:
- Rebased
- Fixed typo in comment.
v5 -> v6:
- Renamed emit_sample -> psample
- Addressed unused variable and conditionally compilation of function.
v4 -> v5:
- Rebased.
- Removed lefover enum value and wrapped some long lines in selftests.
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with userspace.
- Split ovs-dpctl.py features in independent patches.
v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing "sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.
rfc_v2 -> v1:
- Accommodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.
rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.
Adrian Moreno (10): net: psample: add user cookie net: sched: act_sample: add action cookie to sample net: psample: skip packet copy if no listeners net: psample: allow using rate as probability net: openvswitch: add psample action net: openvswitch: store sampling probability in cb. selftests: openvswitch: add psample action selftests: openvswitch: add userspace parsing selftests: openvswitch: parse trunc action selftests: openvswitch: add psample test
Documentation/netlink/specs/ovs_flow.yaml | 17 ++ include/net/psample.h | 5 +- include/uapi/linux/openvswitch.h | 31 +- include/uapi/linux/psample.h | 11 +- net/openvswitch/Kconfig | 1 + net/openvswitch/actions.c | 66 ++++- net/openvswitch/datapath.h | 3 + net/openvswitch/flow_netlink.c | 32 ++- net/openvswitch/vport.c | 1 + net/psample/psample.c | 16 +- net/sched/act_sample.c | 12 + .../selftests/net/openvswitch/openvswitch.sh | 115 +++++++- .../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++- 13 files changed, 566 insertions(+), 16 deletions(-)
-- 2.45.2
Hi,
Simon Horman has spotted that openvswitch.sh tests are failing in the debug executor:
https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh
The failing tests are two: psample and upcall_interfaces. These two tests have a known source of instability (they use "sleep") that make them specially unreliable in slow systems.
Aaron and I already discussed this and I'm working on a patch to make both tests more robust by adding a wait-and-retry mechanism.
I hope this series can be considered regardless of this flaky tests.
Thanks. Adrián
Hello:
This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Thu, 4 Jul 2024 10:56:51 +0200 you wrote:
** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath.
[...]
Here is the summary with links: - [net-next,v9,01/10] net: psample: add user cookie https://git.kernel.org/netdev/net-next/c/093b0f366567 - [net-next,v9,02/10] net: sched: act_sample: add action cookie to sample https://git.kernel.org/netdev/net-next/c/03448444ae5c - [net-next,v9,03/10] net: psample: skip packet copy if no listeners https://git.kernel.org/netdev/net-next/c/c35d86a23029 - [net-next,v9,04/10] net: psample: allow using rate as probability https://git.kernel.org/netdev/net-next/c/7b1b2b60c63f - [net-next,v9,05/10] net: openvswitch: add psample action https://git.kernel.org/netdev/net-next/c/aae0b82b46cb - [net-next,v9,06/10] net: openvswitch: store sampling probability in cb. https://git.kernel.org/netdev/net-next/c/71763d8a8203 - [net-next,v9,07/10] selftests: openvswitch: add psample action https://git.kernel.org/netdev/net-next/c/60ccf62d3ceb - [net-next,v9,08/10] selftests: openvswitch: add userspace parsing https://git.kernel.org/netdev/net-next/c/c7815abbea45 - [net-next,v9,09/10] selftests: openvswitch: parse trunc action https://git.kernel.org/netdev/net-next/c/b192bf12dbb0 - [net-next,v9,10/10] selftests: openvswitch: add psample test https://git.kernel.org/netdev/net-next/c/30d772a03582
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org