On Thu, May 26, 2022 at 06:40:22AM +0000, Ferenc Fejes wrote:
Hi Vladimir!
On Thu, 2022-05-26 at 00:50 +0000, Vladimir Oltean wrote:
On Fri, May 06, 2022 at 12:24:37PM +0000, Ferenc Fejes wrote:
Hi Vladimir!
On 2022. 05. 06. 14:01, Vladimir Oltean wrote:
Hi Ferenc,
On Fri, May 06, 2022 at 07:49:40AM +0000, Ferenc Fejes wrote: This is correct. I have been testing only with the offloaded tc- gate action so I did not notice that the software does not act upon the ipv. Your proposal sounds straightforward enough. Care to send a bug fix patch?
Unfortunately I cant, our company policy does not allow direct open-source contributions :-(
However I would be more than happy if you can fix it.
That's too bad.
I have a patch which I am still testing, but you've managed to captivate my attention for saying that you are testing 802.1Qch with a software implementation of tc-gate.
Do you have a use case for this? What cycle times are you targeting? How are you ensuring that you are deterministically meeting the deadlines?
The cycle times I targeted were nowhere near to a realistic TSN scenario: I "generated" ping packets in every 100 msecs and on the ingress port and I marked them with prio 1 for 500ms (gate 1) and prio 2 for another 500ms (gate 2). On the egress port I applied taprio with the same base- time and same 500-500ms cycles but reverse ordered gates (that's the "definition" of the Qch), so while act_gate on the ingress is in gate 1 cycle, the taprio kept open gate 2 and gate 1 closed, etc. For "verification" I simply run a tcpdump on the listener machine what I pinged from the talker and eyeballed wether the 5-5 packets bursts shows up arrival timestamps.
Do you also have a software tc-taprio on the egress device or is that at least offloaded?
No, I experimented with the software version, but that worked with my netns tests and on physical machines too (after the IPV patch).
I'm asking these questions because the peristaltic shaper is primarily intended to be used on hardware switches. The patch I'm preparing includes changes to struct sk_buff. I just want to know how well I'll be able to sell these changes to maintainers strongly opposing the growth of this structure for an exceedingly small niche :)
Can you tell me about the intention behind the sk_buff changes? Does that required because of some offloading scenario? In my case putting the IPV into the skb->priority was good enough because the taprio using that field by default to select the traffic class for the packet.
Thanks, Ferenc
Hopefully this patch explains it:
-----------------------------[ cut here ]----------------------------- From c8d33e0d65a4a63884a626dc04c7190a2bbe122b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean vladimir.oltean@nxp.com Date: Wed, 25 May 2022 16:27:52 +0300 Subject: [PATCH] net/sched: act_gate: act on Internal Priority Value
IEEE 802.1Q specifies in Annex T ("Cyclic queuing and forwarding") the use case for an Internal Priority Value set by the PSFP function. Essentially, the intended use is to be able to select a packet's egress queue squarely based on its arrival time. Essentially, this means that a packet with the same VLAN PCP 5 can be delivered to TC 7 or 6 depending on whether it was received in an "odd" or "even" cycle. To quote the spec, "The use of the IPV allows this direction of frames to outbound queues to be independent of the received priority, and also does not affect the priority associated with the frame on transmission."
The problem is that the software implementation of act_gate takes the IPV as entry from user space, but does not act on it in its software implementation - only offloading drivers do.
To fix this, we need to keep a current_ipv variable according to the gate entry that's currently executed by act_gate, and use this IPV to overwrite the skb->priority.
In fact, a complication arises due to the following clause from 802.1Q:
| 8.6.6.1 PSFP queuing | If PSFP is supported (8.6.5.1), and the IPV associated with the stream | filter that passed the frame is anything other than the null value, then | that IPV is used to determine the traffic class of the frame, in place | of the frame's priority, via the Traffic Class Table specified in 8.6.6. | In all other respects, the queuing actions specified in 8.6.6 are | unchanged. The IPV is used only to determine the traffic class | associated with a frame, and hence select an outbound queue; for all | other purposes, the received priority is used.
An example of layer that we don't want to see the IPV is the egress-qos-map of a potential VLAN output device. Instead, the VLAN PCP should still be populated based on the original skb->priority.
I'm aware of the existence of a skb_update_prio() function in __dev_queue_xmit(), right before passing the skb to the qdisc layer, for the use case where net_prio cgroups are used to assign processes to network priorities on a per-interface basis. But rewriting the skb->priority with the skb->ipv there simply wouldn't work, exactly because there might be a VLAN in the output path of the skb.
One might suggest: just delay the IPV rewriting in the presence of stacked devices until it is actually needed and likely to be used, like when dev->num_tc != 0 (which VLAN doesn't have). But there are still other uses of skb->priority in the qdisc layer, like skbprio, htb etc. From the spec's wording it isn't clear that we want these functions to look at the proper packet user priority or at the PSFP IPV. Probably the former.
So the only implementation that conforms to these requirements seems to be the invasive one, where we introduce a dedicated helper for the pattern where drivers and the core ask for netdev_get_prio_tc_map(dev, skb->priority). We replace all such instances with a helper that selects the TC based on IPV, if the skb was filtered by PSFP time gates on ingress, and falls back to skb->priority otherwise.
Fixes: a51c328df310 ("net: qos: introduce a gate control flow action") Reported-by: Ferenc Fejes ferenc.fejes@ericsson.com Link: https://lore.kernel.org/netdev/87o80h1n65.fsf@kurt/T/#meaec9c24b3add9cb11edd... Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +- include/linux/netdevice.h | 13 +++++++++++++ include/linux/skbuff.h | 11 +++++++++++ include/net/tc_act/tc_gate.h | 1 + net/core/dev.c | 4 ++-- net/dsa/tag_ocelot.c | 2 +- net/sched/act_gate.c | 6 ++++++ net/sched/sch_taprio.c | 12 +++--------- 8 files changed, 38 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 77c2e70b0860..719259af9aaa 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -8531,7 +8531,7 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb, int txq;
if (sb_dev) { - u8 tc = netdev_get_prio_tc_map(dev, skb->priority); + u8 tc = skb_get_prio_tc_map(skb, dev); struct net_device *vdev = sb_dev;
txq = vdev->tc_to_txq[tc].offset; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 47b59f99b037..8b87d017288f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2366,6 +2366,19 @@ int netdev_set_prio_tc_map(struct net_device *dev, u8 prio, u8 tc) return 0; }
+static inline int skb_get_prio_tc_map(const struct sk_buff *skb, + const struct net_device *dev) +{ + __u32 prio = skb->priority; + +#if IS_ENABLED(CONFIG_NET_ACT_GATE) + if (skb->use_ipv) + prio = skb->ipv; +#endif + + return netdev_get_prio_tc_map(dev, prio); +} + int netdev_txq_to_tc(struct net_device *dev, unsigned int txq); void netdev_reset_tc(struct net_device *dev); int netdev_set_tc_queue(struct net_device *dev, u8 tc, u16 count, u16 offset); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index da96f0d3e753..b0a463c0bc65 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -913,6 +913,9 @@ typedef unsigned char *sk_buff_data_t; * @csum_start: Offset from skb->head where checksumming should start * @csum_offset: Offset from csum_start where checksum should be stored * @priority: Packet queueing priority + * @use_ipv: Packet internal priority was altered by PSFP + * @ipv: Internal Priority Value, overrides priority for traffic class + * selection * @ignore_df: allow local fragmentation * @cloned: Head may be cloned (check refcnt to be sure) * @ip_summed: Driver fed us an IP checksum @@ -1145,6 +1148,10 @@ struct sk_buff { __u8 slow_gro:1; __u8 csum_not_inet:1;
+#ifdef CONFIG_NET_ACT_GATE + __u8 use_ipv:1; +#endif + #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ #endif @@ -1209,6 +1216,10 @@ struct sk_buff { /* only useable after checking ->active_extensions != 0 */ struct skb_ext *extensions; #endif + +#ifdef CONFIG_NET_ACT_GATE + __u32 ipv; +#endif };
/* if you move pkt_type around you also must adapt those constants */ diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h index c8fa11ebb397..b05c2c7d78e4 100644 --- a/include/net/tc_act/tc_gate.h +++ b/include/net/tc_act/tc_gate.h @@ -44,6 +44,7 @@ struct tcf_gate { ktime_t current_close_time; u32 current_entry_octets; s32 current_max_octets; + s32 current_ipv; struct tcfg_gate_entry *next_entry; struct hrtimer hitimer; enum tk_offsets tk_offset; diff --git a/net/core/dev.c b/net/core/dev.c index 721ba9c26554..956aa227c260 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3194,7 +3194,7 @@ static u16 skb_tx_hash(const struct net_device *dev, u16 qcount = dev->real_num_tx_queues;
if (dev->num_tc) { - u8 tc = netdev_get_prio_tc_map(dev, skb->priority); + u8 tc = skb_get_prio_tc_map(skb, dev);
qoffset = sb_dev->tc_to_txq[tc].offset; qcount = sb_dev->tc_to_txq[tc].count; @@ -4002,7 +4002,7 @@ EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb, struct xps_dev_maps *dev_maps, unsigned int tci) { - int tc = netdev_get_prio_tc_map(dev, skb->priority); + int tc = skb_get_prio_tc_map(skb, dev); struct xps_map *map; int queue_index = -1;
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 0d81f172b7a6..036e746f18eb 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -52,7 +52,7 @@ static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev, ocelot_xmit_get_vlan_info(skb, dp, &vlan_tci, &tag_type);
qos_class = netdev_get_num_tc(netdev) ? - netdev_get_prio_tc_map(netdev, skb->priority) : skb->priority; + skb_get_prio_tc_map(skb, netdev) : skb->priority;
injection = skb_push(skb, OCELOT_TAG_LEN); prefix = skb_push(skb, OCELOT_SHORT_PREFIX_LEN); diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c index fd5155274733..9fb248b104f8 100644 --- a/net/sched/act_gate.c +++ b/net/sched/act_gate.c @@ -81,6 +81,7 @@ static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0; gact->current_entry_octets = 0; gact->current_max_octets = next->maxoctets; + gact->current_ipv = next->ipv;
gact->current_close_time = ktime_add_ns(gact->current_close_time, next->interval); @@ -140,6 +141,11 @@ static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a, } }
+ if (gact->current_ipv >= 0) { + skb->use_ipv = true; + skb->ipv = gact->current_ipv; + } + spin_unlock(&gact->tcf_lock);
return gact->tcf_action; diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index b9c71a304d39..fb8bc17e38bb 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -201,7 +201,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb, s32 cycle_elapsed; int tc, n;
- tc = netdev_get_prio_tc_map(dev, skb->priority); + tc = skb_get_prio_tc_map(skb, dev); packet_transmit_time = length_to_duration(q, qdisc_pkt_len(skb));
*interval_start = 0; @@ -509,7 +509,6 @@ static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
for (i = 0; i < dev->num_tx_queues; i++) { struct Qdisc *child = q->qdiscs[i]; - int prio; u8 tc;
if (unlikely(!child)) @@ -522,9 +521,7 @@ static struct sk_buff *taprio_peek_soft(struct Qdisc *sch) if (TXTIME_ASSIST_IS_ENABLED(q->flags)) return skb;
- prio = skb->priority; - tc = netdev_get_prio_tc_map(dev, prio); - + tc = skb_get_prio_tc_map(skb, dev); if (!(gate_mask & BIT(tc))) continue;
@@ -579,7 +576,6 @@ static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch) for (i = 0; i < dev->num_tx_queues; i++) { struct Qdisc *child = q->qdiscs[i]; ktime_t guard; - int prio; int len; u8 tc;
@@ -597,9 +593,7 @@ static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch) if (!skb) continue;
- prio = skb->priority; - tc = netdev_get_prio_tc_map(dev, prio); - + tc = skb_get_prio_tc_map(skb, dev); if (!(gate_mask & BIT(tc))) { skb = NULL; continue; -----------------------------[ cut here ]-----------------------------