On 12/6/22 21:43, Jakub Kicinski wrote:
On Mon, 5 Dec 2022 21:44:09 +0100 Justin Iurman wrote:
Please revert this patch.
Many people use FQ qdisc, where packets are waiting for their Earliest Departure Time to be released.
The IOAM queue depth is a very important value and is already used.
Can you say more about the use? What signal do you derive from it? I do track qlen on Meta's servers but haven't found a strong use for it yet (I did for backlog drops but not the qlen itself).
The specification goal of the queue depth was initially to be able to track the entire path with a detailed view for packets or flows (kind of a zoom on the interface to have details about its queues). With the current definition/implementation of the queue depth, if only one queue is congested, you're able to know it. Which doesn't necessarily mean that all queues are full, but this one is and there might be something going on. And this is something operators might want to be able to detect precisely, for a lot of use cases depending on the situation. On the contrary, if all queues are full, then you could deduce that as well for each queue separately, as soon as a packet is assigned to it. So I think that with "queue depth = sum(queues)", you don't have details and you're not able to detect a single queue congestion, while with "queue depth = queue" you could detect both. One might argue that it's fine to only have the aggregation in some situation. I'd say that we might need both, actually. Which is technically possible (even though expensive, as Eric mentioned) thanks to the way it is specified by the RFC, where some freedom was intentionally given. I could come up with a solution for that.
Also, the draft says:
5.4.2.7. queue depth
The "queue depth" field is a 4-octet unsigned integer field. This field indicates the current length of the egress interface queue of the interface from where the packet is forwarded out. The queue depth is expressed as the current amount of memory buffers used by the queue (a packet could consume one or more memory buffers, depending on its size). 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | queue depth | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It is relatively clear that the egress interface is the aggregate egress interface, not a subset of the interface.
Correct, even though the definition of an interface in RFC 9197 is quite abstract (see the end of section 4.4.2.2: "[...] could represent a physical interface, a virtual or logical interface, or even a queue").
If you have 32 TX queues on a NIC, all of them being backlogged (line rate), sensing the queue length of one of the queues would give a 97% error on the measure.
Why would it? Not sure I get your idea based on that example.
Because it measures the length of a single queue not the device.
Yep, I figured that out after the off-list discussion we've had with Eric.
So my plan would be, if you all agree with, to correct and repost this patch to fix the NULL qdisc issue. Then, I'd come with a solution to allow both (with and without aggregation of queues) and post it on net-next. But again, if the consensus is to revert this patch (which I think would bring no benefit IMHO), then so be it. Thoughts?