Chengen Du wrote:
Hi Willem,
My apologies, but I still have some questions I would like to discuss with you.
On Wed, Jun 5, 2024 at 6:57 AM Willem de Bruijn willemdebruijn.kernel@gmail.com wrote:
Chengen Du wrote:
The issue initially stems from libpcap. The ethertype will be overwritten as the VLAN TPID if the network interface lacks hardware VLAN offloading. In the outbound packet path, if hardware VLAN offloading is unavailable, the VLAN tag is inserted into the payload but then cleared from the sk_buff struct. Consequently, this can lead to a false negative when checking for the presence of a VLAN tag, causing the packet sniffing outcome to lack VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing tool may be unable to parse packets as expected.
The TCI-TPID is missing because the prb_fill_vlan_info() function does not modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the payload and not in the sk_buff struct. The skb_vlan_tag_present() function only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header is stripped, preventing the packet capturing tool from determining the correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, which means the packet capturing tool cannot parse the L3 header correctly.
Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.c... Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") Cc: stable@vger.kernel.org Signed-off-by: Chengen Du chengen.du@canonical.com
net/packet/af_packet.c | 64 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index ea3ebc160e25..53d51ac87ac6 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -538,6 +538,52 @@ static void *packet_current_frame(struct packet_sock *po, return packet_lookup_frame(po, rb, rb->head, status); }
+static u16 vlan_get_tci(struct sk_buff *skb) +{
unsigned int vlan_depth = skb->mac_len;
struct vlan_hdr vhdr, *vh;
u8 *skb_head = skb->data;
int skb_len = skb->len;
if (vlan_depth) {
if (WARN_ON(vlan_depth < VLAN_HLEN))
return 0;
vlan_depth -= VLAN_HLEN;
} else {
vlan_depth = ETH_HLEN;
}
skb_push(skb, skb->data - skb_mac_header(skb));
vh = skb_header_pointer(skb, vlan_depth, sizeof(vhdr), &vhdr);
if (skb_head != skb->data) {
skb->data = skb_head;
skb->len = skb_len;
}
if (unlikely(!vh))
return 0;
return ntohs(vh->h_vlan_TCI);
As stated in the conversation: no need for the vlan_depth code.
skb->data is either at the link layer header or immediately beyond it (i.e., at the outer vlan tag).
I'm confused about this part and feel there may be some misunderstanding on my end. From what I understand, skb->data will be at different positions depending on the scenario. For example, in tpacket_rcv(), in SOCK_RAW, it will be at the link layer header, but for SOCK_DGRAM with PACKET_OUTGOING, it will be at the network layer header.
Right, but that is a binary option. Either at L2 or right after.
skb_header_pointer(skb, skb_mac_offset(skb) - ETH_VLEN, ...) will do.
Given this situation, it seems necessary to adjust skb->data to point to the link layer header and then seek the VLAN tag based on the MAC length, rather than parsing directly from the skb->data head. Could you please clarify this in more detail?
+}
+static __be16 vlan_get_protocol_dgram(struct sk_buff *skb) +{
__be16 proto = skb->protocol;
if (unlikely(eth_type_vlan(proto))) {
u8 *skb_head = skb->data;
Since skb->head is a different thing from skb->data, please call this orig_data or so.
int skb_len = skb->len;
skb_push(skb, skb->data - skb_mac_header(skb));
proto = __vlan_get_protocol(skb, proto, NULL);
if (skb_head != skb->data) {
skb->data = skb_head;
skb->len = skb_len;
}
}
I don't see a way around this temporary skb->data mangling, so +1 unless someone else suggests a cleaner way.
On second thought, one option:
This adds some parsing overhead in the datapath. SOCK_RAW does not need it, as it can see the whole VLAN tag. Perhaps limit the new branches to SOCK_DGRAM cases? Then the above can also be simplified.
I considered this approach before, but it would result in different metadata for SOCK_DGRAM and SOCK_RAW scenarios. This difference makes me hesitate because it might be better to provide consistent metadata to describe the same packet, regardless of the receiver's approach. These are just my thoughts and I'm open to further discussion.
See Alexandre's response and my follow-up.