Hello,
Thanks for your time reviewing!
On Wed, 2021-08-25 at 19:05 +0200, Pablo Neira Ayuso wrote:
Hi,
On Mon, Aug 09, 2021 at 04:10:36PM +1200, Cole Dishington wrote:
Adds support for masquerading into a smaller subset of ports - defined by the PSID values from RFC-7597 Section 5.1. This is part of the support for MAP-E and Lightweight 4over6, which allows multiple devices to share an IPv4 address by splitting the L4 port / id into ranges.
Co-developed-by: Anthony Lineham anthony.lineham@alliedtelesis.co.nz Signed-off-by: Anthony Lineham anthony.lineham@alliedtelesis.co.nz Co-developed-by: Scott Parlane scott.parlane@alliedtelesis.co.nz Signed-off-by: Scott Parlane scott.parlane@alliedtelesis.co.nz Signed-off-by: Blair Steven blair.steven@alliedtelesis.co.nz Signed-off-by: Cole Dishington Cole.Dishington@alliedtelesis.co.nz Reviewed-by: Florian Westphal fw@strlen.de
[...]
Looking at the userspace logic:
https://scanmail.trustwave.com/?c=20988&d=6vim4fcVLjPkIbLUDqz3Tj2W4gXWNC...
Chunk extracted from void parse_psid(...)
offset = (1 << (16 - offset_len));
Assuming offset_len = 6, then you skip 0-1023 ports, OK.
psid = psid << (16 - offset_len - psid_len);
This psid calculation is correct? Maybe:
psid = psid << (16 - offset_len);
PSID port numbers have the form [offset|PSID|j] and 16 = offset_length + PSID_length + j_length. The PSID calculation above is bit shifting the passed psid up j_length.
The userspace tool accepts the unshifted psid to be consistent with how RFC7597 specified it (see RFC7597 Appendix A. Examples).
instead?
psid=0 => 0 << (16 - 6) = 1024 psid=1 => 1 << (16 - 6) = 2048
This is implicitly assuming that 64 PSIDs are available, each of them taking 1024 ports, ie. psid_len is 6 bits. But why are you subtracting the psid_len above?
/* Handle the special case of no offset bits (a=0), so offset loops */ min = psid;
OK, this line above is the minimal port in the range
if (offset) min += offset;
... which is incremented by the offset (to skip the 0-1023 ports).
r->min_proto.all = htons(min); r->max_proto.all = htons(min + ((1 << (16 - offset_len - psid_len)) - 1));
Here, you subtract psid_len again, not sure why.
Each PSID port range is made up of many smaller contiguous port sub-ranges (except for the special case of offset_len = 0) e.g. for PSID=0x34,psid_length=8,psid_offset=6 the ranges are 1232-1235, 2256-2259, ..., 63696-63699, 64720-64723 (Taken from rfc7597 Appendix A. Examples). The above calculation is selecting the first sub-range. Max is computed by finding j_length and filling it with 1's.
r->base_proto.all = htons(offset);
base is set to offset, ie. 1024.
r->flags |= NF_NAT_RANGE_PSID; r->flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
Now looking at the kernel side.
diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c index 8e8a65d46345..19a4754cda76 100644 --- a/net/netfilter/nf_nat_masquerade.c +++ b/net/netfilter/nf_nat_masquerade.c @@ -55,8 +55,31 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS; newrange.min_addr.ip = newsrc; newrange.max_addr.ip = newsrc;
- newrange.min_proto = range->min_proto;
- newrange.max_proto = range->max_proto;
- if (range->flags & NF_NAT_RANGE_PSID) {
u16 base = ntohs(range->base_proto.all);
u16 min = ntohs(range->min_proto.all);
u16 off = 0;
/* xtables should stop base > 2^15 by enforcement of
* 0 <= offset_len < 16 argument, with offset_len=0
* as a special case inwhich base=0.
I don't understand this comment.
This is a sanity check. The userspace tool restricts offset_len to the specified range and since base = 2^(16 - offset_len) (or base = 0 for the special case of offset_len = 16) the below condition should never be true. However, if base greater than 1<<15 was allowed, a divide by zero error would occur on the block below.
*/
if (WARN_ON_ONCE(base > (1 << 15)))
return NF_DROP;
/* If offset=0, port range is in one contiguous block */
if (base)
off = prandom_u32_max(((1 << 16) / base) - 1);
Assuming the example above, base is set to 1024. Then, off is a random value between UINT16_MAX (you expressed this as 1 << 16) and the base which is 1024 minus 1.
So this is picking a random off (actually the PSID?) between 0 and 63. What about clashes? I mean, two different machines behind the NAT might get the same off.
newrange.min_proto.all = htons(min + base * off);
min could be 1024, 2048, 3072... you add base which is 1024 * off.
Is this duplicated? Both calculated in user and kernel space?
Each PSID value defines many contiguous port sub-ranges. The randomly chosen off selects the ith sub-range for a given PSID e.g. off=1 would select 2256-2259 for rfc7597 Appendix A. Examples.
The userspace tool calculates the min and max of the first sub-range for a given psid, whereas the above randomly selects one of the sub-ranges for a given psid.
j_length determines how large each sub-range will be, so for small j_length values there still is the risk the chosen sub-range will be exhausted.
newrange.max_proto.all = htons(ntohs(newrange.min_proto.all) + ntohs(range->max_proto.all) - min);
I'm stopping here, I'm getting lost.
My understanding about this RFC is that you would like to split the 16-bit ports in ranges to uniquely identify the host behind the NAT.
Why don't you just you just select the port range from userspace utilizing the existing infrastructure? I mean, why do you need this kernel patch?
If utilizing existing infrastruture to install PSID port ranges a lot of rules would be required as each PSID port range is made up of many smaller sub-ranges.
e.g. (from rfc7597 Appendix A. Examples) for psid_length=8,offset_length=6 each PSID would need 63 NF_NAT_RANGE_PROTO_SPECIFIED rules, hence a total of 16128 rules if all the PSIDs were allocated.
Florian already suggested:
Is it really needed to place all of this in the nat core?
The only thing that has to be done in the NAT core, afaics, is to suppress port reallocation attmepts when NF_NAT_RANGE_PSID is set.
Is there a reason why nf_nat_masquerade_ipv4/6 can't be changed instead to do what you want?
AFAICS its enough to set NF_NAT_RANGE_PROTO_SPECIFIED and init the upper/lower boundaries, i.e. change input given to nf_nat_setup_info().
extracted from:
https://scanmail.trustwave.com/?c=20988&d=6vim4fcVLjPkIbLUDqz3Tj2W4gXWNC...