This patch set extends the locked port feature for devices that are behind a locked port, but do not have the ability to authorize themselves as a supplicant using IEEE 802.1X. Such devices can be printers, meters or anything related to fixed installations. Instead of 802.1X authorization, devices can get access based on their MAC addresses being whitelisted.
For an authorization daemon to detect that a device is trying to get access through a locked port, the bridge will add the MAC address of the device to the FDB with a locked flag to it. Thus the authorization daemon can catch the FDB add event and check if the MAC address is in the whitelist and if so replace the FDB entry without the locked flag enabled, and thus open the port for the device.
This feature is known as MAC-Auth or MAC Authentication Bypass (MAB) in Cisco terminology, where the full MAB concept involves additional Cisco infrastructure for authorization. There is no real authentication process, as the MAC address of the device is the only input the authorization daemon, in the general case, has to base the decision if to unlock the port or not.
With this patch set, an implementation of the offloaded case is supplied for the mv88e6xxx driver. When a packet ingresses on a locked port, an ATU miss violation event will occur. When handling such ATU miss violation interrupts, the MAC address of the device is added to the FDB with a zero destination port vector (DPV) and the MAC address is communicated through the switchdev layer to the bridge, so that a FDB entry with the locked flag enabled can be added.
Hans Schultz (4): net: bridge: add fdb flag to extent locked port feature net: switchdev: add support for offloading of fdb locked flag net: dsa: mv88e6xxx: mac-auth/MAB implementation selftests: forwarding: add test of MAC-Auth Bypass to locked port tests
drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 40 ++- drivers/net/dsa/mv88e6xxx/chip.h | 5 + drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global1_atu.c | 35 ++- .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c | 249 ++++++++++++++++++ .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h | 40 +++ drivers/net/dsa/mv88e6xxx/port.c | 32 ++- drivers/net/dsa/mv88e6xxx/port.h | 2 + include/net/dsa.h | 6 + include/net/switchdev.h | 3 +- include/uapi/linux/neighbour.h | 1 + net/bridge/br.c | 3 +- net/bridge/br_fdb.c | 18 +- net/bridge/br_if.c | 1 + net/bridge/br_input.c | 11 +- net/bridge/br_private.h | 9 +- .../net/forwarding/bridge_locked_port.sh | 42 ++- 18 files changed, 470 insertions(+), 29 deletions(-) create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
Add an intermediate state for clients behind a locked port to allow for possible opening of the port for said clients. This feature corresponds to the Mac-Auth and MAC Authentication Bypass (MAB) named features. The latter defined by Cisco. Locked FDB entries will be limited in number, so as to prevent DOS attacks by spamming the port with random entries. The limit will be a per port limit as it is a port based feature and that the port flushes all FDB entries on link down.
Only the kernel can set this FDB entry flag, while userspace can read the flag and remove it by deleting the FDB entry.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com --- include/uapi/linux/neighbour.h | 1 + net/bridge/br_fdb.c | 11 +++++++++++ net/bridge/br_if.c | 1 + net/bridge/br_input.c | 11 ++++++++++- net/bridge/br_private.h | 7 ++++++- 5 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h index 39c565e460c7..76d65b481086 100644 --- a/include/uapi/linux/neighbour.h +++ b/include/uapi/linux/neighbour.h @@ -53,6 +53,7 @@ enum { #define NTF_ROUTER (1 << 7) /* Extended flags under NDA_FLAGS_EXT: */ #define NTF_EXT_MANAGED (1 << 0) +#define NTF_EXT_LOCKED (1 << 1)
/* * Neighbor Cache Entry States. diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index e7f4fccb6adb..6b83e2d6435d 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -105,6 +105,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br, struct nda_cacheinfo ci; struct nlmsghdr *nlh; struct ndmsg *ndm; + u32 ext_flags = 0;
nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags); if (nlh == NULL) @@ -125,11 +126,16 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br, ndm->ndm_flags |= NTF_EXT_LEARNED; if (test_bit(BR_FDB_STICKY, &fdb->flags)) ndm->ndm_flags |= NTF_STICKY; + if (test_bit(BR_FDB_ENTRY_LOCKED, &fdb->flags)) + ext_flags |= NTF_EXT_LOCKED;
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->key.addr)) goto nla_put_failure; if (nla_put_u32(skb, NDA_MASTER, br->dev->ifindex)) goto nla_put_failure; + if (nla_put_u32(skb, NDA_FLAGS_EXT, ext_flags)) + goto nla_put_failure; + ci.ndm_used = jiffies_to_clock_t(now - fdb->used); ci.ndm_confirmed = 0; ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated); @@ -171,6 +177,7 @@ static inline size_t fdb_nlmsg_size(void) return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN) /* NDA_LLADDR */ + nla_total_size(sizeof(u32)) /* NDA_MASTER */ + + nla_total_size(sizeof(u32)) /* NDA_FLAGS_EXT */ + nla_total_size(sizeof(u16)) /* NDA_VLAN */ + nla_total_size(sizeof(struct nda_cacheinfo)) + nla_total_size(0) /* NDA_FDB_EXT_ATTRS */ @@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
+ if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags)) + atomic_dec(&f->dst->locked_entry_cnt); + hlist_del_init_rcu(&f->fdb_node); rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, br_fdb_rht_params); @@ -1086,6 +1096,7 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, modified = true;
set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags); + clear_bit(BR_FDB_ENTRY_LOCKED, &fdb->flags);
fdb->used = jiffies; if (modified) { diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 47fcbade7389..0ca04cba5ebe 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -429,6 +429,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br, p->priority = 0x8000 >> BR_PORT_BITS; p->port_no = index; p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; + p->locked_entry_cnt.counter = 0; br_init_port(p); br_set_state(p, BR_STATE_DISABLED); br_stp_port_timer_init(p); diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 68b3e850bcb9..0280806cf980 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -110,8 +110,17 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb br_fdb_find_rcu(br, eth_hdr(skb)->h_source, vid);
if (!fdb_src || READ_ONCE(fdb_src->dst) != p || - test_bit(BR_FDB_LOCAL, &fdb_src->flags)) + test_bit(BR_FDB_LOCAL, &fdb_src->flags) || + test_bit(BR_FDB_ENTRY_LOCKED, &fdb_src->flags)) { + if (!fdb_src && atomic_read(&p->locked_entry_cnt) < BR_LOCKED_ENTRIES_MAX) { + unsigned long flags = 0; + + __set_bit(BR_FDB_ENTRY_LOCKED, &flags); + br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, flags); + atomic_inc(&p->locked_entry_cnt); + } goto drop; + } }
nbp_switchdev_frame_mark(p, skb); diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 06e5f6faa431..be17c99efe65 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -31,6 +31,8 @@ #define BR_MULTICAST_QUERY_INTVL_MIN msecs_to_jiffies(1000) #define BR_MULTICAST_STARTUP_QUERY_INTVL_MIN BR_MULTICAST_QUERY_INTVL_MIN
+#define BR_LOCKED_ENTRIES_MAX 64 + #define BR_HWDOM_MAX BITS_PER_LONG
#define BR_VERSION "2.3" @@ -251,7 +253,8 @@ enum { BR_FDB_ADDED_BY_EXT_LEARN, BR_FDB_OFFLOADED, BR_FDB_NOTIFY, - BR_FDB_NOTIFY_INACTIVE + BR_FDB_NOTIFY_INACTIVE, + BR_FDB_ENTRY_LOCKED, };
struct net_bridge_fdb_key { @@ -414,6 +417,8 @@ struct net_bridge_port { u16 backup_redirected_cnt;
struct bridge_stp_xstats stp_xstats; + + atomic_t locked_entry_cnt; };
#define kobj_to_brport(obj) container_of(obj, struct net_bridge_port, kobj)
On 24/05/2022 18:21, Hans Schultz wrote:
Add an intermediate state for clients behind a locked port to allow for possible opening of the port for said clients. This feature corresponds to the Mac-Auth and MAC Authentication Bypass (MAB) named features. The latter defined by Cisco. Locked FDB entries will be limited in number, so as to prevent DOS attacks by spamming the port with random entries. The limit will be a per port limit as it is a port based feature and that the port flushes all FDB entries on link down.
Only the kernel can set this FDB entry flag, while userspace can read the flag and remove it by deleting the FDB entry.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
include/uapi/linux/neighbour.h | 1 + net/bridge/br_fdb.c | 11 +++++++++++ net/bridge/br_if.c | 1 + net/bridge/br_input.c | 11 ++++++++++- net/bridge/br_private.h | 7 ++++++- 5 files changed, 29 insertions(+), 2 deletions(-)
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h index 39c565e460c7..76d65b481086 100644 --- a/include/uapi/linux/neighbour.h +++ b/include/uapi/linux/neighbour.h @@ -53,6 +53,7 @@ enum { #define NTF_ROUTER (1 << 7) /* Extended flags under NDA_FLAGS_EXT: */ #define NTF_EXT_MANAGED (1 << 0) +#define NTF_EXT_LOCKED (1 << 1) /*
- Neighbor Cache Entry States.
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index e7f4fccb6adb..6b83e2d6435d 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -105,6 +105,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br, struct nda_cacheinfo ci; struct nlmsghdr *nlh; struct ndmsg *ndm;
- u32 ext_flags = 0;
nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags); if (nlh == NULL) @@ -125,11 +126,16 @@ static int fdb_fill_info(struct sk_buff *skb, const struct net_bridge *br, ndm->ndm_flags |= NTF_EXT_LEARNED; if (test_bit(BR_FDB_STICKY, &fdb->flags)) ndm->ndm_flags |= NTF_STICKY;
- if (test_bit(BR_FDB_ENTRY_LOCKED, &fdb->flags))
ext_flags |= NTF_EXT_LOCKED;
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->key.addr)) goto nla_put_failure; if (nla_put_u32(skb, NDA_MASTER, br->dev->ifindex)) goto nla_put_failure;
- if (nla_put_u32(skb, NDA_FLAGS_EXT, ext_flags))
goto nla_put_failure;
- ci.ndm_used = jiffies_to_clock_t(now - fdb->used); ci.ndm_confirmed = 0; ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated);
@@ -171,6 +177,7 @@ static inline size_t fdb_nlmsg_size(void) return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN) /* NDA_LLADDR */ + nla_total_size(sizeof(u32)) /* NDA_MASTER */
+ nla_total_size(sizeof(u32)) /* NDA_FLAGS_EXT */
- nla_total_size(sizeof(u16)) /* NDA_VLAN */
- nla_total_size(sizeof(struct nda_cacheinfo))
- nla_total_size(0) /* NDA_FDB_EXT_ATTRS */
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons: - f->dst can be NULL - f->dst changes without any synchronization - there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
I could check if f->dst is NULL, but in general this should be able to work on a per port basis, so do you have an idea of how to keep a per port counter of added locked fdb entries?
Best, Hans
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
On 24/05/2022 19:21, Hans Schultz wrote:
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent fdb view with all its attributes when changing its dst in this case, which would require new locking because you have multiple dependent struct fields and it will kill roaming/learning scalability. I don't think this use case is worth the complexity it will bring, so I'd suggest an alternative - you can monitor the number of locked entries per-port from a user-space agent and disable port learning or some similar solution that doesn't require any complex kernel changes. Is the limit a requirement to add the feature?
I have an idea how to do it and to minimize the performance hit if it really is needed but it'll add a lot of complexity which I'd like to avoid if possible.
Cheers, Nik
On ons, maj 25, 2022 at 11:06, Nikolay Aleksandrov razor@blackwall.org wrote:
On 24/05/2022 19:21, Hans Schultz wrote:
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
fdb view with all its attributes when changing its dst in this case, which would require new locking because you have multiple dependent struct fields and it will kill roaming/learning scalability. I don't think this use case is worth the complexity it will bring, so I'd suggest an alternative - you can monitor the number of locked entries per-port from a user-space agent and disable port learning or some similar solution that doesn't require any complex kernel changes. Is the limit a requirement to add the feature?
I have an idea how to do it and to minimize the performance hit if it really is needed but it'll add a lot of complexity which I'd like to avoid if possible.
Cheers, Nik
On 25/05/2022 11:34, Hans Schultz wrote:
On ons, maj 25, 2022 at 11:06, Nikolay Aleksandrov razor@blackwall.org wrote:
On 24/05/2022 19:21, Hans Schultz wrote:
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
Yeah, I know. :) That's why I wrote it, if the limit is not a feature requirement I'd suggest dropping it altogether, it can be enforced externally (e.g. from user-space) if needed.
By the way just fyi net-next is closed right now due to merge window. And one more thing please include a short log of changes between versions when you send a new one. I had to go look for v2 to find out what changed.
fdb view with all its attributes when changing its dst in this case, which would require new locking because you have multiple dependent struct fields and it will kill roaming/learning scalability. I don't think this use case is worth the complexity it will bring, so I'd suggest an alternative - you can monitor the number of locked entries per-port from a user-space agent and disable port learning or some similar solution that doesn't require any complex kernel changes. Is the limit a requirement to add the feature?
I have an idea how to do it and to minimize the performance hit if it really is needed but it'll add a lot of complexity which I'd like to avoid if possible.
Cheers, Nik
On ons, maj 25, 2022 at 11:38, Nikolay Aleksandrov razor@blackwall.org wrote:
On 25/05/2022 11:34, Hans Schultz wrote:
On ons, maj 25, 2022 at 11:06, Nikolay Aleksandrov razor@blackwall.org wrote:
On 24/05/2022 19:21, Hans Schultz wrote:
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
@@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, if (test_bit(BR_FDB_STATIC, &f->flags)) fdb_del_hw_addr(br, f->key.addr.addr);
- if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags))
atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
Yeah, I know. :) That's why I wrote it, if the limit is not a feature requirement I'd suggest dropping it altogether, it can be enforced externally (e.g. from user-space) if needed.
By the way just fyi net-next is closed right now due to merge window. And one more thing please include a short log of changes between versions when you send a new one. I had to go look for v2 to find out what changed.
Okay, I will drop the limit in the bridge module, which is an easy thing to do. :) (It is mostly there to ensure against DOS attacks if someone bombards a locked port with random mac addresses.) I have a similar limitation in the driver, which should then probably be dropped too?
The mayor difference between v2 and v3 is in the mv88e6xxx driver, where I now keep an inventory of locked ATU entries and remove them based on a timer (mv88e6xxx_switchcore.c).
I guess the mentioned log should be in the cover letter part?
fdb view with all its attributes when changing its dst in this case, which would require new locking because you have multiple dependent struct fields and it will kill roaming/learning scalability. I don't think this use case is worth the complexity it will bring, so I'd suggest an alternative - you can monitor the number of locked entries per-port from a user-space agent and disable port learning or some similar solution that doesn't require any complex kernel changes. Is the limit a requirement to add the feature?
I have an idea how to do it and to minimize the performance hit if it really is needed but it'll add a lot of complexity which I'd like to avoid if possible.
Cheers, Nik
On 25/05/2022 12:11, Hans Schultz wrote:
On ons, maj 25, 2022 at 11:38, Nikolay Aleksandrov razor@blackwall.org wrote:
On 25/05/2022 11:34, Hans Schultz wrote:
On ons, maj 25, 2022 at 11:06, Nikolay Aleksandrov razor@blackwall.org wrote:
On 24/05/2022 19:21, Hans Schultz wrote:
Hi Hans, So this approach has a fundamental problem, f->dst is changed without any synchronization you cannot rely on it and thus you cannot account for these entries properly. We must be very careful if we try to add any new synchronization not to affect performance as well. More below...
> @@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, > if (test_bit(BR_FDB_STATIC, &f->flags)) > fdb_del_hw_addr(br, f->key.addr.addr); > > + if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags)) > + atomic_dec(&f->dst->locked_entry_cnt);
Sorry but you cannot do this for multiple reasons:
- f->dst can be NULL
- f->dst changes without any synchronization
- there is no synchronization between fdb's flags and its ->dst
Cheers, Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
Yeah, I know. :) That's why I wrote it, if the limit is not a feature requirement I'd suggest dropping it altogether, it can be enforced externally (e.g. from user-space) if needed.
By the way just fyi net-next is closed right now due to merge window. And one more thing please include a short log of changes between versions when you send a new one. I had to go look for v2 to find out what changed.
Okay, I will drop the limit in the bridge module, which is an easy thing to do. :) (It is mostly there to ensure against DOS attacks if someone bombards a locked port with random mac addresses.) I have a similar limitation in the driver, which should then probably be dropped too?
That is up to you/driver, I'd try looking for similar problems in other switch drivers and check how those were handled. There are people in the CC above that can directly answer that. :)
The mayor difference between v2 and v3 is in the mv88e6xxx driver, where I now keep an inventory of locked ATU entries and remove them based on a timer (mv88e6xxx_switchcore.c).
ack
I guess the mentioned log should be in the cover letter part?
Yep, usually a short mention of what changed to make it easier for reviewers. Some people also add the patch-specific changes to each patch under the --- so they're not included in the log, but I'm fine either way as long as I don't have to go digging up the old versions.
fdb view with all its attributes when changing its dst in this case, which would require new locking because you have multiple dependent struct fields and it will kill roaming/learning scalability. I don't think this use case is worth the complexity it will bring, so I'd suggest an alternative - you can monitor the number of locked entries per-port from a user-space agent and disable port learning or some similar solution that doesn't require any complex kernel changes. Is the limit a requirement to add the feature?
I have an idea how to do it and to minimize the performance hit if it really is needed but it'll add a lot of complexity which I'd like to avoid if possible.
Cheers, Nik
Hi Nikolay,
On Wed, May 25, 2022 at 01:18:49PM +0300, Nikolay Aleksandrov wrote:
> Hi Hans, > So this approach has a fundamental problem, f->dst is changed without any synchronization > you cannot rely on it and thus you cannot account for these entries properly. We must be very > careful if we try to add any new synchronization not to affect performance as well. > More below... > >> @@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, >> if (test_bit(BR_FDB_STATIC, &f->flags)) >> fdb_del_hw_addr(br, f->key.addr.addr); >> >> + if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags)) >> + atomic_dec(&f->dst->locked_entry_cnt); > > Sorry but you cannot do this for multiple reasons: > - f->dst can be NULL > - f->dst changes without any synchronization > - there is no synchronization between fdb's flags and its ->dst > > Cheers, > Nik
Hi Nik,
if a port is decoupled from the bridge, the locked entries would of course be invalid, so maybe if adding and removing a port is accounted for wrt locked entries and the count of locked entries, would that not work?
Best, Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
Yeah, I know. :) That's why I wrote it, if the limit is not a feature requirement I'd suggest dropping it altogether, it can be enforced externally (e.g. from user-space) if needed.
By the way just fyi net-next is closed right now due to merge window. And one more thing please include a short log of changes between versions when you send a new one. I had to go look for v2 to find out what changed.
Okay, I will drop the limit in the bridge module, which is an easy thing to do. :) (It is mostly there to ensure against DOS attacks if someone bombards a locked port with random mac addresses.) I have a similar limitation in the driver, which should then probably be dropped too?
That is up to you/driver, I'd try looking for similar problems in other switch drivers and check how those were handled. There are people in the CC above that can directly answer that. :)
Not sure whom you're referring to?
In fact I was pretty sure that I didn't see any OOM protection in the source code of the Linux bridge driver itself either, so I wanted to check that for myself, so I wrote a small "killswitch" program that's supposed to, well, kill a switch. It took me a while to find a few free hours to do the test, sorry for that.
https://github.com/vladimiroltean/killswitch/blob/master/src/killswitch.c
Sure enough, I can kill a Marvell Armada 3720 device with 1GB of RAM within 3 minutes of running the test program.
[ 273.864203] ksoftirqd/0: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [ 273.876426] CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted 5.18.7-rc1-00013-g52b92343db13 #74 [ 273.884775] Hardware name: CZ.NIC Turris Mox Board (DT) [ 273.889994] Call trace: [ 273.892437] dump_backtrace.part.0+0xc8/0xd4 [ 273.896721] show_stack+0x18/0x70 [ 273.900039] dump_stack_lvl+0x68/0x84 [ 273.903703] dump_stack+0x18/0x34 [ 273.907017] warn_alloc+0x114/0x1a0 [ 273.910508] __alloc_pages+0xbb0/0xbe0 [ 273.914257] cache_grow_begin+0x60/0x300 [ 273.918183] fallback_alloc+0x184/0x220 [ 273.922017] ____cache_alloc_node+0x174/0x190 [ 273.926373] kmem_cache_alloc+0x1a4/0x220 [ 273.930381] fdb_create+0x40/0x430 [ 273.933784] br_fdb_update+0x198/0x210 [ 273.937532] br_handle_frame_finish+0x244/0x530 [ 273.942063] br_handle_frame+0x1c0/0x270 [ 273.945986] __netif_receive_skb_core.constprop.0+0x29c/0xd30 [ 273.951734] __netif_receive_skb_list_core+0xe8/0x210 [ 273.956784] netif_receive_skb_list_internal+0x180/0x29c [ 273.962091] napi_gro_receive+0x174/0x190 [ 273.966099] mvneta_rx_swbm+0x6b8/0xb40 [ 273.969935] mvneta_poll+0x684/0x900 [ 273.973506] __napi_poll+0x38/0x18c [ 273.976988] net_rx_action+0xe8/0x280 [ 273.980643] __do_softirq+0x124/0x2a0 [ 273.984299] run_ksoftirqd+0x4c/0x60 [ 273.987871] smpboot_thread_fn+0x23c/0x270 [ 273.991963] kthread+0x10c/0x110 [ 273.995188] ret_from_fork+0x10/0x20
(followed by lots upon lots of vomiting, followed by ...)
[ 311.138590] Out of memory and no killable processes... [ 311.143774] Kernel panic - not syncing: System is deadlocked on memory [ 311.150295] CPU: 0 PID: 6 Comm: kworker/0:0 Not tainted 5.18.7-rc1-00013-g52b92343db13 #74 [ 311.158550] Hardware name: CZ.NIC Turris Mox Board (DT) [ 311.163766] Workqueue: events rht_deferred_worker [ 311.168477] Call trace: [ 311.170916] dump_backtrace.part.0+0xc8/0xd4 [ 311.175188] show_stack+0x18/0x70 [ 311.178501] dump_stack_lvl+0x68/0x84 [ 311.182159] dump_stack+0x18/0x34 [ 311.185466] panic+0x168/0x328 [ 311.188515] out_of_memory+0x568/0x584 [ 311.192261] __alloc_pages+0xb04/0xbe0 [ 311.196006] __alloc_pages_bulk+0x15c/0x604 [ 311.200185] alloc_pages_bulk_array_mempolicy+0xbc/0x24c [ 311.205491] __vmalloc_node_range+0x238/0x550 [ 311.209843] __vmalloc_node_range+0x1c0/0x550 [ 311.214195] kvmalloc_node+0xe0/0x124 [ 311.217856] bucket_table_alloc.isra.0+0x40/0x150 [ 311.222554] rhashtable_rehash_alloc.isra.0+0x20/0x8c [ 311.227599] rht_deferred_worker+0x7c/0x540 [ 311.231775] process_one_work+0x1d0/0x320 [ 311.235779] worker_thread+0x70/0x440 [ 311.239435] kthread+0x10c/0x110 [ 311.242661] ret_from_fork+0x10/0x20 [ 311.246238] SMP: stopping secondary CPUs [ 311.250161] Kernel Offset: disabled [ 311.253642] CPU features: 0x000,00020009,00001086 [ 311.258338] Memory Limit: none [ 311.261390] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
That can't be quite alright? Shouldn't we have some sort of protection in the bridge itself too, not just tell hardware driver writers to deal with it? Or is it somewhere, but it needs to be enabled/configured?
On 06/07/2022 21:13, Vladimir Oltean wrote:
Hi Nikolay,
On Wed, May 25, 2022 at 01:18:49PM +0300, Nikolay Aleksandrov wrote:
>> Hi Hans, >> So this approach has a fundamental problem, f->dst is changed without any synchronization >> you cannot rely on it and thus you cannot account for these entries properly. We must be very >> careful if we try to add any new synchronization not to affect performance as well. >> More below... >> >>> @@ -319,6 +326,9 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, >>> if (test_bit(BR_FDB_STATIC, &f->flags)) >>> fdb_del_hw_addr(br, f->key.addr.addr); >>> >>> + if (test_bit(BR_FDB_ENTRY_LOCKED, &f->flags) && !test_bit(BR_FDB_OFFLOADED, &f->flags)) >>> + atomic_dec(&f->dst->locked_entry_cnt); >> >> Sorry but you cannot do this for multiple reasons: >> - f->dst can be NULL >> - f->dst changes without any synchronization >> - there is no synchronization between fdb's flags and its ->dst >> >> Cheers, >> Nik > > Hi Nik, > > if a port is decoupled from the bridge, the locked entries would of > course be invalid, so maybe if adding and removing a port is accounted > for wrt locked entries and the count of locked entries, would that not > work? > > Best, > Hans
Hi Hans, Unfortunately you need the correct amount of locked entries per-port if you want to limit their number per-port, instead of globally. So you need a consistent
Hi Nik, the used dst is a port structure, so it is per-port and not globally.
Best, Hans
Yeah, I know. :) That's why I wrote it, if the limit is not a feature requirement I'd suggest dropping it altogether, it can be enforced externally (e.g. from user-space) if needed.
By the way just fyi net-next is closed right now due to merge window. And one more thing please include a short log of changes between versions when you send a new one. I had to go look for v2 to find out what changed.
Okay, I will drop the limit in the bridge module, which is an easy thing to do. :) (It is mostly there to ensure against DOS attacks if someone bombards a locked port with random mac addresses.) I have a similar limitation in the driver, which should then probably be dropped too?
That is up to you/driver, I'd try looking for similar problems in other switch drivers and check how those were handled. There are people in the CC above that can directly answer that. :)
Not sure whom you're referring to?
I meant people who have dealt with hardware resource management in the drivers.
In fact I was pretty sure that I didn't see any OOM protection in the source code of the Linux bridge driver itself either, so I wanted to check that for myself, so I wrote a small "killswitch" program that's supposed to, well, kill a switch. It took me a while to find a few free hours to do the test, sorry for that.
https://github.com/vladimiroltean/killswitch/blob/master/src/killswitch.c
Sure enough, I can kill a Marvell Armada 3720 device with 1GB of RAM within 3 minutes of running the test program.
I don't think that is new or surprising, if there isn't anything to control the device resources you'll get there. You don't really need to write any new programs you can easily do it with mausezahn. I have tests that add over 10 million fdbs on devices for a few seconds.
The point is it's not the bridge's task to limit memory consumption or to watch for resource management. You can limit new entries from the device driver (in case of swdev learning) or you can use a daemon to watch the number of entries and disable learning. There are many different ways to avoid this. We've discussed it before and I don't mind adding a hard fdb per-port limit in the bridge as long as it's done properly. We've also discussed LRU and similar algorithms for fdb learning and eviction. But any hardcoded limits or limits that can break current default use cases are unacceptable, they must be opt-in.
[ 273.864203] ksoftirqd/0: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [ 273.876426] CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted 5.18.7-rc1-00013-g52b92343db13 #74 [ 273.884775] Hardware name: CZ.NIC Turris Mox Board (DT) [ 273.889994] Call trace: [ 273.892437] dump_backtrace.part.0+0xc8/0xd4 [ 273.896721] show_stack+0x18/0x70 [ 273.900039] dump_stack_lvl+0x68/0x84 [ 273.903703] dump_stack+0x18/0x34 [ 273.907017] warn_alloc+0x114/0x1a0 [ 273.910508] __alloc_pages+0xbb0/0xbe0 [ 273.914257] cache_grow_begin+0x60/0x300 [ 273.918183] fallback_alloc+0x184/0x220 [ 273.922017] ____cache_alloc_node+0x174/0x190 [ 273.926373] kmem_cache_alloc+0x1a4/0x220 [ 273.930381] fdb_create+0x40/0x430 [ 273.933784] br_fdb_update+0x198/0x210 [ 273.937532] br_handle_frame_finish+0x244/0x530 [ 273.942063] br_handle_frame+0x1c0/0x270 [ 273.945986] __netif_receive_skb_core.constprop.0+0x29c/0xd30 [ 273.951734] __netif_receive_skb_list_core+0xe8/0x210 [ 273.956784] netif_receive_skb_list_internal+0x180/0x29c [ 273.962091] napi_gro_receive+0x174/0x190 [ 273.966099] mvneta_rx_swbm+0x6b8/0xb40 [ 273.969935] mvneta_poll+0x684/0x900 [ 273.973506] __napi_poll+0x38/0x18c [ 273.976988] net_rx_action+0xe8/0x280 [ 273.980643] __do_softirq+0x124/0x2a0 [ 273.984299] run_ksoftirqd+0x4c/0x60 [ 273.987871] smpboot_thread_fn+0x23c/0x270 [ 273.991963] kthread+0x10c/0x110 [ 273.995188] ret_from_fork+0x10/0x20
(followed by lots upon lots of vomiting, followed by ...)
[ 311.138590] Out of memory and no killable processes... [ 311.143774] Kernel panic - not syncing: System is deadlocked on memory [ 311.150295] CPU: 0 PID: 6 Comm: kworker/0:0 Not tainted 5.18.7-rc1-00013-g52b92343db13 #74 [ 311.158550] Hardware name: CZ.NIC Turris Mox Board (DT) [ 311.163766] Workqueue: events rht_deferred_worker [ 311.168477] Call trace: [ 311.170916] dump_backtrace.part.0+0xc8/0xd4 [ 311.175188] show_stack+0x18/0x70 [ 311.178501] dump_stack_lvl+0x68/0x84 [ 311.182159] dump_stack+0x18/0x34 [ 311.185466] panic+0x168/0x328 [ 311.188515] out_of_memory+0x568/0x584 [ 311.192261] __alloc_pages+0xb04/0xbe0 [ 311.196006] __alloc_pages_bulk+0x15c/0x604 [ 311.200185] alloc_pages_bulk_array_mempolicy+0xbc/0x24c [ 311.205491] __vmalloc_node_range+0x238/0x550 [ 311.209843] __vmalloc_node_range+0x1c0/0x550 [ 311.214195] kvmalloc_node+0xe0/0x124 [ 311.217856] bucket_table_alloc.isra.0+0x40/0x150 [ 311.222554] rhashtable_rehash_alloc.isra.0+0x20/0x8c [ 311.227599] rht_deferred_worker+0x7c/0x540 [ 311.231775] process_one_work+0x1d0/0x320 [ 311.235779] worker_thread+0x70/0x440 [ 311.239435] kthread+0x10c/0x110 [ 311.242661] ret_from_fork+0x10/0x20 [ 311.246238] SMP: stopping secondary CPUs [ 311.250161] Kernel Offset: disabled [ 311.253642] CPU features: 0x000,00020009,00001086 [ 311.258338] Memory Limit: none [ 311.261390] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
That can't be quite alright? Shouldn't we have some sort of protection in the bridge itself too, not just tell hardware driver writers to deal with it? Or is it somewhere, but it needs to be enabled/configured?
This is expected, if you'd like feel free to add a hard learning limit in the driver and the bridge (again if implemented properly). Nothing can save you if someone has L2 access to the device, they can poison any table if learning is enabled.
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
I don't think that is new or surprising, if there isn't anything to control the device resources you'll get there. You don't really need to write any new programs you can easily do it with mausezahn. I have tests that add over 10 million fdbs on devices for a few seconds.
Of course it isn't new, but that doesn't make the situation in any way better, quite the opposite...
The point is it's not the bridge's task to limit memory consumption or to watch for resource management. You can limit new entries from the device driver (in case of swdev learning) or you can use a daemon to watch the number of entries and disable learning. There are many different ways to avoid this. We've discussed it before and I don't mind adding a hard fdb per-port limit in the bridge as long as it's done properly. We've also discussed LRU and similar algorithms for fdb learning and eviction. But any hardcoded limits or limits that can break current default use cases are unacceptable, they must be opt-in.
I don't think you can really say that it's not the bridge's task to limit memory consumption when what it does is essentially allocate memory from untrusted and unbounded user input, in kernel softirq context.
That's in fact the problem, the kernel OOM killer will kick in, but there will be no process to kill. This is why the kernel deadlocks on memory and dies.
Maybe where our expectations differ is that I believe that a Linux bridge shouldn't need gazillions of tweaks to not kill the kernel? There are many devices in production using a bridge without such configuration, you can't just make it opt-in.
Of course, performance under heavy stress is a separate concern, and maybe user space monitoring would be a better idea for that.
I know you changed jobs, but did Cumulus Linux have an application to monitor and limit the FDB entry count? Is there some standard application which does this somewhere, or does everybody roll their own?
Anyway, limiting FDB entry count from user space is still theoretically different from not dying. If you need to schedule a task to dispose of the weight while the ship is sinking from softirq context, you may never get to actually schedule that task in time. AFAIK the bridge UAPI doesn't expose a pre-programmed limit, so what needs to be done is for user space to manually delete entries until the count falls below the limit.
On 06/07/2022 23:21, Vladimir Oltean wrote:
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
I don't think that is new or surprising, if there isn't anything to control the device resources you'll get there. You don't really need to write any new programs you can easily do it with mausezahn. I have tests that add over 10 million fdbs on devices for a few seconds.
Of course it isn't new, but that doesn't make the situation in any way better, quite the opposite...
The point is it's not the bridge's task to limit memory consumption or to watch for resource management. You can limit new entries from the device driver (in case of swdev learning) or you can use a daemon to watch the number of entries and disable learning. There are many different ways to avoid this. We've discussed it before and I don't mind adding a hard fdb per-port limit in the bridge as long as it's done properly. We've also discussed LRU and similar algorithms for fdb learning and eviction. But any hardcoded limits or limits that can break current default use cases are unacceptable, they must be opt-in.
I don't think you can really say that it's not the bridge's task to limit memory consumption when what it does is essentially allocate memory from untrusted and unbounded user input, in kernel softirq context.
That's in fact the problem, the kernel OOM killer will kick in, but there will be no process to kill. This is why the kernel deadlocks on memory and dies.
Maybe where our expectations differ is that I believe that a Linux bridge shouldn't need gazillions of tweaks to not kill the kernel? There are many devices in production using a bridge without such configuration, you can't just make it opt-in.
No, you cannot suddenly enforce such limit because such limit cannot work for everyone. There is no silver bullet that works for everyone. Opt-in is the only way to go about this with specific config for different devices and deployments, anyone interested can set their limits. They can be auto-adjusted by swdev drivers after that if necessary, but first they must be implemented in software.
If you're interested in adding default limits based on memory heuristics and consumption I'd be interested to see it.
Of course, performance under heavy stress is a separate concern, and maybe user space monitoring would be a better idea for that.
You can do the whole software learning from user-space if needed, not only under heavy stress.
I know you changed jobs, but did Cumulus Linux have an application to monitor and limit the FDB entry count? Is there some standard application which does this somewhere, or does everybody roll their own?
I don't see how that is relevant.
Anyway, limiting FDB entry count from user space is still theoretically different from not dying. If you need to schedule a task to dispose of
you can disable learning altogether and add entries from a user-space daemon, ie implement complete user-space learning agent, theoretically you can solve it in many ways if that's the problem
the weight while the ship is sinking from softirq context, you may never get to actually schedule that task in time. AFAIK the bridge UAPI doesn't expose a pre-programmed limit, so what needs to be done is for user space to manually delete entries until the count falls below the limit.
That is a single case speculation, it depends on how it was implemented in the first place. You can disable learning and have more than enough time to deal with it.
I already said it's ok to add hard configurable limits if they're done properly performance-wise. Any distribution can choose to set some default limits after the option exists.
On 07/07/2022 00:01, Nikolay Aleksandrov wrote:
On 06/07/2022 23:21, Vladimir Oltean wrote:
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
[snip]
I already said it's ok to add hard configurable limits if they're done properly performance-wise. Any distribution can choose to set some default limits after the option exists.
Just fyi, and to avoid duplicate efforts, I already have patches for global and per-port software fdb limits that I'll polish and submit soon (depending on time availability, of course). If I find more time I might add per-vlan limits as well to the set. They use embedded netlink attributes to config and dump, so we can easily extend them later (e.g. different action on limit hit, limit statistics etc).
Hi Nikolay,
On Thu, Jul 07, 2022 at 05:08:15PM +0300, Nikolay Aleksandrov wrote:
On 07/07/2022 00:01, Nikolay Aleksandrov wrote:
On 06/07/2022 23:21, Vladimir Oltean wrote:
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
[snip]
I already said it's ok to add hard configurable limits if they're done properly performance-wise. Any distribution can choose to set some default limits after the option exists.
Just fyi, and to avoid duplicate efforts, I already have patches for global and per-port software fdb limits that I'll polish and submit soon (depending on time availability, of course). If I find more time I might add per-vlan limits as well to the set. They use embedded netlink attributes to config and dump, so we can easily extend them later (e.g. different action on limit hit, limit statistics etc).
So again, to repeat myself, it's nice to have limits on FDB size, but those won't fix the software bridges that are now out in the open and can't have their configuration scripts changed.
I haven't had the time to expand on this in a proper change yet, but I was thinking more along the lines of adding an OOM handler with register_oom_notifier() in br_fdb_init(), and on OOM, do something, like flush the FDB from all bridges. There are going to be complications, it will schedule switchdev, switchdev is going to allocate memory which we're low on, the workqueues aren't created with WQ_MEM_RECLAIM, so this isn't necessarily going to be a silver bullet either. But this is what concerns me the most, the unconfigured bridge killing the kernel so easily. As you can see, with an OOM handler I'm not so much trying to impose a fixed limit on FDB size, but do something sensible such that the bridge doesn't contribute to the kernel dying.
On 7 July 2022 20:15:07 EEST, Vladimir Oltean olteanv@gmail.com wrote:
Hi Nikolay,
On Thu, Jul 07, 2022 at 05:08:15PM +0300, Nikolay Aleksandrov wrote:
On 07/07/2022 00:01, Nikolay Aleksandrov wrote:
On 06/07/2022 23:21, Vladimir Oltean wrote:
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
[snip]
I already said it's ok to add hard configurable limits if they're done properly performance-wise. Any distribution can choose to set some default limits after the option exists.
Just fyi, and to avoid duplicate efforts, I already have patches for global and per-port software fdb limits that I'll polish and submit soon (depending on time availability, of course). If I find more time I might add per-vlan limits as well to the set. They use embedded netlink attributes to config and dump, so we can easily extend them later (e.g. different action on limit hit, limit statistics etc).
So again, to repeat myself, it's nice to have limits on FDB size, but those won't fix the software bridges that are now out in the open and can't have their configuration scripts changed.
I haven't had the time to expand on this in a proper change yet, but I was thinking more along the lines of adding an OOM handler with register_oom_notifier() in br_fdb_init(), and on OOM, do something, like flush the FDB from all bridges. There are going to be complications, it will schedule switchdev, switchdev is going to allocate memory which we're low on, the workqueues aren't created with WQ_MEM_RECLAIM, so this isn't necessarily going to be a silver bullet either. But this is what concerns me the most, the unconfigured bridge killing the kernel so easily. As you can see, with an OOM handler I'm not so much trying to impose a fixed limit on FDB size, but do something sensible such that the bridge doesn't contribute to the kernel dying.
Hi Vladimir, Sounds good to me, the fdb limits have come up multiple times in the past so I decided to finally add them and build from there, with them configured oom shouldn't be hit. These limits have never been present and people are fine (everyone deals with or leaves it), but I'll be happy to review and ack such changes. I hope you can correlate the oom and the bridge fdbs, not just blindly flushing as that can be problematic if you plan to have it enabled by default.
Cheers, Nik
On Thu, Jul 7, 2022 at 4:08 PM Nikolay Aleksandrov razor@blackwall.org wrote:
On 07/07/2022 00:01, Nikolay Aleksandrov wrote:
On 06/07/2022 23:21, Vladimir Oltean wrote:
On Wed, Jul 06, 2022 at 10:38:04PM +0300, Nikolay Aleksandrov wrote:
[snip]
I already said it's ok to add hard configurable limits if they're done properly performance-wise. Any distribution can choose to set some default limits after the option exists.
Just fyi, and to avoid duplicate efforts, I already have patches for global and per-port software fdb limits that I'll polish and submit soon (depending on time availability, of course). If I find more time I might add per-vlan limits as well to the set. They use embedded netlink attributes to config and dump, so we can easily extend them later (e.g. different action on limit hit, limit statistics etc).
Sounds good, I will just limit the number of locked entries in the driver as they are not controllable from the bridge. :-)
On Tue, May 24, 2022 at 05:21:41PM +0200, Hans Schultz wrote:
Add an intermediate state for clients behind a locked port to allow for possible opening of the port for said clients. This feature corresponds to the Mac-Auth and MAC Authentication Bypass (MAB) named features. The latter defined by Cisco. Locked FDB entries will be limited in number, so as to prevent DOS attacks by spamming the port with random entries. The limit will be a per port limit as it is a port based feature and that the port flushes all FDB entries on link down.
Why locked FDB entries need a special treatment compared to regular entries? A port that has learning enabled can be spammed with random source MACs just as well.
The authorization daemon that is monitoring FDB notifications can have a policy to shut down a port if the rate / number of locked entries is above a given threshold.
I don't think this kind of policy belongs in the kernel. If it resides in user space, then the threshold can be adjusted. Currently it's hard coded to 64 and I don't see how user space can change or monitor it.
On tor, maj 26, 2022 at 17:13, Ido Schimmel idosch@idosch.org wrote:
On Tue, May 24, 2022 at 05:21:41PM +0200, Hans Schultz wrote:
Add an intermediate state for clients behind a locked port to allow for possible opening of the port for said clients. This feature corresponds to the Mac-Auth and MAC Authentication Bypass (MAB) named features. The latter defined by Cisco. Locked FDB entries will be limited in number, so as to prevent DOS attacks by spamming the port with random entries. The limit will be a per port limit as it is a port based feature and that the port flushes all FDB entries on link down.
Why locked FDB entries need a special treatment compared to regular entries? A port that has learning enabled can be spammed with random source MACs just as well.
The authorization daemon that is monitoring FDB notifications can have a policy to shut down a port if the rate / number of locked entries is above a given threshold.
I don't think this kind of policy belongs in the kernel. If it resides in user space, then the threshold can be adjusted. Currently it's hard coded to 64 and I don't see how user space can change or monitor it.
In the Mac-Auth/MAB context, the locked port feature is really a form of CPU based learning, and on mv88e6xxx switchcores, this is facilitated by violation interrupts. Based on miss violation interrupts, the locked entries are then added to a list with a timer to remove the entries according to the bridge timeout. As this is very CPU intensive compared to normal operation, the assessment is that all this will jam up most devices if bombarded with random entries at link speed, and my estimate is that any userspace daemon that listens to the ensuing fdb events will never get a chance to stop this flood and eventually the device will lock down/reset. To prevent this, the limit is introduced.
Ideally this limit could be adjustable from userspace, but in real use-cases a cap like 64 should be more than enough, as that corresponds to 64 possible devices behind a port that cannot authenticate by other means (printers etc.) than having their mac addresses white-listed.
The software bridge behavior was then just set to correspond to the offloaded behavior, but after correspondence with Nik, the software bridge locked entries limit will be removed.
On Fri, May 27, 2022 at 10:52:27AM +0200, Hans Schultz wrote:
On tor, maj 26, 2022 at 17:13, Ido Schimmel idosch@idosch.org wrote:
On Tue, May 24, 2022 at 05:21:41PM +0200, Hans Schultz wrote:
Add an intermediate state for clients behind a locked port to allow for possible opening of the port for said clients. This feature corresponds to the Mac-Auth and MAC Authentication Bypass (MAB) named features. The latter defined by Cisco. Locked FDB entries will be limited in number, so as to prevent DOS attacks by spamming the port with random entries. The limit will be a per port limit as it is a port based feature and that the port flushes all FDB entries on link down.
Why locked FDB entries need a special treatment compared to regular entries? A port that has learning enabled can be spammed with random source MACs just as well.
The authorization daemon that is monitoring FDB notifications can have a policy to shut down a port if the rate / number of locked entries is above a given threshold.
I don't think this kind of policy belongs in the kernel. If it resides in user space, then the threshold can be adjusted. Currently it's hard coded to 64 and I don't see how user space can change or monitor it.
In the Mac-Auth/MAB context, the locked port feature is really a form of CPU based learning, and on mv88e6xxx switchcores, this is facilitated by violation interrupts. Based on miss violation interrupts, the locked entries are then added to a list with a timer to remove the entries according to the bridge timeout. As this is very CPU intensive compared to normal operation, the assessment is that all this will jam up most devices if bombarded with random entries at link speed, and my estimate is that any userspace daemon that listens to the ensuing fdb events will never get a chance to stop this flood and eventually the device will lock down/reset. To prevent this, the limit is introduced.
Ideally this limit could be adjustable from userspace, but in real use-cases a cap like 64 should be more than enough, as that corresponds to 64 possible devices behind a port that cannot authenticate by other means (printers etc.) than having their mac addresses white-listed.
The software bridge behavior was then just set to correspond to the offloaded behavior, but after correspondence with Nik, the software bridge locked entries limit will be removed.
As far as the bridge is concerned, locked entries are not really different from regular learned entries in terms of processing and since we don't have limits for regular entries I don't think we should have limits for locked entries.
I do understand the problem you have in mv88e6xxx and I think it would be wise to hard code a reasonable limit there. It can be adjusted over time based on feedback and possibly exposed to user space.
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
As far as the bridge is concerned, locked entries are not really different from regular learned entries in terms of processing and since we don't have limits for regular entries I don't think we should have limits for locked entries.
I do understand the problem you have in mv88e6xxx and I think it would be wise to hard code a reasonable limit there. It can be adjusted over time based on feedback and possibly exposed to user space.
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Ahh, I see. I think that the best is for those FDB entries to be created as dynamic entries, so that user-space does not have to keep track of unauthorized entries. Also the mv88e6xxx chipsets have a 'hold at one' feature, for authorized entries, so that if a device goes quiet after being authorized the driver can signal to the software bridge and further to userspace that an authorized device has gone quiet, and the entry can then be removed. This though requires user added dynamic entries in the ATU, or you can call it synthetically 'learned' entries.
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Btw, will the bridge not create a SWITCHDEV_FDB_ADD_TO_DEVICE event towards the switchcore in the scheme you mention and thus add an entry that opens up for the specified mac address?
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Btw, will the bridge not create a SWITCHDEV_FDB_ADD_TO_DEVICE event towards the switchcore in the scheme you mention and thus add an entry that opens up for the specified mac address?
It will, but the driver needs to ignore FDB entries that are notified with locked flag. I see that you extended 'struct switchdev_notifier_fdb_info' with the locked flag, but it's not initialized in br_switchdev_fdb_populate(). Can you add it in the next version?
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Btw, will the bridge not create a SWITCHDEV_FDB_ADD_TO_DEVICE event towards the switchcore in the scheme you mention and thus add an entry that opens up for the specified mac address?
It will, but the driver needs to ignore FDB entries that are notified with locked flag. I see that you extended 'struct switchdev_notifier_fdb_info' with the locked flag, but it's not initialized in br_switchdev_fdb_populate(). Can you add it in the next version?
Yes, definitely. I have only had focus on it in the messages coming up from the driver, and neglected it the other way.
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Btw, will the bridge not create a SWITCHDEV_FDB_ADD_TO_DEVICE event towards the switchcore in the scheme you mention and thus add an entry that opens up for the specified mac address?
It will, but the driver needs to ignore FDB entries that are notified with locked flag. I see that you extended 'struct switchdev_notifier_fdb_info' with the locked flag, but it's not initialized in br_switchdev_fdb_populate(). Can you add it in the next version?
An issue with sending the flag to the driver is that port_fdb_add() is suddenly getting more and more arguments and getting messy in my opinion, but maybe that's just how it is...
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or? Also the replace command is not really supported properly as it is. I have made a fix for that which looks something like this:
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6cbb27e3b976..f43aa204f375 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -917,6 +917,9 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, if (flags & NLM_F_EXCL) return -EEXIST;
+ if (flags & NLM_F_REPLACE) + modified = true; + if (READ_ONCE(fdb->dst) != source) { WRITE_ONCE(fdb->dst, source); modified = true;
The argument for always sending notifications to the driver in the case of replace is that a replace command will refresh the entries timeout if the entry is the same. Any thoughts on this?
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Just to give you another data point about how this works in other devices, I can say that at least in Spectrum this works a bit differently. Packets that ingress via a locked port and incur an FDB miss are trapped to the CPU where they should be injected into the Rx path so that the bridge will create the 'locked' FDB entry and notify it to user space. The packets are obviously rated limited as the CPU cannot handle billions of packets per second, unlike the ASIC. The limit is not per bridge port (or even per bridge), but instead global to the entire device.
Btw, will the bridge not create a SWITCHDEV_FDB_ADD_TO_DEVICE event towards the switchcore in the scheme you mention and thus add an entry that opens up for the specified mac address?
It will, but the driver needs to ignore FDB entries that are notified with locked flag. I see that you extended 'struct switchdev_notifier_fdb_info' with the locked flag, but it's not initialized in br_switchdev_fdb_populate(). Can you add it in the next version?
An issue with sending the flag to the driver is that port_fdb_add() is suddenly getting more and more arguments and getting messy in my opinion, but maybe that's just how it is...
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
Also the replace command is not really supported properly as it is. I have made a fix for that which looks something like this:
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6cbb27e3b976..f43aa204f375 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -917,6 +917,9 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, if (flags & NLM_F_EXCL) return -EEXIST;
if (flags & NLM_F_REPLACE)
modified = true;
if (READ_ONCE(fdb->dst) != source) { WRITE_ONCE(fdb->dst, source); modified = true;
The argument for always sending notifications to the driver in the case of replace is that a replace command will refresh the entries timeout if the entry is the same. Any thoughts on this?
I don't think so. It always updates its "used" timer, not its "updated" timer which is the one for expire. A replace that doesn't actually change anything on the entry shouldn't generate a notification.
On tor, jun 02, 2022 at 12:33, Nikolay Aleksandrov razor@blackwall.org wrote:
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
I also get 0x4 from iproute2, but I still get SELF entries when I look with: bridge fdb show dev DEV
Also the replace command is not really supported properly as it is. I have made a fix for that which looks something like this:
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6cbb27e3b976..f43aa204f375 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -917,6 +917,9 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, if (flags & NLM_F_EXCL) return -EEXIST;
if (flags & NLM_F_REPLACE)
modified = true;
if (READ_ONCE(fdb->dst) != source) { WRITE_ONCE(fdb->dst, source); modified = true;
The argument for always sending notifications to the driver in the case of replace is that a replace command will refresh the entries timeout if the entry is the same. Any thoughts on this?
I don't think so. It always updates its "used" timer, not its "updated" timer which is the one for expire. A replace that doesn't actually change anything on the entry shouldn't generate a notification.
Okay, so then there is missing checks on flags as the issue arose from replacing locked entries with dynamic entries. I will do another fix based on flags as modified needs to be true for the driver to get notified.
On 02/06/2022 13:17, Hans Schultz wrote:
On tor, jun 02, 2022 at 12:33, Nikolay Aleksandrov razor@blackwall.org wrote:
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
I also get 0x4 from iproute2, but I still get SELF entries when I look with: bridge fdb show dev DEV
after the above add: $ bridge fdb show dev vnet110 | grep 00:11 00:11:22:33:44:55 master virbr0 static
Also the replace command is not really supported properly as it is. I have made a fix for that which looks something like this:
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6cbb27e3b976..f43aa204f375 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -917,6 +917,9 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, if (flags & NLM_F_EXCL) return -EEXIST;
if (flags & NLM_F_REPLACE)
modified = true;
if (READ_ONCE(fdb->dst) != source) { WRITE_ONCE(fdb->dst, source); modified = true;
The argument for always sending notifications to the driver in the case of replace is that a replace command will refresh the entries timeout if the entry is the same. Any thoughts on this?
I don't think so. It always updates its "used" timer, not its "updated" timer which is the one for expire. A replace that doesn't actually change anything on the entry shouldn't generate a notification.
Okay, so then there is missing checks on flags as the issue arose from replacing locked entries with dynamic entries. I will do another fix based on flags as modified needs to be true for the driver to get notified.
On Thu, Jun 02, 2022 at 01:30:06PM +0300, Nikolay Aleksandrov wrote:
On 02/06/2022 13:17, Hans Schultz wrote:
On tor, jun 02, 2022 at 12:33, Nikolay Aleksandrov razor@blackwall.org wrote:
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
I also get 0x4 from iproute2, but I still get SELF entries when I look with: bridge fdb show dev DEV
after the above add: $ bridge fdb show dev vnet110 | grep 00:11 00:11:22:33:44:55 master virbr0 static
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
Also the replace command is not really supported properly as it is. I have made a fix for that which looks something like this:
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6cbb27e3b976..f43aa204f375 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -917,6 +917,9 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, if (flags & NLM_F_EXCL) return -EEXIST;
if (flags & NLM_F_REPLACE)
modified = true;
if (READ_ONCE(fdb->dst) != source) { WRITE_ONCE(fdb->dst, source); modified = true;
The argument for always sending notifications to the driver in the case of replace is that a replace command will refresh the entries timeout if the entry is the same. Any thoughts on this?
I don't think so. It always updates its "used" timer, not its "updated" timer which is the one for expire. A replace that doesn't actually change anything on the entry shouldn't generate a notification.
Okay, so then there is missing checks on flags as the issue arose from replacing locked entries with dynamic entries. I will do another fix based on flags as modified needs to be true for the driver to get notified.
On tor, jun 02, 2022 at 13:39, Ido Schimmel idosch@nvidia.com wrote:
On Thu, Jun 02, 2022 at 01:30:06PM +0300, Nikolay Aleksandrov wrote:
On 02/06/2022 13:17, Hans Schultz wrote:
On tor, jun 02, 2022 at 12:33, Nikolay Aleksandrov razor@blackwall.org wrote:
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote:
On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
I also get 0x4 from iproute2, but I still get SELF entries when I look with: bridge fdb show dev DEV
after the above add: $ bridge fdb show dev vnet110 | grep 00:11 00:11:22:33:44:55 master virbr0 static
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
Well yes, but I get some additional entries with 'self' for different vlans. So from clean adding a random fdb entry I get 4 entries on the port, 2 with 'master' and two with 'self'. It looks like this:
# bridge fdb add 00:22:33:44:55:66 dev eth6 master static # bridge fdb show dev eth6 | grep 55 00:22:33:44:55:66 vlan 1 master br0 offload static 00:22:33:44:55:66 master br0 offload static 00:22:33:44:55:66 vlan 1 self static 00:22:33:44:55:66 vlan 4095 self static
If I do a replace of a locked entry I only get one with the 'self' flag.
On Thu, Jun 02, 2022 at 01:36:56PM +0200, Hans Schultz wrote:
On tor, jun 02, 2022 at 13:39, Ido Schimmel idosch@nvidia.com wrote:
On Thu, Jun 02, 2022 at 01:30:06PM +0300, Nikolay Aleksandrov wrote:
On 02/06/2022 13:17, Hans Schultz wrote:
On tor, jun 02, 2022 at 12:33, Nikolay Aleksandrov razor@blackwall.org wrote:
On 02/06/2022 12:17, Hans Schultz wrote:
On tis, maj 31, 2022 at 17:23, Ido Schimmel idosch@nvidia.com wrote: > On Tue, May 31, 2022 at 11:34:21AM +0200, Hans Schultz wrote:
Another issue is that bridge fdb add MAC dev DEV master static seems to add the entry with the SELF flag set, which I don't think is what we would want it to do or?
I don't see such thing (hacked iproute2 to print the flags before cmd): $ bridge fdb add 00:11:22:33:44:55 dev vnet110 master static flags 0x4
0x4 = NTF_MASTER only
I also get 0x4 from iproute2, but I still get SELF entries when I look with: bridge fdb show dev DEV
after the above add: $ bridge fdb show dev vnet110 | grep 00:11 00:11:22:33:44:55 master virbr0 static
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
Well yes, but I get some additional entries with 'self' for different vlans. So from clean adding a random fdb entry I get 4 entries on the port, 2 with 'master' and two with 'self'. It looks like this:
# bridge fdb add 00:22:33:44:55:66 dev eth6 master static # bridge fdb show dev eth6 | grep 55 00:22:33:44:55:66 vlan 1 master br0 offload static 00:22:33:44:55:66 master br0 offload static
These two entries are added by the bridge driver ('master' is set). You get two entries because you didn't specify a VLAN, so one entry is installed with VLAN 0 (no VLAN) and the second is installed because VLAN 1 is configured on eth6.
00:22:33:44:55:66 vlan 1 self static
This entry is from the HW. It corresponds to the first entry above.
00:22:33:44:55:66 vlan 4095 self static
I assume you are using VLAN 4095 for untagged traffic, so this entry probably corresponds to the second entry above.
If I do a replace of a locked entry I only get one with the 'self' flag.
IIUC, your driver is adding the entry to the bridge and with a specific VLAN. So you have one entry reported by the bridge driver and a corresponding entry in HW.
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
When replacing a locked entry it looks like this:
# bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 master br0 extern_learn offload locked
# bridge fdb replace 00:4c:4c:4c:4c:4c dev eth6 vlan 1 master static ; bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 self static
The problem is then that the function br_fdb_find_rcu(br,eth_hdr(skb)->h_source, vid); , where the h_source and vid is the entry above, does not find the entry. My hypothesis was then that this is because of the 'self' flag that I see.
I am thinking that the function dsa_slave_port_fdb_do_dump() is only for debug, and thus does not really set any flags in the bridge modules FDB, but then I don't understand why the above find function does not find the entry?
On Thu, Jun 02, 2022 at 02:08:41PM +0200, Hans Schultz wrote:
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
When replacing a locked entry it looks like this:
# bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 master br0 extern_learn offload locked
# bridge fdb replace 00:4c:4c:4c:4c:4c dev eth6 vlan 1 master static ; bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 self static
This output means that the FDB entry was deleted from the bridge driver FDB.
The problem is then that the function br_fdb_find_rcu(br,eth_hdr(skb)->h_source, vid); , where the h_source and vid is the entry above, does not find the entry. My hypothesis was then that this is because of the 'self' flag that I see.
br_fdb_find_rcu() does a lookup in the bridge driver FDB, but per the output above, the entry isn't there for some reason. It's only in HW.
Can it be that you driver is deleting these entries from the bridge driver FDB via SWITCHDEV_FDB_DEL_TO_BRIDGE for some reason?
I am thinking that the function dsa_slave_port_fdb_do_dump() is only for debug, and thus does not really set any flags in the bridge modules FDB, but then I don't understand why the above find function does not find the entry?
Yes, that sounds much like the case. So the replace of course just modifies the SW fdb entry, and then it just uses port_fdb_add() to replace HW entry I assume, which then in my case triggers SWITCHDEV_FDB_DEL_TO_BRIDGE as the locked entry is removed. So I should not send the SWITCHDEV_FDB_DEL_TO_BRIDGE message when removing the locked entry from port_fdb_add() function...
(note: having problems with smtp.gmail.com...)
On Thu, Jun 2, 2022 at 2:18 PM Ido Schimmel idosch@nvidia.com wrote:
On Thu, Jun 02, 2022 at 02:08:41PM +0200, Hans Schultz wrote:
I think Hans is testing with mv88e6xxx which dumps entries directly from HW via ndo_fdb_dump(). See dsa_slave_port_fdb_do_dump() which sets NTF_SELF.
Hans, are you seeing the entry twice? Once with 'master' and once with 'self'?
When replacing a locked entry it looks like this:
# bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 master br0 extern_learn offload locked
# bridge fdb replace 00:4c:4c:4c:4c:4c dev eth6 vlan 1 master static ; bridge fdb show dev eth6 | grep 4c 00:4c:4c:4c:4c:4c vlan 1 self static
This output means that the FDB entry was deleted from the bridge driver FDB.
The problem is then that the function br_fdb_find_rcu(br,eth_hdr(skb)->h_source, vid); , where the h_source and vid is the entry above, does not find the entry. My hypothesis was then that this is because of the 'self' flag that I see.
br_fdb_find_rcu() does a lookup in the bridge driver FDB, but per the output above, the entry isn't there for some reason. It's only in HW.
Can it be that you driver is deleting these entries from the bridge driver FDB via SWITCHDEV_FDB_DEL_TO_BRIDGE for some reason?
I am thinking that the function dsa_slave_port_fdb_do_dump() is only for debug, and thus does not really set any flags in the bridge modules FDB, but then I don't understand why the above find function does not find the entry?
Used for Mac-auth/MAB feature in the offloaded case.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com --- include/net/dsa.h | 6 ++++++ include/net/switchdev.h | 3 ++- net/bridge/br.c | 3 ++- net/bridge/br_fdb.c | 7 +++++-- net/bridge/br_private.h | 2 +- 5 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h index 14f07275852b..a5a843b2d67d 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -330,6 +330,12 @@ struct dsa_port { /* List of VLANs that CPU and DSA ports are members of. */ struct mutex vlans_lock; struct list_head vlans; + + /* List and maintenance of locked ATU entries */ + struct mutex locked_entries_list_lock; + struct list_head atu_locked_entries_list; + atomic_t atu_locked_entry_cnt; + struct delayed_work atu_work; };
/* TODO: ideally DSA ports would have a single dp->link_dp member, diff --git a/include/net/switchdev.h b/include/net/switchdev.h index aa0171d5786d..62f4f7c9c7c2 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -245,7 +245,8 @@ struct switchdev_notifier_fdb_info { u16 vid; u8 added_by_user:1, is_local:1, - offloaded:1; + offloaded:1, + locked:1; };
struct switchdev_notifier_port_obj_info { diff --git a/net/bridge/br.c b/net/bridge/br.c index 96e91d69a9a8..12933388a5a4 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -166,7 +166,8 @@ static int br_switchdev_event(struct notifier_block *unused, case SWITCHDEV_FDB_ADD_TO_BRIDGE: fdb_info = ptr; err = br_fdb_external_learn_add(br, p, fdb_info->addr, - fdb_info->vid, false); + fdb_info->vid, false, + fdb_info->locked); if (err) { err = notifier_from_errno(err); break; diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6b83e2d6435d..92469547283a 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -1135,7 +1135,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br, "FDB entry towards bridge must be permanent"); return -EINVAL; } - err = br_fdb_external_learn_add(br, p, addr, vid, true); + err = br_fdb_external_learn_add(br, p, addr, vid, true, false); } else { spin_lock_bh(&br->hash_lock); err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb); @@ -1365,7 +1365,7 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p)
int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid, - bool swdev_notify) + bool swdev_notify, bool locked) { struct net_bridge_fdb_entry *fdb; bool modified = false; @@ -1385,6 +1385,9 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, if (!p) flags |= BIT(BR_FDB_LOCAL);
+ if (locked) + flags |= BIT(BR_FDB_ENTRY_LOCKED); + fdb = fdb_create(br, p, addr, vid, flags); if (!fdb) { err = -ENOMEM; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index be17c99efe65..88913e6a59e1 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -815,7 +815,7 @@ int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p); void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p); int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid, - bool swdev_notify); + bool swdev_notify, bool locked); int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid, bool swdev_notify);
On Tue, May 24, 2022 at 05:21:42PM +0200, Hans Schultz wrote:
Used for Mac-auth/MAB feature in the offloaded case.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
include/net/dsa.h | 6 ++++++ include/net/switchdev.h | 3 ++- net/bridge/br.c | 3 ++- net/bridge/br_fdb.c | 7 +++++-- net/bridge/br_private.h | 2 +- 5 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h index 14f07275852b..a5a843b2d67d 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -330,6 +330,12 @@ struct dsa_port { /* List of VLANs that CPU and DSA ports are members of. */ struct mutex vlans_lock; struct list_head vlans;
- /* List and maintenance of locked ATU entries */
- struct mutex locked_entries_list_lock;
- struct list_head atu_locked_entries_list;
- atomic_t atu_locked_entry_cnt;
- struct delayed_work atu_work;
DSA is not Marvell only, so please remove these from struct dsa_port and place them somewhere like struct mv88e6xxx_port. Also, the change has nothing to do in a patch with the "net: switchdev: " prefix.
}; /* TODO: ideally DSA ports would have a single dp->link_dp member, diff --git a/include/net/switchdev.h b/include/net/switchdev.h index aa0171d5786d..62f4f7c9c7c2 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -245,7 +245,8 @@ struct switchdev_notifier_fdb_info { u16 vid; u8 added_by_user:1, is_local:1,
offloaded:1;
offloaded:1,
locked:1;
As mentioned by Ido, please update br_switchdev_fdb_populate() as part of this change, in the bridge->switchdev direction. We should add a comment near struct switchdev_notifier_fdb_info stating just that, so that people don't forget.
}; struct switchdev_notifier_port_obj_info { diff --git a/net/bridge/br.c b/net/bridge/br.c index 96e91d69a9a8..12933388a5a4 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -166,7 +166,8 @@ static int br_switchdev_event(struct notifier_block *unused, case SWITCHDEV_FDB_ADD_TO_BRIDGE: fdb_info = ptr; err = br_fdb_external_learn_add(br, p, fdb_info->addr,
fdb_info->vid, false);
fdb_info->vid, false,
if (err) { err = notifier_from_errno(err); break;fdb_info->locked);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 6b83e2d6435d..92469547283a 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -1135,7 +1135,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br, "FDB entry towards bridge must be permanent"); return -EINVAL; }
err = br_fdb_external_learn_add(br, p, addr, vid, true);
} else { spin_lock_bh(&br->hash_lock); err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb);err = br_fdb_external_learn_add(br, p, addr, vid, true, false);
@@ -1365,7 +1365,7 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p) int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid,
bool swdev_notify)
bool swdev_notify, bool locked)
{ struct net_bridge_fdb_entry *fdb; bool modified = false; @@ -1385,6 +1385,9 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, if (!p) flags |= BIT(BR_FDB_LOCAL);
if (locked)
flags |= BIT(BR_FDB_ENTRY_LOCKED);
- fdb = fdb_create(br, p, addr, vid, flags); if (!fdb) { err = -ENOMEM;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index be17c99efe65..88913e6a59e1 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -815,7 +815,7 @@ int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p); void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p); int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid,
bool swdev_notify);
bool swdev_notify, bool locked);
int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid, bool swdev_notify); -- 2.30.2
This implementation for the Marvell mv88e6xxx chip series, is based on handling ATU miss violations occurring when packets ingress on a port that is locked. The mac address triggering the ATU miss violation is communicated through switchdev to the bridge module, which adds a fdb entry with the fdb locked flag set. The entry is kept according to the bridges ageing time, thus simulating a dynamic entry.
Note: The locked port must have learning enabled for the ATU miss violation to occur.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com --- drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 40 ++- drivers/net/dsa/mv88e6xxx/chip.h | 5 + drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global1_atu.c | 35 ++- .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c | 249 ++++++++++++++++++ .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h | 40 +++ drivers/net/dsa/mv88e6xxx/port.c | 32 ++- drivers/net/dsa/mv88e6xxx/port.h | 2 + 9 files changed, 389 insertions(+), 16 deletions(-) create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
diff --git a/drivers/net/dsa/mv88e6xxx/Makefile b/drivers/net/dsa/mv88e6xxx/Makefile index c8eca2b6f959..3ca57709730d 100644 --- a/drivers/net/dsa/mv88e6xxx/Makefile +++ b/drivers/net/dsa/mv88e6xxx/Makefile @@ -15,3 +15,4 @@ mv88e6xxx-objs += port_hidden.o mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_PTP) += ptp.o mv88e6xxx-objs += serdes.o mv88e6xxx-objs += smi.o +mv88e6xxx-objs += mv88e6xxx_switchdev.o \ No newline at end of file diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 5d2c57a7c708..f7a16886bee9 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -42,6 +42,7 @@ #include "ptp.h" #include "serdes.h" #include "smi.h" +#include "mv88e6xxx_switchdev.h"
static void assert_reg_lock(struct mv88e6xxx_chip *chip) { @@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port); + else + if (mv88e6xxx_port_is_locked(chip, port, true)) + mv88e6xxx_atu_locked_entry_flush(ds, port); }
static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port, @@ -1685,6 +1689,9 @@ static void mv88e6xxx_port_fast_age(struct dsa_switch *ds, int port) struct mv88e6xxx_chip *chip = ds->priv; int err;
+ if (mv88e6xxx_port_is_locked(chip, port, true)) + mv88e6xxx_atu_locked_entry_flush(ds, port); + mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_fast_age_fid(chip, port, 0); mv88e6xxx_reg_unlock(chip); @@ -1721,11 +1728,11 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid, return err; }
-static int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip, - int (*cb)(struct mv88e6xxx_chip *chip, - const struct mv88e6xxx_vtu_entry *entry, - void *priv), - void *priv) +int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip, + int (*cb)(struct mv88e6xxx_chip *chip, + const struct mv88e6xxx_vtu_entry *entry, + void *priv), + void *priv) { struct mv88e6xxx_vtu_entry entry = { .vid = mv88e6xxx_max_vid(chip), @@ -2722,9 +2729,12 @@ static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port, struct mv88e6xxx_chip *chip = ds->priv; int err;
+ if (mv88e6xxx_port_is_locked(chip, port, true)) + mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid); + mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, - MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC); + MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC); mv88e6xxx_reg_unlock(chip);
return err; @@ -2735,12 +2745,17 @@ static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port, struct dsa_db db) { struct mv88e6xxx_chip *chip = ds->priv; + bool locked_found = false; int err;
- mv88e6xxx_reg_lock(chip); - err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0); - mv88e6xxx_reg_unlock(chip); + if (mv88e6xxx_port_is_locked(chip, port, true)) + locked_found = mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid);
+ if (!locked_found) { + mv88e6xxx_reg_lock(chip); + err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0); + mv88e6xxx_reg_unlock(chip); + } return err; }
@@ -3809,11 +3824,16 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
static int mv88e6xxx_port_setup(struct dsa_switch *ds, int port) { - return mv88e6xxx_setup_devlink_regions_port(ds, port); + int err; + + err = mv88e6xxx_setup_devlink_regions_port(ds, port); + mv88e6xxx_init_violation_handler(ds, port); + return err; }
static void mv88e6xxx_port_teardown(struct dsa_switch *ds, int port) { + mv88e6xxx_teardown_violation_handler(ds, port); mv88e6xxx_teardown_devlink_regions_port(ds, port); }
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h index 5e03cfe50156..c9a8404a6293 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.h +++ b/drivers/net/dsa/mv88e6xxx/chip.h @@ -803,6 +803,11 @@ static inline void mv88e6xxx_reg_unlock(struct mv88e6xxx_chip *chip) mutex_unlock(&chip->reg_lock); }
+int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip, + int (*cb)(struct mv88e6xxx_chip *chip, + const struct mv88e6xxx_vtu_entry *entry, + void *priv), + void *priv); int mv88e6xxx_fid_map(struct mv88e6xxx_chip *chip, unsigned long *bitmap);
#endif /* _MV88E6XXX_CHIP_H */ diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h index 65958b2a0d3a..503fbf216670 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.h +++ b/drivers/net/dsa/mv88e6xxx/global1.h @@ -136,6 +136,7 @@ #define MV88E6XXX_G1_ATU_DATA_TRUNK 0x8000 #define MV88E6XXX_G1_ATU_DATA_TRUNK_ID_MASK 0x00f0 #define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_MASK 0x3ff0 +#define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_NO_EGRESS 0x0000 #define MV88E6XXX_G1_ATU_DATA_STATE_MASK 0x000f #define MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED 0x0000 #define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_1_OLDEST 0x0001 diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c index 40bd67a5c8e9..517376271f64 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c @@ -12,6 +12,8 @@
#include "chip.h" #include "global1.h" +#include "port.h" +#include "mv88e6xxx_switchdev.h"
/* Offset 0x01: ATU FID Register */
@@ -114,6 +116,18 @@ static int mv88e6xxx_g1_atu_op_wait(struct mv88e6xxx_chip *chip) return mv88e6xxx_g1_wait_bit(chip, MV88E6XXX_G1_ATU_OP, bit, 0); }
+static int mv88e6xxx_g1_read_atu_violation(struct mv88e6xxx_chip *chip) +{ + int err; + + err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_ATU_OP, + MV88E6XXX_G1_ATU_OP_BUSY | MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION); + if (err) + return err; + + return mv88e6xxx_g1_atu_op_wait(chip); +} + static int mv88e6xxx_g1_atu_op(struct mv88e6xxx_chip *chip, u16 fid, u16 op) { u16 val; @@ -356,11 +370,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) int spid; int err; u16 val; + u16 fid;
mv88e6xxx_reg_lock(chip);
- err = mv88e6xxx_g1_atu_op(chip, 0, - MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION); + err = mv88e6xxx_g1_read_atu_violation(chip); if (err) goto out;
@@ -368,6 +382,10 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) if (err) goto out;
+ err = mv88e6xxx_g1_read(chip, MV88E6352_G1_ATU_FID, &fid); + if (err) + goto out; + err = mv88e6xxx_g1_atu_data_read(chip, &entry); if (err) goto out; @@ -382,6 +400,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) dev_err_ratelimited(chip->dev, "ATU age out violation for %pM\n", entry.mac); + err = mv88e6xxx_handle_violation(chip, + chip->ports[spid].port, + &entry, + fid, + MV88E6XXX_G1_ATU_OP_AGE_OUT_VIOLATION); }
if (val & MV88E6XXX_G1_ATU_OP_MEMBER_VIOLATION) { @@ -396,6 +419,14 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) "ATU miss violation for %pM portvec %x spid %d\n", entry.mac, entry.portvec, spid); chip->ports[spid].atu_miss_violation++; + if (mv88e6xxx_port_is_locked(chip, chip->ports[spid].port, false)) + err = mv88e6xxx_handle_violation(chip, + chip->ports[spid].port, + &entry, + fid, + MV88E6XXX_G1_ATU_OP_MISS_VIOLATION); + if (err) + goto out; }
if (val & MV88E6XXX_G1_ATU_OP_FULL_VIOLATION) { diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c new file mode 100644 index 000000000000..8436655ceb9a --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * mv88e6xxx_switchdev.c + * + * Authors: + * Hans J. Schultz hans.schultz@westermo.com + * + */ + +#include <net/switchdev.h> +#include <linux/list.h> +#include "chip.h" +#include "global1.h" +#include "mv88e6xxx_switchdev.h" + +static void mv88e6xxx_atu_locked_entry_timer_work(struct atu_locked_entry *ale) +{ + struct switchdev_notifier_fdb_info info = { + .addr = ale->mac, + .vid = ale->vid, + .added_by_user = false, + .is_local = false, + .offloaded = true, + .locked = true, + }; + struct mv88e6xxx_atu_entry entry; + struct net_device *brport; + struct dsa_port *dp; + + entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED; + entry.trunk = false; + memcpy(&entry.mac, &ale->mac, ETH_ALEN); + + mv88e6xxx_reg_lock(ale->chip); + mv88e6xxx_g1_atu_loadpurge(ale->chip, ale->fid, &entry); + mv88e6xxx_reg_unlock(ale->chip); + + dp = dsa_to_port(ale->chip->ds, ale->port); + brport = dsa_port_to_bridge_port(dp); + + if (brport) { + if (!rtnl_is_locked()) { + rtnl_lock(); + call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_BRIDGE, + brport, &info.info, NULL); + rtnl_unlock(); + } else { + call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_BRIDGE, + brport, &info.info, NULL); + } + } else { + dev_err(ale->chip->dev, "ERR: No bridge port for dsa port belonging to port %d\n", + ale->port); + } +} + +static inline void mv88e6xxx_atu_locked_entry_purge(struct atu_locked_entry *ale) +{ + mv88e6xxx_atu_locked_entry_timer_work(ale); + del_timer(&ale->timer); + list_del(&ale->list); + kfree(ale); +} + +static void mv88e6xxx_atu_locked_entry_cleanup(struct work_struct *work) +{ + struct dsa_port *dp = container_of(work, struct dsa_port, atu_work.work); + struct atu_locked_entry *ale, *tmp; + + mutex_lock(&dp->locked_entries_list_lock); + list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) { + if (ale->timed_out) { + mv88e6xxx_atu_locked_entry_purge(ale); + atomic_dec(&dp->atu_locked_entry_cnt); + } + } + mutex_unlock(&dp->locked_entries_list_lock); + + mod_delayed_work(system_long_wq, &dp->atu_work, msecs_to_jiffies(100)); +} + +static void mv88e6xxx_atu_locked_entry_timer_handler(struct timer_list *t) +{ + struct atu_locked_entry *ale = from_timer(ale, t, timer); + + if (ale) + ale->timed_out = true; +} + +struct mv88e6xxx_fid_search_ctx { + u16 fid_search; + u16 vid_found; +}; + +static int mv88e6xxx_find_vid_on_matching_fid(struct mv88e6xxx_chip *chip, + const struct mv88e6xxx_vtu_entry *entry, + void *priv) +{ + struct mv88e6xxx_fid_search_ctx *ctx = priv; + + if (ctx->fid_search == entry->fid) { + ctx->vid_found = entry->vid; + return 1; + } + + return 0; +} + +int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip, + int port, + struct mv88e6xxx_atu_entry *entry, + u16 fid, + u16 type) +{ + struct switchdev_notifier_fdb_info info = { + .addr = entry->mac, + .vid = 0, + .added_by_user = false, + .is_local = false, + .offloaded = true, + .locked = true, + }; + struct atu_locked_entry *locked_entry; + struct mv88e6xxx_fid_search_ctx ctx; + struct net_device *brport; + struct dsa_port *dp; + int err; + + ctx.fid_search = fid; + err = mv88e6xxx_vtu_walk(chip, mv88e6xxx_find_vid_on_matching_fid, &ctx); + if (err < 0) + return err; + if (err == 1) + info.vid = ctx.vid_found; + else + return -ENODATA; + + dp = dsa_to_port(chip->ds, port); + brport = dsa_port_to_bridge_port(dp); + + if (!brport) + return -ENODEV; + + switch (type) { + case MV88E6XXX_G1_ATU_OP_MISS_VIOLATION: + if (atomic_read(&dp->atu_locked_entry_cnt) >= ATU_LOCKED_ENTRIES_MAX) { + mv88e6xxx_reg_unlock(chip); + return 0; + } + entry->portvec = MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_NO_EGRESS; + entry->state = MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC; + entry->trunk = false; + err = mv88e6xxx_g1_atu_loadpurge(chip, fid, entry); + if (err) + goto fail; + + locked_entry = kmalloc(sizeof(*locked_entry), GFP_ATOMIC); + if (!locked_entry) + return -ENOMEM; + timer_setup(&locked_entry->timer, mv88e6xxx_atu_locked_entry_timer_handler, 0); + locked_entry->timer.expires = jiffies + dp->ageing_time / 10; + locked_entry->chip = chip; + locked_entry->port = port; + locked_entry->fid = fid; + locked_entry->vid = info.vid; + locked_entry->timed_out = false; + memcpy(&locked_entry->mac, entry->mac, ETH_ALEN); + + mutex_lock(&dp->locked_entries_list_lock); + add_timer(&locked_entry->timer); + list_add(&locked_entry->list, &dp->atu_locked_entries_list); + mutex_unlock(&dp->locked_entries_list_lock); + atomic_inc(&dp->atu_locked_entry_cnt); + + mv88e6xxx_reg_unlock(chip); + + rtnl_lock(); + err = call_switchdev_notifiers(SWITCHDEV_FDB_ADD_TO_BRIDGE, + brport, &info.info, NULL); + break; + } + rtnl_unlock(); + + return err; + +fail: + mv88e6xxx_reg_unlock(chip); + return err; +} + +bool mv88e6xxx_atu_locked_entry_find_purge(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid) +{ + struct dsa_port *dp = dsa_to_port(ds, port); + struct atu_locked_entry *ale, *tmp; + bool found = false; + + mutex_lock(&dp->locked_entries_list_lock); + list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) { + if (!memcmp(&ale->mac, addr, ETH_ALEN)) { + if (ale->vid == vid) { + mv88e6xxx_atu_locked_entry_purge(ale); + atomic_dec(&dp->atu_locked_entry_cnt); + found = true; + break; + } + } + } + mutex_unlock(&dp->locked_entries_list_lock); + return found; +} + +void mv88e6xxx_atu_locked_entry_flush(struct dsa_switch *ds, int port) +{ + struct dsa_port *dp = dsa_to_port(ds, port); + struct atu_locked_entry *ale, *tmp; + + mutex_lock(&dp->locked_entries_list_lock); + list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) { + mv88e6xxx_atu_locked_entry_purge(ale); + atomic_dec(&dp->atu_locked_entry_cnt); + } + mutex_unlock(&dp->locked_entries_list_lock); + + if (atomic_read(&dp->atu_locked_entry_cnt) != 0) + dev_err(ds->dev, + "ERROR: Locked entries count is not zero after flush on port %d\n", + port); +} + +void mv88e6xxx_init_violation_handler(struct dsa_switch *ds, int port) +{ + struct dsa_port *dp = dsa_to_port(ds, port); + + INIT_LIST_HEAD(&dp->atu_locked_entries_list); + mutex_init(&dp->locked_entries_list_lock); + dp->atu_locked_entry_cnt.counter = 0; + INIT_DELAYED_WORK(&dp->atu_work, mv88e6xxx_atu_locked_entry_cleanup); + mod_delayed_work(system_long_wq, &dp->atu_work, msecs_to_jiffies(100)); +} + +void mv88e6xxx_teardown_violation_handler(struct dsa_switch *ds, int port) +{ + struct dsa_port *dp = dsa_to_port(ds, port); + + cancel_delayed_work(&dp->atu_work); + mv88e6xxx_atu_locked_entry_flush(ds, port); + mutex_destroy(&dp->locked_entries_list_lock); +} diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h new file mode 100644 index 000000000000..f0e7abf7c361 --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * + * mv88e6xxx_switchdev.h + * + * Authors: + * Hans J. Schultz hans.schultz@westermo.com + * + */ + +#ifndef DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_ +#define DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_ + +#include <net/switchdev.h> +#include "chip.h" + +#define ATU_LOCKED_ENTRIES_MAX 64 + +struct atu_locked_entry { + struct list_head list; + struct mv88e6xxx_chip *chip; + int port; + u8 mac[ETH_ALEN]; + u16 fid; + u16 vid; + struct timer_list timer; + bool timed_out; +}; + +int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip, + int port, + struct mv88e6xxx_atu_entry *entry, + u16 fid, + u16 type); +bool mv88e6xxx_atu_locked_entry_find_purge(struct dsa_switch *ds, int port, + const unsigned char *addr, u16 vid); +void mv88e6xxx_atu_locked_entry_flush(struct dsa_switch *ds, int port); +void mv88e6xxx_init_violation_handler(struct dsa_switch *ds, int port); +void mv88e6xxx_teardown_violation_handler(struct dsa_switch *ds, int port); + +#endif /* DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_ */ diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c index 795b3128768f..c4e5e7174129 100644 --- a/drivers/net/dsa/mv88e6xxx/port.c +++ b/drivers/net/dsa/mv88e6xxx/port.c @@ -14,9 +14,11 @@ #include <linux/phylink.h>
#include "chip.h" +#include "global1.h" #include "global2.h" #include "port.h" #include "serdes.h" +#include "mv88e6xxx_switchdev.h"
int mv88e6xxx_port_read(struct mv88e6xxx_chip *chip, int port, int reg, u16 *val) @@ -1239,6 +1241,25 @@ int mv88e6xxx_port_set_mirror(struct mv88e6xxx_chip *chip, int port, return err; }
+bool mv88e6xxx_port_is_locked(struct mv88e6xxx_chip *chip, int port, bool chiplock) +{ + bool locked = false; + u16 reg; + + if (chiplock) + mv88e6xxx_reg_lock(chip); + + if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®)) + goto out; + locked = reg & MV88E6XXX_PORT_CTL0_SA_FILT_DROP_ON_LOCK; + +out: + if (chiplock) + mv88e6xxx_reg_unlock(chip); + + return locked; +} + int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, bool locked) { @@ -1261,10 +1282,13 @@ int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, if (err) return err;
- reg &= ~MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT; - if (locked) - reg |= MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT; - + reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK; + if (locked) { + reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | + MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | + MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | + MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1; + } return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_ASSOC_VECTOR, reg); }
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h index e0a705d82019..d377abd6ab17 100644 --- a/drivers/net/dsa/mv88e6xxx/port.h +++ b/drivers/net/dsa/mv88e6xxx/port.h @@ -231,6 +231,7 @@ #define MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT 0x2000 #define MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG 0x1000 #define MV88E6XXX_PORT_ASSOC_VECTOR_REFRESH_LOCKED 0x0800 +#define MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK 0x07ff
/* Offset 0x0C: Port ATU Control */ #define MV88E6XXX_PORT_ATU_CTL 0x0c @@ -374,6 +375,7 @@ int mv88e6xxx_port_set_fid(struct mv88e6xxx_chip *chip, int port, u16 fid); int mv88e6xxx_port_get_pvid(struct mv88e6xxx_chip *chip, int port, u16 *pvid); int mv88e6xxx_port_set_pvid(struct mv88e6xxx_chip *chip, int port, u16 pvid);
+bool mv88e6xxx_port_is_locked(struct mv88e6xxx_chip *chip, int port, bool chiplock); int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, bool locked);
Hi Hans,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Hans-Schultz/Extend-locked-po... base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 677fb7525331375ba2f90f4bc94a80b9b6e697a3 config: x86_64-randconfig-a005 (https://download.01.org/0day-ci/archive/20220525/202205250511.beuDDi9L-lkp@i...) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/5c2d731ec7670b3eb06906c64d66c6... git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Hans-Schultz/Extend-locked-port-feature-with-FDB-locked-flag-MAC-Auth-MAB/20220524-232455 git checkout 5c2d731ec7670b3eb06906c64d66c6098c588a6a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/dsa/mv88e6xxx/
If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot lkp@intel.com
All warnings (new ones prefixed by >>):
drivers/net/dsa/mv88e6xxx/chip.c:2754:6: warning: variable 'err' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (!locked_found) { ^~~~~~~~~~~~~ drivers/net/dsa/mv88e6xxx/chip.c:2759:9: note: uninitialized use occurs here return err; ^~~ drivers/net/dsa/mv88e6xxx/chip.c:2754:2: note: remove the 'if' if its condition is always true if (!locked_found) { ^~~~~~~~~~~~~~~~~~~ drivers/net/dsa/mv88e6xxx/chip.c:2749:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated.
vim +2754 drivers/net/dsa/mv88e6xxx/chip.c
2742 2743 static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port, 2744 const unsigned char *addr, u16 vid, 2745 struct dsa_db db) 2746 { 2747 struct mv88e6xxx_chip *chip = ds->priv; 2748 bool locked_found = false; 2749 int err; 2750 2751 if (mv88e6xxx_port_is_locked(chip, port, true)) 2752 locked_found = mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid); 2753
2754 if (!locked_found) {
2755 mv88e6xxx_reg_lock(chip); 2756 err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0); 2757 mv88e6xxx_reg_unlock(chip); 2758 } 2759 return err; 2760 } 2761
Hi Vladimir, maybe you have missed my upstreaming of this patch set...
According to our earlier discussions I have now implemented the feature, so that the ATU locked entries are owned by the driver, so to make the entries dynamic I add the entries to a list and use kernel timers to age out the entries as they should be 'dynamic'. See mv88e6xxx_switchdev.c.
Hans
On Tue, May 24, 2022 at 5:22 PM Hans Schultz schultz.hans@gmail.com wrote:
This implementation for the Marvell mv88e6xxx chip series, is based on handling ATU miss violations occurring when packets ingress on a port that is locked. The mac address triggering the ATU miss violation is communicated through switchdev to the bridge module, which adds a fdb entry with the fdb locked flag set. The entry is kept according to the bridges ageing time, thus simulating a dynamic entry.
Note: The locked port must have learning enabled for the ATU miss violation to occur.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 40 ++- drivers/net/dsa/mv88e6xxx/chip.h | 5 + drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global1_atu.c | 35 ++- .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c | 249 ++++++++++++++++++ .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h | 40 +++ drivers/net/dsa/mv88e6xxx/port.c | 32 ++- drivers/net/dsa/mv88e6xxx/port.h | 2 + 9 files changed, 389 insertions(+), 16 deletions(-) create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
Hi Hans,
On Tue, May 24, 2022 at 05:21:43PM +0200, Hans Schultz wrote:
This implementation for the Marvell mv88e6xxx chip series, is based on handling ATU miss violations occurring when packets ingress on a port that is locked. The mac address triggering the ATU miss violation is communicated through switchdev to the bridge module, which adds a fdb entry with the fdb locked flag set. The entry is kept according to the bridges ageing time, thus simulating a dynamic entry.
Note: The locked port must have learning enabled for the ATU miss violation to occur.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
I'm sorry that I couldn't focus on the big picture of this patch, but locking is an absolute disaster and I just stopped after a while, it's really distracting :)
Would you mind addressing the feedback below first, and I'll take another look when you send v4?
drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 40 ++- drivers/net/dsa/mv88e6xxx/chip.h | 5 + drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global1_atu.c | 35 ++- .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c | 249 ++++++++++++++++++ .../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h | 40 +++ drivers/net/dsa/mv88e6xxx/port.c | 32 ++- drivers/net/dsa/mv88e6xxx/port.h | 2 + 9 files changed, 389 insertions(+), 16 deletions(-) create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
diff --git a/drivers/net/dsa/mv88e6xxx/Makefile b/drivers/net/dsa/mv88e6xxx/Makefile index c8eca2b6f959..3ca57709730d 100644 --- a/drivers/net/dsa/mv88e6xxx/Makefile +++ b/drivers/net/dsa/mv88e6xxx/Makefile @@ -15,3 +15,4 @@ mv88e6xxx-objs += port_hidden.o mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_PTP) += ptp.o mv88e6xxx-objs += serdes.o mv88e6xxx-objs += smi.o +mv88e6xxx-objs += mv88e6xxx_switchdev.o \ No newline at end of file diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 5d2c57a7c708..f7a16886bee9 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -42,6 +42,7 @@ #include "ptp.h" #include "serdes.h" #include "smi.h" +#include "mv88e6xxx_switchdev.h" static void assert_reg_lock(struct mv88e6xxx_chip *chip) { @@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
- else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
} static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port, @@ -1685,6 +1689,9 @@ static void mv88e6xxx_port_fast_age(struct dsa_switch *ds, int port) struct mv88e6xxx_chip *chip = ds->priv; int err;
- if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
Dumb question: if you only flush the locked entries at fast age if the port is locked, then what happens with the existing locked entries if the port becomes unlocked before an FDB flush takes place? Shouldn't mv88e6xxx_port_set_lock() call mv88e6xxx_atu_locked_entry_flush() too?
mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_fast_age_fid(chip, port, 0); mv88e6xxx_reg_unlock(chip); @@ -1721,11 +1728,11 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid, return err; } -static int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip,
int (*cb)(struct mv88e6xxx_chip *chip,
const struct mv88e6xxx_vtu_entry *entry,
void *priv),
void *priv)
+int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip,
int (*cb)(struct mv88e6xxx_chip *chip,
const struct mv88e6xxx_vtu_entry *entry,
void *priv),
void *priv)
{ struct mv88e6xxx_vtu_entry entry = { .vid = mv88e6xxx_max_vid(chip), @@ -2722,9 +2729,12 @@ static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port, struct mv88e6xxx_chip *chip = ds->priv; int err;
- if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid);
- mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid,
MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
Unrelated and unjustified change.
mv88e6xxx_reg_unlock(chip); return err; @@ -2735,12 +2745,17 @@ static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port, struct dsa_db db) { struct mv88e6xxx_chip *chip = ds->priv;
- bool locked_found = false; int err;
- mv88e6xxx_reg_lock(chip);
- err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0);
- mv88e6xxx_reg_unlock(chip);
- if (mv88e6xxx_port_is_locked(chip, port, true))
locked_found = mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid);
- if (!locked_found) {
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0);
mv88e6xxx_reg_unlock(chip);
- } return err;
} @@ -3809,11 +3824,16 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) static int mv88e6xxx_port_setup(struct dsa_switch *ds, int port) {
- return mv88e6xxx_setup_devlink_regions_port(ds, port);
- int err;
- err = mv88e6xxx_setup_devlink_regions_port(ds, port);
- mv88e6xxx_init_violation_handler(ds, port);
What's with this quirky placement? You need to do error checking and call mv88e6xxx_teardown_violation_handler() if setting up the devlink port regions fails, otherwise the port will fail to probe but no one will quiesce its delayed ATU work.
By the way, do all mv88e6xxx switches support 802.1X and MAC Auth Bypass, or do we need to initialize these structures depending on some capability?
- return err;
} static void mv88e6xxx_port_teardown(struct dsa_switch *ds, int port) {
- mv88e6xxx_teardown_violation_handler(ds, port); mv88e6xxx_teardown_devlink_regions_port(ds, port);
} diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h index 5e03cfe50156..c9a8404a6293 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.h +++ b/drivers/net/dsa/mv88e6xxx/chip.h @@ -803,6 +803,11 @@ static inline void mv88e6xxx_reg_unlock(struct mv88e6xxx_chip *chip) mutex_unlock(&chip->reg_lock); } +int mv88e6xxx_vtu_walk(struct mv88e6xxx_chip *chip,
int (*cb)(struct mv88e6xxx_chip *chip,
const struct mv88e6xxx_vtu_entry *entry,
void *priv),
void *priv);
int mv88e6xxx_fid_map(struct mv88e6xxx_chip *chip, unsigned long *bitmap); #endif /* _MV88E6XXX_CHIP_H */ diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h index 65958b2a0d3a..503fbf216670 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.h +++ b/drivers/net/dsa/mv88e6xxx/global1.h @@ -136,6 +136,7 @@ #define MV88E6XXX_G1_ATU_DATA_TRUNK 0x8000 #define MV88E6XXX_G1_ATU_DATA_TRUNK_ID_MASK 0x00f0 #define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_MASK 0x3ff0 +#define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_NO_EGRESS 0x0000 #define MV88E6XXX_G1_ATU_DATA_STATE_MASK 0x000f #define MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED 0x0000 #define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_1_OLDEST 0x0001 diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c index 40bd67a5c8e9..517376271f64 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c @@ -12,6 +12,8 @@ #include "chip.h" #include "global1.h" +#include "port.h" +#include "mv88e6xxx_switchdev.h" /* Offset 0x01: ATU FID Register */ @@ -114,6 +116,18 @@ static int mv88e6xxx_g1_atu_op_wait(struct mv88e6xxx_chip *chip) return mv88e6xxx_g1_wait_bit(chip, MV88E6XXX_G1_ATU_OP, bit, 0); } +static int mv88e6xxx_g1_read_atu_violation(struct mv88e6xxx_chip *chip) +{
- int err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_ATU_OP,
MV88E6XXX_G1_ATU_OP_BUSY | MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
Split on 3 lines please.
- if (err)
return err;
- return mv88e6xxx_g1_atu_op_wait(chip);
+}
static int mv88e6xxx_g1_atu_op(struct mv88e6xxx_chip *chip, u16 fid, u16 op) { u16 val; @@ -356,11 +370,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) int spid; int err; u16 val;
- u16 fid;
mv88e6xxx_reg_lock(chip);
- err = mv88e6xxx_g1_atu_op(chip, 0,
MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
- err = mv88e6xxx_g1_read_atu_violation(chip);
I cannot comment on the validity of this change: previously, we were writing FID 0 as part of mv88e6xxx_g1_atu_op(), now we are reading back the FID. Definitely too much going on in a single change, this needs a separate patch with an explanation.
if (err) goto out; @@ -368,6 +382,10 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) if (err) goto out;
- err = mv88e6xxx_g1_read(chip, MV88E6352_G1_ATU_FID, &fid);
- if (err)
goto out;
Is it ok to read the MV88E6352_G1_ATU_FID register from an IRQ handler common for all switches, I wonder?
- err = mv88e6xxx_g1_atu_data_read(chip, &entry); if (err) goto out;
@@ -382,6 +400,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) dev_err_ratelimited(chip->dev, "ATU age out violation for %pM\n", entry.mac);
err = mv88e6xxx_handle_violation(chip,
chip->ports[spid].port,
Dumb question: isn't chip->ports[spid].port == spid?
&entry,
fid,
MV88E6XXX_G1_ATU_OP_AGE_OUT_VIOLATION);
This fits on 3 lines instead of 5 (and same below).
} if (val & MV88E6XXX_G1_ATU_OP_MEMBER_VIOLATION) { @@ -396,6 +419,14 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) "ATU miss violation for %pM portvec %x spid %d\n", entry.mac, entry.portvec, spid); chip->ports[spid].atu_miss_violation++;
if (mv88e6xxx_port_is_locked(chip, chip->ports[spid].port, false))
err = mv88e6xxx_handle_violation(chip,
chip->ports[spid].port,
&entry,
fid,
MV88E6XXX_G1_ATU_OP_MISS_VIOLATION);
if (err)
}goto out;
if (val & MV88E6XXX_G1_ATU_OP_FULL_VIOLATION) { diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c new file mode 100644 index 000000000000..8436655ceb9a --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*
- mv88e6xxx_switchdev.c
- Authors:
- Hans J. Schultz hans.schultz@westermo.com
- */
+#include <net/switchdev.h> +#include <linux/list.h> +#include "chip.h" +#include "global1.h" +#include "mv88e6xxx_switchdev.h"
+static void mv88e6xxx_atu_locked_entry_timer_work(struct atu_locked_entry *ale)
Please find a more adequate name for this function.
+{
- struct switchdev_notifier_fdb_info info = {
.addr = ale->mac,
.vid = ale->vid,
.added_by_user = false,
.is_local = false,
No need to have an initializer for the false members.
.offloaded = true,
.locked = true,
- };
- struct mv88e6xxx_atu_entry entry;
- struct net_device *brport;
- struct dsa_port *dp;
- entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED;
- entry.trunk = false;
- memcpy(&entry.mac, &ale->mac, ETH_ALEN);
ether_addr_copy
- mv88e6xxx_reg_lock(ale->chip);
- mv88e6xxx_g1_atu_loadpurge(ale->chip, ale->fid, &entry);
The portvec will be junk memory that's on stack, is that what you want?
- mv88e6xxx_reg_unlock(ale->chip);
- dp = dsa_to_port(ale->chip->ds, ale->port);
- brport = dsa_port_to_bridge_port(dp);
- if (brport) {
if (!rtnl_is_locked()) {
rtnl_lock();
No, no, no, no, no, no, no.
As I've explained already: https://patchwork.kernel.org/project/netdevbpf/patch/20220317093902.1305816-... dsa_port_to_bridge_port() needs to be called with the rtnl_mutex held.
Please take a moment to figure out which function expects which lock and for what operation, then draw a call graph, figure out a consistent lock hierarchy where things are always acquired in the same order, and if a function needs a locking context but not all callers offer it, put an ASSERT_RTNL() (for example) and transfer the locking responsibility to the caller.
Doing this will also help you name your functions better than "locked entry timer work" (which are called from... drum roll... mv88e6xxx_port_fdb_del and mv88e6xxx_port_fast_age).
Which by the way, reminds me that..... You can't take rtnl_lock() from port_fdb_add() and port_fdb_del(), see commits d7d0d423dbaa ("net: dsa: flush switchdev workqueue when leaving the bridge") and 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), as you'll deadlock with dsa_port_pre_bridge_leave(). In fact you never could, but for a slightly different reason.
From the discussion with Ido and Nikolay I get the impression that you're not doing the right thing here either, notifying a SWITCHDEV_FDB_DEL_TO_BRIDGE from what is effectively the SWITCHDEV_FDB_DEL_TO_DEVICE handler (port_fdb_del).
call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_BRIDGE,
brport, &info.info, NULL);
rtnl_unlock();
} else {
call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_BRIDGE,
brport, &info.info, NULL);
}
- } else {
dev_err(ale->chip->dev, "ERR: No bridge port for dsa port belonging to port %d\n",
ale->port);
- }
+}
+static inline void mv88e6xxx_atu_locked_entry_purge(struct atu_locked_entry *ale)
No inline functions in .c files.
+{
- mv88e6xxx_atu_locked_entry_timer_work(ale);
- del_timer(&ale->timer);
- list_del(&ale->list);
- kfree(ale);
+}
+static void mv88e6xxx_atu_locked_entry_cleanup(struct work_struct *work) +{
- struct dsa_port *dp = container_of(work, struct dsa_port, atu_work.work);
- struct atu_locked_entry *ale, *tmp;
- mutex_lock(&dp->locked_entries_list_lock);
- list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) {
if (ale->timed_out) {
mv88e6xxx_atu_locked_entry_purge(ale);
Nasty lock ordering inversion. In mv88e6xxx_handle_violation() we take &dp->locked_entries_list_lock with mv88e6xxx_reg_lock() held. Here (in mv88e6xxx_atu_locked_entry_timer_work called from here) we take mv88e6xxx_reg_lock() with &dp->locked_entries_list_lock held.
atomic_dec(&dp->atu_locked_entry_cnt);
}
- }
- mutex_unlock(&dp->locked_entries_list_lock);
- mod_delayed_work(system_long_wq, &dp->atu_work, msecs_to_jiffies(100));
+}
+static void mv88e6xxx_atu_locked_entry_timer_handler(struct timer_list *t) +{
- struct atu_locked_entry *ale = from_timer(ale, t, timer);
- if (ale)
ale->timed_out = true;
+}
+struct mv88e6xxx_fid_search_ctx {
- u16 fid_search;
- u16 vid_found;
+};
+static int mv88e6xxx_find_vid_on_matching_fid(struct mv88e6xxx_chip *chip,
const struct mv88e6xxx_vtu_entry *entry,
void *priv)
+{
- struct mv88e6xxx_fid_search_ctx *ctx = priv;
- if (ctx->fid_search == entry->fid) {
ctx->vid_found = entry->vid;
return 1;
- }
- return 0;
+}
+int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip,
int port,
struct mv88e6xxx_atu_entry *entry,
u16 fid,
u16 type)
+{
- struct switchdev_notifier_fdb_info info = {
.addr = entry->mac,
.vid = 0,
.added_by_user = false,
.is_local = false,
.offloaded = true,
.locked = true,
- };
- struct atu_locked_entry *locked_entry;
- struct mv88e6xxx_fid_search_ctx ctx;
- struct net_device *brport;
- struct dsa_port *dp;
- int err;
- ctx.fid_search = fid;
- err = mv88e6xxx_vtu_walk(chip, mv88e6xxx_find_vid_on_matching_fid, &ctx);
- if (err < 0)
return err;
- if (err == 1)
info.vid = ctx.vid_found;
- else
return -ENODATA;
- dp = dsa_to_port(chip->ds, port);
- brport = dsa_port_to_bridge_port(dp);
- if (!brport)
return -ENODEV;
- switch (type) {
- case MV88E6XXX_G1_ATU_OP_MISS_VIOLATION:
if (atomic_read(&dp->atu_locked_entry_cnt) >= ATU_LOCKED_ENTRIES_MAX) {
mv88e6xxx_reg_unlock(chip);
You call mv88e6xxx_reg_lock() from mv88e6xxx_g1_atu_prob_irq_thread_fn() and mv88e6xxx_reg_unlock() from mv88e6xxx_handle_violation()? Nice!
And I understand why that is: to avoid a lock ordering inversion with rtnl_lock(). Just unlock the mv88e6xxx registers after the last hardware access in mv88e6xxx_g1_atu_prob_irq_thread_fn() - after mv88e6xxx_g1_atu_mac_read(), and call mv88e6xxx_handle_violation() with the registers unlocked, and lock them when you need them.
return 0;
}
entry->portvec = MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_NO_EGRESS;
entry->state = MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC;
entry->trunk = false;
err = mv88e6xxx_g1_atu_loadpurge(chip, fid, entry);
if (err)
goto fail;
locked_entry = kmalloc(sizeof(*locked_entry), GFP_ATOMIC);
Please be consistent in your naming of struct atu_locked_entry variables, be they "locked_entry" or "ale" or otherwise. And please create a helper function that creates such a structure and initializes it.
if (!locked_entry)
return -ENOMEM;
timer_setup(&locked_entry->timer, mv88e6xxx_atu_locked_entry_timer_handler, 0);
Does this have to be a dedicated timer per entry, or can you just record the "jiffies" at creation time per locked entry, and compare it with the current jiffies from the periodic, sleepable mv88e6xxx_atu_locked_entry_cleanup?
locked_entry->timer.expires = jiffies + dp->ageing_time / 10;
locked_entry->chip = chip;
locked_entry->port = port;
locked_entry->fid = fid;
locked_entry->vid = info.vid;
locked_entry->timed_out = false;
memcpy(&locked_entry->mac, entry->mac, ETH_ALEN);
mutex_lock(&dp->locked_entries_list_lock);
add_timer(&locked_entry->timer);
list_add(&locked_entry->list, &dp->atu_locked_entries_list);
mutex_unlock(&dp->locked_entries_list_lock);
atomic_inc(&dp->atu_locked_entry_cnt);
mv88e6xxx_reg_unlock(chip);
rtnl_lock();
err = call_switchdev_notifiers(SWITCHDEV_FDB_ADD_TO_BRIDGE,
brport, &info.info, NULL);
break;
- }
- rtnl_unlock();
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
- return err;
+fail:
- mv88e6xxx_reg_unlock(chip);
- return err;
+}
+bool mv88e6xxx_atu_locked_entry_find_purge(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid)
+{
- struct dsa_port *dp = dsa_to_port(ds, port);
- struct atu_locked_entry *ale, *tmp;
- bool found = false;
- mutex_lock(&dp->locked_entries_list_lock);
- list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) {
if (!memcmp(&ale->mac, addr, ETH_ALEN)) {
if (ale->vid == vid) {
mv88e6xxx_atu_locked_entry_purge(ale);
atomic_dec(&dp->atu_locked_entry_cnt);
found = true;
break;
}
}
- }
- mutex_unlock(&dp->locked_entries_list_lock);
- return found;
+}
+void mv88e6xxx_atu_locked_entry_flush(struct dsa_switch *ds, int port) +{
- struct dsa_port *dp = dsa_to_port(ds, port);
- struct atu_locked_entry *ale, *tmp;
- mutex_lock(&dp->locked_entries_list_lock);
- list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) {
mv88e6xxx_atu_locked_entry_purge(ale);
atomic_dec(&dp->atu_locked_entry_cnt);
- }
- mutex_unlock(&dp->locked_entries_list_lock);
- if (atomic_read(&dp->atu_locked_entry_cnt) != 0)
dev_err(ds->dev,
"ERROR: Locked entries count is not zero after flush on port %d\n",
port);
And generally speaking, why would you expect it to be 0, since there's nothing that stops this check from racing with mv88e6xxx_handle_violation? I suspect that if locking is properly thought through, the atu_locked_entry_cnt can just be a plain int instead of an improperly used atomic.
Also, random fact: no need to say ERROR when printing with the KERN_ERR log level. It's kind of implied from the log level.
+}
+void mv88e6xxx_init_violation_handler(struct dsa_switch *ds, int port) +{
- struct dsa_port *dp = dsa_to_port(ds, port);
- INIT_LIST_HEAD(&dp->atu_locked_entries_list);
- mutex_init(&dp->locked_entries_list_lock);
- dp->atu_locked_entry_cnt.counter = 0;
atomic_set()
- INIT_DELAYED_WORK(&dp->atu_work, mv88e6xxx_atu_locked_entry_cleanup);
- mod_delayed_work(system_long_wq, &dp->atu_work, msecs_to_jiffies(100));
+}
+void mv88e6xxx_teardown_violation_handler(struct dsa_switch *ds, int port) +{
- struct dsa_port *dp = dsa_to_port(ds, port);
- cancel_delayed_work(&dp->atu_work);
- mv88e6xxx_atu_locked_entry_flush(ds, port);
- mutex_destroy(&dp->locked_entries_list_lock);
+} diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
This and mv88e6xxx_switchdev.c are the only source files belonging to this driver which have the mv88e6xxx_ prefix (others are "chip.c" etc). Can you please follow the convention?
new file mode 100644 index 000000000000..f0e7abf7c361 --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later
- mv88e6xxx_switchdev.h
- Authors:
- Hans J. Schultz hans.schultz@westermo.com
- */
+#ifndef DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_ +#define DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_
+#include <net/switchdev.h> +#include "chip.h"
+#define ATU_LOCKED_ENTRIES_MAX 64
+struct atu_locked_entry {
mv88e6xxx driver specific structure names should be prefixed with mv88e6xxx_.
- struct list_head list;
- struct mv88e6xxx_chip *chip;
- int port;
- u8 mac[ETH_ALEN];
Either align everything with tabs, or nothing.
- u16 fid;
- u16 vid;
- struct timer_list timer;
- bool timed_out;
+};
+int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip,
int port,
struct mv88e6xxx_atu_entry *entry,
u16 fid,
u16 type);
Both this and the function definition can easily fit on 3 lines.
+bool mv88e6xxx_atu_locked_entry_find_purge(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
+void mv88e6xxx_atu_locked_entry_flush(struct dsa_switch *ds, int port); +void mv88e6xxx_init_violation_handler(struct dsa_switch *ds, int port); +void mv88e6xxx_teardown_violation_handler(struct dsa_switch *ds, int port);
+#endif /* DRIVERS_NET_DSA_MV88E6XXX_MV88E6XXX_SWITCHDEV_H_ */ diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c index 795b3128768f..c4e5e7174129 100644 --- a/drivers/net/dsa/mv88e6xxx/port.c +++ b/drivers/net/dsa/mv88e6xxx/port.c @@ -14,9 +14,11 @@ #include <linux/phylink.h> #include "chip.h" +#include "global1.h" #include "global2.h" #include "port.h" #include "serdes.h" +#include "mv88e6xxx_switchdev.h" int mv88e6xxx_port_read(struct mv88e6xxx_chip *chip, int port, int reg, u16 *val) @@ -1239,6 +1241,25 @@ int mv88e6xxx_port_set_mirror(struct mv88e6xxx_chip *chip, int port, return err; } +bool mv88e6xxx_port_is_locked(struct mv88e6xxx_chip *chip, int port, bool chiplock) +{
- bool locked = false;
- u16 reg;
- if (chiplock)
mv88e6xxx_reg_lock(chip);
Please, no "if (chiplock) mutex_lock()" hacks. Just lockdep_assert_held(&chip->reg_lock), which serves both for documentation and for validation purposes, ensure the lock is always taken at the caller (which in this case is super easy) and move on.
- if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®))
goto out;
It would be good to actually propagate the error to the caller and "locked" via a pass-by-reference bool pointer argument, not just say that I/O errors mean that the port is unlocked.
- locked = reg & MV88E6XXX_PORT_CTL0_SA_FILT_DROP_ON_LOCK;
+out:
- if (chiplock)
mv88e6xxx_reg_unlock(chip);
- return locked;
+}
int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, bool locked) { @@ -1261,10 +1282,13 @@ int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, if (err) return err;
- reg &= ~MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT;
- if (locked)
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT;
- reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK;
- if (locked) {
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG |
MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT |
MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT |
MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I'd suggest aligning these macros vertically.
- } return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_ASSOC_VECTOR, reg);
} diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h index e0a705d82019..d377abd6ab17 100644 --- a/drivers/net/dsa/mv88e6xxx/port.h +++ b/drivers/net/dsa/mv88e6xxx/port.h @@ -231,6 +231,7 @@ #define MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT 0x2000 #define MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG 0x1000 #define MV88E6XXX_PORT_ASSOC_VECTOR_REFRESH_LOCKED 0x0800 +#define MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK 0x07ff /* Offset 0x0C: Port ATU Control */ #define MV88E6XXX_PORT_ATU_CTL 0x0c @@ -374,6 +375,7 @@ int mv88e6xxx_port_set_fid(struct mv88e6xxx_chip *chip, int port, u16 fid); int mv88e6xxx_port_get_pvid(struct mv88e6xxx_chip *chip, int port, u16 *pvid); int mv88e6xxx_port_set_pvid(struct mv88e6xxx_chip *chip, int port, u16 pvid); +bool mv88e6xxx_port_is_locked(struct mv88e6xxx_chip *chip, int port, bool chiplock); int mv88e6xxx_port_set_lock(struct mv88e6xxx_chip *chip, int port, bool locked); -- 2.30.2
On Mon, Jun 27, 2022 at 8:06 PM Vladimir Oltean olteanv@gmail.com wrote:
Hi Hans,
On Tue, May 24, 2022 at 05:21:43PM +0200, Hans Schultz wrote:
This implementation for the Marvell mv88e6xxx chip series, is based on handling ATU miss violations occurring when packets ingress on a port that is locked. The mac address triggering the ATU miss violation is communicated through switchdev to the bridge module, which adds a fdb entry with the fdb locked flag set. The entry is kept according to the bridges ageing time, thus simulating a dynamic entry.
Note: The locked port must have learning enabled for the ATU miss violation to occur.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
I'm sorry that I couldn't focus on the big picture of this patch, but locking is an absolute disaster and I just stopped after a while, it's really distracting :)
The code works, but I think that we should "undisaster" it. :)
Would you mind addressing the feedback below first, and I'll take another look when you send v4?
fine :)
if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
I guess you are right.
}
static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port, @@ -1685,6 +1689,9 @@ static void mv88e6xxx_port_fast_age(struct dsa_switch *ds, int port) struct mv88e6xxx_chip *chip = ds->priv; int err;
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
Dumb question: if you only flush the locked entries at fast age if the port is locked, then what happens with the existing locked entries if the port becomes unlocked before an FDB flush takes place? Shouldn't mv88e6xxx_port_set_lock() call mv88e6xxx_atu_locked_entry_flush() too?
That was my first thought too, but the way the flags are handled with the mask etc, does so that mv88e6xxx_port_set_lock() is called when other flags change. It could be done by the transition from locked->unlocked by checking if the port is locked already. On the other hand, the timers will timeout and the entries will be removed anyhow.
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_find_purge(ds, port, addr, vid);
mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid,
MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
Unrelated and unjustified change.
Ups, missed that one.
static int mv88e6xxx_port_setup(struct dsa_switch *ds, int port) {
return mv88e6xxx_setup_devlink_regions_port(ds, port);
int err;
err = mv88e6xxx_setup_devlink_regions_port(ds, port);
mv88e6xxx_init_violation_handler(ds, port);
What's with this quirky placement? You need to do error checking and call mv88e6xxx_teardown_violation_handler() if setting up the devlink port regions fails, otherwise the port will fail to probe but no one will quiesce its delayed ATU work.
Yes, of course.
By the way, do all mv88e6xxx switches support 802.1X and MAC Auth Bypass, or do we need to initialize these structures depending on some capability?
I will have to look into that, but I think they all do support these features.
err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_ATU_OP,
MV88E6XXX_G1_ATU_OP_BUSY | MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
Split on 3 lines please.
OK.
if (err)
return err;
return mv88e6xxx_g1_atu_op_wait(chip);
+}
static int mv88e6xxx_g1_atu_op(struct mv88e6xxx_chip *chip, u16 fid, u16 op) { u16 val; @@ -356,11 +370,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) int spid; int err; u16 val;
u16 fid; mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_g1_atu_op(chip, 0,
MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
err = mv88e6xxx_g1_read_atu_violation(chip);
I cannot comment on the validity of this change: previously, we were writing FID 0 as part of mv88e6xxx_g1_atu_op(), now we are reading back the FID. Definitely too much going on in a single change, this needs a separate patch with an explanation.
It is of course needed to read the fid and I couldn't really understand the reasoning behind how it was before, but I will do as you say as best I can.
if (err) goto out;
@@ -368,6 +382,10 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) if (err) goto out;
err = mv88e6xxx_g1_read(chip, MV88E6352_G1_ATU_FID, &fid);
if (err)
goto out;
Is it ok to read the MV88E6352_G1_ATU_FID register from an IRQ handler common for all switches, I wonder?
I don't know about the naming of this define (I probably overlooked the 6352 part), but it is the same as I have in the spec for 6097, and I don't see any alternative...
err = mv88e6xxx_g1_atu_data_read(chip, &entry); if (err) goto out;
@@ -382,6 +400,11 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) dev_err_ratelimited(chip->dev, "ATU age out violation for %pM\n", entry.mac);
err = mv88e6xxx_handle_violation(chip,
chip->ports[spid].port,
Dumb question: isn't chip->ports[spid].port == spid?
Probably you are right.
&entry,
fid,
MV88E6XXX_G1_ATU_OP_AGE_OUT_VIOLATION);
This fits on 3 lines instead of 5 (and same below).
OK
+static void mv88e6xxx_atu_locked_entry_timer_work(struct atu_locked_entry *ale)
Please find a more adequate name for this function.
Any suggestions?
+{
struct switchdev_notifier_fdb_info info = {
.addr = ale->mac,
.vid = ale->vid,
.added_by_user = false,
.is_local = false,
No need to have an initializer for the false members.
OK
.offloaded = true,
.locked = true,
};
struct mv88e6xxx_atu_entry entry;
struct net_device *brport;
struct dsa_port *dp;
entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED;
entry.trunk = false;
memcpy(&entry.mac, &ale->mac, ETH_ALEN);
ether_addr_copy
mv88e6xxx_reg_lock(ale->chip);
mv88e6xxx_g1_atu_loadpurge(ale->chip, ale->fid, &entry);
The portvec will be junk memory that's on stack, is that what you want?
Probably not what I want.
if (brport) {
if (!rtnl_is_locked()) {
rtnl_lock();
No, no, no, no, no, no, no.
As I've explained already: https://patchwork.kernel.org/project/netdevbpf/patch/20220317093902.1305816-... dsa_port_to_bridge_port() needs to be called with the rtnl_mutex held.
Please take a moment to figure out which function expects which lock and for what operation, then draw a call graph, figure out a consistent lock hierarchy where things are always acquired in the same order, and if a function needs a locking context but not all callers offer it, put an ASSERT_RTNL() (for example) and transfer the locking responsibility to the caller.
As I remember it was because mv88e6xxx_atu_locked_entry_flush() was called both with and without the lock, but there was something I didn't know about how link down handling works.
Doing this will also help you name your functions better than "locked entry timer work" (which are called from... drum roll... mv88e6xxx_port_fdb_del and mv88e6xxx_port_fast_age).
Which by the way, reminds me that..... You can't take rtnl_lock() from port_fdb_add() and port_fdb_del(), see commits d7d0d423dbaa ("net: dsa: flush switchdev workqueue when leaving the bridge") and 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), as you'll deadlock with dsa_port_pre_bridge_leave(). In fact you never could, but for a slightly different reason.
From the discussion with Ido and Nikolay I get the impression that you're not doing the right thing here either, notifying a SWITCHDEV_FDB_DEL_TO_BRIDGE from what is effectively the SWITCHDEV_FDB_DEL_TO_DEVICE handler (port_fdb_del).
Hmm, my experience tells me that much is opposite the normal conventions when dealing with locked ports, as there was never switchdev notifications from the driver to the bridge before, but that is needed to keep ATU and FDB entries in sync.
No inline functions in .c files.
OK
Nasty lock ordering inversion. In mv88e6xxx_handle_violation() we take &dp->locked_entries_list_lock with mv88e6xxx_reg_lock() held. Here (in mv88e6xxx_atu_locked_entry_timer_work called from here) we take mv88e6xxx_reg_lock() with &dp->locked_entries_list_lock held.
I will look into that.
switch (type) {
case MV88E6XXX_G1_ATU_OP_MISS_VIOLATION:
if (atomic_read(&dp->atu_locked_entry_cnt) >= ATU_LOCKED_ENTRIES_MAX) {
mv88e6xxx_reg_unlock(chip);
You call mv88e6xxx_reg_lock() from mv88e6xxx_g1_atu_prob_irq_thread_fn() and mv88e6xxx_reg_unlock() from mv88e6xxx_handle_violation()? Nice!
And I understand why that is: to avoid a lock ordering inversion with rtnl_lock(). Just unlock the mv88e6xxx registers after the last hardware access in mv88e6xxx_g1_atu_prob_irq_thread_fn() - after mv88e6xxx_g1_atu_mac_read(), and call mv88e6xxx_handle_violation() with the registers unlocked, and lock them when you need them.
OK.
locked_entry = kmalloc(sizeof(*locked_entry), GFP_ATOMIC);
Please be consistent in your naming of struct atu_locked_entry variables, be they "locked_entry" or "ale" or otherwise. And please create a helper function that creates such a structure and initializes it.
OK
if (!locked_entry)
return -ENOMEM;
timer_setup(&locked_entry->timer, mv88e6xxx_atu_locked_entry_timer_handler, 0);
Does this have to be a dedicated timer per entry, or can you just record the "jiffies" at creation time per locked entry, and compare it with the current jiffies from the periodic, sleepable mv88e6xxx_atu_locked_entry_cleanup?
I think that approach should be sufficient too.
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
Just a small optimization as I also have another case of the switch (only one switch case if you didn't notice) belonging to the next patch set regarding dynamic ATU entries.
+void mv88e6xxx_atu_locked_entry_flush(struct dsa_switch *ds, int port) +{
struct dsa_port *dp = dsa_to_port(ds, port);
struct atu_locked_entry *ale, *tmp;
mutex_lock(&dp->locked_entries_list_lock);
list_for_each_entry_safe(ale, tmp, &dp->atu_locked_entries_list, list) {
mv88e6xxx_atu_locked_entry_purge(ale);
atomic_dec(&dp->atu_locked_entry_cnt);
}
mutex_unlock(&dp->locked_entries_list_lock);
if (atomic_read(&dp->atu_locked_entry_cnt) != 0)
dev_err(ds->dev,
"ERROR: Locked entries count is not zero after flush on port %d\n",
port);
And generally speaking, why would you expect it to be 0, since there's nothing that stops this check from racing with mv88e6xxx_handle_violation?
I guess you are right that when setting the port STP state to BLOCKED, there is the potential race you mention.
Also, random fact: no need to say ERROR when printing with the KERN_ERR log level. It's kind of implied from the log level.
Of course.
dp->atu_locked_entry_cnt.counter = 0;
atomic_set()
Right!
This and mv88e6xxx_switchdev.c are the only source files belonging to this driver which have the mv88e6xxx_ prefix (others are "chip.c" etc). Can you please follow the convention?
Yes. I think I got that idea from some other driver, thus avoiding switchdev.h file, but I will change it.
+struct atu_locked_entry {
mv88e6xxx driver specific structure names should be prefixed with mv88e6xxx_.
OK
u8 mac[ETH_ALEN];
Either align everything with tabs, or nothing.
Ups.
+int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip,
int port,
struct mv88e6xxx_atu_entry *entry,
u16 fid,
u16 type);
Both this and the function definition can easily fit on 3 lines.
OK
Please, no "if (chiplock) mutex_lock()" hacks. Just lockdep_assert_held(&chip->reg_lock), which serves both for documentation and for validation purposes, ensure the lock is always taken at the caller (which in this case is super easy) and move on.
As I am calling the function in if statement checks, it would make that code more messy, while with this approach the function can be called from anywhere. I also looked at having two functions, with one being a wrapper function taking the lock and calling the other...
if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®))
goto out;
It would be good to actually propagate the error to the caller and "locked" via a pass-by-reference bool pointer argument, not just say that I/O errors mean that the port is unlocked.
Again the wish to be able to call it from if statement checks,.
reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK;
if (locked) {
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG |
MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT |
MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT |
MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I'd suggest aligning these macros vertically.
They are according to the Linux kernel coding standard wrt indentation afaik.
Hi, does anybody know what it going on with this variable? struct dsa_port *dp ->ageing_time;
I experience that it changes value like a factor ~10 at times.
On Tue, Jul 05, 2022 at 05:05:52PM +0200, Hans S wrote:
Hi, does anybody know what it going on with this variable? struct dsa_port *dp ->ageing_time;
I experience that it changes value like a factor ~10 at times.
Could you be a bit more specific? Are you talking about STP Topology Change Notification BPDUs, which trigger this code path?
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 7d27b2e6038f..9b25bc2dcb3e 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -671,10 +671,10 @@ void __br_set_topology_change(struct net_bridge *br, unsigned char val)
if (val) { t = 2 * br->forward_delay; - br_debug(br, "decreasing ageing time to %lu\n", t); + br_info(br, "decreasing ageing time to %lu\n", t); } else { t = br->bridge_ageing_time; - br_debug(br, "restoring ageing time to %lu\n", t); + br_info(br, "restoring ageing time to %lu\n", t); }
err = __set_ageing_time(br->dev, t);
Coincidentally the default values of 2 * br->forward_delay and br->bridge_ageing_time are 1 order of magnitude apart from each other.
[ 139.998310] br0: topology change detected, propagating [ 140.003490] br0: decreasing ageing time to 3000 [ 175.193054] br0: restoring ageing time to 30000
What's the problem anyway?
On Wed, Jul 6, 2022 at 3:28 PM Vladimir Oltean olteanv@gmail.com wrote:
On Tue, Jul 05, 2022 at 05:05:52PM +0200, Hans S wrote:
Hi, does anybody know what it going on with this variable? struct dsa_port *dp ->ageing_time;
I experience that it changes value like a factor ~10 at times.
Could you be a bit more specific? Are you talking about STP Topology Change Notification BPDUs, which trigger this code path?
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 7d27b2e6038f..9b25bc2dcb3e 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -671,10 +671,10 @@ void __br_set_topology_change(struct net_bridge *br, unsigned char val)
if (val) { t = 2 * br->forward_delay;
br_debug(br, "decreasing ageing time to %lu\n", t);
br_info(br, "decreasing ageing time to %lu\n", t); } else { t = br->bridge_ageing_time;
br_debug(br, "restoring ageing time to %lu\n", t);
br_info(br, "restoring ageing time to %lu\n", t); } err = __set_ageing_time(br->dev, t);
Coincidentally the default values of 2 * br->forward_delay and br->bridge_ageing_time are 1 order of magnitude apart from each other.
[ 139.998310] br0: topology change detected, propagating [ 140.003490] br0: decreasing ageing time to 3000 [ 175.193054] br0: restoring ageing time to 30000
What's the problem anyway?
It might be a topology change as you indicate, though I am not sure. So I am not using that variable any more for determining the ageing time for the locked FDB entries, but instead I have made a function to read the time from the chip instead.
The problem with that, I have mentioned in my latest reply to the mac-auth patch set...
On Tue, Jun 28, 2022 at 02:26:43PM +0200, Hans S wrote:
Dumb question: if you only flush the locked entries at fast age if the port is locked, then what happens with the existing locked entries if the port becomes unlocked before an FDB flush takes place? Shouldn't mv88e6xxx_port_set_lock() call mv88e6xxx_atu_locked_entry_flush() too?
That was my first thought too, but the way the flags are handled with the mask etc, does so that mv88e6xxx_port_set_lock() is called when other flags change. It could be done by the transition from locked->unlocked by checking if the port is locked already.
Why does mv88e6xxx_port_set_lock() get called when other flags change?
On the other hand, the timers will timeout and the entries will be removed anyhow.
+static void mv88e6xxx_atu_locked_entry_timer_work(struct atu_locked_entry *ale)
Please find a more adequate name for this function.
Any suggestions?
Not sure. It depends on whether you leave just the logic to delete a locked ATU entry, or also the switchdev FDB_DEL_TO_BRIDGE notifier. In any case, pick a name that reflects what it does. Something with locked_entry_delete() can't be too wrong.
From the discussion with Ido and Nikolay I get the impression that you're not doing the right thing here either, notifying a SWITCHDEV_FDB_DEL_TO_BRIDGE from what is effectively the SWITCHDEV_FDB_DEL_TO_DEVICE handler (port_fdb_del).
Hmm, my experience tells me that much is opposite the normal conventions when dealing with locked ports, as there was never switchdev notifications from the driver to the bridge before, but that is needed to keep ATU and FDB entries in sync.
On delete you mean? So the bridge signals switchdev a deletion of a locked FDB entry (as I pointed out, this function gets indirectly called from port_fdb_del), but it won't get deleted until switchdev signals it back, is what you're saying?
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
Just a small optimization as I also have another case of the switch (only one switch case if you didn't notice) belonging to the next patch set regarding dynamic ATU entries.
What kind of optimization are you even talking about? Please get rid of coding patterns like this, sorry.
Please, no "if (chiplock) mutex_lock()" hacks. Just lockdep_assert_held(&chip->reg_lock), which serves both for documentation and for validation purposes, ensure the lock is always taken at the caller (which in this case is super easy) and move on.
As I am calling the function in if statement checks, it would make that code more messy, while with this approach the function can be called from anywhere. I also looked at having two functions, with one being a wrapper function taking the lock and calling the other...
There are many functions in mv88e6xxx that require the reg_lock to be held, there's nothing new or special here.
if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®))
goto out;
It would be good to actually propagate the error to the caller and "locked" via a pass-by-reference bool pointer argument, not just say that I/O errors mean that the port is unlocked.
Again the wish to be able to call it from if statement checks,.
reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK;
if (locked) {
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG |
MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT |
MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT |
MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I'd suggest aligning these macros vertically.
They are according to the Linux kernel coding standard wrt indentation afaik.
Compare:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
with:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
On Wed, Jul 6, 2022 at 10:56 AM Vladimir Oltean olteanv@gmail.com wrote:
On Tue, Jun 28, 2022 at 02:26:43PM +0200, Hans S wrote:
Dumb question: if you only flush the locked entries at fast age if the port is locked, then what happens with the existing locked entries if the port becomes unlocked before an FDB flush takes place? Shouldn't mv88e6xxx_port_set_lock() call mv88e6xxx_atu_locked_entry_flush() too?
BTW:
@@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
I removed this code, but then on link down the locked entries were not cleared out. Something not as thought?
That was my first thought too, but the way the flags are handled with the mask etc, does so that mv88e6xxx_port_set_lock() is called when other flags change. It could be done by the transition from locked->unlocked by checking if the port is locked already.
Why does mv88e6xxx_port_set_lock() get called when other flags change?
That is what seems to happen, but maybe I am wrong. Looking at the dsa functions dsa_port_inherit_brport_flags() and dsa_port_clear_brport_flags(), they set the mask for which underlying function is called, and it looks to me that they call once for all the flags: BR_LEARNING, BR_FLOOD, BR_MCAST_FLOOD, BR_BCAST_FLOOD, BR_PORT_LOCKED?
On the other hand, the timers will timeout and the entries will be removed anyhow.
+static void mv88e6xxx_atu_locked_entry_timer_work(struct atu_locked_entry *ale)
Please find a more adequate name for this function.
Any suggestions?
Not sure. It depends on whether you leave just the logic to delete a locked ATU entry, or also the switchdev FDB_DEL_TO_BRIDGE notifier. In any case, pick a name that reflects what it does. Something with locked_entry_delete() can't be too wrong.
From the discussion with Ido and Nikolay I get the impression that you're not doing the right thing here either, notifying a SWITCHDEV_FDB_DEL_TO_BRIDGE from what is effectively the SWITCHDEV_FDB_DEL_TO_DEVICE handler (port_fdb_del).
Hmm, my experience tells me that much is opposite the normal conventions when dealing with locked ports, as there was never switchdev notifications from the driver to the bridge before, but that is needed to keep ATU and FDB entries in sync.
On delete you mean? So the bridge signals switchdev a deletion of a locked FDB entry (as I pointed out, this function gets indirectly called from port_fdb_del), but it won't get deleted until switchdev signals it back, is what you're saying?
When added they are added with bridge FDB flags: extern_learn offload locked, with a SWITCHDEV_FDB_ADD_TO_BRIDGE event. So they are owned by the driver. When the driver deletes the locked entry the bridge FDB entry is removes by the SWITCHDEV_FDB_DEL_TO_BRIDGE event from the driver. That seems quite fair?
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
Just a small optimization as I also have another case of the switch (only one switch case if you didn't notice) belonging to the next patch set regarding dynamic ATU entries.
What kind of optimization are you even talking about? Please get rid of coding patterns like this, sorry.
Right!
Please, no "if (chiplock) mutex_lock()" hacks. Just lockdep_assert_held(&chip->reg_lock), which serves both for documentation and for validation purposes, ensure the lock is always taken at the caller (which in this case is super easy) and move on.
As I am calling the function in if statement checks, it would make that code more messy, while with this approach the function can be called from anywhere. I also looked at having two functions, with one being a wrapper function taking the lock and calling the other...
There are many functions in mv88e6xxx that require the reg_lock to be held, there's nothing new or special here.
Now I take the lock in the function regardless. No boolean.
if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®))
goto out;
It would be good to actually propagate the error to the caller and "locked" via a pass-by-reference bool pointer argument, not just say that I/O errors mean that the port is unlocked.
Again the wish to be able to call it from if statement checks,.
reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK;
if (locked) {
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG |
MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT |
MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT |
MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I'd suggest aligning these macros vertically.
They are according to the Linux kernel coding standard wrt indentation afaik.
Compare:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
with:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I cannot see the difference here...?
Another issue...
I have removed the timers as they are superfluous and now just use the worker and jiffies. But I have found that the whole ageing time seems to be broken on the 5.17 kernel I am running. I don't know if it has been fixed, but the ageing timeout is supposed to be given in seconds. Here is the output from various functions after the command "ip link set dev br0 type bridge ageing_time 1500" (that is nominally 1500 seconds according to man page!):
dsa_switch_ageing_time: ageing_time 10000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 10000 br0: failed (err=-34) to set attribute (id=6) dsa_switch_ageing_time: ageing_time 15000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 15000
The 15000 set corresponds to 150 seconds! (I hardcoded the dsa ageing_time_min to 1000)
@@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
I removed this code, but then on link down the locked entries were not cleared out. Something not as thought?
I don't see a fast ageing happening on link down. There is the two cases: 1. Soft link down
With iproute2 command the link is brought down and mv88e6xxx_mac_link_down() is called with rtnl lock taken.
2. Hard link down
I remove the cable from the port and mv88e6xxx_mac_link_down() is called without rtnl lock.
As the hard link down case calls without rtnl lock, either I trigger the case you have mentioned or I have to use rtnl_is_locked() somewhere along the line?
On Wed, Jul 06, 2022 at 12:12:01PM +0200, Hans S wrote:
On Wed, Jul 6, 2022 at 10:56 AM Vladimir Oltean olteanv@gmail.com wrote:
On Tue, Jun 28, 2022 at 02:26:43PM +0200, Hans S wrote:
Dumb question: if you only flush the locked entries at fast age if the port is locked, then what happens with the existing locked entries if the port becomes unlocked before an FDB flush takes place? Shouldn't mv88e6xxx_port_set_lock() call mv88e6xxx_atu_locked_entry_flush() too?
BTW:
@@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
I removed this code, but then on link down the locked entries were not cleared out. Something not as thought?
What was the port's STP state before the link down event, and did it change after the link down?
The relevant code in DSA is:
int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age) { struct dsa_switch *ds = dp->ds; int port = dp->index;
if (!ds->ops->port_stp_state_set) return -EOPNOTSUPP;
ds->ops->port_stp_state_set(ds, port, state);
if (!dsa_port_can_configure_learning(dp) || (do_fast_age && dp->learning)) { /* Fast age FDB entries or flush appropriate forwarding database * for the given port, if we are moving it from Learning or * Forwarding state, to Disabled or Blocking or Listening state. * Ports that were standalone before the STP state change don't * need to fast age the FDB, since address learning is off in * standalone mode. */
if ((dp->stp_state == BR_STATE_LEARNING || dp->stp_state == BR_STATE_FORWARDING) && (state == BR_STATE_DISABLED || state == BR_STATE_BLOCKING || state == BR_STATE_LISTENING)) dsa_port_fast_age(dp); }
dp->stp_state = state;
return 0; }
If the STP state wasn't LEARNING or FORWARDING, there weren't supposed to be dynamic FDB entries on the port in the first place, so DSA says there's nothing to flush, and doesn't call dsa_port_fast_age(). Are there dynamic FDB entries being installed on a port that isn't in a state that's supposed to learn? I guess the answer is yes. Is that what you want, or should the locked entries be recorded only in the LEARNING or FORWARDING states, otherwise discarded?
That was my first thought too, but the way the flags are handled with the mask etc, does so that mv88e6xxx_port_set_lock() is called when other flags change. It could be done by the transition from locked->unlocked by checking if the port is locked already.
Why does mv88e6xxx_port_set_lock() get called when other flags change?
That is what seems to happen, but maybe I am wrong. Looking at the dsa functions dsa_port_inherit_brport_flags() and dsa_port_clear_brport_flags(), they set the mask for which underlying function is called, and it looks to me that they call once for all the flags: BR_LEARNING, BR_FLOOD, BR_MCAST_FLOOD, BR_BCAST_FLOOD, BR_PORT_LOCKED?
What you actually want to say is: "mv88e6xxx_port_set_lock() is also called when the DSA port joins a bridge, due to the switchdev attribute replay logic present in dsa_port_switchdev_sync_attrs()".
Which, by the way, is logic that you've added yourself, in commit b9e8b58fd2cb ("net: dsa: Include BR_PORT_LOCKED in the list of synced brport flags") ;)
You are free to return early from mv88e6xxx_port_set_lock() if nothing has changed. The DSA layer doesn't keep track of the locked state of the port so it cannot deduce whether propagating to the switch driver is necessary or not.
From the discussion with Ido and Nikolay I get the impression that you're not doing the right thing here either, notifying a SWITCHDEV_FDB_DEL_TO_BRIDGE from what is effectively the SWITCHDEV_FDB_DEL_TO_DEVICE handler (port_fdb_del).
Hmm, my experience tells me that much is opposite the normal conventions when dealing with locked ports, as there was never switchdev notifications from the driver to the bridge before, but that is needed to keep ATU and FDB entries in sync.
On delete you mean? So the bridge signals switchdev a deletion of a locked FDB entry (as I pointed out, this function gets indirectly called from port_fdb_del), but it won't get deleted until switchdev signals it back, is what you're saying?
When added they are added with bridge FDB flags: extern_learn offload locked, with a SWITCHDEV_FDB_ADD_TO_BRIDGE event. So they are owned by the driver. When the driver deletes the locked entry the bridge FDB entry is removes by the SWITCHDEV_FDB_DEL_TO_BRIDGE event from the driver. That seems quite fair?
I'm just pointing out that you left other (probably unintended) code paths for which the SWITCHDEV_FDB_DEL_TO_BRIDGE notifier is quite useless. I haven't yet looked at your newest revision to see what changed there.
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
Just a small optimization as I also have another case of the switch (only one switch case if you didn't notice) belonging to the next patch set regarding dynamic ATU entries.
What kind of optimization are you even talking about? Please get rid of coding patterns like this, sorry.
Right!
Right what? I'm genuinely curious what optimization are you talking about.
if (mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, ®))
goto out;
It would be good to actually propagate the error to the caller and "locked" via a pass-by-reference bool pointer argument, not just say that I/O errors mean that the port is unlocked.
Again the wish to be able to call it from if statement checks,.
reg &= MV88E6XXX_PORT_ASSOC_VECTOR_PAV_MASK;
if (locked) {
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG |
MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT |
MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT |
MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I'd suggest aligning these macros vertically.
They are according to the Linux kernel coding standard wrt indentation afaik.
Compare:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
with:
reg |= MV88E6XXX_PORT_ASSOC_VECTOR_IGNORE_WRONG | MV88E6XXX_PORT_ASSOC_VECTOR_LOCKED_PORT | MV88E6XXX_PORT_ASSOC_VECTOR_INT_AGE_OUT | MV88E6XXX_PORT_ASSOC_VECTOR_HOLD_AT_1;
I cannot see the difference here...?
Just out of curiosity, are you even trying, are you looking at the difference using a monospace font?
Another issue...
I have removed the timers as they are superfluous and now just use the worker and jiffies. But I have found that the whole ageing time seems to be broken on the 5.17 kernel I am running. I don't know if it has been fixed, but the ageing timeout is supposed to be given in seconds. Here is the output from various functions after the command "ip link set dev br0 type bridge ageing_time 1500" (that is nominally 1500 seconds according to man page!):
dsa_switch_ageing_time: ageing_time 10000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 10000 br0: failed (err=-34) to set attribute (id=6) dsa_switch_ageing_time: ageing_time 15000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 15000
The 15000 set corresponds to 150 seconds! (I hardcoded the dsa ageing_time_min to 1000)
Are you talking about this known problem, that the ageing time values in seconds need to be scaled up by a factor of USER_HZ when passed to the kernel? https://www.spinics.net/lists/netdev/msg672070.html https://www.spinics.net/lists/netdev/msg567332.html
On Wed, Jul 6, 2022 at 4:33 PM Vladimir Oltean olteanv@gmail.com wrote:
@@ -919,6 +920,9 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, if (err) dev_err(chip->dev, "p%d: failed to force MAC link down\n", port);
else
if (mv88e6xxx_port_is_locked(chip, port, true))
mv88e6xxx_atu_locked_entry_flush(ds, port);
This is superfluous, is it not? The bridge will transition a port whose link goes down to BR_STATE_DISABLED, which will make dsa_port_set_state() fast-age the dynamic FDB entries on the port, which you've already handled below.
I removed this code, but then on link down the locked entries were not cleared out. Something not as thought?
What was the port's STP state before the link down event, and did it change after the link down?
The stp state is FORWARDING.
If the STP state wasn't LEARNING or FORWARDING, there weren't supposed to be dynamic FDB entries on the port in the first place, so DSA says there's nothing to flush, and doesn't call dsa_port_fast_age(). Are there dynamic FDB entries being installed on a port that isn't in a state that's supposed to learn? I guess the answer is yes. Is that what you want, or should the locked entries be recorded only in the LEARNING or FORWARDING states, otherwise discarded?
Learning is off as has been discussed, and I do want the locked entries to be dynamic in the sense that the driver removes them after the system ageing time has passed.
What you actually want to say is: "mv88e6xxx_port_set_lock() is also called when the DSA port joins a bridge, due to the switchdev attribute replay logic present in dsa_port_switchdev_sync_attrs()".
Which, by the way, is logic that you've added yourself, in commit b9e8b58fd2cb ("net: dsa: Include BR_PORT_LOCKED in the list of synced brport flags") ;)
You are free to return early from mv88e6xxx_port_set_lock() if nothing has changed. The DSA layer doesn't keep track of the locked state of the port so it cannot deduce whether propagating to the switch driver is necessary or not.
I think I can safely call mv88e6xxx_atu_locked_entry_flush() from mv88e6xxx_port_set_lock() when locked is off as the port setup for the respective port must have been completed successfully.
When added they are added with bridge FDB flags: extern_learn offload locked, with a SWITCHDEV_FDB_ADD_TO_BRIDGE event. So they are owned by the driver. When the driver deletes the locked entry the bridge FDB entry is removes by the SWITCHDEV_FDB_DEL_TO_BRIDGE event from the driver. That seems quite fair?
I'm just pointing out that you left other (probably unintended) code paths for which the SWITCHDEV_FDB_DEL_TO_BRIDGE notifier is quite useless. I haven't yet looked at your newest revision to see what changed there.
I guess I should add a boolean to tell if mv88e6xxx_atu_locked_entry_purge() should send a notification or not. So that port_fdb_del() will not cause a SWITCHDEV_FDB_DEL_TO_BRIDGE event.
Why is the rtnl_unlock() outside the switch statement but the rtnl_lock() inside? Not to mention, the dsa_port_to_bridge_port() call needs to be under rtnl_lock().
Just a small optimization as I also have another case of the switch (only one switch case if you didn't notice) belonging to the next patch set regarding dynamic ATU entries.
What kind of optimization are you even talking about? Please get rid of coding patterns like this, sorry.
Right!
Right what? I'm genuinely curious what optimization are you talking about.
I am just confirming that what you wrote is correct, e.g. the "Right!". So I have fixed that. :-)
Just out of curiosity, are you even trying, are you looking at the difference using a monospace font?
Another issue...
I have removed the timers as they are superfluous and now just use the worker and jiffies. But I have found that the whole ageing time seems to be broken on the 5.17 kernel I am running. I don't know if it has been fixed, but the ageing timeout is supposed to be given in seconds. Here is the output from various functions after the command "ip link set dev br0 type bridge ageing_time 1500" (that is nominally 1500 seconds according to man page!):
dsa_switch_ageing_time: ageing_time 10000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 10000 br0: failed (err=-34) to set attribute (id=6) dsa_switch_ageing_time: ageing_time 15000, ageing_time_min 1000, ageing_time_max 3825000 mv88e6xxx_set_ageing_time: set ageing time to 15000
The 15000 set corresponds to 150 seconds! (I hardcoded the dsa ageing_time_min to 1000)
Are you talking about this known problem, that the ageing time values in seconds need to be scaled up by a factor of USER_HZ when passed to the kernel? https://www.spinics.net/lists/netdev/msg672070.html https://www.spinics.net/lists/netdev/msg567332.html
It might be so, but there is another factor 10 which might be regarding topology change as I understand. If I want a ageing timeout of say 15 or 30 seconds, that hardly seems possible?
Hi Vladimir,
BTW, I have sent the patch to read the FID as you requested. You should have received it yesterday (6th July) at around 12:25 UTC.
Hans
Verify that the MAC-Auth mechanism works by adding a FDB entry with the locked flag set. denying access until the FDB entry is replaced with a FDB entry without the locked flag set.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com --- .../net/forwarding/bridge_locked_port.sh | 42 ++++++++++++++++--- 1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh index 5b02b6b60ce7..50b9048d044a 100755 --- a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh +++ b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0
-ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan" +ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan locked_port_mab" NUM_NETIFS=4 CHECK_TC="no" source lib.sh @@ -94,13 +94,13 @@ locked_port_ipv4() ping_do $h1 192.0.2.2 check_fail $? "Ping worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static + bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after locking port and adding FDB entry"
bridge link set dev $swp1 locked off - bridge fdb del `mac_get $h1` dev $swp1 master static + bridge fdb del `mac_get $h1` dev $swp1 master
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after unlocking port and removing FDB entry." @@ -124,13 +124,13 @@ locked_port_vlan() ping_do $h1.100 198.51.100.2 check_fail $? "Ping through vlan worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 vlan 100 master static + bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after locking port and adding FDB entry"
bridge link set dev $swp1 locked off - bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master static + bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after unlocking port and removing FDB entry" @@ -153,7 +153,8 @@ locked_port_ipv6() ping6_do $h1 2001:db8:1::2 check_fail $? "Ping6 worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static + bridge fdb replace `mac_get $h1` dev $swp1 master static + ping6_do $h1 2001:db8:1::2 check_err $? "Ping6 did not work after locking port and adding FDB entry"
@@ -166,6 +167,35 @@ locked_port_ipv6() log_test "Locked port ipv6" }
+locked_port_mab() +{ + RET=0 + check_locked_port_support || return 0 + + ping_do $h1 192.0.2.2 + check_err $? "MAB: Ping did not work before locking port" + + bridge link set dev $swp1 locked on + bridge link set dev $swp1 learning on + + bridge fdb del `mac_get $h1` dev $swp1 master + + ping_do $h1 192.0.2.2 + check_fail $? "MAB: Ping worked on locked port without FDB entry" + + bridge fdb show | grep `mac_get $h1` | grep -q "locked" + check_err $? "MAB: No locked fdb entry after ping on locked port" + + bridge fdb replace `mac_get $h1` dev $swp1 master static + + ping_do $h1 192.0.2.2 + check_err $? "MAB: Ping did not work with fdb entry without locked flag" + + bridge fdb del `mac_get $h1` dev $swp1 master + bridge link set dev $swp1 locked off + + log_test "Locked port MAB" +} trap cleanup EXIT
setup_prepare
On Tue, May 24, 2022 at 05:21:44PM +0200, Hans Schultz wrote:
Verify that the MAC-Auth mechanism works by adding a FDB entry with the locked flag set. denying access until the FDB entry is replaced with a FDB entry without the locked flag set.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
.../net/forwarding/bridge_locked_port.sh | 42 ++++++++++++++++--- 1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh index 5b02b6b60ce7..50b9048d044a 100755 --- a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh +++ b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan" +ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan locked_port_mab" NUM_NETIFS=4 CHECK_TC="no" source lib.sh @@ -94,13 +94,13 @@ locked_port_ipv4() ping_do $h1 192.0.2.2 check_fail $? "Ping worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after locking port and adding FDB entry" bridge link set dev $swp1 locked off
- bridge fdb del `mac_get $h1` dev $swp1 master static
- bridge fdb del `mac_get $h1` dev $swp1 master
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after unlocking port and removing FDB entry." @@ -124,13 +124,13 @@ locked_port_vlan() ping_do $h1.100 198.51.100.2 check_fail $? "Ping through vlan worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 vlan 100 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after locking port and adding FDB entry" bridge link set dev $swp1 locked off
- bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master static
- bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after unlocking port and removing FDB entry" @@ -153,7 +153,8 @@ locked_port_ipv6() ping6_do $h1 2001:db8:1::2 check_fail $? "Ping6 worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
- ping6_do $h1 2001:db8:1::2 check_err $? "Ping6 did not work after locking port and adding FDB entry"
@@ -166,6 +167,35 @@ locked_port_ipv6() log_test "Locked port ipv6" }
Why did you change s/add/replace/? Also, from the subject and commit message I understand the patch is about adding a new test, not changing existing ones.
+locked_port_mab() +{
- RET=0
- check_locked_port_support || return 0
- ping_do $h1 192.0.2.2
- check_err $? "MAB: Ping did not work before locking port"
- bridge link set dev $swp1 locked on
- bridge link set dev $swp1 learning on
- bridge fdb del `mac_get $h1` dev $swp1 master
Why the delete is needed? Aren't you getting errors on trying to delete a non-existing entry? In previous test cases learning is disabled and it seems the FDB entry is cleaned up.
- ping_do $h1 192.0.2.2
- check_fail $? "MAB: Ping worked on locked port without FDB entry"
- bridge fdb show | grep `mac_get $h1` | grep -q "locked"
- check_err $? "MAB: No locked fdb entry after ping on locked port"
- bridge fdb replace `mac_get $h1` dev $swp1 master static
- ping_do $h1 192.0.2.2
- check_err $? "MAB: Ping did not work with fdb entry without locked flag"
- bridge fdb del `mac_get $h1` dev $swp1 master
bridge link set dev $swp1 learning off
- bridge link set dev $swp1 locked off
- log_test "Locked port MAB"
+} trap cleanup EXIT setup_prepare -- 2.30.2
On tor, maj 26, 2022 at 17:27, Ido Schimmel idosch@idosch.org wrote:
On Tue, May 24, 2022 at 05:21:44PM +0200, Hans Schultz wrote:
Verify that the MAC-Auth mechanism works by adding a FDB entry with the locked flag set. denying access until the FDB entry is replaced with a FDB entry without the locked flag set.
Signed-off-by: Hans Schultz schultz.hans+netdev@gmail.com
.../net/forwarding/bridge_locked_port.sh | 42 ++++++++++++++++--- 1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh index 5b02b6b60ce7..50b9048d044a 100755 --- a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh +++ b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan" +ALL_TESTS="locked_port_ipv4 locked_port_ipv6 locked_port_vlan locked_port_mab" NUM_NETIFS=4 CHECK_TC="no" source lib.sh @@ -94,13 +94,13 @@ locked_port_ipv4() ping_do $h1 192.0.2.2 check_fail $? "Ping worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after locking port and adding FDB entry" bridge link set dev $swp1 locked off
- bridge fdb del `mac_get $h1` dev $swp1 master static
- bridge fdb del `mac_get $h1` dev $swp1 master
ping_do $h1 192.0.2.2 check_err $? "Ping did not work after unlocking port and removing FDB entry." @@ -124,13 +124,13 @@ locked_port_vlan() ping_do $h1.100 198.51.100.2 check_fail $? "Ping through vlan worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 vlan 100 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after locking port and adding FDB entry" bridge link set dev $swp1 locked off
- bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master static
- bridge fdb del `mac_get $h1` dev $swp1 vlan 100 master
ping_do $h1.100 198.51.100.2 check_err $? "Ping through vlan did not work after unlocking port and removing FDB entry" @@ -153,7 +153,8 @@ locked_port_ipv6() ping6_do $h1 2001:db8:1::2 check_fail $? "Ping6 worked after locking port, but before adding FDB entry"
- bridge fdb add `mac_get $h1` dev $swp1 master static
- bridge fdb replace `mac_get $h1` dev $swp1 master static
- ping6_do $h1 2001:db8:1::2 check_err $? "Ping6 did not work after locking port and adding FDB entry"
@@ -166,6 +167,35 @@ locked_port_ipv6() log_test "Locked port ipv6" }
Why did you change s/add/replace/? Also, from the subject and commit message I understand the patch is about adding a new test, not changing existing ones.
Sorry, I might have lost a bit track of the kernel selftests, as for internal reasons there has been a pause in the work. I will remove the changes to the previous tests, and I hope it will be fine.
+locked_port_mab() +{
- RET=0
- check_locked_port_support || return 0
- ping_do $h1 192.0.2.2
- check_err $? "MAB: Ping did not work before locking port"
- bridge link set dev $swp1 locked on
- bridge link set dev $swp1 learning on
- bridge fdb del `mac_get $h1` dev $swp1 master
Why the delete is needed? Aren't you getting errors on trying to delete a non-existing entry? In previous test cases learning is disabled and it seems the FDB entry is cleaned up.
I guess you are right.
- ping_do $h1 192.0.2.2
- check_fail $? "MAB: Ping worked on locked port without FDB entry"
- bridge fdb show | grep `mac_get $h1` | grep -q "locked"
- check_err $? "MAB: No locked fdb entry after ping on locked port"
- bridge fdb replace `mac_get $h1` dev $swp1 master static
- ping_do $h1 192.0.2.2
- check_err $? "MAB: Ping did not work with fdb entry without locked flag"
- bridge fdb del `mac_get $h1` dev $swp1 master
bridge link set dev $swp1 learning off
noted.
- bridge link set dev $swp1 locked off
- log_test "Locked port MAB"
+} trap cleanup EXIT setup_prepare -- 2.30.2
linux-kselftest-mirror@lists.linaro.org