Recently, I have some network issue across the whole school. All the computer will get offline and then fix themselves. We monitored the issue with multiping and found that all core network devices are affected by it, see below:
- Network size: 1800 students, 200 faculties.
- All network (LAN connection or internet) stopped functioning for about 30 seconds every 4 hours in the beginning(Spanning tree issue) then mostly in the morning when people start to work, then recover itself.
Find the ownership
Due to the school network was a flat design, all the devices are in one subnet, I suspect it was a traffic storm ( Multicast, broadcast), or Spanning tree issue ( recalculation the root switch freezes the whole network)
We setup multiping to ping switches management interface across the school, to find out it’s a real global issue or partial issue.Make sure the time on all network devices are accurate by double check the STP setting. Forward syslog to syslog server for find the correlated event when the issue happens.
In the first stage, we found that the spanning tree topology changes every 4 hours, the fix is documented in the case study section here: https://frankfu.click/cisco/ccna/spaning-tree-protocol.html/2/
After we changed the root switch priority. This issue was gone for a few days, as the new semester started, the issue is back, and not at a regular 4 hours interval anymore. After monitor the issue time and STP recalculation time, they are not related anymore.
Monitored the network traffic with wireshark, and analysed the traffic statistics, found that the multicast and broadcast traffic forms almost 50% of the total traffic, and most of them are MDNS targeting port 5353. This may caused by the students mobile devices and smart TV, which generate big amount of multicast traffic for service discovery.
The basic idea here is to reduce the broadcast/multicast domain, a desired solution will be put different type of devices on different VLANs, or subnet them and do routing. An alternative way is to use ACL to isolate the traffic in different area, plus people don’t need to discover a device in other buildings in most of cases.
Extended IP access list DENY_MDNS_IPv4 10 deny udp any any eq 5353 100 permit ip any any
IPv6 access list DENY_MDNS_IPv6 deny udp any any eq 5353 sequence 10 permit ipv6 any any sequence 100
interface GigabitEthernet1/0/1 ip access-group DENY_MDNS_IPv4 in ipv6 traffic-filter DENY_MDNS_IPv6 in
Monitor the school network for another few days, issue is resolved.