Recently, I have some network issue across the whole school. All the computer will get offline and then fix themselves. We monitored the issue with multiping and found that all core network devices are affected by it, see below:

 

 

SYMPTOM
  • Network size: 1800 students, 200 faculties.
  • All network (LAN connection or internet) stopped functioning for about 30 seconds every 4 hours in the beginning(Spanning tree issue) then mostly in the morning when people start to work, then recover itself.
Find the ownership

Due to the school network was a flat design, all the devices are in one subnet, I suspect it was a traffic storm ( Multicast, broadcast), or Spanning tree issue ( recalculation the root switch freezes the whole network)

We setup multiping to ping switches management interface across the school, to find out it’s a real global issue or partial issue.Make sure the time on all network devices are accurate by double check the STP setting. Forward syslog to syslog server for find the correlated event when the issue happens.

 

Fix

In the first stage, we found that the spanning tree topology changes every 4 hours, the fix is documented in the case study section here: https://frankfu.click/cisco/ccna/spaning-tree-protocol.html/2/

After we changed the root switch priority. This issue was gone for a few days, as the new semester started, the issue is back, and not at a regular 4 hours interval anymore. After monitor the issue time and STP recalculation time, they are not related anymore.

Monitored the network traffic with wireshark, and analysed the traffic statistics, found that the multicast and broadcast traffic forms almost 50% of the total traffic, and most of them are MDNS targeting port 5353. This may caused by the students mobile devices and smart TV, which generate big amount of multicast traffic for service discovery.

Solution:

The basic idea here is to reduce the broadcast/multicast domain, a desired solution will be put different type of devices on different VLANs, or subnet them and do routing. An alternative way is to use ACL to isolate the traffic in different area, plus people don’t need to discover a device in other buildings in most of cases.

Create ACL:

 

Extended IP access list DENY_MDNS_IPv4
 10 deny udp any any eq 5353
 100 permit ip any any

 

IPv6 access list DENY_MDNS_IPv6
 deny udp any any eq 5353 sequence 10
 permit ipv6 any any sequence 100

Apply it:

 

interface GigabitEthernet1/0/1
 ip access-group DENY_MDNS_IPv4 in
 ipv6 traffic-filter DENY_MDNS_IPv6 in

Monitor the school network for another few days, issue is resolved.