Steve's Blog: multicast

Showing posts with label multicast. Show all posts

Monday, 9 June 2014

IGMP Query Solicitation

Once again, I've come back to IGMP/Multicast/STP. This is a topic that, while reasonably well documented, can be complex to find explanations of.

In this case, I saw a network where a switch was running many (lots, like 300+) VLANs and we were regularly seeing traffic from the same MAC in all of those VLANs. This behaviour was intruiging, so I did some digging.

The switch in question was an Allied Telesyn x610 switch. Looking at the packets in Wireshark, it turns out they all originate from a switch and they are IGMP Query Solicitation. So...

What is IGMP Query Solicitation?

So you remember back in one of my earliest posts, where we had IGMP and Spanning-Tree interacting in such a way that we had floods of traffic on the network? This is very much related. In that scenario we were seeing that our switches were flooding traffic to all ports whenever there was a Spanning-Tree topology change. This "IGMP Query Solicitation" is another behaviour that can occur at exactly the same time, when a topology change is seen. As well as flooding the multicast traffic to ensure it reaches the correct destination, the switch can also send an "IGMP Query Solicitation" message to everybody. This message essentially "resets" IGMP by prompting the querier to immediately send out a General Query. This, in turn, means that all clients will re-affirm their interest in the appropriate multicasting groups by sending a Membership Report. Therefore, the IGMP snoopers (switches) will then be able to rebuild their snooping databases and once again send traffic to all the right places. So it provides a way of restoring order to your network after Spanning-Tree announces a Topology Change.

So do I want this behavior? How do I turn it off?

Yes. It's good. It helps your network to get back to normal after any changes in topology. In our case there was a device that was not coping with seeing the same MAC in 300+ VLANs at the same time. However, there may be legitimate reasons to turn this on or off. In the Cisco switches I tested, this was disabled by default. In order for it to function, it required:

An IGMP querier active in a VLAN (any VLAN with no querier got no query solicitation)
IGMP Query Solicitation enabled ("ip igmp snooping tcn query solicit")

However, in the Allied Telesyn x610 switch that I was using, this behaviour is enabled by default. In addition, IGMP snooping is enabled by default. This is sort of good because if you don't need it, it is unlikely to cause harm, yet not having it when you need it can cause your network to flood.

In the case of the AT switches, there are two options, the default is that only the STP root-bridge will send the query solicitation packets. You can turn that off, or you can optionally enable it for all switches, not just the root. Commands are:

(no) ip igmp snooping tcn query solicit root
(no) ip igmp snooping tcn query solicit

The variation in implementations and defaults across switch manufacturers is interesting. Allied Telesyn seem to be protecting people, whereas Cisco are assuming you should know what you're doing at least a little bit!

As always, I hope this has been of use to somebody!

Monday, 3 September 2012

Other Switches

Just another quick update with information on other switches behaviour when it comes to spanning-tree and multicast...

HP Switches

I tested this with a pair of HP 2610 edge switches aggregated on an HP 2520.

HP's "auto-edge-port" functionality reduces the number of TCNs on the network - and is enabled by default! I had to do a fair bit of tweaking to get TCNs when I plugged and unplugged normal devices.

Even once I got TCNs occurring on-demand, the HP switches do not seem to flood multicast when TCNs are seen. The only undesirable behaviour I did see was that multicast traffic was flooded for a few seconds when a device leaves a multicast group. This is documented by HP:

On switches that do not support Data-Driven IGMP, unregistered multicast groups are flooded to the VLAN rather than pruned. In this scenario, Fast-Leave IGMP can actually increase the problem of multicast flooding by removing the IGMP group filter before the Querier has recognized the IGMP leave. The Querier will continue to transmit the multicast group during this short time, and because the group is no longer registered the switch will then flood the multicast group to all ports.

Because of the multicast flooding problem mentioned above, the IGMP Fast-Leave feature is disabled by default on all ProCurve switches that do not support Data-Driven IGMP.

Thanks HP... this is basically their way of saying "If you want multicast, don't use our 2600 switches."

Allied Telesyn Switches

I tested this with a pair of AT-8100S switches aggregated on an AT-x610.

Allied Telesyn switches do seem to suffer from similar problems, and their ports are not portfast by default. Enabling portfast on edge ports does reduce TCNs, so you should be able to significantly reduce the prevalence of these issues.

There is an option to disable TCN flooding, but oddly it has nothing related to TCN in the name. In global configuration mode use this command:

no ip igmp snooping flood-unknown-mcast

Since implementing this in my lab, I have not been able to get any floods to occur, no matter that spanning-tree is doing. I like the fact that it is a single global command to turn it off - much quicker than doing it per-port on a Cisco - but that means there is a lack of flexibility. If you did want this behaviour on one port and not on another, then you can't have it!

Thursday, 30 August 2012

Playing with Multicast

Just an addendum to my previous entry - a few useful tricks I picked up whilst setting up a lab for playing with multicast.

Generating lots of multicast traffic

A dead quick and easy way of producing multicast traffic is to use iperf - normally a bandwidth-testing tool. It will happily send to multicast addresses, just give it a multicast IP as a destination. Using UDP we can send traffic aimlessly without worrying whether there is an iperf server on the other end. Use the syntax below!

iperf -u -p 1234 -c 224.1.1.1 -b 50M -t 86400

I actually used multiple instances simultaneously, to produce a few variable sized multicast streams to different multicast IPs.

Receiving multicast traffic

Also easy, using socat, a somewhat more versatile version of netcat. Syntax below!

socat STDIO UDP4-RECV:1234,ip-add-membership=224.1.1.1:eth0 > /dev/null

Obviously you may want to send the traffic somewhere other than /dev/null - but if you've generated it from iperf with the above syntax, there will be quite a lot of it!

Have fun playing with multicast in your lab!

Multicast, IGMP and Spanning-Tree

So, I've come across this problem a lot of times, so I thought I'd try and write a post to help others in the same situation.

The situation is this - you have a large network of switches, using spanning-tree to prevent loops, but you are also using the network for multicast streaming. If you have any significant amount of multicast going on (maybe an IPTV system) then you'll be using IGMP snooping on all the switches to make sure that you don't have traffic going where it's not required. You set it up, and everything is working fine.

But then... it breaks. Badly. Your network starts flooding occasionally, for a couple of minutes at a time. During that time, all traffic on the network is delayed at best, and often dropped.

The interactions of IGMP, STP and your large amounts of multicast traffic are killing your network.

Let's break it down to explain the different things that are happening here:

Why does my network grind to a halt?

It's flooded! When using IGMP snooping, the multicast traffic on your network is normally only sent to those people who want to receive it. However, in this situation, your switches are momentarily sending traffic to all ports. There is so much traffic that your switch ports may be running at capacity, or the end-hosts are getting sent so much unwanted multicast traffic that they can't keep up.

So why does IGMP snooping suddenly stop working?

It doesn't. It is choosing to flood your multicast traffic because it thinks that is the best course of action in the given situation. If we look at the debug messages for IGMP snooping:

00:08:15: IGMPSN: mgt: Received topology change on vlan 1
00:08:15: IGMPSN: mgt: Updating all GCEs with flood portset for in Vlan 1

When spanning-tree protocol tells the switch that a topology change has occurred (more on this below), IGMP snooping will flood your multicast traffic to all ports, assuming that if the topology has changed and your traffic is mission-critical, then it had better send it to all ports to make sure it gets to your end user!

But I don't want that...

Ok, no problem - you can turn it off. In Cisco switches, you need to add this command to every interface you want to stop the flooding on.

no ip igmp snooping tcn flood

That probably means all your edge ports, and potentially some of your uplink trunks, although these should probably be high enough bandwidth to be able to cope with all your multicast! This command is basically telling your switches "Don't flood traffic when you receive a topology-change notification (TCN)".

What is this topology change anyway? I didn't change anything!

Spanning-tree protocol, although very useful, can be very tricky to get configured correctly, and can cause you a lot of problems. When any switch believes a topology change has occurred, it will send a notification to the root bridge. When the root bridge receives this, it sets the topology-change (TC) bit in its BPDUs, to notify the whole of the rest of the network that a topology change has occurred.

So why are they happening if my network isn't changing?

Spanning tree will send a topology-change-notification (TCN) whenever it believes a topology change has occurred. If you already understand spanning-tree, you will know that any port, as it comes up, will go through two different states, "learning" and "listening", before finally entering the "forwarding" state and starting to operate normally. Any port transitioning in or out of this forwarding state will trigger a TCN. However, if a port is configured with "portfast" it will skip the "listening" and "learning" states and jump straight to "forwarding", without triggering a TCN. So, put simply, any port going up or down, anywhere on your network, that is not in portfast mode, will trigger a TCN, as shown below:

Without Portfast:

02:30:04: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:30:05: set portid: VLAN0001 Fa0/1: new port id 8001
02:30:05: STP: VLAN0001 Fa0/1 -> listening
02:30:06: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up
02:30:20: STP: VLAN0001 Fa0/1 -> learning
02:30:35: STP: VLAN0001 sent Topology Change Notice on Gi0/1
02:30:35: STP: VLAN0001 Fa0/1 -> forwarding

With Portfast:

02:29:10: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:29:11: set portid: VLAN0001 Fa0/1: new port id 8001
02:29:11: STP: VLAN0001 Fa0/1 ->jump to forwarding from blocking
02:29:12: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up

As you can see, portfast is very desirable, because not only does it stop unwanted TCNs, but it also means your ports will come up much faster. If you're anything like me, you already put all your access ports into portfast, just because you want them to come up fast. However, you may not put your edge trunk ports (perhaps for a server, wireless AP or VoIP phone) into portfast.

And this will fix all my problems?

Not necessarily. It's possible that you are getting legitimate topology changes within your network. For example, I have seen an occasion where a faulty fibre link was causing an interface flap for an unused switch on the edge of a network. You can track down the source of your TCNs by using "debug spanning-tree events" on your switches. Start with the root bridge, and when your TCN occurs, you should see something like this:

02:38:48: STP: VLAN0001 Topology Change rcvd on Fa0/24

So, work out which switch is on Fa0/24, log into that and run debugs there. Repeat the process until you find the port that is flapping. A quicker way of doing this is to set up all your switches to log debug messages to a syslog server, and turn on spanning-tree event debugs on all the switches at the same time, and then you only have to see a single TCN, rather than having to keep waiting for it to occur. I'll put up another post about syslogs on Cisco another time.

I'm still confused, how do I stop this flooding happening?!

The quickest way is to add the "no ip igmp snooping tcn flood" command to all your interfaces. If you want to stop the underlying cause, make sure all ports where a single device is connected are set up with "spanning-tree portfast" for access ports or "spanning-tree portfast trunk" for trunk ports. Don't do this for links to switches - they should be set up as part of your spanning tree.

I hope this is useful to some people. I've dealt with this situation quite a few times, and the first few it took me a while to figure out what was happening. If you want to understand this further, Cisco has a very helpful page about this here.

Edit: See my next entry for info on some of the Linux commands I used to test multicast in my lab.
Update: See my newer post about IGMP Query Solicitation