Steve's Blog: 2012

Thursday, 13 September 2012

Irritating Cisco bug

Took me a long time to figure this one out... On a Cisco router, specifically Cisco 2900 running early versions of IOS 15.2, you may be completely baffled by access-lists on your interfaces failing to work! If you do a "debug ip access-list data-plane" you see the following error:

Sep 13 09:42:45: IPACL-DP: Pkt Matched against EPM list, Action: Deny
Sep 13 09:42:45: IPACL-DP: Pkt matched punt/drop it
Sep 13 09:42:45: IPACL-DP: Pkt is dropped in cef path: interface GigabitEthernet0/1 inbound direction

If this is the case for you, it's a bug in IOS. See here. Bug is number CSCtt19027. Cisco say:

ACL applied on serial or Gi interface drops all packets with permit any
Symptoms: When ACL is applied to the serial interface or Gigabit interface,
ping failure seen even though the permit statement is there.

Conditions: The symptom is observed when ACL is configured on the serial
interface or Gigabit interface.

Workaround: Enable EPM by installing the security license.

Further Problem Description: This is seen with those images where EPM is not
supported and because of that an EPM call always gives a return value
as "deny" due to registry call.

Hope this helps someone, because when I searched, I couldn't get any hits for "Matched against EPM list" at all!

Monday, 3 September 2012

Other Switches

Just another quick update with information on other switches behaviour when it comes to spanning-tree and multicast...

HP Switches

I tested this with a pair of HP 2610 edge switches aggregated on an HP 2520.

HP's "auto-edge-port" functionality reduces the number of TCNs on the network - and is enabled by default! I had to do a fair bit of tweaking to get TCNs when I plugged and unplugged normal devices.

Even once I got TCNs occurring on-demand, the HP switches do not seem to flood multicast when TCNs are seen. The only undesirable behaviour I did see was that multicast traffic was flooded for a few seconds when a device leaves a multicast group. This is documented by HP:

On switches that do not support Data-Driven IGMP, unregistered multicast groups are flooded to the VLAN rather than pruned. In this scenario, Fast-Leave IGMP can actually increase the problem of multicast flooding by removing the IGMP group filter before the Querier has recognized the IGMP leave. The Querier will continue to transmit the multicast group during this short time, and because the group is no longer registered the switch will then flood the multicast group to all ports.

Because of the multicast flooding problem mentioned above, the IGMP Fast-Leave feature is disabled by default on all ProCurve switches that do not support Data-Driven IGMP.

Thanks HP... this is basically their way of saying "If you want multicast, don't use our 2600 switches."

Allied Telesyn Switches

I tested this with a pair of AT-8100S switches aggregated on an AT-x610.

Allied Telesyn switches do seem to suffer from similar problems, and their ports are not portfast by default. Enabling portfast on edge ports does reduce TCNs, so you should be able to significantly reduce the prevalence of these issues.

There is an option to disable TCN flooding, but oddly it has nothing related to TCN in the name. In global configuration mode use this command:

no ip igmp snooping flood-unknown-mcast

Since implementing this in my lab, I have not been able to get any floods to occur, no matter that spanning-tree is doing. I like the fact that it is a single global command to turn it off - much quicker than doing it per-port on a Cisco - but that means there is a lack of flexibility. If you did want this behaviour on one port and not on another, then you can't have it!

Thursday, 30 August 2012

Playing with Multicast

Just an addendum to my previous entry - a few useful tricks I picked up whilst setting up a lab for playing with multicast.

Generating lots of multicast traffic

A dead quick and easy way of producing multicast traffic is to use iperf - normally a bandwidth-testing tool. It will happily send to multicast addresses, just give it a multicast IP as a destination. Using UDP we can send traffic aimlessly without worrying whether there is an iperf server on the other end. Use the syntax below!

iperf -u -p 1234 -c 224.1.1.1 -b 50M -t 86400

I actually used multiple instances simultaneously, to produce a few variable sized multicast streams to different multicast IPs.

Receiving multicast traffic

Also easy, using socat, a somewhat more versatile version of netcat. Syntax below!

socat STDIO UDP4-RECV:1234,ip-add-membership=224.1.1.1:eth0 > /dev/null

Obviously you may want to send the traffic somewhere other than /dev/null - but if you've generated it from iperf with the above syntax, there will be quite a lot of it!

Have fun playing with multicast in your lab!

Multicast, IGMP and Spanning-Tree

So, I've come across this problem a lot of times, so I thought I'd try and write a post to help others in the same situation.

The situation is this - you have a large network of switches, using spanning-tree to prevent loops, but you are also using the network for multicast streaming. If you have any significant amount of multicast going on (maybe an IPTV system) then you'll be using IGMP snooping on all the switches to make sure that you don't have traffic going where it's not required. You set it up, and everything is working fine.

But then... it breaks. Badly. Your network starts flooding occasionally, for a couple of minutes at a time. During that time, all traffic on the network is delayed at best, and often dropped.

The interactions of IGMP, STP and your large amounts of multicast traffic are killing your network.

Let's break it down to explain the different things that are happening here:

Why does my network grind to a halt?

It's flooded! When using IGMP snooping, the multicast traffic on your network is normally only sent to those people who want to receive it. However, in this situation, your switches are momentarily sending traffic to all ports. There is so much traffic that your switch ports may be running at capacity, or the end-hosts are getting sent so much unwanted multicast traffic that they can't keep up.

So why does IGMP snooping suddenly stop working?

It doesn't. It is choosing to flood your multicast traffic because it thinks that is the best course of action in the given situation. If we look at the debug messages for IGMP snooping:

00:08:15: IGMPSN: mgt: Received topology change on vlan 1
00:08:15: IGMPSN: mgt: Updating all GCEs with flood portset for in Vlan 1

When spanning-tree protocol tells the switch that a topology change has occurred (more on this below), IGMP snooping will flood your multicast traffic to all ports, assuming that if the topology has changed and your traffic is mission-critical, then it had better send it to all ports to make sure it gets to your end user!

But I don't want that...

Ok, no problem - you can turn it off. In Cisco switches, you need to add this command to every interface you want to stop the flooding on.

no ip igmp snooping tcn flood

That probably means all your edge ports, and potentially some of your uplink trunks, although these should probably be high enough bandwidth to be able to cope with all your multicast! This command is basically telling your switches "Don't flood traffic when you receive a topology-change notification (TCN)".

What is this topology change anyway? I didn't change anything!

Spanning-tree protocol, although very useful, can be very tricky to get configured correctly, and can cause you a lot of problems. When any switch believes a topology change has occurred, it will send a notification to the root bridge. When the root bridge receives this, it sets the topology-change (TC) bit in its BPDUs, to notify the whole of the rest of the network that a topology change has occurred.

So why are they happening if my network isn't changing?

Spanning tree will send a topology-change-notification (TCN) whenever it believes a topology change has occurred. If you already understand spanning-tree, you will know that any port, as it comes up, will go through two different states, "learning" and "listening", before finally entering the "forwarding" state and starting to operate normally. Any port transitioning in or out of this forwarding state will trigger a TCN. However, if a port is configured with "portfast" it will skip the "listening" and "learning" states and jump straight to "forwarding", without triggering a TCN. So, put simply, any port going up or down, anywhere on your network, that is not in portfast mode, will trigger a TCN, as shown below:

Without Portfast:

02:30:04: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:30:05: set portid: VLAN0001 Fa0/1: new port id 8001
02:30:05: STP: VLAN0001 Fa0/1 -> listening
02:30:06: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up
02:30:20: STP: VLAN0001 Fa0/1 -> learning
02:30:35: STP: VLAN0001 sent Topology Change Notice on Gi0/1
02:30:35: STP: VLAN0001 Fa0/1 -> forwarding

With Portfast:

02:29:10: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:29:11: set portid: VLAN0001 Fa0/1: new port id 8001
02:29:11: STP: VLAN0001 Fa0/1 ->jump to forwarding from blocking
02:29:12: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up

As you can see, portfast is very desirable, because not only does it stop unwanted TCNs, but it also means your ports will come up much faster. If you're anything like me, you already put all your access ports into portfast, just because you want them to come up fast. However, you may not put your edge trunk ports (perhaps for a server, wireless AP or VoIP phone) into portfast.

And this will fix all my problems?

Not necessarily. It's possible that you are getting legitimate topology changes within your network. For example, I have seen an occasion where a faulty fibre link was causing an interface flap for an unused switch on the edge of a network. You can track down the source of your TCNs by using "debug spanning-tree events" on your switches. Start with the root bridge, and when your TCN occurs, you should see something like this:

02:38:48: STP: VLAN0001 Topology Change rcvd on Fa0/24

So, work out which switch is on Fa0/24, log into that and run debugs there. Repeat the process until you find the port that is flapping. A quicker way of doing this is to set up all your switches to log debug messages to a syslog server, and turn on spanning-tree event debugs on all the switches at the same time, and then you only have to see a single TCN, rather than having to keep waiting for it to occur. I'll put up another post about syslogs on Cisco another time.

I'm still confused, how do I stop this flooding happening?!

The quickest way is to add the "no ip igmp snooping tcn flood" command to all your interfaces. If you want to stop the underlying cause, make sure all ports where a single device is connected are set up with "spanning-tree portfast" for access ports or "spanning-tree portfast trunk" for trunk ports. Don't do this for links to switches - they should be set up as part of your spanning tree.

I hope this is useful to some people. I've dealt with this situation quite a few times, and the first few it took me a while to figure out what was happening. If you want to understand this further, Cisco has a very helpful page about this here.

Edit: See my next entry for info on some of the Linux commands I used to test multicast in my lab.
Update: See my newer post about IGMP Query Solicitation

Monday, 23 July 2012

Blogs Worth Reading

Seeing as I haven't actually posted anything on this blog since creating it in 2008, I thought I would give you some links to some blogs I read that you may like:

Krebs On Security
A great blog about security that I read regularly. Brian is great at investigating cybercrime and some of his most interesting articles are the ones where he links various online criminals back to real people.

Errata Security
Another good blog, supposedly about security, but more often covering other interesting subjects within IT. Going into more technical detail than Krebs On Security, this can give great insight into how some of the security news actually works - Rob's article about password cracking after the LinkedIn password breach was really informative.

Kaspersky Labs
From one of the top security labs in Europe, another good in-depth blog about a variety of malware and threats. These are the guys who have been heavily involved in analysis of Flame, Stuxnet and Madi. Definitely worth a read.

Hopefully at some point I will put up some interesting articles of my own, but for now, go and read what these guys are writing!