Steve's Blog

Friday, 1 February 2013

IRC DOS Bot

So, if you've read my previous post, you will see that I was recently looking into a Denial-of-Service attack. Once I eventually tracked down the host it was coming from, I needed to identify what was sending all the packets.

Thankfully, the owner of the server was very interested in my offer of assistance with their server, so I was able to do more digging!

So, it was a Debian Linux server running sending a lot of source-spoofed packets onto the network. My first step was to get access back to the server remotely, to make life easier. This was simple; two iptables rules to block all packets to our two destination hosts:

iptables -I OUTPUT -d aaa.bbb.37.10 -j DROP
iptables -I OUTPUT -d aaa.bbb.38.140 -j DROP

This quickly stopped the outgoing packets and made the server remotely accessible.

Next up, checking netstat gave no obvious connections to these IPs. However, knowing they are spoofed, this was actually the wrong place to look. The offending process was almost certainly using a raw socket. So:

~# netstat -anp | grep raw
raw     0   0 0.0.0.0:1       0.0.0.0:*     7      2666/dhcpd
raw     0   0 0.0.0.0:255     0.0.0.0:*     7      9060/0]
raw     0   0 0.0.0.0:255     0.0.0.0:*     7      28477/0]

There were three processes with raw sockets open, dhcpd, and two unnamed ones. Knowing that dhcpd was legitimate, the two other processes needed looking at. A quick run of "top" showed me that one of them was consuming *way* too much CPU!

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9060 root 20 0 1836 300 196 R 101 0.0 642:44.59 migration

It seemed a bit strange that the process was called "migration" as this is a standard Linux process. However, the real migration process would have a low pid as it is a system process started at boot. The other process was also called migration and also had a high pid, but this time without the high cpu. So, I paused both processes, without terminating them, to enable further analysis.

kill -STOP 9060
kill -STOP 28477

Another new trick I learned, which to many of you will be no surprise, is that you can simply and easily access a copy of the running application even if it has removed itself from the disk. Looking in /proc/<pid> I could see:

~# cd /proc/28477
/proc/28477# ls -l
total 0
dr-xr-xr-x 2 root root 0 Jan 28 12:12 attr
-r-------- 1 root root 0 Jan 28 12:12 auxv
-r--r--r-- 1 root root 0 Jan 28 12:12 cgroup
--w------- 1 root root 0 Jan 28 12:12 clear_refs
-r--r--r-- 1 root root 0 Jan 27 01:27 cmdline
-rw-r--r-- 1 root root 0 Jan 28 12:12 coredump_filter
-r--r--r-- 1 root root 0 Jan 28 12:12 cpuset
lrwxrwxrwx 1 root root 0 Jan 28 11:30 cwd -> /usr/sbin/ttyload
-r-------- 1 root root 0 Jan 28 12:12 environ
lrwxrwxrwx 1 root root 0 Jan 26 06:25 exe -> /usr/sbin/ttyload/migration (deleted)
dr-x------ 2 root root 0 Jan 28 11:30 fd
dr-x------ 2 root root 0 Jan 28 11:30 fdinfo
-r-------- 1 root root 0 Jan 28 12:12 io
-r-------- 1 root root 0 Jan 28 12:12 limits
-rw-r--r-- 1 root root 0 Jan 28 12:12 loginuid
-r--r--r-- 1 root root 0 Jan 28 11:30 maps
-rw------- 1 root root 0 Jan 28 12:12 mem
-r--r--r-- 1 root root 0 Jan 28 12:12 mountinfo
-r--r--r-- 1 root root 0 Jan 28 12:12 mounts
-r-------- 1 root root 0 Jan 28 12:12 mountstats
dr-xr-xr-x 8 root root 0 Jan 28 12:12 net
-rw-r--r-- 1 root root 0 Jan 28 12:12 oom_adj
-r--r--r-- 1 root root 0 Jan 28 12:12 oom_score
-r--r--r-- 1 root root 0 Jan 28 12:12 pagemap
-r--r--r-- 1 root root 0 Jan 28 12:12 personality
lrwxrwxrwx 1 root root 0 Jan 28 11:30 root -> /
-rw-r--r-- 1 root root 0 Jan 28 12:12 sched
-r--r--r-- 1 root root 0 Jan 28 12:12 sessionid
-r--r--r-- 1 root root 0 Jan 28 12:12 smaps
-r--r--r-- 1 root root 0 Jan 28 12:12 stack
-r--r--r-- 1 root root 0 Jan 27 01:26 stat
-r--r--r-- 1 root root 0 Jan 27 01:26 statm
-r--r--r-- 1 root root 0 Jan 27 01:27 status
-r--r--r-- 1 root root 0 Jan 28 12:12 syscall
dr-xr-xr-x 3 root root 0 Jan 28 11:43 task
-r--r--r-- 1 root root 0 Jan 28 12:12 wchan

So the "exe" symlink pointed to the original file on disk, but told me that it had now been deleted. However, the file was still in memory, so I was able to copy it and start looking at it!

/proc/28477# cat exe > /tmp/thing
/proc/28477# file /tmp/thing
/tmp/thing: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

Looking further at what strings it contained (IP address hidden!):

# strings /tmp/thing
--snip--
aa.bb.ccc.251
ircd.exampledomain.net
privmsg %s :CHECKSUM <on/off>
privmsg %s :Checksum has been turned on.
privmsg %s :Checksum has been turned off.
privmsg %s :DNS <host>
privmsg %s :Resolving %s...
privmsg %s :Unable to resolve.
privmsg %s :Resolved to %s.
privmsg %s :%d.%d.%d.%d
privmsg %s :%d.%d.%d.%d - %d.%d.%d.%d
privmsg %s :White Knight 2012
privmsg %s :NICK <nick>
privmsg %s :Nick cannot be larger than 50 characters.
NICK %s
privmsg %s :Removed all spoofs
privmsg %s :What kind of subnet address is that? Do something like: 169.40
privmsg %s :BYSIN <target> <port>
privmsg %s :Packeting %s.
privmsg %s :SLICE <destination> <lowport> <highport>
privmsg %s :Error in SLICE2: Low > high.
privmsg %s :PAN <target> <lowport> <highport>
privmsg %s :Talking %s ...
privmsg %s :This knight accepts the following commands via ctcp (with a #):
privmsg %s :NICK <nick>                                      = Changes the nick of the knight
privmsg %s :GETSPOOF                                         = Gets the current spoofing
privmsg %s :SPOOFS <subnet>                                  = Changes spoofing to a subnet
privmsg %s :DNS <host>                                       = DNSs a host
privmsg %s :CHECKSUM <on/off>                                = Turns checksum on or off
privmsg %s :IRC <command>                                    = Sends this command to the server
privmsg %s :SH <command>                                     = Executes a command
privmsg %s :KILLALL                                          = Kills all current packeting
privmsg %s :KILL                                             = Kills the knight
privmsg %s :DISABLE                                          = Disables all packeting from this knight
privmsg %s :ENABLE                                           = Enables all packeting from this knight
privmsg %s :VERSION                                          = Requests version of knight
privmsg %s :HELP                                             = Displays this
kill -9 %d;kill -9 0
privmsg %s :Killing pid %d.
--snip--

So it was fairly obviously an IRC bot, with the main goal of sending spoofed DOS attacks. It also had the ability for the controller to execute arbitrary shell commands, which in this case would be run with root privilege. Therefore, it could be used to insert further malware.

I have since been playing a little with this bot in a sandbox, but that is for a later post.

Hope you found that interesting, it was a bit of a rushed post. I hope you find it an encouragement to do some of your own malware analysis!

DOS Attack

So recently I had the interesting experience of helping track down a Denial-of-Service attack coming from a network. Thankfully due to the egress filtering policies, the attack didn't make it onto the Internet.

The problem was first discovered when a Netflow collector started having resource problems. Due to the massive number of SYN packets in the DoS attack, Netflow was creating a new flow in memory for each new SYN packet, and having to remember about it until the flow timed out. Not good! After having figured out what was going on, I was able to capture a sample of traffic and start working out where it was coming from.

Here is a small sample, with the destinations partially obscured:

25771 11:48:32.584857 136.102.253.14 -> aaa.bbb.38.140 TCP 51515 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25772 11:48:32.584866 110.93.51.90 -> aaa.bbb.38.140 TCP 27433 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25773 11:48:32.584868 177.88.230.116 -> aaa.bbb.37.10 TCP 3177 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25774 11:48:32.584876 130.136.233.47 -> aaa.bbb.37.10 TCP 21115 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25775 11:48:32.584889 174.230.64.48 -> aaa.bbb.37.10 TCP 27701 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25776 11:48:32.584896 73.217.102.69 -> aaa.bbb.38.140 TCP 43969 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25777 11:48:32.584902 221.233.146.91 -> aaa.bbb.37.10 TCP 42246 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25778 11:48:32.584910 123.83.83.9 -> aaa.bbb.38.140 TCP 40649 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25779 11:48:32.584915 140.101.194.71 -> aaa.bbb.37.10 TCP 1129 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25780 11:48:32.584922 193.82.83.24 -> aaa.bbb.38.140 TCP 39930 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25781 11:48:32.584929 177.36.15.25 -> aaa.bbb.37.10 TCP 43185 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25782 11:48:32.584936 148.154.218.100 -> aaa.bbb.38.140 TCP 4283 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25783 11:48:32.584942 140.41.59.48 -> aaa.bbb.37.10 TCP 47192 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25784 11:48:32.584950 89.134.150.35 -> aaa.bbb.38.140 TCP 25345 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25785 11:48:32.584955 207.236.20.32 -> aaa.bbb.37.10 TCP 22479 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25786 11:48:32.584963 240.16.136.1 -> aaa.bbb.37.10 TCP 44095 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25787 11:48:32.584970 70.40.185.17 -> aaa.bbb.38.140 TCP 46254 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25788 11:48:32.584975 82.195.26.42 -> aaa.bbb.37.10 TCP 61179 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25789 11:48:32.584983 130.184.250.86 -> aaa.bbb.38.140 TCP 15603 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25790 11:48:32.584989 62.108.248.77 -> aaa.bbb.37.10 TCP 61996 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25791 11:48:32.584996 244.193.231.42 -> aaa.bbb.38.140 TCP 27477 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25792 11:48:32.585003 254.119.67.64 -> aaa.bbb.37.10 TCP 46181 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25793 11:48:32.585009 171.252.91.108 -> aaa.bbb.38.140 TCP 22252 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25794 11:48:32.585016 61.79.106.102 -> aaa.bbb.37.10 TCP 41133 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25795 11:48:32.585023 142.113.79.114 -> aaa.bbb.38.140 TCP 32631 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25796 11:48:32.585029 99.7.239.111 -> aaa.bbb.37.10 TCP 29999 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25797 11:48:32.585037 5.107.140.57 -> aaa.bbb.38.140 TCP 2415 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25798 11:48:32.585043 16.197.131.47 -> aaa.bbb.37.10 TCP 21038 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25799 11:48:32.585050 109.10.184.48 -> aaa.bbb.38.140 TCP 41172 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25800 11:48:32.585057 183.145.182.22 -> aaa.bbb.37.10 TCP 7899 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25801 11:48:32.585063 116.183.190.106 -> aaa.bbb.38.140 TCP 18717 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25802 11:48:32.585070 52.217.83.29 -> aaa.bbb.37.10 TCP 32629 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25803 11:48:32.585077 31.215.117.74 -> aaa.bbb.38.140 TCP 19412 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25804 11:48:32.585084 48.113.61.11 -> aaa.bbb.37.10 TCP 25432 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25805 11:48:32.585090 161.46.43.79 -> aaa.bbb.38.140 TCP 35808 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25806 11:48:32.585097 232.129.112.88 -> aaa.bbb.37.10 TCP 1616 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25807 11:48:32.585103 152.25.15.45 -> aaa.bbb.38.140 TCP 10276 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25808 11:48:32.585110 133.225.67.50 -> aaa.bbb.37.10 TCP 43895 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25809 11:48:32.585116 19.124.117.101 -> aaa.bbb.38.140 TCP 52242 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25810 11:48:32.585123 134.83.167.21 -> aaa.bbb.37.10 TCP 16292 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25811 11:48:32.585129 228.41.163.17 -> aaa.bbb.38.140 TCP 8024 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25812 11:48:32.585135 53.115.220.107 -> aaa.bbb.37.10 TCP 46765 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25813 11:48:32.585142 142.7.225.83 -> aaa.bbb.38.140 TCP 32227 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25814 11:48:32.585151 68.104.42.36 -> aaa.bbb.37.10 TCP 63759 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25815 11:48:32.585156 142.113.105.51 -> aaa.bbb.38.140 TCP 23125 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
25816 11:48:32.585169 163.167.156.8 -> aaa.bbb.38.140 TCP 37433 > 22 [SYN] Seq=0 Win=65535 Len=0 MSS=1460

So, a very large number of packets! The main things to notice are that they are coming from a massive variety of source addresses (none of which are valid for a packet going out of this network) and that they are destined for only two hosts. These two hosts are the targets, and the packets are being sent with spoofed source addresses, which makes tracking down the actual source quite challenging!

I was able to track down the source of the flows by adding a simple ACL (blocking packets to the two desinations) to different routers in the network, and seeing which ones scored the most hits. For example, this ACL was applied:

ip access-list extended Temp
deny ip any host aaa.bbb.37.10
permit ip any any

and then when reviewed a little while later:

# show access-list
Extended IP access list Temp
10 deny ip any host aaa.bbb.37.10 (2029998454 matches)
20 permit ip any any (42805196 matches)

Using this method I was quickly able to identify roughly where in the network the packets were coming from, but, to cut a long story short, it took a while to identify the source router.

From there, it was a case of identifying the sending host on the LAN. This can be done with mac-address accounting:

Configure:
(config-if)# ip accounting mac-address input
Check:
#sh interface F0/0 mac-accounting
FastEthernet0/0
Input (509 free)
    00aa.ccf2.7c51(153): 5 packets, 411 bytes, last: 1380ms ago
    00aa.ddcd.4a27(163): 1 packets, 103 bytes, last: 9236ms ago
    00bb.ee85.2569(192): 6 packets, 1573 bytes, last: 4580ms ago
                  Total: 12 packets, 2087 bytes

Unfortunately the above output was an afterthought, and I don't have the "live" version. However, it quickly gave me the MAC of the offending host.

Due to the vast traffic flood, I had to wait until the next day before I could get physically on the console of the hacked PC and do some more digging!

More to follow very soon...

Thursday, 13 September 2012

Irritating Cisco bug

Took me a long time to figure this one out... On a Cisco router, specifically Cisco 2900 running early versions of IOS 15.2, you may be completely baffled by access-lists on your interfaces failing to work! If you do a "debug ip access-list data-plane" you see the following error:

Sep 13 09:42:45: IPACL-DP: Pkt Matched against EPM list, Action: Deny
Sep 13 09:42:45: IPACL-DP: Pkt matched punt/drop it
Sep 13 09:42:45: IPACL-DP: Pkt is dropped in cef path: interface GigabitEthernet0/1 inbound direction

If this is the case for you, it's a bug in IOS. See here. Bug is number CSCtt19027. Cisco say:

ACL applied on serial or Gi interface drops all packets with permit any
Symptoms: When ACL is applied to the serial interface or Gigabit interface,
ping failure seen even though the permit statement is there.

Conditions: The symptom is observed when ACL is configured on the serial
interface or Gigabit interface.

Workaround: Enable EPM by installing the security license.

Further Problem Description: This is seen with those images where EPM is not
supported and because of that an EPM call always gives a return value
as "deny" due to registry call.

Hope this helps someone, because when I searched, I couldn't get any hits for "Matched against EPM list" at all!

Monday, 3 September 2012

Other Switches

Just another quick update with information on other switches behaviour when it comes to spanning-tree and multicast...

HP Switches

I tested this with a pair of HP 2610 edge switches aggregated on an HP 2520.

HP's "auto-edge-port" functionality reduces the number of TCNs on the network - and is enabled by default! I had to do a fair bit of tweaking to get TCNs when I plugged and unplugged normal devices.

Even once I got TCNs occurring on-demand, the HP switches do not seem to flood multicast when TCNs are seen. The only undesirable behaviour I did see was that multicast traffic was flooded for a few seconds when a device leaves a multicast group. This is documented by HP:

On switches that do not support Data-Driven IGMP, unregistered multicast groups are flooded to the VLAN rather than pruned. In this scenario, Fast-Leave IGMP can actually increase the problem of multicast flooding by removing the IGMP group filter before the Querier has recognized the IGMP leave. The Querier will continue to transmit the multicast group during this short time, and because the group is no longer registered the switch will then flood the multicast group to all ports.

Because of the multicast flooding problem mentioned above, the IGMP Fast-Leave feature is disabled by default on all ProCurve switches that do not support Data-Driven IGMP.

Thanks HP... this is basically their way of saying "If you want multicast, don't use our 2600 switches."

Allied Telesyn Switches

I tested this with a pair of AT-8100S switches aggregated on an AT-x610.

Allied Telesyn switches do seem to suffer from similar problems, and their ports are not portfast by default. Enabling portfast on edge ports does reduce TCNs, so you should be able to significantly reduce the prevalence of these issues.

There is an option to disable TCN flooding, but oddly it has nothing related to TCN in the name. In global configuration mode use this command:

no ip igmp snooping flood-unknown-mcast

Since implementing this in my lab, I have not been able to get any floods to occur, no matter that spanning-tree is doing. I like the fact that it is a single global command to turn it off - much quicker than doing it per-port on a Cisco - but that means there is a lack of flexibility. If you did want this behaviour on one port and not on another, then you can't have it!

Thursday, 30 August 2012

Playing with Multicast

Just an addendum to my previous entry - a few useful tricks I picked up whilst setting up a lab for playing with multicast.

Generating lots of multicast traffic

A dead quick and easy way of producing multicast traffic is to use iperf - normally a bandwidth-testing tool. It will happily send to multicast addresses, just give it a multicast IP as a destination. Using UDP we can send traffic aimlessly without worrying whether there is an iperf server on the other end. Use the syntax below!

iperf -u -p 1234 -c 224.1.1.1 -b 50M -t 86400

I actually used multiple instances simultaneously, to produce a few variable sized multicast streams to different multicast IPs.

Receiving multicast traffic

Also easy, using socat, a somewhat more versatile version of netcat. Syntax below!

socat STDIO UDP4-RECV:1234,ip-add-membership=224.1.1.1:eth0 > /dev/null

Obviously you may want to send the traffic somewhere other than /dev/null - but if you've generated it from iperf with the above syntax, there will be quite a lot of it!

Have fun playing with multicast in your lab!

Multicast, IGMP and Spanning-Tree

So, I've come across this problem a lot of times, so I thought I'd try and write a post to help others in the same situation.

The situation is this - you have a large network of switches, using spanning-tree to prevent loops, but you are also using the network for multicast streaming. If you have any significant amount of multicast going on (maybe an IPTV system) then you'll be using IGMP snooping on all the switches to make sure that you don't have traffic going where it's not required. You set it up, and everything is working fine.

But then... it breaks. Badly. Your network starts flooding occasionally, for a couple of minutes at a time. During that time, all traffic on the network is delayed at best, and often dropped.

The interactions of IGMP, STP and your large amounts of multicast traffic are killing your network.

Let's break it down to explain the different things that are happening here:

Why does my network grind to a halt?

It's flooded! When using IGMP snooping, the multicast traffic on your network is normally only sent to those people who want to receive it. However, in this situation, your switches are momentarily sending traffic to all ports. There is so much traffic that your switch ports may be running at capacity, or the end-hosts are getting sent so much unwanted multicast traffic that they can't keep up.

So why does IGMP snooping suddenly stop working?

It doesn't. It is choosing to flood your multicast traffic because it thinks that is the best course of action in the given situation. If we look at the debug messages for IGMP snooping:

00:08:15: IGMPSN: mgt: Received topology change on vlan 1
00:08:15: IGMPSN: mgt: Updating all GCEs with flood portset for in Vlan 1

When spanning-tree protocol tells the switch that a topology change has occurred (more on this below), IGMP snooping will flood your multicast traffic to all ports, assuming that if the topology has changed and your traffic is mission-critical, then it had better send it to all ports to make sure it gets to your end user!

But I don't want that...

Ok, no problem - you can turn it off. In Cisco switches, you need to add this command to every interface you want to stop the flooding on.

no ip igmp snooping tcn flood

That probably means all your edge ports, and potentially some of your uplink trunks, although these should probably be high enough bandwidth to be able to cope with all your multicast! This command is basically telling your switches "Don't flood traffic when you receive a topology-change notification (TCN)".

What is this topology change anyway? I didn't change anything!

Spanning-tree protocol, although very useful, can be very tricky to get configured correctly, and can cause you a lot of problems. When any switch believes a topology change has occurred, it will send a notification to the root bridge. When the root bridge receives this, it sets the topology-change (TC) bit in its BPDUs, to notify the whole of the rest of the network that a topology change has occurred.

So why are they happening if my network isn't changing?

Spanning tree will send a topology-change-notification (TCN) whenever it believes a topology change has occurred. If you already understand spanning-tree, you will know that any port, as it comes up, will go through two different states, "learning" and "listening", before finally entering the "forwarding" state and starting to operate normally. Any port transitioning in or out of this forwarding state will trigger a TCN. However, if a port is configured with "portfast" it will skip the "listening" and "learning" states and jump straight to "forwarding", without triggering a TCN. So, put simply, any port going up or down, anywhere on your network, that is not in portfast mode, will trigger a TCN, as shown below:

Without Portfast:

02:30:04: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:30:05: set portid: VLAN0001 Fa0/1: new port id 8001
02:30:05: STP: VLAN0001 Fa0/1 -> listening
02:30:06: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up
02:30:20: STP: VLAN0001 Fa0/1 -> learning
02:30:35: STP: VLAN0001 sent Topology Change Notice on Gi0/1
02:30:35: STP: VLAN0001 Fa0/1 -> forwarding

With Portfast:

02:29:10: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up
02:29:11: set portid: VLAN0001 Fa0/1: new port id 8001
02:29:11: STP: VLAN0001 Fa0/1 ->jump to forwarding from blocking
02:29:12: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up

As you can see, portfast is very desirable, because not only does it stop unwanted TCNs, but it also means your ports will come up much faster. If you're anything like me, you already put all your access ports into portfast, just because you want them to come up fast. However, you may not put your edge trunk ports (perhaps for a server, wireless AP or VoIP phone) into portfast.

And this will fix all my problems?

Not necessarily. It's possible that you are getting legitimate topology changes within your network. For example, I have seen an occasion where a faulty fibre link was causing an interface flap for an unused switch on the edge of a network. You can track down the source of your TCNs by using "debug spanning-tree events" on your switches. Start with the root bridge, and when your TCN occurs, you should see something like this:

02:38:48: STP: VLAN0001 Topology Change rcvd on Fa0/24

So, work out which switch is on Fa0/24, log into that and run debugs there. Repeat the process until you find the port that is flapping. A quicker way of doing this is to set up all your switches to log debug messages to a syslog server, and turn on spanning-tree event debugs on all the switches at the same time, and then you only have to see a single TCN, rather than having to keep waiting for it to occur. I'll put up another post about syslogs on Cisco another time.

I'm still confused, how do I stop this flooding happening?!

The quickest way is to add the "no ip igmp snooping tcn flood" command to all your interfaces. If you want to stop the underlying cause, make sure all ports where a single device is connected are set up with "spanning-tree portfast" for access ports or "spanning-tree portfast trunk" for trunk ports. Don't do this for links to switches - they should be set up as part of your spanning tree.

I hope this is useful to some people. I've dealt with this situation quite a few times, and the first few it took me a while to figure out what was happening. If you want to understand this further, Cisco has a very helpful page about this here.

Edit: See my next entry for info on some of the Linux commands I used to test multicast in my lab.
Update: See my newer post about IGMP Query Solicitation

Monday, 23 July 2012

Blogs Worth Reading

Seeing as I haven't actually posted anything on this blog since creating it in 2008, I thought I would give you some links to some blogs I read that you may like:

Krebs On Security
A great blog about security that I read regularly. Brian is great at investigating cybercrime and some of his most interesting articles are the ones where he links various online criminals back to real people.

Errata Security
Another good blog, supposedly about security, but more often covering other interesting subjects within IT. Going into more technical detail than Krebs On Security, this can give great insight into how some of the security news actually works - Rob's article about password cracking after the LinkedIn password breach was really informative.

Kaspersky Labs
From one of the top security labs in Europe, another good in-depth blog about a variety of malware and threats. These are the guys who have been heavily involved in analysis of Flame, Stuxnet and Madi. Definitely worth a read.

Hopefully at some point I will put up some interesting articles of my own, but for now, go and read what these guys are writing!