Lots of organizations are deploying Security Information and Event Management (SIEM) systems either to do their due diligence or because it’s part of a regulatory requirement. One of the misconceptions that typically is derived from marketing material is that you plug it in, turn it on, and voila, instant security. This couldn’t be further from the truth.
I look at SIEM like a meta-IDS (Intrusion Detection System). It is attempting to find those needles in the haystack. Most of the deployments I’ve worked on receive millions of events per day. Many of the events are informational. Sometimes it is mandatory to send those events to the SIEM because of regulatory requirements, so my goal is always to maximize our resources and make the best of the situation. When you’re getting millions of firewall events per day, for example, you can either have them take up space on your SAN uselessly or you can try to detect misuse with them.
The first thing you need to do is identify which systems will be forwarding events, typically all switches, routers, servers, application, and security systems (Network/Host Intrusion Prevention, Firewalls, anti-malware, etc). The number of devices you forward events from to the SIEM will depend on how much money you are willing to spend on event collectors that receive and normalize events, and the storage necessary to keep all of this data around.
Deciding what events to send to your SIEM is often challenging. The system you are investigating is going to have two capacity limits to be aware of:
- Storage. How much space will your events take? To get a rough estimate I would go to every system that will be forwarding events and report on how much space they logged in a day then multiply that by your retention policy and add them all together. So for instance take your (firewall logs for the day * 90) + (IPS logs for the day * 90) = required storage.
- Events per second. At the very least it is recommended to go to all of the devices that will be forwarding events and report on how many they generated in a day and divide that by 86,400 (number of seconds in a day). This will get an approximate number of total events per second which will determine the number and size of event collectors.
The purpose of this post is to help you develop ideas for custom correlation rule use cases. Maybe a SIEM sizing and requirements guide can come later. So for now let’s assume that you already have a SIEM in place and you want to get started with it.
Vendor-Provided Correlation Rules
My general methodology with SIEM (and any Intrusion Prevention System for that matter) is to enable everything to see what happens and tune back what you are not interested in. In many cases you have paid for the content and what better way to get the best bang for your buck that to see how it works in your environment. The idea would be to enable the correlation rules once your events are being forwarded, to see how they react.
- If there is a specific firewall event on your network monitoring system, sending UDP packets on port 162 to poll system information via SNMP triggering a port scanning detection rule for example, you would not turn off the entire correlation rule. The idea would be to find the mechanism to ignore that specific traffic for that specific rule.
I have seen rules that need to be modified slightly to become effective.
- For example, a correlation rule monitoring for TCPport 31337 is going to trigger backdoor rules. Firewall events will trigger this occasionally accidentally because of an outbound connection. Not to get too detailed here, but, when a computer initiates a connection to a web server on TCPport 80, it has to open a random port between 1024-65535 which could trigger here. Modifying the rule to monitor for 31337 as a destination port may be a good way to tune this rule.
- Using the same example, McAfee Rogue System Detector scans hosts for TCP 31337 during service discovery of the network. Even though internal firewalls/routers may be permitting and logging this traffic the target hosts may not (hopefully not) be running these services. In this case you may want to ignore the Rogue System Detectors with a destination TCP port of 31337.
Potential Malware Calling Home
The way malware behaves in our networks is a moving target, but it does tend to move like cars on a highway rather than at light speed. So today there are several indicators we can monitor that would allow us to infer that there is either an infection or misuse internally by an employee or contractor.
- Resolving domain names can be important to keep stability in the malware and allow for quick changes of IP addresses. For example, if I program my malware to connect to a web server at pwnd.example.net it would be nice for me as the malware administrator to change the IP of my web server in the event that someone pulls the plug on the one I’m using. If the malware is programmed to use a static IP to connect to, I will lose that malware network. If I use DNS I may be able to mitigate some of this risk by getting a new web server, setting up shop, and changing the IP of pwnd.example.net to the new IP. In most environments, I’ve been in, there are only a handful of DNS servers that all systems internally are configured to use. Part of this correlation rule would be if the following is NOT true, source or destination port is UDP orTCP 53 and source or destination IPs your list of approved DNS servers then trigger the alert.
- Another stanza to add to this rule could be approved proxy servers, if you are using one that is not in transparent mode. From your border firewalls you should only see traffic from the LAN subnet coming from the proxy server to anywhere on TCP port 80. Anything else could be an attempt to subvert this control by an employee or contractor or malware configured to do so. In addition to the above rule, if the source IP is NOT your proxy and the destination is TCPport 80 trigger the alert. You may also want to include an AND operator for the logging device being that of the border firewall to reduce the number of logs that need to be investigated.
- Another way may be to monitor for IRC traffic. If IRC is permitted you will see pretty quickly how many people are using it (it won’t be many) and can hopefully tune the rule to only trigger when a certain amount of events are found in a certain amount of time. Then you could look for a source or destination port of TCP 6666, 6667, 7777 and a few others. Another thing I like to do with this is configure a rule on my Network Intrusion Prevention System to look for any packets with IRC as the protocol and trigger an IPS event. Then look for that IPS event in this stanza of the rule too, which should make sure you catch anything at your egress point.
- Yet another stanza could be hosts attempting to use an SMTP server other than yours.
Misuse of Administration Account
Every environment I have been in has Windows and nix servers. These systems have default administration accounts, administrator and root respectively. It is best practice to provide actual system administrators with dedicated administration user accounts so that there is accountability during administration. If someone were to login as root and shut down a service how would you know who it was? You may be able to track it back by IP, but not certainly. Typically, administrators don’t want the administration team using their regular user accounts to have administrative privileges so that they mitigate mistakes. Administrators typically will have a separate user account for administration to ensure a certain level of assurance that the changes are deliberate, for example username_a. The default administration accounts are then printed and locked in a fireproof box somewhere and used for emergencies only.
That means that if someone is logging into a system with the username administrator or root, either an administrator is misusing the default account or it may have been compromised. It is important to alert specifically when the login was successful. This rule can easily be tested. Most environments will have systems and/or scripts that automate administration tasks so you will need to filter those out of the correlation rule. This does leave residual risk, but we are doing the most with what we have available to us. If you don’t like the risk with that, then do the right thing and change the user account.
This rule is similar to the malware calling home rule in the sense that we are looking for potential misuse by first looking at strange behavior. If a network is enforcing least privilege the user network will be able to send HTTP and HTTPS from the inside network out to the Internet. All of their SMTP traffic should go to the internal mail relay. If users are tunneling other protocols through HTTP they are likely attempting to evade controls, or it could be malware attempting to evade controls. This rule requires a Network Intrusion Detection/Prevention System or Application Layer Firewall. You will need to create a rule that is monitoring for TCP port 80 OR 443 traffic that is NOT HTTP protocol. On the SIEM you would just have to monitor for one of these events to be received to trigger the alert. Again when you first create this rule you may need to tune the rule on the log generating device(s) and/or filter certain hosts from triggering the correlation rule.
Potential Server Compromise
This rule can be time consuming to create for your environment, but I have to say that this is one of my favorites. It could be that you create this type of rule only for critical hosts. Here is the concept. We will use a public facing web server as the example but this obviously applies to any server.
A typical web server is listening for connections on TCP port 80. The only connections you should see in firewall logs are random source IP addresses being permitted to access TCP port 80 on your server as the destination. When you open up a web browser and connect to a website your computer opens up one of these ports locally between 1024-65535 and makes a connection to TCP port 80 on the web server. So if you see a firewall log that shows your web server making a connection on a high source port to any other system someone is initiating a connection from that webserver. If they are browsing websites or hoping to other systems from here that should be frowned upon and corrected. Maybe this is someone who has already compromised the system and is sending information back to their website or FTPserver. Similarly if you see someone connect to a port other than 80 on that webserver then you have another server running. Either someone set something new up, or maybe this is a backdoor running.
In conclusion, these are some ideas to get you started with developing correlation rules. Be creative. When building these rules you are always going to get a lot of false positives in the beginning. Do not get discouraged. Create your rule, either replay several weeks work of data through it or let it run and keep an eye on it.
Other Things to Consider
There are many other things to consider when deploying a SIEM. One of the things that senior engineers should be doing with the SIEM at least a couple of times per week is perusing the base events to look at the logs that are NOT getting correlated. There could be a lot of things happening that you don’t want to have happen but just don’t have a correlation rule yet. Importing Vulnerability Assessment results can really help to increase effectiveness and efficiency. Events need to be monitored to ensure that they are getting normalized correctly. Perhaps we will dig into some of these issues another time.
Strange Bandwidth Utilization
There are a couple of ways to look at this, Potential DDoS Detections, and Potential Exfiltration. The most common way to get this data would be to use switch and router flow events. There may be other ways depending on the environment such as forwarding Arbor Networks events or Network Intrusion Prevention events, etc to the SIM. Regardless, this can take some time to benchmark and tune because bandwidth utilization is typically somewhat sporadic.
To detect potential DDoS attacks a good start would be to start with monitoring for traffic ingress to the network targeted to a handful of critical system assets that would prevent the organization from functioning should they become inaccessible. The rule would look something like if the bandwidth directed to my web servers is greater than 40Mb/s for 10 minutes or more, trigger an alert.
Exfiltration is the act of pulling data out of the network after it has been compromised. As an example, bandwidth utilization may increase egress to the network from a file share server. The rule would look similar to the DDoS rule where if traffic leaving an asset is greater than 3Mb/s for 10 minutes or more, trigger an event.
The purpose of these rules are to provide you with some guidance on how to further leverage your SIEM solution. Even if they do not apply to your network specifically I hope they help you to think about some custom correlation events you can create to fit your environment.