J Wolfgang Goerlich's thoughts on Information Security
Incident Management in PowerShell: Recovery, Lessons Learned

By wolfgang. 14 June 2013 11:00

A week ago, we released PoshSec at BSides Detroit. This is the Steele release (0.2), named in memory of Will Steele. Will, who launched the PoshSec project, passed away last year. PoshSec is available for download on GitHub.

This is the final part in our series on Incident Management. Incident Management consists of following stages: PreparationIdentificationContainmentEradication, Recovery, and Lessons Learned. Today, we will look at the long tail of recovery and on lessons learned.

Recovery -- Monitoring
The immediate and most visible aspect of recovery is resuming services on the breached system. Resumption gets us back in business. But monitoring keeps us in business.
When you listen to Josh Little’s BSides Detroit presentation (A Cascade of Pebbles: How Small Incident Response Mistakes Make for Big Compromises), note how many times the recovery was executed without follow-up monitoring. At each pebble, the responsible parties cleaned up what the saw without raising the alarm. This allowed the attackers to remain in the network for weeks.

The first lesson is to communicate identified security breaches. The second lesson is to maintain a high degree of vigilance for at least two weeks following any incident.

The team’s schedule must be re-prioritized to allocate more time for monitoring post-breach. The bad guys have appeared, and stared into our souls. The attackers potentially have all the information they need for subsequent attacks, phishing attacks, or social engineering attacks. We also do not know for certain that we have not restored infected data. Therefore, plan to spend more time with PowerShell reviewing the logs.

With the exception of remedial changes, put in place a change freeze. Remedial changes include closing the vulnerability that was used in the security breach, resetting passwords to void any potentially captured hashes, and hardening along the kill chain. Other than these, reduce changes to reduce the likelihood of the security team seeing a legitimate change as an attack, or vice versus.

In sum, the last phase in the recovery stage is increased monitoring. Watch our baselines and our honey tokens closely, and be prepared for subsequent attacks.

Lessons Learned
And this brings us to the Lessons Learned stage. Ideally, this stage has three outputs: a root cause analysis (RCA) document with suggestions for improvements; a threat scenario write-up; and an indicators of compromise (IOC) document.

A source of information for these outputs are the PowerShell transcripts and the logs. The PowerShell transcripts must be enabled during the incident (Start-TranscriptStop-Transcript). Together with the logs, the incident can be pieced back together.

The RCA can be created during the review. Hold a minimum of two review meetings. In the first, do a table-top exercise and walk thru the incident. Capture all the key facts on at timeline. Between the first and second meeting, circulate this timeline and solicit feedback and additional detail. Then hold a second review meeting, and identify at least one improvement for each stage: Preparation, Identification, Containment, Eradication, Recovery.

Depending on the attack, the situation, and the likelihood of reoccurrence, we may want to create a threat scenario. A threat scenario is a sanitized version of the attack that highlights the tactics an attacker uses. The scenario cover the vulnerabilities, threats, and the business impact. Such documents can then be used by the security and operations teams for training purposes.

Finally, we generate an IOC to be shared with the wider security community. As mentioned during the Identification article, groups such as Information Sharing and Analysis Center (ISAC) have been setup to share IOCs. As attackers often use the modus operandi, sharing IOCs with our peers allows us to build better defenses.

We take from the community in Identification by leveraging others’ IOCs. We give back to the community in Lessons Learned by sharing our IOCs.

With that, we have learned from our mistakes, implemented the learning in our training programs, and feed the information back to our peers. Only then can we say that we have completed the security incident.


Summary
Incident Management is a formal process like Business Continuity. The objective of both is reducing the impact on the organization. To do this, we plan and prepare for failures and for breaches. We train for success. We automate key tasks to reduce errors and speed up our response. When an event does occur, we respond by following our training and by implementing our plans. Once done, we pause, reflect, and find ways to improve our game.

Done right, Incident Management is a measured response that deflects the attack and leaves our organization in a stronger position afterwards. Let’s do it right.

This article series is cross-posted on the PoshSec blog.

Tags:

Incident Response

Incident Management in PowerShell: Containment

By wolfgang. 12 June 2013 11:00

Welcome to part three of our Incident Management series. On Monday, we reviewed preparing for incidents. Yesterday, we reviewed identifying indicators of compromise. Today’s article will cover containing the breach.

The PoshSec Steele release (0.1) is available for download on GitHub.

At this stage in the security incident, we have verified a security breach is in effect. We did this by notifying changes in the state and behavior of the system. Perhaps group memberships have changed, suspicious software installed, or unrecognized services are now listening on new ports. Fortunately, during the preparation phase we integrated the system into our Disaster Recovery plan.

Containment
There are two concepts behind successful containment. First, use a measured response in order to minimize the impact on the organization. Second, leverage the disaster recovery program and execute the runbook to maintain services.

When a breach is identified, kill all services and processes that are not in the baseline (Stop-Process). Oftentimes attackers have employed persistence techniques, so we must setup the computer to prevent new processes from spawning (see @obscuresec’s Invoke-ProcessLock script). This stops the breach in progress and prevents the attacker from continuing on this machine.

We now need to execute a disaster recovery runbook to resume services. Data files can be moved to a backup server using file replication services (New-DfsnFolderTarget). Services and software can be moved by replaying the build scripts on the backup server. The success metric here is minimizing downtime and data loss, thereby minimizing and potentially avoiding any business impact.

We can now move onto the network layer. If necessary, QoS and other NAC services can be set during the initial transfer. We then can move the compromised system onto a quarantine network. This VLAN should contain systems with the forensics and imaging tools necessary for the recovery process.

The switch commands for QoS, NAC, and VLAN vary by manufacturer. It is a good idea to determine what these commands are and how to execute them. A better idea is to automate these with PowerShell, leveraging the .Net Framework and libraries like SSH.Net and SharpSSH.

For more information about the network side of inicident containment, please see Mick Douglas’s talk: Automating Incident Response. The concepts Mick discusses can be executed manually, automated with switch scripts, or automated with PowerShell and SSH libraries.

To summarize Containment, we respond in a measured way based on the value the system delivers to the organization. Containment begins with disaster recovery: fail-over the services and data and minimize the business impact. We can then move the affected system to a quarantine network, and move onto the next stage: Eradication. The value PowerShell delivers is in automating the Containment process. When minutes count and time is expensive, automation lowers the impact of a breach.

This article series is cross-posted on the PoshSec blog.

Tags:

Incident Response | Security

Incident Management in PowerShell: Identification

By wolfgang. 11 June 2013 11:00

The practice of Incident Management consists of six stages: Preparation, Identification, Containment, Eradication, Recovery, and Lessons Learned. Yesterday, we reviewed the Preparation stage.

During preparation, we automate the build process. We then baseline key areas of the system that are likely to change in a security incident. We enable logging and, to the greatest extent possible, set up the system to log everything. All of these steps can be automated with native PowerShell and with PoshSec.

Some portions of the preparation process are procedural, of course. For example, new systems go through a risk analysis. What is the value of the IT asset to the organization when it is available, and what is the impact when the asset is not available? New systems are also integrated within the Disaster Recovery program. This answers the question: How might a system be recovered in the event of a breach or outage?

Once in production, little occurs until the identification of a breach. That is the stage we will consider today.

Identification
Consider the security breach described in Josh Little’s BSides Detroit presentation. A Cascade of Pebbles: How Small Incident Response Mistakes Make for Big Compromises.

At several points, the security team has an opportunity to identify a breach. There is a suspicious phishing email. A user logs into computers that they normally are not on; not once but twice. Software is installed on these computers. An intrusion detection system on the local network detects malicious activity between the computer and the SQL servers. New user accounts are added to privileged groups. In sum, all of these constitute indicators of compromise (IOC).

Fundamentally, the identification stage is about detecting changes in state and behavior. Often times, these changes are missed and the breach persists. The ideal scenario is that the changes are detected, reported, and investigated. Even then, however, there is a risk. As we see with Josh Little’s talk, the changes may be dismissed and the incident therefore continues.

For state changes, PowerShell can be used to detect several ways an attacker may gain persistence. PoshSec contains scripts, for example, that detect if new TCP/UDP ports begin listening on the system. Scripts can also detect if software is installed, services added, or devices connected. PoshSec can also be used to monitor local computer and Active Directory group memberships.

Each one of these states is stored as a baseline XML file. When the script runs again, it compares the current state against the baseline. Any changes are reported to the security administrator for investigation.

Behavior changes are a bit trickier than state changes due to the flexible nature in which many employees use their computers. Our best defenses, at this layer, come from analyzing the logs. For example, in the case of Josh’s talk, the security team could have noted the user logging into a computer that they do not normally use. Another common example is DNS because a DNS lookup to a new domain may indicate the computer is infected.

How do we know if a change in state or behavior is actually a compromise? One way is leveraging the IOCs shared by Information Sharing and Analysis Center (ISAC) groups. (For example, please see FS-ISAC.) Another way is to leverage OpenIOC files. These are XML files that, along with PowerShell’s baselining, can be used to compare the state and behavior changes to previously spotted attack patterns.

In summary, PowerShell provides two defensive techniques. The first is that it enables us to quickly see changes that attackers make by baselining and comparing. The second is that it enables us to parse and browse the logs to spot the attacker’s behavior. In this stage, logs are critical. It is PowerShell’s scripting that turns logging into a strategic defensive tool. It is then up to us to thoroughly investigate these indications.

If the investigation leads us to conclude there is a breach, we move to the next stage: Containment. We will review containing the breach tomorrow.


Bonus tip: When investigating security incidents, always record your session using the Start-Transcript and Stop-Transcript cmdlets. The incident transcript will be required when we get to lessons learned. The transcript will also be necessary should the incident move to law enforcement.

This article series is cross-posted on the PoshSec blog.

Tags:

Incident Response

Incident Management in PowerShell: Preparation

By wolfgang. 10 June 2013 11:30

We released PoshSec last Friday at BSides Detroit. We have named v0.1 the Steele release in honor of Will Steele. Will recognized PowerShell’s potential for improving an organization’s security posture early on. Last year, Matt Johnson -- founder of the Michigan PowerShell User Group -- joined Will and launched the PoshSec project. Sadly, Will passed away on Christmas Eve of 2011. A number of us have picked up the banner.

The Steele release team was led by Matt Johnson and included Rich Cassara (@rjcassara), Nick Jacob (@mortiousprime), Michael Ortega (@securitymoey), and J Wolfgang Goerlich (@jwgoerlich). You can download the code from GitHub. In memory of Will Steele.

This is the first of a five part series exploring PowerShell as it applies to Incident Management.

So what is Incident Management? Incident Management is a practice comprised of six stages. We prepare for the incident with automation and application of controls. We identify when an incident occurs. Believe it or not, this is where most organizations fall down. If you look at the Verizon Data Breach Investigations Report, companies can go weeks, months, sometimes even years before they identify that a breach has occurred. So we prepare for it, we identify it when it happens, we contain it so that it doesn’t spread to other systems, and then we clean up and recover. Finally, we figure out what happened and apply the lessons learned to reduce the risk of a re-occurrence.

Formally, IM consists of the following stages: Preparation, Identification, Containment, Eradication, Recovery, and Lessons Learned. We will explore these stages this week and examine the role PowerShell plays in each.

Preparation
The key practice in the Preparation stage is leveraging the time that you have on a project, before the system goes live. If time is money, the preparation time is the cheapest time.

Our most expensive time is later on, in the middle of a breach, or in a disaster recovery scenario. The server is in operation, the workflow is going on, and we are breaking the business by having that server asset unavailable. There is a material impact to the organization. It is very visible, from our management up to the CEO level. Downtime is our most expensive time.

The objective in Preparation is to bank roll as much time as possible. We want to ensure, therefore, that extra time is allocated during pre-launch for automating the system build, hardening the system, and implementing security controls. Then, when an incident does occur, we can identify and recover quickly.

System build is where PowerShell shines the brightest. As the DevOps saying goes, infrastructure is code. PowerShell was conceived of as a task framework and admin automation tool, and it can be used to script the entire Windows server build process. Take the time to automate the process and, once done, we place the build scripts in a CVS (code versioning software) to track changes. When an incident occurs, we can then pull on these scripts to reduce our time to recover.

Once built, we can harden to increase the time it will take an attacker to breach our defense. CIS Security Benchmarks (Center for Internet Security) provides guidance on settings and configurations. As with the build, the focus is on scripting each step in hardening. And again, we will want to store these scripts in a CVS for ready replays during an incident.

Finally, we implement security controls to detect and correct changes that may be indicators of compromise. For a breakdown of the desired controls, we can follow theCSIS 20 Critical Security Controls matrix. The Steele release of PoshSec automates (1) Inventory of Authorized and Unauthorized Devices; (2) Inventory of Authorized and Unauthorized Software; (11) Limitation and Control of Network Ports, Protocols, and Services; (12) Controlled Use of Administrative Privileges; and (16) Account Monitoring and Control.

The bottom line is we baseline areas of the system that attackers will change, store those baselines as XML files in a CVS, and check regularly for changes against the baseline. We use the Export-Clixml and Compare-Object cmdlets to simplify the process.

At this point in the process, we are treating our systems like code. The setup and securing is completed using PowerShell scripts. The final state is baselined. The baselines, along with the scripts, are stored in a CVS. We are now prepared for a security incident to occur.

Next step: Identification
Tomorrow, we will cover the Identification stage. What happens when something changes against the baseline? Say, a new user with a suspicious name added to a privileged group. Maybe it is a new Windows service that is looking up suspect domain names and sending traffic out over the Internet. Whatever it is, we have a change. That change is an indicator of compromise. Tomorrow, we will review finding and responding to IOCs.

This article series is cross-posted on the PoshSec blog.

Tags:

Incident Response | Security

    Log in