Security Chaos Engineering 101: Getting Your Hands Dirty

Security Chaos Engineering (SCE) is a novel approach to cyber security; its core fundamentals are based on the principles of chaos engine

2.4.2023

Kennedy Torkura

6 min read

Security Chaos Engineering 101: Getting Your Hands Dirty

Table of Contents

Contributors

Kennedy Torkura

Co-Founder & CTO

Abstract blue-gray geometric shape composed of interconnected cubes on a white background.

Twitter social media icon with a blue bird silhouette.

Facebook social media icon in a gray circle.

Security Chaos Engineering (SCE) is a novel approach to cyber security; its core fundamentals are based on the principles of chaos engineering, though the objective is to enable cyber resiliency. Chaos engineering allows enterprises to survive outages that might result from availability and performance-related faults. Conversely, by adopting SCE, enterprises can become resilient to cyber attacks, e.g. ransomware attacks.

This article presents adopting SCE as a security engineering practice that is not esoteric but achievable. The primary motivation is to allow security engineering and professionals, in general, to view SCE as any other security engineering effort and demystify the effort and impact of adopting it. Furthermore, several misconceptions about SCE are addressed in this article to present objective information and clarity of knowledge.

This article is a follow-up to an earlier article in which several fundamental aspects of Security Chaos Engineering were discussed, including some common misconceptions. In this article, practical examples of conducting SCE experiments are provided.

Hello World SCE

Most of us already practice some form of SCE unknowingly. Creating, modifying, or deleting cloud resources forms the basis for SCE experiments, and these are the foundational techniques. So why not directly call it SCE, two major points: mindset and intent.

Adopting the Right Mindset

The mindset adopted for SCE is critical for crafting the right hypothesis and conducting successful experiments. Mindset generally refers to a way of thinking, an attitude, opinion, especially a habitual one. It is critical to have an `assume-breach` mindset; otherwise, a conflicting hypothesis might be crafted that does not support effective experiments. A mindset able to challenge existing beliefs and culture is requisite. Talking about an `assume breach` mindset, the assumption that an attacker can gain access into a cloud environment needs to be taken. This is the first hurdle a conflicting mindset will encounter, the need to be convinced that attackers can by-pass an `iron-cladded` preventive defense. The examples for these kinds of compromise abound e.g. the LastPass data breaches.

Adopting the Right Intent

The intent for conducting an SCE experiment is encapsulated in learning from failures and being proactive. You want to get evidence about specific assumptions before making conclusions. Ideally, security decisions should not be based on gut feelings or vendor promises but on experiments, facts and data. There is room for knowledge that comes from experience; however, this has to be balanced with empirical analysis. Also, as discussed in the last blog article about SCE misconceptions , the intention is not to overwhelm the environment with balistic attacks. Attempts to do this will result to burnout, stress and displeasure from management and other security folks. The key thing is to start small, learn and improve your strategies gradually.

***High-Level Illustration of The S3 Public SCE Experiment***‌ ‌

SCE Experiment - Public S3 Bucket

The experiment we will use is based on a public S3 bucket. The aim is to experiment and gather evidence about the events and reactions that would unfold if an S3 bucket becomes public, intentionally, mistakenly, or due to adversarial action. We will be using an existing bucket for this experiment. However, feel free to create a new bucket. The aim is to observe what happens when the S3 bucket already exists; if a security control works effectively.

***Complete Workflow of The AWS S3 Bucket SCE Experiment***

Step 1: Establish The Steady State

Once the target bucket has been selected, the steady state has to be established. For this example, the steady state can be as simple as the configuration of the target S3 bucket. Infrastructure-as-Code can be leveraged for marking the steady state.

***Terraform file for deploying a private AWS S3 Bucket***

Step 2: Make the S3 Bucket Public

There are several ways to make an AWS bucket public. Two of these methods are shown below using the AWS CLI. The first command allows everybody on the internet to access the bucket and its contents (objects).

aws s3api put-bucket-acl --bucket sce-experiment --acl public-read-write

The second command completely disables the `public-access-block-configuration`.

aws s3api delete-public-access-block --bucket sce-experiment

Steps 3 & 4: Observe

When the bucket is made public, a few events are expected for a well-secured AWS account. Ordinarily, security controls are deployed to prevent, detect or recover from security events. These security controls are contextual to the environment, based on the security architecture. Let's assume GuardDuty is deployed as a detective security control; hence it is expected to raise an alert based on a previous configuration. Note the assumption is that GuardDuty has been configured to send alerts based on some rules to a slack channel. For details on configuring GuardDuty alerts with slack notifications, visit the following documentation. When you try this in your environment, depending on your set-up, you might receive the alerts as GuardDuty findings. Some key questions to consider

Can you identify the exact GuardDuty finding?
Does the GuardDuty finding make sense to you, i.e. can you interpret it?
Is the GuardDuty finding actionable?
How long did the GuardDuty finding take to arrive from when the bucket was made public?

More questions can be creatively carved out but let's keep it simple.

**GuardDuty Slack Notification Received When Bucket `sce-experiment` Is Made Public**

Step 5: Recover

Finally, we will like to return the bucket to its steady state. This can be done using the AWS CLI or via terraform. It could be possible to adopt other strategies that allow the persistence of cloud resources, e.g., an agile cloud inventory and asset management system can be leveraged to roll back the earlier changes.

aws s3api put-public-access-block --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true --bucket sce-experiment

Step 6: Analysis and Planning

The observations and results from the experiment are critical and useful for making fact-based decisions that improve security and cyber-resiliency. Several approaches can be adopted. For example, in our S3 experiment, we expected notifications from GuardDuty via slack integration. However, timely notifications are more useful, as they could breach the gap between a successful attack and a stopped one. Hence a lesson to derive will be to determine practically how long it takes to get the GuardDuty notification and decide if the delivery time is acceptable. An improvement to this will be the implementation of S3 bucket events and accompanying Lambda functions. This allows the events to be triggered and reported almost immediately. See the details of implementing S3 bucket events in the AWS documentation. After this improvement, a follow-up experiment could be conducted to verify its effectiveness and other necessary improvements. Note this is just one dimension of improvement. Answers to the posed questions are contextual, and answering them provides proper guidance for the right improvement approaches. The key thing is to quickly evaluate your security controls and investments and make informed, evidence-based decisions.

***A Possible Improvement of Enhance Security - Leveraging S3 Events and Lambda***

The Mitigant SCE Platform

Mitigant SCE platform aims to facilitate cyber-resiliency as a first-class citizen in cloud-native infrastructure. It is suitable for companies of all sizes and allows quick and safe adoption of SCE without going through the cost and resource overhead. The cost of implementing an SCE strategy could be daunting for most enterprises. Mitigant solves these challenges by providing a SaaS platform.

**S3 Public Bucket Experiment Easily Conducted With Mitigant SCE Platform**

Mitigant SCE platform consists of several cloud attacks which can be leveraged as building blocks for constructing complex attack scenarios against AWS infrastructure. The platform enables safe and controlled SCE experiments, attacks can be started and stopped with button clicks, and all changes made to the cloud infrastructure are rolled back and restored seamlessly. Additionally, all attacks are mapped to the MITRE ATT&CK library, enabling the implementation of real-world attacks in the wild.

Sign up today for a free trial of the Mitigant SCE platform to help build cyber-resiliency for cloud infrastructure at https://mitigant.io/sign-up.

‍