Read Our LatestArticle on at theMitigant Blog.


Drift Management in Cloud Infrastructure

Sep 05, 2022-5 min read
Go back to Blog

Managing cloud drifts is one of the challenges faced by cloud-native enterprises. While it might be easy to orchestrate complex cloud infrastructure via a single command, keeping track of these resources could be tasking. Enterprises do not want to implement gatekeeper structures that often hinder agility. Unattended cloud drifts could introduce huge costs and security issues. This blog overviews cloud drifts and different drift management techniques.

Mind the Gap - How Wide Has Your Cloud Drifted

The cloud has made enterprises agile; infrastructure can be launched quickly with a few buttons or commands. Features like autoscaling allow infrastructure to grow in response to traffic and other factors. While this might help a business achieve its objectives, it could bite back if proper measures are not implemented. The phenomenon where a cloud infrastructure moves away from its planned or perceived state is called cloud drift.

Drifts in cloud infrastructure occur when cloud resources (including configurations) move away from the desired state. Unfortunately, most enterprises feel the pain of cloud drift before implementing countermeasures. Moreover, implementing countermeasures is also a challenge, given that such countermeasures could themselves become hindrances to the agility of engineering and product teams.

Types of Cloud Drifts

Drifts could occur at different levels in cloud infrastructure, majorly due to resource or configuration changes.

Cloud Resource Drifts

Every asset in the cloud is a resource; thus, the number of resources in a cloud infrastructure occurs when cloud resources and configurations change owing to several factors, including intentional and unintentional reasons. Resource drifts occur when cloud resources are created, modified, or deleted.

Cloud Configuration Drifts

Most cloud resources have some configuration critical for expressing desired behavior, including security, performance, and availability. Therefore, changes to these configurations could have considerable implications in varying degrees. For example, changes to AWS S3 access policies could potentially make the bucket publicly accessible.

Implications of Cloud Drifts

Cost  Implications

Given that every cloud asset is essentially a resource, most cloud resources are billed by the cloud providers via a PAYG model. Hence, keeping track of cloud infrastructure evolution requires efficient mechanisms that identify when resources are orchestrated to determine if they are correctly or wrongly deployed.

Security Implications

Change management has been traditionally considered a critical aspect of a robust security architecture. Though the task of managing changes in infrastructures is not really a security responsibility, it turns out that in the cloud environment especially, changes are critical for identifying security events. However, like most events that might fire alerts,  there is a huge change for false positives, regardless of the challenge is how to identify the changes that are of security relevance.

Reconciler Pattern Showing Its Four Methods

Drift Management Lifecycle

Most drift management mechanisms detect drifts by employing “the reconciler pattern” . The reconciler pattern is a software engineering pattern that aims to solve the issue of drifts. It achieves this by establishing two states: the desired state  (also known as the expected state) and the actual state  (also known as the real world state). The desired state is defined at orchestration via different methods e.g. DSL or IaC and persisted in a kind of data store (e.g. file-based, in-memory, object storage, and RDBMS). The drift is computed by comparing the desired state with the expected state to compute the differences, i.e., the changes resulting from the creation, modification, or deletion of cloud resources and configurations. The reconciler pattern has four standard methods: getActual(), getExpected(), reconcile(), and destroy(). These methods are used for drift detection and resolution.  It is critical to understand that drift management is the broader umbrella that encapsulates different aspects related to cloud drifts: drift detection, drift analysis, and drift reconciliation. Let's examine these aspects briefly:

Drift Management Lifecycle

Drift Detection

This involves steps to identify drifts and most likely inform a cloud administrator via CLI or user interfaces. Drift detection is the most common form of drift management. However, it caters to a small fraction of the implications are leaves other issues unaddressed.

Drift Analysis

Beyond detecting drifts, it is sometimes critical to understanding the root cause of drifts as this might help in proactively preventing future drifts or better understanding a cloud system. Drift analysis often implies different mechanisms, including log event analysis. A more important aspect to be commonly practiced in the analysis of security events that lead to drift might identify Indicators-of-Compromise.

Drift Resolution

The aim of drift resolution is to reconcile the differences between the expected state and the actual state. This is one of the most challenging aspects of drift management as it might involve the deletion or creation of resources.

Drift Management Techniques

Drift management techniques are divided into two main categories: static and dynamic drift management techniques. Lets examine them briefly.

Static Drift Management

The most popular drift management techniques employ static techniques based on IaC systems, e.g., Terraform. For example, the Terraform commands terraform refresh, terraform plan and terraform apply aim to compute the drift and update the desired state (Terraform state files).

The Output of Terraform Plan Command Showing Detected Drifts

Dynamic Drift Management

In this approach, the desired state is established by directly enumerating cloud accounts at a specific point and persisting a state. After that, the established state can be directly compared with the actual state to determine drifts. This approach is more comprehensive, given that there might be some resources that are not provisioned via IaC and hence not included in the drift. Furthermore, reconciliation is more efficient since this is directly done by leveraging the cloud provider SDKs.  Essentially, dynamic drift management leverages an asset management system or CMDB in order to maintain a detailed form of the expected state and allow for advanced operations such as CRUD and versioning. This approach is also known as Infrastructure-as-Software.

Mitigant's Dynamic Drift Management Displaying Detected Drifts

Static Versus Dynamic Drift Management

Similar to other computing mechanisms, there is a continuing discussion about the pros and cons of static and dynamic drift management techniques. A significant limitation of the static drift management approach is its limitation to only the infrastructure orchestrated via the established desired state. It does not have knowledge of infrastructure deployed via other means, e.g., other IaC systems. Terraform has no awareness of infrastructure deployed via other means e.t.c AWS CDK, Pulumi, cloud APIs, or cloud web consoles. However, dynamic drift management approaches scan the entire cloud infrastructure regardless of orchestration source and can thereafter resolve the drifts seamlessly. However, dynamic drift management systems are not easily implemented as they are traditional software applications and therefore require a much longer time for implementation. An alternative is to use managed dynamic drift systems, e.g., AWS Config.

Slack Notification of Mitigant's Drift Management

Mitigant’s Drift Management System

Mitigant drift management system uses dynamic drift management approaches to allow for continuous detection, analysis, and resolution of drifts. Being a SaaS platform, engineering teams do not need to spend time building a system, instead all the features come out of the box. Drifts in AWS infrastructure are automatically tracked and notifications are sent to enable prompt response. In addition, Mitigant drift analysis focuses on investigating the security events that might have led to drifts, this allow for proactive security countermeasures.

Overcome the pain of cloud drifts once for all with Mitigant by signing up here  . We are also excited to give you a demo, just ask here !


Kennedy Torkura

Co-Founder & CTO, Mitigant. | Contributing Author - O'Reilly Security Chaos Engineering Book. | AWS Community Builder

More Reading Material


Ready to secure your cloud infrastructure?

Sign up now to enable security and resiliency for your cloud infrastructure with Mitigant.