How to build resilience through the cloud in a post-pandemic world
Article by Gigamon CTO Shehzad Merchant.
As we look to a post-pandemic world, a key area of investment we can expect to see is the building of resilience to destructive type attacks.
Last year saw a record number of distributed denial-of-service (DDoS) and ransomware attacks, which are expected to escalate throughout the rest of this decade.
Many organisations are looking to the cloud to help achieve resilience to these attack types. But what is it about the cloud and cloud-native architectures that lend themselves to resilience to these types of attacks?
Three attributes come to mind: distributed, immutable and ephemeral.
Distributed – Applications and services
If applications are leveraging a distributed delivery model — for example, leveraging cloud-based services such as content delivery networks (CDNs) — then organisations have to worry less about DDoS attacks, as these attacks work best by concentrating their firepower in one direction.
Immutable – Data sets
Suppose applications are leveraging solutions that do not modify records but rather are ‘append-on-write’ (in other words, a data set is immutable). In that case, organisations have to worry less about attacks on the integrity of that data, as it is easier to detect and surface such attacks.
Ephemeral – Workloads
Finally, if applications are ephemeral in nature, then organisations may worry less about attackers establishing persistence and moving laterally. The value of confidential information, such as tokens associated with that application instance, is reduced as those assets simply get decommissioned and new ones get instantiated within a relatively short time frame.
So by leveraging modern cloud-native architectures that are distributed, immutable, and ephemeral, organisations help address confidentiality, integrity, and availability, which have been the foundational triad of cybersecurity.
So, how are companies manifesting these attributes in their applications? Modern cloud architectures are moving from monolithic, tiered models to distributed microservices-based architectures, where each microservice can scale independently, within a geographic region or across regions.
Each microservice can have its own storage and database optimised for that service, allowing that service to run stateless (or perhaps more accurately, using a shared-state model where the state is shared amongst the running instances via the storage/database layer). This allows those services to become truly ephemeral and distributed.
Pets vs cattle
This brings us to a concept that has been discussed for some time in the context of the cloud — pets versus cattle.
Pets have cute names, and they can be recognised individually. If a pet falls ill, the owner takes it to the vet. Owners give them a lifetime of caring and make sure the pet lives a healthy life for as long as possible.
Traditional applications are like pets. Each instance is unique. If the application gets infected, it is taken to the cyber-vet. ‘Patch in place’ is standard with traditional applications that make these instances unique. The job of IT is to keep the applications up and running for as long as possible.
Cattle, on the other hand, don’t have names. They have an obscure number; businesses generally cannot distinguish the cattle in the herd and don’t build relationships with them. If cattle fall ill or get infected, IT culls the herd.
Modern cloud applications are like cattle. Many running instances of the services are created, and each instance is indistinguishable from the other. They are all manifested from a golden repository.
IT never patches in place — that is, never makes the instances bespoke. Their job is to make the instances ephemeral, killing the instances quickly and creating new ones. In doing so, they build resilient systems, which in many ways is the opposite of keeping applications up for as long as possible — these latter systems tend to be more fragile.
Benefits of the cloud
The cloud offers many tools to help build systems that follow this paradigm. For example, Amazon recently announced ‘chaos engineering’ as-a-service. This allows organisations to introduce elements of chaos into their production workloads, such as taking down running instances, to ensure that the overall performance isn’t impacted and the workloads over time become resilient in the face of these types of operational setbacks.
Getting to this point is a journey, which may be accomplished in multiple steps. For example, organisations may move their ‘pets’ — their traditional applications and workloads — from an on-premises world to the cloud world without significantly altering the architecture of the applications. The common term for this is ‘lift and shift’.
Once the applications are in the cloud and organisations have started building familiarity with cloud-native tools, they can work on re-architecting their traditional applications (pets) into modern architectures that are distributed, immutable, and ephemeral (cattle).
In other words, they can move from pets-in-the-cloud to cattle-in-the-cloud. Yet organisations need to make sure that they don’t regress and move back to pet creation once they reach this point. For example, they don’t patch in place or keep instances up and running for longer than necessary.
Maintaining real-time or near real-time visibility at each step of the journey is critical to ensuring early detection of pets or pet-like behaviour. As new workloads are moved to the cloud in a lift-and-shift model, understanding the internal and external dependencies is vital to enforce the right policies and disincentivise pet creation.
While there are many ways to do this, looking to the network activity footprint of these applications provides a ground-zero approach to mapping this out.