Disaster Recovery in the AWS Cloud

Home » Amazon Web Services » Disaster Recovery in the AWS Cloud

When you are building applications in the AWS cloud, you have to go to painstaking lengths to make your applications durable, resilient and highly available.

Whilst AWS can help you with this for the most part, it is nearly impossible to see a situation in which you will not need some kind of Disaster Recovery plan.

An organization’s Business Continuity and Disaster Recovery (BCDR) program is a set of approaches and processes that can be used to recover from a disaster and resume its regular business operations after the disaster has ended. An example of a disaster would be a natural calamity, an outage or disruption caused by a power outage, an employee mistake, a hardware failure, or a cyberattack.

With the implementation of a BCDR plan, businesses can operate as close to normal as possible after an unexpected interruption, and with the least possible loss of data.

In this blog post, we will explore three notable disaster recovery solutions, each with different merits and drawbacks, and different ways of restoring them once they’ve been lost. However, before we can appreciate these different methods, we need to break down some key terminology in Disaster Recovery. Using AWS infrastructure as a lens, we will examine all of these strategies.

What is Disaster Recovery?

This definition provides an excellent summary of disaster recovery – an extremely broad term.

“Disaster recovery involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.”

This definition emphasizes the necessity of recovering systems, tools, etc. after a disaster. Disaster Recovery depends on many factors, including:

Financial plan
Competence in technology
Use of tools
The Cloud Provider used

It is essential to understand some key terminology, including RPO and RTO, in order to evaluate disaster recovery efficacy:

How do RPOs and RTOs differ?

RPO (Recovery Point Objective)

The Recovery Point Objective (RPO) is the maximum acceptable amount of data loss after an unplanned data-loss incident, expressed as an amount of time. This is a measure of a maximum, in order to get a low RPO, you will have to have a highly available solution.

RTO (Recovery Time Objective)

The Recovery Time Objective (RTO) is the maximum tolerable length of time that a computer, system, network or application can be down after a failure or disaster occurs. This is measured in minutes or hours and trying to retrieve as low of an RTO as possible is dependent on how quickly you can get your application back online.

Disaster Recovery Methods

Now that we understand these key concepts, we can break down three popular disaster recovery methods, namely Backup and Restore, Disaster Recovery Plan, and Disaster Recovery Contingency Plan.

Backup and Restore

Data loss or corruption can be mitigated by utilizing backup and restore. The replication of data to other data centers can also mitigate the effects of a disaster. Redeploying the infrastructure, configuration, and application code in the recovery Data center is in addition to restoring the data.

The recovery time objective (RTO) and recovery point objective (RPO) of backup and restoration are higher. The result is longer downtimes and greater data loss between the time of the disaster event and the time of recovery. Even so, backup and restore may still be the most cost-effective and easiest strategy for your workload. RTO and RPO in minutes or less are not required for all workloads.

RPO is dependent on how frequently you take snapshots, and RTO is dependent on how long it takes to restore snapshots.

Pilot Light

As far as affordability and reliability are concerned, Pilot Light strikes a perfect balance between the two. There is one key difference between Backup and Restore and Pilot Light: Pilot Light will always have its core functionality running somewhere, either in another region or in another account and region that separates it from Backup and Restore.

You can, for example, log into Backup and Restore and have all of your data synced into an S3 bucket, so that you can retrieve it in case of a disaster. It is important to note that when using Pilot Light, the data is synchronized with an always-on and always-available database replica.

Also, other core services, such as an EC2 instance with all of the necessary software already installed on it, will be available and ready to use at the touch of a button. There would be an Auto-Scaling Policy in place for each of these EC2 instances to ensure the instances would scale out in a timely manner in order to meet your production needs as soon as possible. This strategy focuses on a lower chance of overall downtime and is contingent on smaller aspects of your architecture running all of the time.

Multi-Site Active/Active

Having an exactly mirrored application across multiple AWS regions or data centers is the most resilient cloud disaster recovery strategy.

In the multi-site active/active strategy, you will be able to achieve the lowest RTO (recovery time objective) and RPO (recovery point objective). However, it is important to take into account the potential cost and complexity of operating active stacks in multiple locations.

There is a multi-AZ workload stack available in every region to ensure high availability. There is a live replication of data between each of the data stores within each Region, as well as a backup of this data. Hence, data backups are of crucial importance to protect against disasters that may lead to the loss or corruption of data as a result.

Only the most demanding applications should use this DR method, since it has the lowest RTOs and RPOs of any other DR technique.

Conclusion

It is impossible to build a Disaster Recovery plan that fits all circumstances, and no “one size fits all” approach exists. Budget ahead of time – and ensure that you don’t spend more than you can afford. It may seem like a lot of money is being spent on ‘What ifs?” – but if your applications CAN NOT go down – you have the capability to ensure this happens.

Learn how to Master the AWS Cloud

AWS Training – Our popular AWS training will maximize your chances of passing your AWS certification the first time.
Membership – For unlimited access to our entire cloud training catalog, enroll in our monthly or annual membership program.
Challenge Labs – Build hands-on cloud skills in a secure sandbox environment. Learn, build, test and fail forward without risking unexpected cloud bills.

Disaster Recovery in the AWS Cloud

What is Disaster Recovery?

How do RPOs and RTOs differ?

RPO (Recovery Point Objective)

RTO (Recovery Time Objective)

Disaster Recovery Methods

Backup and Restore

Pilot Light

Multi-Site Active/Active

Conclusion

Learn how to Master the AWS Cloud

Related posts:

How to save money on Amazon Web Services

AWS Health Checks: ELB vs ALB vs Auto Scaling

The 10 Most Exciting Announcements at AWS re:Invent 2021

Categories

Please use the menu below to navigate the article sections:

Hide article menu

AWS Training

AWS Certifications

Find Answers

Connect

Get the Free Beginner's Guide to AWS Certification

Terms