Amazon EC2 Auto Scaling is an AWS service that automatically launches and terminates Amazon Ec2 instances.
EC2 Auto Scaling is used for automatic, horizontal scaling of Amazon EC2 instances.
EC2 Auto Scaling uses scaling policies and metrics to determine when and how to scale out (launching instances) or in (terminating instances).
The Amazon EC2 Auto Scaling service performs the following three main functions:
- Monitors the health of running Amazon EC2 instances – uses health checks to identify if instances are healthy or unhealthy and whether they are able to receive traffic.
- Replaces impaired instances automatically – EC2 Auto Scaling automatically replaces instances that fail health checks.
- Balances capacity (number of instances) across Availability Zones (AZs) – when launching instances EC2 Auto Scaling tries to balance the instances across AZs.
Scaling based on a schedule allows you to scale your application ahead of known load changes.
For example, every week the traffic to your web application starts to increase on Wednesday, remains high on Thursday, and starts to decrease on Friday.
You can plan your scaling activities based on the known traffic patterns of your web application.
Amazon EC2 Auto Scaling enables you to follow the demand curve for your applications closely, reducing the need to manually provision Amazon EC2 capacity in advance.
For example, you can use target tracking scaling policies to select a load metric for your application, such as CPU utilization. Or, you could set a target value using the new “Request Count Per Target” metric from Application Load Balancer, a load balancing option for the Elastic Load Balancing service.
Amazon EC2 Auto Scaling will then automatically adjust the number of EC2 instances as needed to maintain your target.
Predictive Scaling, a feature of AWS Auto Scaling uses machine learning to schedule the right number of EC2 instances in anticipation of approaching traffic changes.
Predictive Scaling predicts future traffic, including regularly-occurring spikes, and provisions the right number of EC2 instances in advance.
Predictive Scaling’s machine learning algorithms detect changes in daily and weekly patterns, automatically adjusting their forecasts.
This removes the need for manual adjustment of Auto Scaling parameters as cyclicality changes over time, making Auto Scaling simpler to configure.
Auto Scaling enhanced with Predictive Scaling delivers faster, simpler, and more accurate capacity provisioning resulting in lower cost and more responsive applications.
Deployment and Provisioning
EC2 Auto Scaling includes the following components:
EC2 instances are organized into groups so that they can be treated as a logical unit for the purposes of scaling and management. When you create a group, you can specify its minimum, maximum, and, desired number of EC2 instances.
Groups use a launch template or a launch configuration as a configuration template for its EC2 instances. You can specify information such as the AMI ID, instance type, key pair, security groups, and block device mapping for your instances.
Amazon EC2 Auto Scaling provides several ways for you to scale your Auto Scaling groups. For example, you can configure a group to scale based on the occurrence of specified conditions (dynamic scaling) or on a schedule. For more information see “Scaling Options and Scaling Policies” below.
Accessing Amazon EC2 Auto Scaling
You can use the Amazon EC2 management console to manage EC2 Auto Scaling.
You can also access Amazon EC2 Auto Scaling using the Amazon EC2 Auto Scaling API.
Amazon EC2 Auto Scaling provides a Query API. These requests are HTTP or HTTPS requests that use the HTTP verbs GET or POST and a Query parameter named
Can also use the AWS Command Line Interface (CLI) or the AWS Tools for Windows PowerShell.
Scaling Options and Scaling Policies
The scaling options define the triggers and when instances should be provisioned/de-provisioned.
There are four scaling options:
- Maintain – keep a specific or minimum number of instances running.
- Manual – use maximum, minimum, or a specific number of instances.
- Scheduled – increase or decrease the number of instances based on a schedule.
- Dynamic – scale based on real-time system metrics (e.g. CloudWatch metrics).
The following table describes the scaling options that are available and when you should use them:
The scaling options are configured through Scaling Policies which determine when, if, and how the ASG scales and shrinks.
The following table describes the scaling policy types available for dynamic scaling policies and when to use them:
Launch Templates vs Launch Configurations
A launch template is similar to a launch configuration, in that it specifies instance configuration information. Included are the ID of the Amazon Machine Image (AMI), the instance type, a key pair, security groups, and the other parameters that you use to launch EC2 instances.
However, defining a launch template instead of a launch configuration allows you to have multiple versions of a template. With versioning, you can create a subset of the full set of parameters and then reuse it to create other templates or template versions. For example, you can create a default template that defines common configuration parameters and allow the other parameters to be specified as part of another version of the same template.
If you plan to continue to use launch configurations with Amazon EC2 Auto Scaling, be aware that not all Auto Scaling group features are available. For example, you cannot create an Auto Scaling group that launches both Spot and On-Demand Instances or that specifies multiple instance types.
EC2 Auto Scaling Lifecycle Hooks
Lifecycle hooks enable you to perform custom actions by pausing instances as an Auto Scaling group launches or terminates them.
When an instance is paused, it remains in a wait state either until you complete the lifecycle action using the complete-lifecycle-action command or the CompleteLifecycleAction operation, or until the timeout period ends (one hour by default).
Adding lifecycle hooks to your Auto Scaling group gives you greater control over how instances launch and terminate.
Instances can remain in a wait state for a finite period of time. The default is one hour (3600 seconds).
You can configure notifications for when an instance enters a wait state. You can use Amazon EventBridge, Amazon SNS, or Amazon SQS to receive the notifications.
EC2 Auto Scaling can be used to implement high availability when you launch instances into at least two Availability Zones.
Use an Amazon Elastic Load Balancer or Amazon Route 53 to direct incoming connections to your EC2 instances.
EC2 Auto Scaling is a regional service so it cannot provide HA across multiple AWS Regions.
Monitoring and Reporting
When you enable Auto Scaling group metrics, your Auto Scaling group sends sampled data to CloudWatch every minute. There is no charge for enabling these metrics.
You can enable and disable Auto Scaling group metrics using the AWS Management Console, AWS CLI, or AWS SDKs.
AWS/AutoScaling namespace includes the following metrics which are sent to CloudWatch every 1 minute:
Metrics are also sent from the Amazon EC2 instances to Amazon CloudWatch:
- Basic monitoring sends EC2 metrics to CloudWatch about ASG instances every 5 minutes.
- Detailed can be enabled and sends metrics every 1 minute (chargeable).
- When the launch configuration is created from the console basic monitoring of EC2 instances is enabled by default.
- When the launch configuration is created from the CLI detailed monitoring of EC2 instances is enabled by default.
EC2 Auto Scaling uses health checks to ensure instances are healthy and available.
- By default Auto Scaling uses EC2 status checks.
- You can use ELB health checks and custom health checks in addition to the EC2 status checks.
- If any health check returns an unhealthy status the instance will be terminated.
- With ELB an instance is marked as unhealthy if ELB reports it as OutOfService.
- A healthy instance enters the InService state.
- If an instance is marked as unhealthy it will be scheduled for replacement.
- If connection draining is enabled, Auto Scaling waits for in-flight requests to complete or timeout before terminating instances.
- The health check grace period allows a period of time for a new instance to warm up before performing a health check (300 seconds by default).
Note: If using an ELB it is a best practice to enable ELB health checks as otherwise EC2 status checks may show an instance as being healthy that the ELB has determined is unhealthy. In this case the instance will be removed from service by the ELB but will not be terminated by Auto Scaling.
Logging and Auditing
Amazon CloudTrail captures all API calls for AWS Auto Scaling as events.
The calls captured include calls from the AWS Auto Scaling console and code calls to the AWS Auto Scaling API.
If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, including events for AWS Auto Scaling.
If you don’t configure a trail, you can still view the most recent events in the CloudTrail console in Event history.
You can determine the requests that were made to AWS Auto Scaling, the IP address from which the requests were made, who made the requests, when they were made, and additional details.
Authorization and Access Control
EC2 Auto Scaling support identity-based IAM policies.
Amazon EC2 Auto Scaling does not support resource-based policies.
Amazon EC2 Auto Scaling uses service-linked roles for the permissions that it requires to call other AWS services on your behalf. A service-linked role is a unique type of IAM role that is linked directly to an AWS service.
There is a default service-linked role for your account, named AWSServiceRoleForAutoScaling. This role is automatically assigned to your Auto Scaling groups unless you specify a different service-linked role.
Amazon EC2 Auto Scaling also does not support Access Control Lists (ACLs).
You can apply tag-based, resource-level permissions in the identity-based policies that you create for Amazon EC2 Auto Scaling. This gives you better control over which resources a user can create, modify, use, or delete.