The Amazon Simple Storage Service (Amazon S3) is an object-based storage system.
Amazon S3 is built to store and retrieve any amount of data from anywhere on the Internet.
Amazon S3 stores objects and objects are stored in buckets.
It’s a simple storage service that offers an extremely durable, highly available, and infinitely scalable data storage infrastructure at very low costs.
Amazon S3 is a distributed architecture and objects are redundantly stored on multiple devices across multiple facilities (AZs) in an Amazon S3 region.
Amazon S3 is a simple key-based object store.
Amazon S3 data is made up of:
- Key (name).
- Value (data).
- Version ID.
- Access Control Lists.
Keys can be any string, and they can be constructed to mimic hierarchical attributes.
Alternatively, you can use S3 Object Tagging to organize your data across all of your S3 buckets and/or prefixes.
Event notifications for specific actions, can send alerts or trigger actions.
Notifications can be sent to:
- SNS Topics.
- SQS Queue.
- Lambda functions.
- Need to configure SNS/SQS/Lambda before S3.
- No extra charges from S3 but you pay for SNS, SQS and Lambda.
S3 provides read after write consistency for PUTS of new objects.
S3 provides eventual consistency for overwrite PUTS and DELETES (takes time to propagate).
HTTP 200 code indicates a successful write to S3.
Additional capabilities offered by Amazon S3 include:
Objects are stored in buckets:
- A bucket can be viewed as a container for objects.
- A bucket is a flat container of objects.
- It does not provide a hierarchy of objects.
- You can use an object key name (prefix) to mimic folders.
You can create folders in your buckets (only available through the Console).
You cannot create nested buckets.
Bucket names are part of the URL used to access the bucket.
An S3 bucket is region specific.
S3 is a universal namespace so names must be unique globally.
Can enable logging to a bucket.
- Bucket names must be at least 3 and no more than 63 character in length.
- Bucket names must start and end with a lowercase character or a number.
- Bucket names must be a series of one or more labels which are separated by a period.
- Bucket names can contain lowercase letters, numbers and hyphens.
- Bucket names cannot be formatted as an IP address.
For better performance, lower latency, and lower cost, create the bucket closer to your clients.
Each object is stored and retrieved by a unique key (ID or name).
An object in S3 is uniquely identified and addressed through:
- Service end-point.
- Bucket name.
- Object key (name).
- Optionally, an object version.
Objects stored in a bucket will never leave the region in which they are stored unless you move them to another region or enable cross-region replication.
You can define permissions on objects when uploading and at any time afterwards using the AWS Management Console.
Sub-resources are subordinate to objects, they do not exist independently but are always associated with another entity such as an object or bucket.
Sub-resources (configuration containers) associated with buckets include:
- Lifecycle – define an object’s lifecycle.
- Website – configuration for hosting static websites.
- Versioning – retain multiple versions of objects as they are changed.
- Access Control Lists (ACLs) – control permissions access to the bucket.
- Bucket Policies – control access to the bucket.
- Cross Origin Resource Sharing (CORS).
Sub-resources associated with objects include:
- ACLs – define permissions to access the object.
- Restore – restoring an archive.
Used to allow requests to a different origin when connected to the main origin.
The request will fail unless the origin allows the requests using CORS headers (e.g. Access-Control-Allow-Origin).
Must enable the correct CORS headers.
Specify a CORS configuration on the S3 bucket.
There are six S3 storage classes.
- S3 Standard (durable, immediately available, frequently accessed).
- S3 Intelligent-Tiering (automatically moves data to the most cost-effective tier).
- S3 Standard-IA (durable, immediately available, infrequently accessed).
- S3 One Zone-IA (lower cost for infrequently accessed data with less resilience).
- S3 Glacier (archived data, retrieval times in minutes or hours).
- S3 Glacier Deep Archive (lowest cost storage class for long term retention).
The table below provides the details of each Amazon S3 storage class:
Objects stored in the S3 One Zone-IA storage class are stored redundantly within a single Availability Zone in the AWS Region you select.
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket.
S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations.
Used to accelerate object uploads to S3 over long distances (latency).
Transfer acceleration is as secure as a direct upload to S3.
You are charged only if there was a benefit in transfer times.
Need to enable transfer acceleration on the S3 bucket.
Cannot be disabled, can only be suspended.
May take up to 30 minutes to implement.
URL is: <bucketname>.s3-accelerate.amazonaws.com.
S3 can be used to host static websites.
Cannot use dynamic content such as PHP, .Net etc.
You can use a custom domain name with S3 using a Route 53 Alias record.
Prefer to learn by doing? Watch the AWS Hands-On Labs video tutorial below to learn how to create a static website using an Amazon S3 bucket. We’ll also show you how to use a custom DNS name for your static website.
When using a custom domain name the bucket name must be the same as the domain name.
Can enable redirection for the whole domain, pages or specific objects.
URL is: <bucketname>.s3-website-.amazonaws.com.
Pre-signed URLs can be used to provide temporary access to a specific object to those who do not have AWS credentials.
By default all objects are private and can only be accessed by the owner.
To share an object you can either make it public or generate a pre-signed URL.
Expiration date and time must be configured.
These can be generated using SDKs for Java and .Net and AWS explorer for Visual Studio.
Can be used for downloading and uploading S3 objects.
MFA delete forces the user to generate a code on a device before performing operations on S3.
You must enable versioning on the bucket.
MFA delete can be required for the following operations:
- Permanently delete an object version.
- Suspend versioning on the bucket.
Only the bucket owner (root account) can enable / disable MFA-delete.
Versioning stores all versions of an object (including all writes and even if an object is deleted).
Versioning protects against accidental object/data deletion or overwrites.
Enables “roll-back” and “un-delete” capabilities.
Versioning can also be used for data retention and archive.
Old versions count as billable size until they are permanently deleted.
Enabling versioning does not replicate existing objects.
Can be used for backup.
Once enabled versioning cannot be disabled only suspended.
Can be integrated with lifecycle rules.
Multi-factor authentication (MFA) delete can be enabled.
MFA delete can also be applied to changing versioning settings.
MFA delete applies to:
- Changing the bucket’s versioning state.
- Permanently deleting an object.
Cross Region Replication requires versioning to be enabled on the source and destination buckets.
Reverting to previous versions isn’t replicated.
By default a HTTP GET retrieves the most recent version.
Only the S3 bucket owner can permanently delete objects once versioning is enabled.
When you try to delete an object with versioning enabled a DELETE marker is placed on the object.
You can delete the DELETE marker and the object will be available again.
Deletion with versioning replicates the delete marker. But deleting the delete marker is not replicated.
Bucket versioning states:
Objects that existed before enabling versioning will have a version ID of NULL.
- If you suspend versioning the existing objects remain as they are however new versions will not be created.
- While versioning is suspended new objects will have a version ID of NULL and uploaded objects of the same name will overwrite the existing object.
Object Lifecycle Management
Used to optimize storage costs, adhere to data retention policies and to keep S3 volumes well-maintained.
A lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions:
- Transition actions—Define when objects transition to another storage class. For example, you might choose to transition objects to the STANDARD_IA storage class 30 days after you created them, or archive objects to the GLACIER storage class one year after creating them.
There are costs associated with the lifecycle transition requests. For pricing information, see Amazon S3 Pricing.
- Expiration actions—Define when objects expire. Amazon S3 deletes expired objects on your behalf.
Lifecycle configuration is an XML file applied at the bucket level as a subresource.
Can be used in conjunction with versioning or independently.
Can be applied to current and previous versions.
Can be applied to specific objects within a bucket: objects with a specific tag or objects with a specific prefix.
You can securely upload/download your data to Amazon S3 via SSL endpoints using the HTTPS protocol (In Transit – SSL/TLS).
Server side encryption options:
- SSE-S3 – Server Side Encryption with S3 managed keys.
- Each object is encrypted with a unique key.
- Encryption key is encrypted with a master key.
- AWS regularly rotate the master key.
- Uses AES 256.
- SSE-KMS – Server Side Encryption with AWS KMS keys.
- KMS uses Customer Master Keys (CMKs) to encrypt.
- Can use the automatically created CMK key.
- OR you can select your own key (gives you control for management of keys).
- An envelope key protects your keys.
- SSE-C – Server Side Encryption with client provided keys.
- Client manages the keys, S3 manages encryption.
- AWS does not store the encryption keys.
- If keys are lost data cannot be decrypted.
The following diagram depicts the options for enabling encryption and shows you where the encryption is applied and where the keys are managed:
Amazon S3 event notifications can be sent in response to actions in Amazon S3 like PUTs, POSTs, COPYs, or DELETEs.
Amazon S3 event notifications enable you to run workflows, send alerts, or perform other actions in response to changes in your objects stored in S3.
To enable notifications, you must first add a notification configuration that identifies the events you want Amazon S3 to publish and the destinations where you want Amazon S3 to send the notifications.
You can configure notifications to be filtered by the prefix and suffix of the key name of objects.
Amazon S3 can publish notifications for the following events:
- New object created events.
- Object removal events.
- Restore object events.
- Reduced Redundancy Storage (RRS) object lost events.
- Replication events.
Amazon S3 can send event notification messages to the following destinations:
- Publish event messages to an Amazon Simple Notification Service (Amazon SNS) topic.
- Publish event messages to an Amazon Simple Queue Service (Amazon SQS) queue.
- Publish event messages to AWS Lambda by invoking a Lambda function and providing the event message as an argument.
Need to grant Amazon S3 permissions to post messages to an Amazon SNS topic or an Amazon SQS queue.
Need to also grant Amazon S3 permission to invoke an AWS Lambda function on your behalf. For information about granting these permissions.
S3 object tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object.
Allow you to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics.
Up to ten tags can be added to each S3 object and you can use either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs to add object tags.
Cross Region Replication
CRR is an Amazon S3 feature that automatically replicates data across AWS Regions.
With CRR, every object uploaded to an S3 bucket is automatically replicated to a destination bucket in a different AWS Region that you choose.
Provides automatic, asynchronous copying of objects between buckets in different regions.
CRR is configured at the S3 bucket level.
You enable a CRR configuration on your source bucket by specifying a destination bucket in a different Region for replication.
You can use either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs to enable CRR.
Versioning must be enabled for both the source and destination buckets .
Source and destination buckets must be in different regions.
With CRR you can only replication between regions, not within a region (see SRR below for single region replication).
Replication is 1:1 (one source bucket, to one destination bucket).
You can configure separate S3 Lifecycle rules on the source and destination buckets.
You can replicate KMS-encrypted objects by providing a destination KMS key in your replication configuration.
Triggers for replication are:
- Uploading objects to the source bucket.
- DELETE of objects in the source bucket.
- Changes to the object, its metadata, or ACL.
What is replicated:
- New objects created after enabling replication.
- Changes to objects.
- Objects created using SSE-S3 using the AWS managed key.
- Object ACL updates.
What isn’t replicated:
- Objects that existed before enabling replication (can use the copy API).
- Objects created with SSE-C and SSE-KMS.
- Objects to which the bucket owner does not have permissions.
- Updates to bucket-level subresources.
- Actions from lifecycle rules are not replicated.
- Objects in the source bucket that are replicated from another region are not replicated.
- If a DELETE request is made without specifying an object version ID a delete marker will be added and replicated.
- If a DELETE request is made specifying an object version ID the object is deleted but the delete marker is not replicated.
Same Region replication (SRR)
As the name implies you can use SRR to replication objects to a destination bucket within the same region as the source bucket.
This feature was released in September 2018.
Replication is automatic and asynchronous.
New objects uploaded to an Amazon S3 bucket are configured for replication at the bucket, prefix, or object tag levels.
Replicated objects can be owned by the same AWS account as the original copy or by different accounts, to protect from accidental deletion.
Replication can be to any Amazon S3 storage class, including S3 Glacier and S3 Glacier Deep Archive to create backups and long-term archives.
When an S3 object is replicated using SRR, the metadata, Access Control Lists (ACL), and object tags associated with the object are also part of the replication.
Once SRR is configured on a source bucket, any changes to the object, metadata, ACLs, or object tags trigger a new replication to the destination bucket.
Can run analytics on data stored on Amazon S3.
This includes data lakes, IoT streaming data, machine learning, and artificial intelligence.
The following strategies can be used:
You can use S3 Inventory to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.
Amazon S3 inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet (Parquet) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
Monitoring and Reporting
Amazon CloudWatch metrics for Amazon S3 can help you understand and improve the performance of applications that use Amazon S3. There are several ways that you can use CloudWatch with Amazon S3.
- Daily storage metrics for buckets ‐ Monitor bucket storage using CloudWatch, which collects and processes storage data from Amazon S3 into readable, daily metrics. These storage metrics for Amazon S3 are reported once per day and are provided to all customers at no additional cost.
- Request metrics ‐ Monitor Amazon S3 requests to quickly identify and act on operational issues. The metrics are available at 1-minute intervals after some latency to process. These CloudWatch metrics are billed at the same rate as the Amazon CloudWatch custom metrics.
- Replication metrics ‐ Monitor the total number of S3 API operations that are pending replication, the total size of objects pending replication, and the maximum replication time to the destination Region. Only replication rules that have S3 Replication Time Control (S3 RTC) enabled will publish replication metrics.
Logging and Auditing
You can record the actions that are taken by users, roles, or AWS services on Amazon S3 resources and maintain log records for auditing and compliance purposes.
AWS recommend that you use AWS CloudTrail for logging bucket and object-level actions for your Amazon S3 resources.
Server access logging provides detailed records for the requests that are made to a bucket. This information can be used for auditing. You must not set the bucket being logged to be the destination for the logs as this creates a logging loop and the bucket will grow in size exponentially.
Authorization and Access Control
By default, only the resource owner can access buckets and objects. The resource owner refers to the AWS account that creates the resource.
Access policy describes who has access to what. You can associate an access policy with a resource (bucket and object) or a user.
You can categorize the available Amazon S3 access policies as follows:
- Resource-based policies – Bucket policies and access control lists (ACLs) are resource-based because you attach them to your Amazon S3 resources.
- User policies – You can use IAM to manage access to your Amazon S3 resources. You can create IAM users, groups, and roles in your account and attach access policies to them granting them access to AWS resources, including Amazon S3.
IAM policies (user policies) can be used to apply permissions (allow/deny) for specific API actions to users and groups. Resources to apply the policy to can be buckets and objects.
Bucket policies can be used to grant public access to the bucket, force objects to be encrypted at upload, and to grant access another AWS account (cross-account access).
ACLs can be applied at the object-level and bucket-level:
- Object ACL – apply permissions at the individual object level (finer level of granularity).
- Bucket ACL – apply permissions at the bucket level.
Access to Amazon S3 buckets is blocked by default (block public access feature).
An IAM principal can access an S3 object only if the user permissions allow it or the resource policy allows it (and there’s no explicit deny).
You can use Access Analyzer for S3 to review all buckets that have bucket access control lists (ACLs), bucket policies, or access point policies that grant public or shared access.
Access Analyzer for S3 alerts you to buckets that are configured to allow access to anyone on the internet or other AWS accounts, including AWS accounts outside of your organization.