AWS Study / Guide /
Table of Contents
- AWS Solutions Architect Study Guide
- Table of Contents
- Database Services
- Amazon RDS (Relational Database Service)
- Amazon Aurora
- Amazon Aurora Serverless
- Amazon Aurora Serverless v2
- Amazon DynamoDB
- DynamoDB Integration Patterns
- Amazon DynamoDB Capacity Management
- Amazon Neptune
- Amazon Pinpoint
- Database Migration
- Amazon DocumentDB
- Caching Services
- Amazon ElastiCache
- Messaging and Queuing
- Amazon MQ
- Amazon SQS (Simple Queue Service)
- Amazon SES (Simple Email Service)
- SNS and SQS Integration
- Communication Patterns for Microservices
- Compute Services
- AWS Elastic Beanstalk
- AWS Fargate
- AWS Lambda
- AWS Step Functions
- EC2 Instance Management
- EC2 Auto Scaling
- EC2 Placement Groups
- EC2 Instance Types
- EC2 Instance Purchase Options
- Amazon EC2
- Storage Services
- Amazon S3 (Simple Storage Service)
- Amazon FSx
- Amazon Elastic File System (EFS)
- Amazon Elastic Block Store (EBS)
- AWS Transfer Family
- AWS Storage Gateway
- AWS DataSync
- AWS Backup
- S3 Lifecycle Management
- Container and Kubernetes Services
- Amazon ECS (Elastic Container Service)
- Amazon EKS (Elastic Kubernetes Service)
- Networking and Content Delivery
- VPC CIDR Selection
- Network Load Balancer for UDP Applications
- Amazon API Gateway
- AWS PrivateLink
- AWS Transit Gateway
- AWS Direct Connect
- AWS Global Accelerator
- Amazon Route 53
- Amazon CloudFront
- Elastic Load Balancing
- Amazon VPC
- VPC Security Best Practices
- Security Best Practices
- Security Group Configurations
- VPC Peering
- Network Load Balancer
- Route 53 Geolocation Routing
- Network Redundancy
- Security and Identity Services
- AWS KMS (Key Management Service)
- AWS Secrets Manager
- AWS Systems Manager
- AWS Firewall Manager
- AWS Network Firewall
- AWS WAF (Web Application Firewall)
- AWS Shield Advanced
- AWS Regional WAF Web ACL
- AWS IAM Identity Center (AWS Single Sign-On)
- AWS Certificate Manager (ACM)
- AWS Organizations
- AWS CloudFormation
- AWS Control Tower
- Identity Federation
- AWS Identity and Access Management (IAM)
- AWS CloudTrail
- Amazon Macie
- Data Analytics and Visualization
- Amazon Kinesis
- Amazon Timestream for LiveAnalytics
- Amazon OpenSearch Service
- Amazon QuickSight
- Amazon Security Lake
- Amazon Athena
- AWS Glue
- AWS Lake Formation
- Data Analytics Solutions
- Analysis and ETL Solutions
- Big Data Processing
- Amazon EMR (Elastic MapReduce)
- Amazon Transcribe
- Machine Learning Services
- Machine Learning Solutions
- Amazon Rekognition
- Encryption and Data Protection
- EBS Encryption by Default
- S3 Bucket Encryption
- Cloud Financial Management
- AWS Cost Anomaly Detection
- AWS Cost Explorer
- Resource Tagging and Cost Allocation
- Amazon DynamoDB Capacity Management
- Configuration and Infrastructure Management
- AWS Config
- AWS Resource Access Manager (RAM)
- Auto Scaling and Capacity Management
- Amazon EC2 Auto Scaling
- EventBridge for Resource Scheduling
- AWS EC2 Capacity Management
- Migration and Data Transfer
- AWS Snowball with Tape Gateway
- AWS Snowball Edge for Data Transfer
- AWS EC2 Capacity Management
- Encrypted AMI Sharing
- Monitoring and Logging
- VPC Flow Logs and Monitoring
- High Availability and Disaster Recovery
- High Availability Architectures
- Backup Strategy for Stateless Applications
- AWS Backup
- Aurora Database Scaling
- Serverless Architectures
- Best Practices and Key Concepts
- AWS PrivateLink
- AWS Resource Access Manager (RAM)
- Machine Learning Solutions for Healthcare
- Amazon Kinesis
- AWS Solution Architect Associate (SAA-C03) Exam Keyword Matrix
- Domain 1: Design Secure Architectures (30%)
- Key Phrase: "Secure access to AWS resources"
- Key Phrase: "Secure workloads and applications"
- Key Phrase: "Data security controls"
- Domain 2: Design Resilient Architectures (26%)
- Key Phrase: "Scalable and loosely coupled"
- Key Phrase: "Highly available" or "Fault-tolerant"
- Domain 3: Design High-Performing Architectures (24%)
- Key Phrase: "High-performing storage"
- Key Phrase: "High-performing compute"
- Key Phrase: "High-performing database"
- Key Phrase: "High-performing network"
- Key Phrase: "Data ingestion and transformation"
- Domain 4: Design Cost-Optimized Architectures (20%)
- Key Phrase: "Cost-optimized storage"
- Key Phrase: "Cost-optimized compute"
- Key Phrase: "Cost-optimized database"
- Key Phrase: "Cost-optimized network"
- Question Type Indicators
- Service Selection Strategy
- Content Weighting Strategy
AWS Solutions Architect Study Guide
Table of Contents
Networking and Content Delivery
- VPC CIDR Selection
- VPC Network Design
- S3 Secure Access
- Network Load Balancer for UDP Applications
- Amazon API Gateway
- AWS PrivateLink
- AWS Transit Gateway
- AWS Direct Connect
- AWS Global Accelerator
- Amazon Route 53
- Amazon CloudFront
- Elastic Load Balancing
- Amazon VPC
- VPC Security Best Practices
- Security Group Configurations
- VPC Peering
- Network Load Balancer
- Route 53 Geolocation Routing
- Network Redundancy
- AWS Network Firewall
Security and Identity Services
- AWS KMS (Key Management Service)
- AWS Secrets Manager
- AWS Systems Manager
- AWS Firewall Manager
- AWS Network Firewall
- AWS WAF (Web Application Firewall)
- AWS Shield Advanced
- AWS Regional WAF Web ACL
- AWS IAM Identity Center (AWS Single Sign-On)
- AWS Certificate Manager (ACM)
- AWS Organizations
- AWS CloudFormation
- AWS Control Tower
- Identity Federation
- AWS Identity and Access Management (IAM)
- AWS CloudTrail
- Amazon Macie
AWS Solution Architect Associate (SAA-C03) Exam Keyword Matrix
Database Services
Amazon RDS (Relational Database Service)
- Performance Improvement Options:
- Read replicas in the same region are cost-effective for serving reports
- Cross-region replicas incur higher costs
- Verified that read replicas help offload read traffic effectively
- For significant improvement in insert operations performance under heavy write loads, changing to Provisioned IOPS SSD (io1) storage is an effective solution
- Particularly effective when storage performance is identified as the bottleneck for databases with millions of daily updates
- More targeted approach for addressing I/O bottleneck than simply increasing instance memory or using burstable performance instances
- Multi-AZ Deployments:
- Primarily for high availability and disaster recovery
- Not intended for scaling workloads
- Standby database cannot be queried, written to, or read from
- Simplest and most effective way to reduce single points of failure
- Provides automatic failover to a standby instance in a different AZ
- Minimal implementation effort when modifying an existing instance
- Confirmed that Multi-AZ is the simplest method to reduce single points of failure
- RDS Proxy:
- Use to reduce failover times
- Helps prevent connection overload
- Creating read replicas alone doesn't reduce failover time
- Manages database connections, allowing applications to scale more effectively by pooling and sharing connections
- Makes applications more resilient to database failures by automatically connecting to the standby DB instance
- Can enforce IAM authentication for databases and securely store credentials in AWS Secrets Manager
- Handles unpredictable surges in database traffic and prevents oversubscription
- Queues or throttles application connections that can't be served immediately, helping applications scale without overwhelming databases
- Confirmed the importance of RDS Proxy in reducing failover times and managing connection pools
- Particularly effective for serverless applications using Lambda that may create many database connections
- Acts as an intermediary layer that sits between applications and databases to manage connection pools
- Significantly reduces timeouts during sudden traffic surges by efficiently managing database connections
- Addresses issues of high CPU and memory utilization caused by large numbers of open connections
- Resilience During Failovers:
- Minimizes application timeouts during database failovers
- Maintains connection pooling during failover events by automatically rerouting connections
- Reduces failover time for RDS instances configured with Multi-AZ deployments
- Provides significant improvement to application availability for database maintenance windows and failover scenarios
- Handles connection management automatically without application changes
- More effective for reducing failover impact than read replicas or Performance Insights
- Connection Management for Serverless Applications:
- Use RDS Proxy when applications experience database connection rejection errors during high demand
- Particularly valuable for serverless applications with highly variable connection rates
- Pools and shares database connections efficiently between application instances
- Reduces connection overhead and minimizes connection timeouts during traffic spikes
- Requires minimal configuration - create a proxy and update application connection strings
- More effective solution than increasing database instance size or implementing Multi-AZ for handling connection issues
- Read Replicas:
- Read-only copy of a DB instance
- Reduces load on primary DB instance by routing queries to the read replica
- Enables elastically scaling beyond capacity constraints of a single DB instance for read-heavy workloads
- Emphasized that read replicas are cost-effective for scaling read workloads
- Additional Note on Reducing Operational Overhead: Creating a read replica and directing heavy read queries or development/reporting workloads to it can significantly improve performance with minimal changes, reducing load on the primary
- Additional Update for Serverless:
- In architectures using AWS Lambda, using RDS Proxy is essential to reduce connection overhead
- Serverless use of RDS Proxy has been validated to improve performance
- Snapshot and Restore:
- Creating a snapshot of an RDS instance after testing and terminating it is a cost-effective approach for infrequently used databases
- Restore from snapshot when needed for the next test cycle
- Significantly reduces costs for databases that are only needed periodically (e.g., monthly testing)
- More cost-efficient than keeping the instance running continuously or modifying instance types repeatedly
- Snapshot and Restore for Disaster Recovery:
- For disaster recovery with RPO and RTO of 24 hours, copying automatic snapshots to another region every 24 hours is the most cost-effective solution
- This approach is more cost-effective than cross-region read replicas which involve continuous replication
- More straightforward than using AWS DMS to create cross-region replication
- Simpler than manually setting up cross-region replication of native backups to an S3 bucket
- Provides a managed solution that aligns perfectly with 24-hour RPO/RTO requirements without paying for continuous replication
- For SQL Server Enterprise Edition databases, this is particularly cost-effective compared to maintaining continuous read replicas
- Amazon RDS Multi-AZ DB Cluster Deployment:
- Deploys a primary DB instance with synchronous standby instances in different Availability Zones
- Provides both high availability and increased capacity for read workloads in a single solution
- Offers more operational efficiency than standard Multi-AZ deployments with separate read replicas
- Automatically fails over to a standby instance during planned maintenance or unexpected outages
- Particularly effective for PostgreSQL workloads needing both resilience and read scaling
- Provides better operational efficiency than cross-region read replicas for regional high availability
- Reader Endpoints:
- For offloading read traffic while maintaining high availability:
- Use RDS Multi-AZ DB cluster deployment (not just a Multi-AZ DB instance)
- Point read workloads to the reader endpoint
- This configuration provides automatic failover support in under 40 seconds
- Allows read traffic to be offloaded from the primary instance
- More cost-effective than creating and managing separate read replicas
- Simplifies the architecture compared to creating multiple standalone read replicas
- For offloading read traffic while maintaining high availability:
- Point-in-Time Recovery:
- Automated backups enable point-in-time recovery within the retention period (up to 35 days)
- Allows restoration to any moment within the retention window (e.g., 5 minutes before an accidental change)
- More precise than manual snapshots for recovering from recent data corruption or accidental changes
- Essential for maintaining business continuity after administrative errors
- Provides a continuous backup capability with transaction logs for fine-grained recovery points
- Snapshot and Restore for Cross-Account Migration:
- To move databases between AWS accounts:
- Create a snapshot of the source database
- Share the snapshot with the target account
- In the target account, restore a new database instance from the shared snapshot
- Provides a simple and secure method to transfer databases across accounts
- More efficient than exporting/importing database dumps for large databases
- Retains all database configuration settings and data in a consistent state
- Can be automated with AWS Backup for regular cross-account replication
- To move databases between AWS accounts:
- Encryption at Rest:
- To encrypt data at rest in RDS DB instances:
- Create a key in AWS Key Management Service (AWS KMS)
- Enable encryption for the DB instances during creation
- Provides secure, managed service for creating and controlling encryption keys
- More appropriate than using Secrets Manager which is designed for storing credentials, not encrypting databases
- More suitable than using AWS Certificate Manager which secures data in transit, not at rest
- Ensures compliance with security requirements for data protection with minimal configuration
- To encrypt data at rest in RDS DB instances:
- Query Performance Optimization:
- For database experiencing timeouts due to long-running reporting queries:
- Create a read replica and direct all reporting queries to the replica
- Keep order processing workloads on the primary database instance
- Effectively separates read-heavy reporting from critical transaction processing
- Eliminates timeout issues without restricting employee query capabilities
- More appropriate than migrating to DynamoDB which would require application changes
- Better than scheduling queries for non-peak hours which limits business flexibility
- Provides immediate benefit with minimal architectural change
- For database experiencing timeouts due to long-running reporting queries:
- Encryption for Existing Databases:
- To encrypt an unencrypted RDS database in a Multi-AZ deployment:
- Encrypt a copy of the latest DB snapshot
- Replace existing DB instance by restoring from the encrypted snapshot
- This is the only way to transition an existing unencrypted RDS instance to an encrypted one
- Ensures both the database and all subsequent snapshots are encrypted moving forward
- To encrypt an unencrypted RDS database in a Multi-AZ deployment:
- Data Migration for Complex File Systems:
- For migrating intricate directory structures with millions of small files:
- Use AWS DataSync to migrate data to Amazon FSx for Windows File Server
- Preserves SMB-based file storage compatibility
- Maintains complex directory structures and unstructured data
- Provides better automation than AWS Direct Connect alone
- More suitable than Amazon FSx for Lustre which is optimized for high-performance workloads
- Better than AWS Storage Gateway volume gateway which is designed for hybrid cloud storage rather than full-scale migrations
- For migrating intricate directory structures with millions of small files:
- Storage Autoscaling for RDS:
- For MySQL databases running out of disk space without allowing downtime:
- Enable storage autoscaling for the RDS instance
- Set maximum storage limit appropriate for anticipated growth
- Allows disk space to be increased automatically when available space is low
- Requires no manual intervention or downtime to expand storage
- More efficient than increasing instance size which affects compute resources unnecessarily
- Better than changing storage type to Provisioned IOPS which doesn't directly address space issues
- Simpler than backup/restore operations which cause application downtime
- Provides the most straightforward solution with minimal effort
- For MySQL databases running out of disk space without allowing downtime:
Amazon Aurora
- Overview:
- A MySQL- or PostgreSQL-compatible relational database engine
- Delivers performance and availability at scale
- Confirmed enhanced performance metrics as per official benchmarks
- Read Replicas:
- Offload read queries from the writer DB instance
- Can help alleviate performance degradation on the primary DB during peak load
- Use a separate reader endpoint for your read queries
- Validated the use of reader endpoints to isolate read workloads
- Database Cloning for Development Environments:
- Use database cloning to create staging/development databases on-demand from production
- Creates a new database quickly with minimal additional storage at the time of cloning
- Provides development teams immediate access to current database copies without impacting production performance
- Eliminates application latency issues associated with full database export processes
- Much more efficient than using mysqldump utility or other backup/restore processes
- Allows development teams to continue working without delays waiting for database copies
- Perfect for environments where development teams need frequent refreshes of production data
- More operationally efficient than traditional database copying methods that cause production slowdowns
- PostgreSQL for Reporting:
- For improving reporting performance with minimal code changes:
- Set up an Aurora PostgreSQL DB cluster that includes an Aurora Replica
- Direct reporting queries to the Aurora Replica instead of the primary instance
- Provides complete PostgreSQL compatibility, requiring minimal application changes
- Efficiently offloads read-heavy reporting workloads from the primary database
- Prevents reporting processes from impacting document modifications or additions
- Superior to Multi-AZ RDS deployments whose secondary instances aren't accessible for queries
- More compatible with existing PostgreSQL code than migrating to NoSQL solutions
- For improving reporting performance with minimal code changes:
Amazon Aurora Serverless
- Cost-Effective Database for Sporadic Usage:
- Ideal for applications with unpredictable usage patterns (heavy at month start, moderate at week start)
- Automatically adjusts database capacity based on application needs
- Pay only for database resources consumed on a per-second basis
- Supports MySQL compatibility without requiring database modifications
- More suitable than RDS for MySQL for sporadic workloads without manual scaling interventions
- Better than deploying MySQL on EC2 Auto Scaling groups which would require significant management overhead
- Perfect solution for migrating from on-premises MySQL with variable workloads
Amazon Aurora Serverless v2
Key Features:
- Automatically scales database capacity based on application demand
- Provides fine-grained scaling that adjusts capacity in increments
- Ideal for handling unpredictable workloads like monthly sales events
- More cost-effective than provisioning for peak capacity as you only pay for what you use
- Particularly effective for applications with fluctuating database usage patterns
- Best solution for maintaining database performance during unexpected traffic increases
- Eliminates connection issues caused by database resource constraints
- Supports PostgreSQL compatibility with automatic scaling capabilities
Amazon Aurora Auto Scaling with Replicas:
- Automatically scales database capacity based on application demand
- Ideal for read-heavy workloads with unpredictable traffic patterns
- Provides multi-AZ deployment for high availability
- Aurora Replicas serve read traffic to improve database performance
- Number of replicas adjusts automatically based on actual workload
- More suitable than Redshift for transactional database workloads
- Better than Single-AZ RDS deployments for high availability requirements
- More appropriate than ElastiCache alone for scaling database performance
Custom Endpoints:
- Allow directing workloads to specific Aurora Replicas within a DB cluster
- Enable fine-grained control over query distribution to instances with specific compute and memory specifications
- Perfect for isolating reporting workloads to replicas optimized for those workloads
- More targeted than reader endpoints which distribute read queries across all replicas
- Simplify database access management for workloads with specific resource requirements
Amazon DynamoDB
- Key Features:
- Fully managed NoSQL database service
- Designed for low-latency, high-throughput workloads
- Single-digit millisecond response time at any scale
- Can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second
- Suitable for applications with unpredictable request patterns
- Key-value store structure ideal for simple query patterns
- Auto-scaling feature adjusts a table's throughput capacity based on incoming traffic
- Confirmed performance numbers and scalability thresholds
- DynamoDB Streams:
- Ordered flow of information about item changes
- Captures every modification to data items
- Each stream record appears exactly once
- Stream records appear in the same sequence as the actual modifications
- Writes stream records in near-real time
- Can be configured to capture "before" and "after" images of modified items
- Emphasized near-real time stream capture accuracy
- On-Demand Replicas:
- Provides read replicas (global tables) for global applications
- Global tables verified for cross-region read scalability
- Time to Live (TTL):
- Automatically removes items after a specified time
- Reduces storage costs and overhead for cleaning up stale data
- Ideal for data with a known expiration requirement (e.g., 30 days)
- Expired items are auto-deleted (within ~24-48 hours) without consuming write throughput
- TTL feature confirmed as an effective, cost-saving mechanism
- DynamoDB Accelerator (DAX):
- Fully managed, highly available, in-memory cache for DynamoDB
- Delivers microsecond read latency (up to 10× performance improvement from milliseconds to microseconds)
- Ideal for read-intensive workloads that require extremely low latency
- Requires minimal application changes (compatible with existing DynamoDB API calls via the DAX client)
- The most operationally efficient solution for reducing DynamoDB latency compared to ElastiCache alternatives
- Simpler implementation than setting up DynamoDB Streams with Lambda and ElastiCache
- Perfect for applications handling millions of requests per day with increasing request volumes
- Specifically designed to integrate seamlessly with DynamoDB's programming model and API
- Performance Improvement:
- Delivers up to 10x performance improvement for read-intensive DynamoDB applications
- Reduces response times from milliseconds to microseconds even at millions of requests per second
- Requires minimal application changes as it's compatible with existing DynamoDB API calls
- Provides performance benefits without adding operational overhead
- Perfect solution for applications experiencing delays due to read-intensive workloads
- More appropriate than ElastiCache solutions for DynamoDB as it doesn't require additional caching logic in applications
- Superior to global tables for optimizing read performance of applications in a single region
- Operates as a fully managed, highly available in-memory cache specifically designed for DynamoDB
- Particularly valuable for organizations with limited staff to handle additional operational overhead
- DynamoDB + AWS Backup :
- Use AWS Backup for fully managed backup/restore solutions with long-term retention (e.g., 7 years)
- Confirmed as best practice for compliance archiving
- Point-in-Time Recovery (PITR):
- Provides continuous backups of your tables, allowing restore to any second in the last 1-35 days
- Can easily meet an RPO of 15 minutes and an RTO of 1 hour by enabling PITR
- Allows quick recovery from accidental writes or deletes to a precise point in time
- Ideal solution for meeting RPO of 15 minutes and RTO of 1 hour requirements
- Enables recovery to any point in time within the last 35 days with second-level precision
- Superior to global tables for addressing data corruption scenarios that require point-in-time restoration
- More efficient and less time-consuming than exporting to S3 Glacier for recovery purposes
- Automated backups enable point-in-time recovery within the retention period (up to 35 days)
- Allows restoration to any moment within the retention window (e.g., 5 minutes before an accidental change)
- More precise than manual snapshots for recovering from recent data corruption or accidental changes
- Essential for maintaining business continuity after administrative errors
- Provides a continuous backup capability with transaction logs for fine-grained recovery points
- Gaming Application Architecture:
- For multiplayer gaming with sub-millisecond data access requirements:
- Use Amazon DynamoDB with DynamoDB Accelerator (DAX) for frequently accessed data
- Export historical data to Amazon S3 using DynamoDB table export
- Use Amazon Athena for one-time queries on historical data in S3
- This provides the least operational overhead while meeting both low-latency access and analytics needs
- Better suited for gaming applications than RDS with custom export scripts
- More direct than solutions using Kinesis Data Streams for data export
- Eliminates the need to maintain complex streaming pipelines for simple historical data access
- Especially valuable for browser-based gaming applications that may have millions of global users
- When combined with CloudFront for content delivery, creates a comprehensive solution for global gaming platforms
- For multiplayer gaming with sub-millisecond data access requirements:
- DynamoDB Point-in-Time Recovery:
- Provides continuous backups for DynamoDB tables
- Allows restoration to any second within the last 35 days
- Enables recovery from accidental writes or deletes with second-level precision
- Requires minimal operational overhead - simply enable the feature
- More granular than periodic backups for meeting specific recovery time requirements
- Most operationally efficient solution for enabling 24-hour recovery window
- Superior to streams-based solutions which require custom implementation
- Time to Live (TTL) for Data Removal:
- For personal data deletion requirements:
- Enable TTL in DynamoDB
- Set the expiration date as an attribute
- Create an AWS Lambda function to set the TTL based on the expiration date value when deletion is requested
- Provides automatic deletion of expired items without manual intervention
- Offers the least operational overhead compared to using EventBridge rules, DynamoDB streams, or AWS Config
- Ensures compliance with data deletion requirements within specified timeframes
- Particularly useful for GDPR and other regulatory compliance requiring data deletion
- More efficient than manual deletion processes which require tracking and execution
- For personal data deletion requirements:
- Real-time Notifications:
- For alerting teams when new items are added to DynamoDB with minimal operational overhead:
- Enable DynamoDB Streams on the table
- Use triggers to write to a single Amazon Simple Notification Service (Amazon SNS) topic
- Have internal teams subscribe to this SNS topic
- This provides a seamless, efficient, and operationally light solution
- DynamoDB Streams captures item-level changes in real-time
- Lambda can be triggered by these changes to process and publish messages to SNS
- Ensures the new notification service does not impact the performance of existing applications
- More efficient than using DynamoDB transactions or publishing to multiple SNS topics
- Better than adding custom attributes and scanning the table with cron jobs
- For alerting teams when new items are added to DynamoDB with minimal operational overhead:
DynamoDB Integration Patterns
- Real-time Notifications:
- For alerting teams when new items are added to DynamoDB with minimal operational overhead:
- Enable DynamoDB Streams on the table
- Create a Lambda function triggered by the stream events
- Configure the Lambda to publish to an SNS topic
- Have team members subscribe to the SNS topic
- This approach has less operational overhead than scanning tables with cron jobs
- More efficient than modifying application code to publish to multiple SNS topics
- Prevents performance impact on the main application by using a separate notification path
- Leverages managed services to minimize maintenance requirements
- For alerting teams when new items are added to DynamoDB with minimal operational overhead:
- Buffering Writes for High-Traffic APIs:
- For asynchronous APIs experiencing availability issues due to DynamoDB throughput limitations:
- Use Amazon SQS queues and Lambda to buffer writes to DynamoDB
- Configure Lambda to process SQS messages at a rate that matches DynamoDB capacity
- Helps manage load and prevent loss of user requests due to exceeding provisioned DynamoDB throughput
- Handles traffic spikes without losing requests as SQS can hold messages until they can be processed
- Improves system availability and resilience during unexpected demand surges
- Better solution than adding throttling on API Gateway which may reject user requests
- More effective than using DynamoDB Accelerator (DAX) which improves read performance but doesn't buffer writes
- More appropriate than creating secondary indexes which don't address write throughput limitations
- For asynchronous APIs experiencing availability issues due to DynamoDB throughput limitations:
- Serverless Real-time Analytics:
- For web applications with real-time analytics from online games:
- Use Amazon DynamoDB for low-latency database needs with single-digit millisecond response times
- Implement Amazon Kinesis for streaming data from online games in real-time
- This combination provides scalable performance for unpredictable user counts
- More suitable than CloudFront which is optimized for content delivery rather than real-time data streaming
- Better than RDS which doesn't provide the same scalability for unpredictable workloads
- More appropriate than Global Accelerator which focuses on routing traffic rather than data streaming or storage
- Ideal for applications requiring immediate insights from rapidly changing user behavior data
- Supports processing millions of events per second with minimal operational overhead
- For web applications with real-time analytics from online games:
- Gaming Application Performance:
- For multiplayer gaming applications requiring sub-millisecond data access:
- Use DynamoDB with DynamoDB Accelerator (DAX)
- Export historical data to S3 for one-time queries using Athena
- DAX provides in-memory caching with microsecond response times
- Requires minimal code changes to implement (just use the DAX client)
- More suitable than RDS or direct S3 storage for sub-millisecond latency requirements
- Provides simpler architecture than Kinesis-based streaming solutions
- For multiplayer gaming applications requiring sub-millisecond data access:
Amazon DynamoDB Capacity Management
- On-Demand Capacity Mode:
- Ideal for tables with unpredictable traffic patterns
- Automatically adjusts capacity to maintain performance as application traffic changes
- Best for tables not used during extended periods that experience quick traffic spikes
- Pay-per-request model eliminates the need to provision capacity in advance
- More cost-effective than provisioned capacity for variable workloads
- More efficient than adding global secondary indexes (which improve query efficiency but not capacity management)
- Better than auto scaling for very quick, unpredictable spikes
- More appropriate than global tables when multi-region replication isn't required
- On-Demand Capacity for Unpredictable Traffic:
- Particularly cost-effective for tables not used during specific periods (e.g., most mornings)
- Ideal when traffic spikes occur very quickly and unpredictably (e.g., evenings)
- Eliminates the need for capacity planning or management even for rapid traffic changes
- More flexible than provisioned capacity with auto scaling for handling very quick traffic spikes
- Better than global tables when multi-region replication isn't required for the workload
- Particularly suitable for tables with periods of inactivity followed by sudden, unpredictable usage
Amazon Neptune
- Use Cases:
- Fully managed graph database service
- Ideal for storing and querying complex relationships, such as an IT infrastructure map
- Supports SPARQL and Gremlin for graph-based queries
- Minimal operational overhead for highly interconnected data
- Neptune confirmed as best for graph-based queries in complex relational data
Amazon Pinpoint
- Marketing Communications Service:
- Designed specifically for customer engagement through channels like email, SMS, push notifications, and voice messages
- Supports two-way SMS messaging allowing users to reply to messages sent by the service
- Can create automated communication workflows through journey creation
- When configured with Kinesis data streams, enables collection, processing, and analysis of SMS responses
- Perfect for marketing communications requiring user response capture and analysis
- More suitable than Amazon Connect for marketing communications and automated SMS response handling
- More appropriate than SQS which doesn't natively support sending SMS messages to users
- Superior to SNS FIFO topics for two-way SMS communication and response analysis requirements
- Can archive data to solutions like Amazon S3 through Kinesis for long-term storage (e.g., 1+ years)
Database Migration
- AWS Database Migration Service (DMS):
- Replicates data from one database to another
- Supports homogeneous or heterogeneous migrations
- Migrating from Microsoft SQL Server to Amazon RDS for SQL Server provides a managed service with significantly reduced operational overhead
- Allows automated database setup, maintenance, and scaling tasks
- Provides managed backups, patching, and monitoring
- DMS confirmed for reducing manual migration efforts
- Minimal-Change Oracle Migrations:
- Use DMS to migrate from on-premises Oracle to Oracle on Amazon RDS
- Retains the same database engine for minimal code changes
- Multi-AZ RDS deployment ensures high availability
- Oracle migration process validated as minimal-impact with Multi-AZ support
- Large Database Migration:
- For migrating large (e.g., 20 TB) databases with minimal downtime:
- Use AWS Snowball Edge Storage Optimized device for initial data transfer
- Implement AWS DMS with AWS SCT for schema conversion and replication of ongoing changes
- Continue replication after Snowball data is loaded to sync any changes made during transfer
- More cost-effective than Snowmobile for databases under 100 TB
- More suitable than Compute Optimized Snowball for straightforward data migrations
- Less expensive and faster to set up than dedicated Direct Connect for one-time migrations
- Provides a balanced approach between cost and minimizing downtime
- For migrating large (e.g., 20 TB) databases with minimal downtime:
- Oracle to Aurora PostgreSQL Migration:
- For migrating on-premises Oracle databases to Aurora PostgreSQL while capturing ongoing changes:
- Use AWS Schema Conversion Tool (SCT) to convert Oracle schema to Aurora PostgreSQL schema
- Use AWS Database Migration Service (DMS) to migrate existing data and replicate ongoing changes
- This approach ensures seamless migration with minimal downtime
- Captures all changes occurring to the source database during migration process
- More complete than using DMS with full-load task only (which doesn't capture ongoing changes)
- More suitable than AWS DataSync which is designed for file transfers, not database migrations
- Better than using Snowball devices which don't support ongoing replication during migration
- For migrating on-premises Oracle databases to Aurora PostgreSQL while capturing ongoing changes:
Amazon DocumentDB
- MongoDB Compatibility:
- Fully managed MongoDB-compatible database service
- Ideal for migrating existing MongoDB workloads to AWS without code changes
- When paired with Amazon EKS and Fargate, provides the least disruptive migration path for containerized MongoDB applications
- Allows organizations to move from self-managed MongoDB to a fully managed service while maintaining application compatibility
- Supports existing MongoDB drivers and tools for smooth transition
- Eliminates the operational overhead of managing database infrastructure
Caching Services
Amazon ElastiCache
- Memcached:
- Ideal for simple caching where the dataset is small and requires simple key-value access
- Used for ephemeral, high-speed data access
- Memcached confirmed for scenarios with lightweight caching needs
- Redis:
- Supports more complex data structures (lists, sets, sorted sets, etc.)
- Provides persistence options, Pub/Sub functionality, and advanced features
- Redis features validated for applications needing complex data types and persistent caching
- Redis for Session Management:
- Perfect for storing user session information in distributed web applications
- Provides high-performance, scalable in-memory cache with Multi-AZ support
- When deployed in multiple Availability Zones, offers enhanced high availability
- Can be configured in cluster mode for additional scalability and fault tolerance
- Essential component for building resilient architectures with PHP applications using sessions
- Prevents session loss during application server failures by externalizing session state
- Allows seamless user experience when application servers are replaced or scaled
- Gaming Leaderboards with Redis:
- For near-real-time top-10 scoreboards in video games:
- Set up Amazon ElastiCache for Redis to compute and cache scores
- Use Redis sorted sets data structure to maintain leaderboards efficiently
- Leverage Redis persistence to enable game state preservation
- Supports features required for game state maintenance and restoration
- Offers better performance than RDS read replicas for scoreboard computation
- More suitable than Memcached which lacks sorted data structures and persistence
- Provides built-in functionality for leaderboards compared to custom CloudFront or RDS solutions
- Perfect for maintaining real-time leaderboards in multiplayer games with concurrent online users
- Enables stopping and restoring games while preserving current scores through Redis persistence
- For near-real-time top-10 scoreboards in video games:
- Write-Through Caching Strategy:
- Ideal when cache data must always match database data
- Ensures data consistency by writing to both the database and cache simultaneously when updates occur
- Perfect for multi-tier applications where changes in RDS databases must be immediately reflected in ElastiCache
- More effective than lazy loading or TTL strategies when strict data consistency is required
- Ensures that the cache always contains the latest data from the database
- Particularly suitable for financial applications or systems where data accuracy is critical
- Superior to other caching approaches for applications where stale data cannot be tolerated
- Database Read Load Reduction:
- For resolving performance issues with RDS databases experiencing heavy read loads:
- Create an Amazon ElastiCache cluster
- Configure the application to cache query results in the ElastiCache cluster
- This approach reduces the number of direct database read requests
- Results in faster data retrieval times during peak traffic periods
- Fully managed in-memory caching service providing microsecond read and write latencies
- Can be used alongside read replicas for comprehensive read scaling strategy
- Particularly effective when the same queries are executed frequently
- More appropriate than turning on auto scaling for RDS which doesn't provide the same kind of read performance benefits
- Better solution than Multi-AZ deployment which focuses on availability rather than performance
- Superior to placing EC2 instances in the same AZ as the database which could reduce high availability
- Can be configured directly from the Amazon RDS console for easier implementation
- For resolving performance issues with RDS databases experiencing heavy read loads:
- In-Memory Caching Benefits:
- ElastiCache can be used as a primary data store for use cases that don't require data durability, such as gaming leaderboards, streaming, and data analytics
- Provides microsecond read and write latencies that support flexible, real-time use cases
- Helps accelerate application and database performance by reducing load on primary databases
- Removes complexity associated with deploying and managing a distributed computing environment
- Perfect solution for enhancing response times during peak traffic periods
- When properly configured to cache query results, significantly improves application responsiveness
- Can be created using RDS console options for easier integration with database workloads
Messaging and Queuing
Amazon MQ
- Key Features:
- Managed message broker service for Apache ActiveMQ or RabbitMQ
- Good fit for migrating from existing message broker solutions where native APIs/protocols are required
- Verified MQ as the optimal solution for legacy broker migration
Amazon SQS (Simple Queue Service)
Standard Queues:
- Near-limitless throughput
- At-least-once delivery
- Best-effort ordering
FIFO Queues:
- First-In-First-Out ordering to ensure messages are processed in order
- Exactly-once processing prevents duplicates
- Ideal for scenarios requiring strict message ordering and guaranteed single processing
- Essential for systems like E-commerce applications that need to process orders in the exact sequence they were received
- Provides guaranteed ordering compared to SNS topics which cannot guarantee message sequence preservation
- Perfect for integration with API Gateway when sequential processing is critical
- More suitable than standard queues when the order of processing is a business requirement
- Works well with API Gateway integrations to ensure commerce orders are processed in the exact sequence they are received
- Ensures all messages are processed in the order they arrive, unlike SNS topics which don't preserve message order
- Differs from standard queues which don't guarantee processing order
Message Retention:
- Messages are kept in queues for up to 14 days (4 days default)
- Allows delayed processing or retries without losing messages
Dead Letter Queue:
- Stores messages that can't be processed (e.g., by a Lambda function) for further analysis
- Prevents failed messages from blocking the processing of other messages
Decoupling for Resilience:
- Use Amazon SNS and SQS to create a buffering layer between clients and backend processors (e.g., EC2 instances)
- If an EC2 instance fails, messages remain in the queue until another instance can process them
- Verified that decoupling via SQS enhances system resiliency
Additional Update for Microservices:
- For microservices-based architectures transitioning from monolithic, SQS is recommended for asynchronous communication
- Confirms SQS improves decoupling and scalability
Event Handling:
- Serves as an effective buffer for message processing during network failures
- When combined with SNS and Lambda, creates resilient data processing workflows
- Can be configured as an on-failure destination to preserve messages that fail initial processing
- Enables eventual processing of all messages without manual intervention
Microservices Communication:
- Ideal for decoupling components in microservices architectures
- Perfect for sequential data processing where order of results is not important
- Allows producer and consumer services to scale independently
- Provides reliable, highly scalable hosted message queuing
- Enables asynchronous communication between microservices
- More effective than SNS for microservices that need to process data sequentially
- Better solution than using Lambda functions or DynamoDB Streams for basic inter-service communication
- Particularly useful when migrating from monolithic to microservices architectures
FIFO Queues for Ordering:
- For payment processing systems requiring strict message ordering:
- Use SQS FIFO queues with message group ID set to the payment ID
- Messages in the same message group are delivered in the exact order they are sent
- Essential for financial applications where processing sequence impacts results
- Prevents payment errors by ensuring transaction order integrity
- Differs from standard queues which don't guarantee processing order
- For payment processing systems requiring strict message ordering:
Asynchronous Image Processing Pattern:
- For multi-tier applications with time-consuming backend processes (like image thumbnail generation):
- Create an SQS queue to decouple frontend from backend processing
- When users upload images, immediately acknowledge receipt and place message on queue
- Process the queue messages asynchronously to generate thumbnails
- Allows providing fast response to users while longer processing continues in background
- Improves user experience by not making users wait for time-consuming operations
- More suitable than direct Lambda triggers or Step Functions for optimizing initial response time
- Enables efficient workflow management between application tiers
- For multi-tier applications with time-consuming backend processes (like image thumbnail generation):
Document Processing with SQS:
- For ensuring reliable processing of documents uploaded to S3:
- Configure an SQS queue as an event source for Lambda functions
- Set up S3 event notifications to send upload events to the SQS queue
- Ensures every document is processed even with transient errors
- Provides built-in retry mechanisms for asynchronous event processing
- More reliable than direct Lambda triggers from S3 which have limited retry capabilities
- Better than API Gateway integration which would require application modifications
- More appropriate than S3 replication to staging buckets which introduces delays
- Superior to Application Load Balancer for asynchronous document processing workflows
- Creates a durable system that guarantees exactly-once processing for each document
- For ensuring reliable processing of documents uploaded to S3:
Amazon SQS with Dead-Letter Queues:
- Message Processing Resilience:
- For handling large volumes of messages that may take days to process:
- Create an SQS queue to decouple sender and processor applications
- Configure dead-letter queues to collect messages that fail processing
- Set appropriate message retention period (up to 14 days) to handle long processing times
- Ensures failed messages don't block processing of other messages
- Provides durable storage for up to 1,000 messages per hour
- Can handle messages that take up to 2 days to process
- Retains failed messages for troubleshooting and reprocessing
- More operationally efficient than self-managed message brokers on EC2
- Better suited for long processing times than Kinesis which is optimized for real-time streaming
- More appropriate than SNS for message queuing as SNS doesn't inherently support message retention
- Requires no infrastructure management while providing guaranteed message delivery
- For handling large volumes of messages that may take days to process:
- Message Processing Resilience:
Auto Scaling with SQS Queue Depth:
- For applications experiencing delays in processing messages:
- Configure an Auto Scaling group for the EC2 instances processing SQS messages
- Use queue depth as the scaling metric to automatically add more processing capacity
- This approach addresses delayed message processing by dynamically scaling based on workload
- More effective than solutions focusing on database performance (like DAX) when the bottleneck is message processing
- More targeted than adding API Gateway or CloudFront when the issue is backend processing capacity
- Perfect for finance applications where timely processing of customer requests is critical
- For applications experiencing delays in processing messages:
Lambda Integration for Database Operations:
- For improving scalability when Lambda functions need to load high volumes into databases:
- Set up two Lambda functions with one receiving information and another loading to database
- Use SQS queue to integrate the Lambda functions and decouple the operations
- Creates a buffer for incoming data to handle varying loads without impacting database insertion
- More scalable than using SNS which is better for fan-out messaging to multiple subscribers
- Better approach than refactoring to EC2-based solutions which introduces operational overhead
- Preferable to platform changes (like switching from Aurora to DynamoDB) which require significant code rework
- For improving scalability when Lambda functions need to load high volumes into databases:
SQS Visibility Timeout for Preventing Duplicate Processing:
- Increase visibility timeout to exceed function timeout plus batch window timeout
- Prevents messages from becoming visible again before processing completes
- Reduces likelihood of duplicate message delivery and processing
- Requires minimal administrative changes to implement
- More effective than long polling for preventing duplicate processing
- More operationally efficient than switching to FIFO queues in many cases
- Better practice than manually deleting messages before processing completes
Amazon SES (Simple Email Service)
- Email Delivery for Web Applications:
- Fully managed email service for sending transactional emails, marketing communications, and notifications
- More operationally efficient than setting up dedicated email processing infrastructure
- Handles complexities of email delivery including bounce management and complaint handling
- Reduces overhead for managing email servers and delivery issues
- Perfect solution for e-commerce applications needing to send order confirmations and marketing emails
- Scales automatically to handle increasing email volumes as application traffic grows
- More appropriate than SNS for dedicated email communication workflows
- Email Delivery Optimization:
- For web applications experiencing email delivery delays:
- Configure web instances to send email through Amazon SES
- Eliminates need for dedicated email processing infrastructure
- Handles critical email infrastructure components including:
- IP address management
- Sender reputation maintenance
- Email delivery standards compliance
- More cost-effective than creating separate application tier for email processing
- Better suited for marketing and transactional emails than SNS
- Reduces operational overhead compared to self-managed email solutions
- Automatically scales to handle increasing email volumes without additional configuration
- For web applications experiencing email delivery delays:
SNS and SQS Integration
- Message Routing Architecture:
- For separating and processing different message types efficiently:
- Create a single Amazon SNS topic
- Subscribe multiple Amazon SQS queues to the topic
- Configure SNS message filtering to route messages to appropriate queues based on message attributes
- Ensures messages are separated by type and processed appropriately
- Prevents message loss with SQS's durable message storage
- Maximizes operational efficiency with automated message routing
- More efficient than creating multiple separate SNS topics
- Simpler than using Kinesis streams for basic message routing scenarios
- Perfect for use cases where messages must be guaranteed processing within specific timeframes
- For separating and processing different message types efficiently:
- Scalable Event Processing:
- For event-based applications where events are generated from S3 bucket file uploads:
- Create an SNS subscription that sends events to an SQS queue
- Configure the SQS queue to trigger a Lambda function
- This pattern provides better decoupling of event generation from event processing
- Enables scalable and reliable event processing through automatic scaling of Lambda
- More straightforward and efficient than using ECS or EKS as intermediaries
- Prevents event loss during processing spikes with SQS's message retention capabilities
- Creates a resilient architecture that can handle varying event volumes
- For event-based applications where events are generated from S3 bucket file uploads:
- Enhancing Reliability with SQS as a Buffer:
- Create an SQS queue and subscribe it to SNS topics to add resilience to data ingestion workflows
- Decouples notification mechanisms from processing mechanisms
- Enables message retention until successful processing, even during network connectivity issues
- Configure Lambda functions to poll messages from SQS queues for reliable processing
- Ensures all messages are processed even after temporary failures
- More effective than increasing Lambda CPU/memory for handling connectivity problems
- Better solution than attempting to deploy Lambda functions across multiple AZs (which is handled automatically by AWS)
- Handling Duplicate Message Processing:
- Increase the SQS queue visibility timeout to be greater than the total of the Lambda function timeout and the batch window timeout
- Prevents messages from becoming visible to another consumer while processing is still in progress
- Addresses issues with multiple processing of the same message when Lambda functions take longer than expected
- More effective than long polling (which helps with empty responses but not duplicate processing)
- More operationally efficient than switching to FIFO queues for this particular use case
- More reliable than manually deleting messages immediately after reading (before processing)
- Image Processing Workflow:
- For serverless image processing with automatic scaling:
- Configure S3 bucket to send notifications to SQS queue on image uploads
- Use Lambda function with SQS queue as invocation source
- Process messages from queue and delete successfully processed messages
- Creates a durable, stateless system that ensures processing requests aren't lost
- Enables the system to automatically scale to handle varying loads
- Removes direct dependencies between components
- Provides a more resilient solution than direct Lambda triggers from S3
- Eliminates need for tracking processed files in memory which introduces statefulness
- More efficient than using EC2 instances to monitor queues or manual processing steps
- For serverless image processing with automatic scaling:
- Message Ordering for Payment Processing:
- For payment systems requiring precise message ordering:
- Use Amazon Kinesis Data Streams with payment ID as the partition key, or
- Use Amazon SQS FIFO queue with message group set to payment ID
- Both approaches ensure messages with the same payment ID are processed in the exact order sent
- Essential for financial transactions where order of operations affects outcomes
- SQS FIFO guarantees exactly-once processing and strict ordering
- Kinesis maintains order within each shard for messages sharing the same partition key
- For payment systems requiring precise message ordering:
- SNS with SQS for Reliable Data Ingestion:
- Create SQS queue and subscribe to SNS topics to add resilience to data workflows
- Provides buffer between notification mechanism and processing function
- Retains messages until successfully processed, even during connectivity issues
- Configure Lambda functions to poll from SQS queue for reliable processing
- Ensures messages aren't lost during temporary network failures
- More effective than increasing Lambda resources for handling connectivity issues
- Automatically deployed across multiple Availability Zones by AWS
Communication Patterns for Microservices
- Decoupled Processing:
- For migrating monolithic applications to microservices architecture:
- Use Amazon SQS queues between producer and consumer services
- Implement asynchronous processing where order of results doesn't matter
- Better than SNS for sequential processing requirements
- More efficient than Lambda or DynamoDB Streams for general inter-service communication
- Enables independent scaling of producer and consumer components
- Provides reliable message delivery with built-in redundancy
- Creates natural decoupling between application components
- For migrating monolithic applications to microservices architecture:
Compute Services
AWS Elastic Beanstalk
- Overview:
- Easiest way to deploy and scale web applications developed in .NET, Java, PHP, Node.js, Python, Ruby, Go, or Docker
- Automatically handles capacity provisioning, load balancing, and auto scaling
- Provides a fully managed platform while still allowing customization of underlying resources
- Multi-Environment & URL Swapping:
- Create multiple environments for staging and production
- Use URL swapping (CNAME swap) to promote changes from staging to production with minimal downtime
- .NET on Windows Server:
- Supports deploying .NET applications with minimal code changes
- Can be configured in a Multi-AZ setup for high availability
- Ideal for rehosting on-premises .NET applications with minimal rework
- Integrates with Amazon RDS for relational database requirements
- Enhanced support for .NET applications confirmed for minimal rework during migrations
AWS Fargate
- Serverless container compute for Amazon ECS and Amazon EKS
- Eliminates the need to manage EC2 instances
- Ideal for workloads needing uninterrupted execution (e.g., longer batch processes beyond Lambda's 15-minute limit)
- Often combined with Amazon EventBridge for scheduled tasks
- Better suited for applications that require more granular control over environments and longer running processes
- More cost-effective than EC2-based solutions for infrequent or sporadic workloads
- Fargate's cost-effectiveness and operational simplicity have been validated
- Scheduled Tasks for Daily Processing:
- For scheduled daily jobs processing large files (up to 10 GB) with constant CPU/memory requirements:
- Create an Amazon ECS cluster with AWS Fargate launch type
- Configure an Amazon EventBridge scheduled event to launch the ECS task
- Provides serverless compute engine that eliminates need to provision and manage servers
- Automatically manages task's CPU and memory allocation based on defined configuration
- Ideal for jobs taking up to an hour to complete
- Minimizes operational effort compared to Lambda (which has 15-minute timeout limit)
- More cost-effective than maintaining EC2 instances for periodic tasks
- For scheduled daily jobs processing large files (up to 10 GB) with constant CPU/memory requirements:
AWS Lambda
- Key Constraints:
- Stateless, short-lived functions
- 15-minute max execution time
- Ideal for event-driven, short-duration tasks
- Can be triggered by Amazon S3 Event Notifications, Amazon SQS, Amazon SNS, etc.
- Provides automatic scaling capabilities for unpredictable request patterns
- Can scale from a few requests per day to thousands per second
- Efficiently serves varying levels of traffic without manual intervention
- Well-suited for handling serverless API backends with Amazon API Gateway
- Ideal for scenarios with unpredictable request patterns that may change suddenly
- Lambda Execution Role:
- IAM role assumed by Lambda function at runtime
- Determines what AWS services and resources the function can access
- Permissions associated with this role control the function's access to AWS resources
- Best practice is to grant only necessary permissions following the principle of least privilege
- Can be configured with permissions to decrypt data using AWS KMS keys
- Accessing On-Premises Resources:
- Configure Lambda to run in a private subnet of your VPC
- Ensure proper route via AWS Direct Connect or Site-to-Site VPN
- Assign appropriate security groups/NACLs so Lambda can communicate with on-prem resources
- Verified secure VPC integration for accessing on-premises systems
- Processing Images:
- Ideal for processing user-uploaded images stored in S3
- Automatically scales to handle varying numbers of concurrent users
- Can be triggered by S3 event notifications when new objects are uploaded
- Efficient for serverless architectures requiring automatic scaling in response to workload demands
- Lambda SnapStart:
- Feature designed to reduce cold start latency for Java functions
- Creates and caches a snapshot of the initialized execution environment
- Significantly reduces startup time by eliminating initialization overhead
- More effective at reducing cold start times than simply increasing function timeout
- Better solution than just increasing memory, which improves execution speed but doesn't directly address cold start latency
- Perfect for Java applications that experience long cold start times due to JVM initialization and class loading
- Provides initialization performance improvements without the additional costs of provisioned concurrency
- File Processing Workflow:
- For immediate processing of uploaded files:
- Design Lambda functions triggered by S3 upload events
- Process files and store results back to S3 or other destinations
- More cost-effective than running EC2 instances for sporadic processing needs
- Serverless approach eliminates the need to manage underlying infrastructure
- Automatically scales with the number of incoming files
- For cost-optimized storage, implement S3 Lifecycle rules to transition rarely accessed files to Glacier
- For immediate processing of uploaded files:
- Secure Database Access:
- For granting Lambda functions access to DynamoDB tables:
- Create an IAM role with Lambda as a trusted service
- Attach a policy allowing read/write access to the DynamoDB table
- Set this role as the Lambda function's execution role
- Follows AWS best practices by using roles instead of stored credentials
- Avoids storing sensitive access keys in Lambda environment variables
- More secure than using IAM users with programmatic access
- Simpler than retrieving credentials from Parameter Store during function execution
- For granting Lambda functions access to DynamoDB tables:
- Provisioned Concurrency for Reduced Latency:
- Configure provisioned concurrency for Lambda functions that load many libraries or have significant initialization overhead
- Keeps specified number of function instances initialized and ready to respond immediately
- Eliminates cold start latency by pre-warming the Lambda runtime environment
- Perfect for functions that are part of user-facing API responses where latency is critical
- More effective for reducing response times than increasing function timeout or memory allocation
- SQS Queue Processing:
- For processing SQS queues more efficiently than EC2-based solutions:
- Migrate script logic from EC2 instance to a Lambda function
- Configure Lambda to poll the SQS queue and process messages
- Automatically scales based on queue depth to handle growing number of messages
- Pay only for compute time used during actual message processing
- Eliminates need to provision and maintain EC2 instances for queue processing
- Significantly reduces operational costs compared to always-running EC2 solution
- Provides better cost optimization for unpredictable or growing workloads
- For processing SQS queues more efficiently than EC2-based solutions:
- AWS Lambda with S3 and KMS:
- Secure File Processing:
- For Lambda functions that download and decrypt files from S3 using KMS keys:
- Create an IAM role with the kms:decrypt permission and attach it as the Lambda execution role
- Grant decrypt permission for the Lambda IAM role in the KMS key's policy
- This approach properly delegates decryption capability to Lambda through IAM
- Ensures proper security boundaries between Lambda's invocation permissions and its operational permissions
- More correct than attaching permissions to Lambda's resource policy which controls invocation not actions
- Provides proper access control by managing permissions at both the IAM role and KMS key policy levels
- For Lambda functions that download and decrypt files from S3 using KMS keys:
- Secure File Processing:
- Processing Images:
- Ideal for processing user-uploaded images stored in S3
- Automatically scales to handle varying numbers of concurrent users
- Can be triggered by S3 event notifications when new objects are uploaded
- Efficient for serverless architectures requiring automatic scaling in response to workload demands
- For IoT data processing with relatively small data size (e.g., 2 MB per night) and modest processing requirements (1 GB memory, 30 second execution time)
- Cost-effective for processing and summarizing data without the need for managing servers or clusters
- More appropriate than AWS Glue or Amazon EMR for smaller-scale data processing tasks
AWS Step Functions
- Application Orchestration:
- Ideal for building distributed applications involving multiple serverless functions and AWS services
- Coordinates workflows across Lambda functions, EC2 instances, containers, and on-premises servers
- Supports manual approval steps as part of workflows - critical for business processes requiring human intervention
- Manages state transitions, error handling, and retries automatically
- Provides visualization of complex workflows for easier monitoring and troubleshooting
- More suitable for complex workflow orchestration than SQS, Glue, or combinations of Lambda and EventBridge
- Manual Approval Workflows:
- Enables the integration of human approval steps within automated workflows
- Perfect for business processes that require human review or decision-making
- Can coordinate both serverless functions and traditional compute resources in the same workflow
- Handles state persistence automatically during long wait periods for human interaction
- More suitable than alternative orchestration solutions for workflows requiring human approvals
- Modernizing Scheduled Jobs:
- For hourly jobs running for short durations (e.g., 10 seconds):
- Convert code into Lambda functions with appropriate memory allocation
- Create Amazon EventBridge scheduled rules to trigger functions
- This approach eliminates the need to maintain EC2 instances that remain idle between job executions
- More cost-effective than containerization on ECS/Fargate for very short-duration tasks
- Particularly valuable for jobs with low memory requirements (e.g., 1 GB or less)
- For hourly jobs running for short durations (e.g., 10 seconds):
- Cost-Optimized Lambda with EC2 Integration:
- For applications using both EC2 instances and Lambda functions:
- Purchase a Compute Savings Plan instead of EC2 Instance Savings Plan
- Optimize Lambda function duration, memory usage, and number of invocations
- Connect Lambda functions to the private subnet containing EC2 instances
- Compute Savings Plan covers both EC2 and Lambda usage for maximum savings
- VPC integration ensures minimal network latency between services
- More cost-effective than EC2-only Savings Plans when using serverless components
- Better performance than keeping Lambda in service VPC when direct EC2 communication is needed
- For applications using both EC2 instances and Lambda functions:
- Event-Driven Architecture Transition:
- For transitioning from monolithic applications to event-driven serverless architecture:
- Build workflows in AWS Step Functions to create state machines
- Use state machines to invoke AWS Lambda functions that process workflow steps
- Leverages serverless concepts while minimizing operational overhead
- Provides process orchestration with comprehensive state management
- Enables coordination across distributed application components
- Superior to AWS Glue for general application workflow orchestration
- More suitable than EC2-based deployments which require infrastructure management
- Better than EventBridge alone for complex multi-step workflows requiring state management
- For transitioning from monolithic applications to event-driven serverless architecture:
EC2 Instance Management
- Hibernation and Warm Pools:
- EC2 hibernation preserves the in-memory state of an instance
- Allows faster restart by saving RAM contents to the EBS root volume
- Warm pools maintain a group of pre-initialized EC2 instances ready for quick use
- Significantly reduces application launch times for memory-intensive applications
- More effective than Capacity Reservations for reducing application startup time
- Better than simply launching additional instances which would still face the same startup delays
- Perfect for applications that take a long time to initialize or load large datasets into memory
- Can be combined with Auto Scaling for both fast startup and scalability
EC2 Auto Scaling
- Lifecycle Hooks for Auditing and Reporting:
- Use Auto Scaling lifecycle hooks to run custom scripts when instances launch or terminate
- Enables automated reporting to external systems about infrastructure changes
- Perfect for integrating with centralized auditing systems that track all EC2 instance creations and terminations
- More reliable than scheduled Lambda functions checking for instance state changes
- Provides real-time notifications of instance lifecycle events without polling or manual intervention
EC2 Placement Groups
- Spread Placement Group:
- Places instances on distinct underlying hardware
- Reduces risk of simultaneous failures across instances
- Perfect for applications requiring high availability where instances should not share hardware
- Recommended for applications with a small number of critical instances that must be isolated
- More effective for hardware isolation than simply grouping instances in separate accounts
- Provides better isolation than dedicated tenancy which only separates from other customers
- Best practice for workloads processing large quantities of data in parallel with strict isolation requirements
- Ensures instances are distributed across different racks with separate power and network sources
EC2 Instance Types
- Instance Type Selection for SAP Workloads:
- For SAP applications and databases with high memory utilization, use memory-optimized instance families for both tiers
- Memory-optimized instances are designed for workloads that process large datasets in memory
- More suitable than compute-optimized instances for SAP applications with high memory demands
- Superior to storage-optimized instances which are designed for high sequential read/write access rather than memory-intensive operations
- Better fit than HPC-optimized instances which focus on compute-bound applications with high network performance
- Memory-optimized instance families align with SAP's typical resource consumption patterns and high memory requirements
EC2 Instance Purchase Options
- Compute Savings Plans:
- Offer flexibility to change EC2 instance types and families while still reducing costs
- Provide reduced prices in exchange for commitment to a consistent amount of usage (measured in $/hour)
- Available in 1-year or 3-year terms
- Automatically apply to EC2 instance usage regardless of region, instance family, operating system, or tenancy
- Perfect for environments where compute needs change frequently (every 2-3 months)
- More flexible than Reserved Instances for organizations that need to change instance types regularly
- Offer better cost optimization than On-Demand instances for predictable workloads
- Allow for changes in instance type without losing the cost benefits of a committed usage plan
- Reserved Instances for 24/7 Workloads:
- For applications running continuously (24/7/365), EC2 Reserved Instances provide the most cost-effective solution
- Significantly less expensive than On-Demand Instances for predictable, constant workloads
- When combined with Aurora Reserved Instances for database layer, provides comprehensive cost optimization for always-on applications
- Better solution than Spot Instances which aren't suitable for applications requiring constant availability due to potential interruptions
- Most suitable for migrating legacy applications from on-premises that need to run continuously and have growing storage needs
- Accommodates customers with dynamic IP addresses
- Ensures database is only accessible through the application tier
- Bastion Host Configuration:
- For Linux-based bastion hosts providing access to application instances in private subnets:
- Configure the bastion host security group to allow inbound SSH access only from the company's external IP range
- Configure application instances security group to allow inbound SSH access only from the private IP address of the bastion host
- This ensures only connections from known company locations can reach the bastion host
- Enhances security by limiting direct access to application instances in private subnets
- For Linux-based bastion hosts providing access to application instances in private subnets:
- Auto Scaling with Mixed Instance Types:
- For optimizing costs without long-term commitments in applications with variable demand:
- Use a mix of On-Demand Instances and Spot Instances in Auto Scaling groups
- On-Demand Instances ensure baseline capacity without commitment
- Spot Instances provide cost savings for variable workloads (up to 90% discount)
- Better than using only On-Demand Instances which would be more expensive
- More flexible than Reserved Instances which require long-term commitments
- More reliable than using only Spot Instances which can be interrupted
- For optimizing costs without long-term commitments in applications with variable demand:
Amazon EC2
- Instance Purchasing Options:
- For production environments running 24/7, Reserved Instances provide the most cost-effective option
- For development and test environments running at least 8 hours daily with periods of inactivity, On-Demand Instances offer the best balance of flexibility and cost
- Not recommended to use Spot Instances for production workloads that require constant availability
- Spot blocks (defined-duration Spot Instances) are less suitable than Reserved Instances for production due to potential interruptions
- License-Compatible Instance Options:
- For commercial off-the-shelf applications with socket/core licensing models:
- Use Dedicated Reserved Hosts
- Allows use of existing licenses that require specific physical infrastructure
- Provides host-level control for software that counts sockets and cores
- More cost-effective than Dedicated On-Demand Hosts for predictable workloads
- Better than Dedicated Instances which don't provide visibility into physical cores/sockets
- Ideal for software with licensing requirements tied to physical hardware characteristics
- Provides the most cost-effective option when combining existing licenses with reserved pricing
- For commercial off-the-shelf applications with socket/core licensing models:
- EC2 Spot Instances for Batch Processing:
- Ideal for highly dynamic, stateless batch processing jobs
- Cost-effective solution (up to 90% discount compared to On-Demand)
- Suitable for workloads that can be interrupted without negative impact
- Perfect for jobs that take 60+ minutes to complete
- More appropriate than Reserved Instances for dynamic, interruptible workloads
- More cost-efficient than On-Demand Instances for stateless batch jobs
- Better than Lambda for jobs exceeding Lambda's 15-minute execution limit
- Target Tracking Scaling Policies:
- Dynamically adjust capacity to maintain specific target metrics (e.g., 40% CPU utilization)
- Automatically scales out when metric rises above target and scales in when below target
- Ensures optimal application performance across all instances
- More effective than simple scaling policies for maintaining specific metric targets
- Less complex than using Lambda functions to update Auto Scaling group capacity
- More dynamic than scheduled scaling actions for unpredictable workloads
- Detailed Monitoring:
- Enables metrics at 1-minute granularity (compared to 5-minute intervals with basic monitoring)
- Provides near real-time visibility into resource performance
- Essential for applications that experience rapid changes in utilization
- Enables more responsive and accurate Auto Scaling actions
- Perfect for analyzing application performance during anticipated traffic increases
- Meets requirements for monitoring with granularity of 2 minutes or less
- Expense Tracking Application Cost Optimization:
- For applications with increasing compute and storage costs:
- Purchase a Compute Savings Plan for the minimum number of necessary EC2 instances
- Use On-Demand Instances for peak scaling
- Set up S3 Lifecycle policies to archive raw images to lower-cost storage tiers after 30 days
- This approach balances cost savings with application performance
- More cost-effective than purchasing Savings Plans for maximum instance count
- Better than using Amazon EFS for storing large amounts of raw image data
- More suitable than reducing instance counts which could impact application performance
- For applications with increasing compute and storage costs:
- Legacy .NET Framework Application Migration:
- For migrating legacy .NET Framework applications to AWS without code changes:
- Use an Amazon Machine Image (AMI) to deploy EC2 instances running Windows Server
- Allows running legacy applications without modification
- More suitable than containerization with Migration Hub Orchestrator which requires application modifications
- Better than AWS Lambda which doesn't support full .NET Framework applications
- More appropriate than AWS Application Migration Service which doesn't convert applications to newer frameworks
- For migrating legacy .NET Framework applications to AWS without code changes:
- Secure S3 Access from Private Subnets:
- For EC2 instances in private subnets accessing S3 without traversing the public internet:
- Deploy an S3 gateway endpoint
- Ensures data travels only through the AWS private network
- Free to use (pay only for S3 requests and data transfer)
- More cost-effective than using NAT gateways
- Simpler than AWS Storage Gateway for basic S3 access
- More appropriate than S3 interface endpoints which incur additional costs
- For EC2 instances in private subnets accessing S3 without traversing the public internet:
- Expense Tracking Application Cost Optimization:
- For applications with increasing compute and storage costs:
- Purchase a Compute Savings Plan for the minimum number of necessary EC2 instances
- Use On-Demand Instances for peak scaling
- Set up S3 Lifecycle policies to archive raw images to lower-cost storage tiers after 30 days
- This approach balances cost savings with application performance
- More cost-effective than purchasing Savings Plans for maximum instance count
- Better than using Amazon EFS for storing large amounts of raw image data
- More suitable than reducing instance counts which could impact application performance
- Provides flexibility to handle both consistent baseline loads and variable traffic patterns
- Creates a comprehensive solution addressing both compute and storage cost optimization needs
- For applications with increasing compute and storage costs:
- Legacy .NET Framework Application Migration:
- For migrating legacy .NET Framework applications to AWS without code changes:
- Use an Amazon Machine Image (AMI) to deploy EC2 instances running Windows Server
- Allows running legacy applications without modification
- More suitable than containerization with Migration Hub Orchestrator which requires application modifications
- Better than AWS Lambda which doesn't support full .NET Framework applications
- More appropriate than AWS Application Migration Service which doesn't convert applications to newer frameworks
- Provides the least administrative overhead while maintaining application compatibility
- Enables running Windows Server 2012 applications on newer Windows Server versions in AWS
- For migrating legacy .NET Framework applications to AWS without code changes:
Storage Services
Amazon S3 (Simple Storage Service)
- Storage Classes:
- S3 Standard: Frequent access, high durability
- S3 Standard-IA: Infrequent access, cost savings for lower access frequency, still requires instant accessibility
- S3 One Zone-IA: Stores data in a single AZ, poses higher risk of data loss in the event of an AZ failure
- S3 Intelligent-Tiering:
- Designed specifically for data with unknown or changing access patterns
- Automatically moves objects between access tiers based on changing access patterns
- Monitors access patterns and moves objects that haven't been accessed for 30 consecutive days to the infrequent access tier
- No retrieval fees when objects are accessed
- Small monthly monitoring and automation fee per object
- Ideal for digital media files with unpredictable access patterns
- Data remains resilient to the loss of an Availability Zone
- Cost-effective for media storage where access patterns are difficult to predict
- No performance impact or operational overhead when access patterns change
- Optimal solution for long-lived data with access patterns that are unknown or changing
- Different from S3 Standard which doesn't automatically move data between access tiers
- More flexible than S3 Standard-IA which is designed for data that is specifically infrequently accessed
- Ideal for data with unknown or unpredictable access patterns
- Automatically optimizes costs by moving objects between frequent and infrequent access tiers
- Maintains S3 Standard durability and availability across multiple Availability Zones
- Perfect for media files that are accessed in unpredictable patterns
- Superior to S3 One Zone-IA which doesn't provide multi-AZ resilience required for important media files
- S3 Intelligent-Tiering for Unpredictable Access Patterns: Automatically moves data between frequent and infrequent access tiers based on observed access patterns without performance impact or operational overhead
- S3 Intelligent-Tiering with Lifecycle Rules: When access patterns cannot be predicted or controlled, use S3 Lifecycle rules to transition objects from S3 Standard to S3 Intelligent-Tiering for the most cost-effective storage solution
- S3 Glacier Instant Retrieval: For archive data that needs immediate access (retrieval in milliseconds)
- S3 Glacier Flexible Retrieval: For rarely accessed long-term data with retrieval times from minutes to hours
- S3 Glacier Deep Archive: Lowest-cost storage for long-term archiving with retrieval times within 12 hours
- Storage classes have been confirmed to meet diverse performance and cost needs
- Data Transfer:
- Snowball: Physical devices to transfer large data volumes offline
- S3 Transfer Acceleration:
- Recommended for global data ingestion from multiple continents, combined with multipart upload to speed transfers
- Turning on S3 Transfer Acceleration with multipart uploads is optimal for aggregating large data files (hundreds of GB) from global locations
- Leverages CloudFront's edge network to route data more efficiently to S3
- Provides faster transfer speeds without introducing operational complexity of multiple region buckets or physical transfer devices
- Most effective solution for aggregating data from global sites when minimizing operational complexity is a requirement
- Superior to using Snowball Edge or EC2-based solutions for regular transfers of moderate data volumes (e.g., 500GB per site)
- Encryption:
- Client-Side Encryption: Data is encrypted before upload (the client manages encryption)
- Server-Side Encryption (SSE-S3): Amazon S3 manages encryption keys
- Server-Side Encryption with AWS KMS (SSE-KMS): Uses AWS KMS for key management
- Server-Side Encryption with Customer-Provided Keys (SSE-C): Customer provides the encryption keys
- SSE-KMS with Automatic Key Rotation:
- Ideal for storing confidential data with encryption at rest requirements
- Provides automatic key rotation for yearly rotation requirements
- Includes detailed logging of key usage in CloudTrail for auditing purposes
- Most operationally efficient solution when compliance mandates encryption key logging
- Eliminates manual intervention needed for key rotation processes
- Automatically rotates the cryptographic material while maintaining the same CMK
- More efficient than SSE-C which requires customer-managed keys for each S3 object request
- Superior to SSE-S3 when detailed key usage auditing is required for compliance
- Better than manual KMS key rotation which introduces unnecessary operational overhead
- Static Website Hosting:
- You can host static websites on an S3 bucket
- Typically combined with Amazon CloudFront for edge caching
- Provides a scalable, cost-effective solution for hosting websites with minimal operational overhead
- Eliminates the need to maintain and patch web servers or content management systems
- S3 Access Points:
- Provide separate custom-hosted endpoints with distinct access policies
- Simplify managing access to shared datasets
- Multi-Region Access Points:
- Active-active S3 configuration with a single global endpoint
- Intelligent routing to the closest bucket for performance
- Supports S3 Cross-Region Replication for durability and failover
- S3 Storage Lens:
- Cloud-storage analytics feature for organization-wide visibility into object storage and activity
- Analyzes metrics to deliver contextual recommendations for optimizing storage costs
- Can identify buckets that don't have S3 Lifecycle rules to abort incomplete multipart uploads
- Helps identify cost-optimization opportunities and implement data-protection best practices
- Provides dashboards and metrics directly within AWS Management Console without custom configuration
- For large data ingestion from global sites, S3 Transfer Acceleration can significantly reduce upload latencies
- Most effective solution for aggregating data from global sites when minimizing operational complexity is a requirement
- Provides faster transfer speeds without introducing operational complexity of multiple region buckets or physical transfer devices
- When combined with multipart uploads, is optimal for aggregating large data files (hundreds of GB) from global locations
- Superior to using Snowball Edge or EC2-based solutions for regular transfers of moderate data volumes (e.g., 500GB per site)
- Leverages CloudFront's edge network to route data more efficiently to S3
- Versioning & MFA Delete:
- Enable versioning and MFA delete to add extra protection against accidental/malicious deletions of objects
- Validated best practice for critical data
- When an object is deleted from a versioned bucket, a delete marker is placed on the current version, while older versions remain, enabling recovery of the prior version
- Versioning in S3 keeps multiple variants of an object in the same bucket
- With versioning, you can preserve, retrieve, and restore every version of every object stored
- After versioning is enabled, if S3 receives multiple write requests for the same object simultaneously, it stores all of those objects
- MFA Delete requires the bucket owner to include two forms of authentication in any request to delete a version or change the versioning state
- MFA Delete provides additional authentication for: changing the versioning state of your bucket and permanently deleting an object version
- While bucket policies can restrict access based on conditions, they don't directly protect against accidental deletion like versioning and MFA Delete do
- Default encryption ensures objects are encrypted at rest but doesn't protect against deletion
- Lifecycle policies help manage objects and storage costs but don't protect against accidental deletion
- The combination of versioning and MFA Delete is a more effective protection strategy than bucket policies, default encryption, or lifecycle policies alone
- Versioning-enabled buckets allow you to recover from both unintended user actions and application failures
- Most effective combination for protecting sensitive audit documents and confidential information
- For confidential audit documents, combining versioning and MFA Delete provides the most secure protection against accidental or malicious deletion IF compliance mode isn't sufficient, however compliance mode is the best option for regulatory compliance
- This approach directly addresses security concerns by making deletion of documents more secure, requiring a physical MFA device to confirm such actions
- Lifecycle Policies for Versioned Objects:
- Use S3 Lifecycle policies to automatically manage retention of specific numbers of object versions
- Can be configured to retain only a certain number of recent versions (e.g., two most recent versions) while deleting older versions
- More operationally efficient than Lambda functions or S3 Batch Operations for managing object versions
- Provides automated, rules-based version cleanup without manual intervention
- Significantly reduces storage costs in versioned buckets by removing unnecessary older versions
- S3 Object Lock:
- Enables write-once-read-many (WORM) protection for S3 objects
- Compliance mode prevents objects from being deleted or modified by anyone including root users
- Can set retention periods (e.g., 365 days) to meet regulatory requirements
- Ideal for medical data, financial records, and other information requiring immutability
- Ensures regulatory compliance for data that cannot be modified or deleted
- For Public Information Sharing: Can be combined with S3 static website hosting and a bucket policy allowing read-only access to create a secure way to share information with the public that cannot be modified or deleted before a specific date
- Applying to Existing Data:
- For meeting legal data retention requirements with minimal operational overhead:
- Turn on S3 Object Lock with compliance retention mode
- Set the retention period to match legal requirements (e.g., 7 years)
- Use S3 Batch Operations to apply these settings to existing data
- This approach is more efficient than manually recopying existing objects
- Governance mode is less suitable for strict legal requirements as it allows privileged users to override settings
- S3 Batch Operations significantly reduces operational overhead for applying retention settings to large datasets
- Compliance mode prevents deletion by any user, including administrators, until the retention period expires
- For meeting legal data retention requirements with minimal operational overhead:
- Restricting Bucket Access to VPC:
- You can configure an S3 VPC Gateway Endpoint and apply a bucket policy restricting access to only your VPC or VPC endpoint
- This ensures all traffic stays within AWS without traversing the public internet
- Commonly uses conditions such as
"aws:SourceVpc"
or"aws:SourceVpce"
in the bucket policy - Provides a secure and reliable method for applications in private subnets to access S3 buckets
- Eliminates the need for NAT gateways or VPN for private access to S3
- Creates a more secure solution for file transfers between applications and storage services
- Enable EC2 instances in private subnets to use AWS services without internet access
- Bucket Policy with VPC Endpoint: Adding a bucket policy that restricts access to only traffic coming from the VPC endpoint ensures data never traverses the public internet
- S3 Cross-Region Replication:
- Automatically and asynchronously copies objects across S3 buckets in different AWS Regions
- Most operationally efficient way to maintain copies of data in multiple regions
- Requires versioning to be enabled on both source and destination buckets
- Only replicates new objects created after replication is configured (existing objects must be copied manually)
- Can be configured at the bucket level or with specific prefix filters
- Provides a managed solution for regulatory compliance requiring geographic data redundancy
- More efficient than custom Lambda solutions for copying objects between regions
- Secure Content Distribution:
- For distributing copyrighted or sensitive content globally while restricting access by country, combine S3 with CloudFront geographic restrictions
- CloudFront's geographic restrictions feature can deny access to users in specific countries
- Use signed URLs to provide secure, time-limited access to authorized customers
- More effective for geographic restrictions than MFA and public bucket access
- More scalable than creating IAM users for each customer
- More efficient than deploying EC2 instances with ALBs in specific countries
- Provides both security and performance for global content delivery with granular access control
- Preventing Public Access:
- To ensure all S3 objects in an AWS account remain private:
- Enable S3 Block Public Access feature at the account level
- Use AWS Organizations to create SCPs that prevent IAM users from changing this setting
- Apply the SCP to the account or organizational units
- More effective than monitoring solutions like GuardDuty or Trusted Advisor that require remediation
- Prevents accidental exposure with minimal administrative overhead
- More appropriate than using Resource Access Manager which is designed for sharing resources, not monitoring
- To ensure all S3 objects in an AWS account remain private:
- Large-Scale Document Storage:
- Ideal for storing extremely large document repositories (e.g., 900+ TB of text documents)
- Provides highly available, reliable, and low-latency access at a significantly lower cost than alternatives
- Automatically scales to meet application demand without any provisioning required
- Perfect for web applications that need to provide access to large document collections
- Offers various storage classes to optimize costs based on access patterns
- More cost-effective than EBS for large-scale storage accessible across multiple instances
- More affordable than EFS for storing massive document repositories
- Superior to OpenSearch Service (Elasticsearch) when primary need is document storage rather than search functionality
- Allows for global distribution of content with minimal operational overhead
- S3 Storage Lens:
- Provides organization-wide visibility into S3 usage and activity metrics
- Can identify S3 buckets that are not versioning-enabled across all AWS Regions
- Aggregates metrics for multiple accounts, making it efficient for identifying buckets without versioning enabled
- More effective for versioning status monitoring than CloudTrail, IAM Access Analyzer, or Multi-Region Access Points
- Secure Upload Access for External Users:
- For allowing external users (like artists) to upload files to S3 without AWS credentials:
- Use an IAM role with upload permissions for the S3 bucket to generate presigned URLs
- Generate presigned URLs specific to each user's allowed S3 prefixes
- Provides temporary, limited access to specific objects or prefixes
- More secure than turning off block public access
- Better than enabling cross-origin resource sharing (CORS) alone which doesn't handle authentication
- More streamlined than creating a custom web interface which introduces additional complexity
- For allowing external users (like artists) to upload files to S3 without AWS credentials:
- WORM Protection for Medical Results:
- For confidential medical results that must be retained for a minimum of 1 year:
- Configure S3 Object Lock in compliance mode with a 1-year retention period
- Use IAM policies to grant specific approved users permission to add new files
- Ensures write once, read many (WORM) protection while allowing approved users to add new files
- More effective than MFA Delete which doesn't enforce retention periods or provide WORM capability
- Superior to IAM roles alone which don't provide the same level of enforcement as compliance mode
- More robust than Lambda-based tracking solutions that introduce unnecessary complexity
- Directly addresses regulatory requirements for medical data retention and protection
- Meets requirements with minimal implementation effort compared to alternative approaches
- For confidential medical results that must be retained for a minimum of 1 year:
- Secure S3 Access from Private Subnets:
- For EC2 instances in private subnets accessing S3 without traversing the public internet:
- Deploy an S3 gateway endpoint
- Ensures data travels only through the AWS private network
- Free to use (pay only for S3 requests and data transfer)
- More cost-effective than using NAT gateways
- Simpler than AWS Storage Gateway for basic S3 access
- More appropriate than S3 interface endpoints which incur additional costs
- Eliminates data transfer charges that would occur when accessing S3 via NAT gateway
- Improves security by keeping data traffic entirely within the AWS network
- For EC2 instances in private subnets accessing S3 without traversing the public internet:
- Log File Retention Strategy:
- For applications generating 10+ TB of logs per month with 10-year retention requirements:
- Store logs in Amazon S3
- Use S3 Lifecycle policies to move logs older than 1 month to S3 Glacier Deep Archive
- Most cost-effective solution for long-term retention of rarely accessed logs
- More efficient than using AWS Backup for S3 object management
- Better than storing logs in CloudWatch Logs which would be significantly more expensive
- More appropriate than using CloudWatch with S3 Lifecycle policies which adds unnecessary complexity
- Provides immediate access to recent logs (past month) for troubleshooting
- Ensures compliance with long-term retention requirements while optimizing costs
- For applications generating 10+ TB of logs per month with 10-year retention requirements:
- Object Lock for Regulatory Compliance:
- For meeting legal requirements to retain data for specific periods (e.g., 7 years):
- Turn on S3 Object Lock with compliance retention mode
- Set the retention period to match required duration (e.g., 7 years)
- Use S3 Batch Operations to apply these settings to existing data
- Compliance mode ensures no user (including administrators) can delete protected objects
- S3 Batch Operations significantly reduces operational overhead compared to manual recopy methods
- More effective than S3 Versioning with MFA Delete which doesn't enforce retention periods
- Superior to governance mode which allows privileged users to override retention settings
- Provides immutable storage with minimal operational overhead
- For meeting legal requirements to retain data for specific periods (e.g., 7 years):
- Lifecycle Rules for Mixed Data Types:
- For optimizing storage costs of different file types:
- Create S3 Lifecycle rules based on file extensions or prefixes
- Transition rarely accessed files (e.g., .csv files) to S3 Glacier after short periods (e.g., 1 day)
- Configure expiration rules for temporary files (e.g., image files) after their useful life (e.g., 30 days)
- Combines with serverless processing (Lambda) for comprehensive data management
- Optimizes storage costs by matching storage class to actual access patterns
- Better than keeping all data in S3 Standard regardless of access frequency
- More cost-effective than using S3 One Zone-IA or S3 Standard-IA for very rarely accessed data
- S3 Object Lock for Public Information Sharing:
- When combined with S3 Versioning enabled and configured for static website hosting, provides a secure way to share files with the public that cannot be modified or deleted
- Can set retention periods in accordance with designated future dates to prevent modifications or deletions
- Perfect solution for legal or compliance scenarios requiring immutable public records
- Setting an S3 bucket policy to allow read-only access ensures the public can view files without modifying them
- More secure and efficient than using IAM permissions for public access control
- For optimizing storage costs of different file types:
- Processing Data with PII Requirements:
- For processing data containing PII where only one application needs access to the full data:
- Store the data in an Amazon S3 bucket
- Use S3 Object Lambda to process and transform data before returning to applications
- Automatically removes PII for applications that shouldn't receive sensitive information
- Eliminates need to create and manage separate copies of data
- Ensures data consistency while applying transformations on-the-fly
- Minimizes operational overhead compared to managing multiple S3 buckets
- More efficient than proxy layers or custom transformation processes
- For processing data containing PII where only one application needs access to the full data:
Amazon FSx
FSx for NetApp ONTAP:
- Managed storage for SMB/NFS
- Multi-protocol access
- Supports cross-region replication with NetApp SnapMirror technology
- Provides a seamless way to replicate data across AWS Regions
- Maintains ability to access data using the same CIFS and NFS protocols as the primary region
- Offers least operational overhead for disaster recovery purposes
- Leverages built-in replication capabilities of FSx for ONTAP
- Ensures replicated data can be accessed using the same file-sharing protocols as in the primary region
- Ideal for DR solutions requiring minimal management complexity
- ONTAP replication verified for seamless cross-region data access
FSx for Windows File Server:
- Uses SMB protocol for Windows-based file shares
- Recommended for Windows-based applications needing shared file systems across multiple AZs
- Provides a fully managed native Microsoft Windows file system
- Ideal for migrating Windows file shares to AWS while maintaining the same access methods
- Supports Multi-AZ configuration for high availability and durability
- Eliminates the need to manually synchronize data between EC2 instances
- Preserves the way users access files through Windows file shares
- Provides built-in integration with Microsoft Active Directory for authentication and access control
- Specifically designed for Microsoft Windows-based workloads like SharePoint that require Windows-native file system features
- Perfect for migrations of on-premises SharePoint deployments requiring shared file storage
- Superior to other storage options like EFS (which uses NFS protocol) for Windows workloads
- FSx for Windows File Server is the recommended storage solution for SharePoint in AWS, as it natively supports the SMB protocol and Active Directory integration required by SharePoint
- Compared to alternatives, it offers the best combination of performance, native Windows compatibility, and simplified management for SharePoint workloads
- For disaster recovery across regions, can be paired with AWS Backup to create and copy backups to secondary regions
- Works with AWS Backup Vault Lock to ensure replicated backups cannot be deleted for specified retention periods
- Multi-AZ deployments provide higher resilience than Single-AZ for business-critical applications
- When RPO of minutes is required, should use Multi-AZ deployment with frequent backups scheduled through AWS Backup
- Can replicate file systems to other regions to protect against regional outages or for disaster recovery
- Amazon FSx for Windows File Server:
- Specifically designed for Microsoft Windows-based workloads like SharePoint
- Provides native integration with Active Directory for access control
- Supports SMB protocol required for Windows shared file storage
- Offers high availability through automatic replication across Availability Zones
- More suitable than Amazon EFS (which uses NFS protocol) for Windows workloads
- Better than AWS Storage Gateway for dedicated SharePoint deployments
- Superior to S3 for Windows file system operations requiring native Windows features
- Preserving File Permissions:
- To migrate and consolidate Windows file servers while preserving permissions:
- Deploy AWS DataSync agents on-premises
- Schedule DataSync tasks to transfer data to FSx for Windows File Server
- Alternatively, order AWS Snowcone for large migrations
- Launch DataSync agents on the Snowcone device for efficient transfer
- DataSync preserves NTFS permissions during migration
- More effective than using S3 as an intermediate storage
- Better than shipping drives or using Snowball Edge with S3 import
- To migrate and consolidate Windows file servers while preserving permissions:
- Microsoft SharePoint Integration:
- Specifically designed to support Windows-based workloads including SharePoint
- Provides fully managed, highly reliable and scalable file storage
- Built-in integration with Microsoft Active Directory for managing file access permissions
- Supports the SMB protocol used by Windows for shared file storage
- Allows active-active deployments across multiple Availability Zones for high availability
- Superior to EFS (which uses NFS protocol) for Windows workloads
- Better than S3 (object storage) for file system operations required by SharePoint
- More suitable than Storage Gateway for SharePoint's specialized needs
Database Performance Optimization:
- For multi-tier applications experiencing slowdowns due to repetitive database queries:
- Implement ElastiCache to cache frequently accessed, identical datasets
- Significantly reduces load on primary database servers
- Delivers in-memory performance for frequently accessed data
- More effective than read replicas for caching identical query results
- Better solution than SNS or Kinesis which aren't designed for caching database responses
- Ideal for ecommerce applications where the same product information is repeatedly requested
- For multi-tier applications experiencing slowdowns due to repetitive database queries:
FSx for OpenZFS:
- For NFS-based Unix/Linux file workloads
- Does not support SMB
- Designed specifically for open-source ZFS file system compatibility
- Optimized for Linux and Unix environments requiring NFS access
- Provides high performance with microsecond latencies
- Offers snapshot capabilities for point-in-time data recovery
FSx for Lustre:
- Fully managed file system optimized for compute-intensive workloads
- Designed for high-performance computing (HPC), machine learning, and media processing
- Available in scratch and persistent deployment types:
- Scratch: For temporary storage and shorter-term processing, data not replicated
- Persistent: For longer-term storage and throughput-focused workloads, data replicated within AZ
- Can seamlessly process data directly from S3
- Provides high levels of throughput, IOPS, and sub-millisecond latencies
- Enables thousands of compute instances to access data simultaneously
- Tailored for workloads that demand both high sustained throughput and data durability
- Supports sub-millisecond latency for HPC environments
- Ideal for scenarios like weather forecasting companies processing hundreds of gigabytes of data with sub-millisecond latency
- Provides a high-performance file system for large-scale data processing with parallel access requirements
- Additionally, FSx for Lustre can be used for on-premises data center workloads that require Lustre clients to access HPC-level shared file systems. This is especially suited for gaming applications needing a fully managed, high-performance file system that works seamlessly with Lustre clients
- The only fully managed AWS storage service compatible with Lustre client protocol
- Superior option compared to AWS Storage Gateway (which supports NFS/SMB), EC2 Windows file shares, or EFS (which uses NFS) when Lustre client support is specifically required
- Particularly well-suited for on-premises gaming applications requiring high-performance shared storage with Lustre protocol
- Provides cost-effective, high-performance, scalable file storage for workloads requiring the Lustre file system
- Specifically designed to work with high-performance computing (HPC) workloads including data analytics and gaming applications
- Perfect for applications hosted in on-premises data centers that need Lustre client access while maintaining a fully managed service
- Uniquely capable of supporting Lustre clients which neither EFS (which uses NFS) nor Storage Gateway file gateway (which supports NFS/SMB) can provide
- Unlike EC2 Windows file sharing which uses SMB protocol, FSx for Lustre delivers the specific Lustre file system protocol required by certain gaming and HPC applications
- Ideal for shared storage solutions for gaming applications hosted in on-premises data centers
- Provides high-performance, fully managed file storage using the Lustre client protocol
- More appropriate than S3 File Gateway, EC2 Windows instances, or EFS for Lustre clients
Amazon Elastic File System (EFS)
- Key Features:
- Fully managed NFS file system
- Grows/shrinks automatically
- Multi-AZ by default
- Excellent for Linux-based applications needing shared file access
- Confirmed scalable file sharing across instances
- Throughput Modes:
- Bursting Throughput: Automatically scales throughput to accommodate spikes
- Provisioned Throughput: Allows specifying throughput levels independent of storage amount
- Document Storage for Multi-Instance Deployments:
- For web applications needing shared document access across multiple instances:
- Store documents in Amazon EFS instead of EBS volumes
- Mount the EFS file system to all EC2 instances
- Configure the application to save and retrieve documents from EFS
- This ensures all instances can access all documents simultaneously
- Solves issues with Application Load Balancer routing to instances with different document subsets
- More scalable than copying data between EBS volumes
- Better than configuring ALB for session stickiness which limits scalability
- For web applications needing shared document access across multiple instances:
- Shared Storage for Application Clusters:
- For applications requiring concurrent read/write access to shared storage across multiple EC2 instances:
- Create an Amazon EFS file system and mount it from each EC2 instance
- Configure applications to use the shared file system for data storage
- Provides seamless, concurrent access to the same files from multiple instances
- Supports rapid read/write operations with high throughput capabilities
- Ideal for applications with hierarchical directory structures
- Scales on-demand without disrupting applications
- Better solution than using S3 for file system-like operations requiring low latency
- Superior to attaching a single EBS volume to multiple instances (not supported)
- More efficient than synchronizing data across multiple EBS volumes
- For applications requiring concurrent read/write access to shared storage across multiple EC2 instances:
Amazon Elastic Block Store (EBS)
- Key Features:
- Persistent block-level storage for use with EC2
- "EBS encryption by default" can be configured at the EC2 account attribute level
- Ensures all newly created EBS volumes are encrypted automatically
- Prevents creation of unencrypted volumes
- Volume Types:
- General Purpose SSD (gp3):
- Baseline performance of 3,000 IOPS and up to 16,000 IOPS
- Can provision IOPS independently of storage capacity
- Most cost-effective when specific IOPS requirements must be met
- Suitable for transactional workloads, virtual desktops, and medium-sized databases
- General Purpose SSD (gp2):
- IOPS performance linked directly to storage capacity
- Requires provisioning larger volume sizes to achieve higher IOPS
- Less cost-effective for specific IOPS requirements
- Provisioned IOPS SSD (io1):
- Designed for I/O-intensive database workloads
- For workloads requiring sustained IOPS performance above 16,000 IOPS
- More expensive than gp3 for similar performance levels
- Magnetic (Standard):
- Lowest per-GB cost but does not support provisioning of IOPS
- Best suited for workloads with infrequently accessed data
- Cannot meet high IOPS requirements reliably or cost-effectively
- General Purpose SSD (gp3):
- Volume Types:
- Throughput Optimized HDD (st1):
- Designed for sequential read and write operations on large files, such as log processing
- Cost-effective solution for workloads requiring high throughput
- Maximum throughput of 500 MBps depending on volume size
- Ideal for applications that process large, sequential log files
- Better choice than Cold HDD (sc1) for workloads requiring sustained throughput of 500 MBps
- More cost-effective than General Purpose SSD (gp3) or Provisioned IOPS (io1) for sequential processing tasks
- Throughput Optimized HDD (st1):
- Multi-Attach EBS Volumes:
- For allowing simultaneous write access to block storage from multiple instances:
- Use Provisioned IOPS SSD (io2) EBS volumes with Amazon EBS Multi-Attach feature
- Attach to multiple Nitro-based EC2 instances within the same Availability Zone
- Enables up to 16 Nitro-based EC2 instances to simultaneously access the same volume
- Improves application availability for clustered applications
- Not supported on General Purpose SSD (gp2/gp3) or Throughput Optimized HDD (st1) volumes
- Perfect for applications requiring simultaneous block-level storage access from multiple instances
- For allowing simultaneous write access to block storage from multiple instances:
- Snapshots:
- You can lock EBS snapshots to protect against accidental or malicious deletion
- Locked snapshots can't be deleted by any user regardless of their IAM permissions
- Snapshots can be stored in WORM (write-once-read-many) format for a specific duration
- Can continue to use a locked snapshot just like any other snapshot
- Lifecycle Management:
- Amazon Data Lifecycle Manager (DLM) automates creation, retention, and deletion of EBS snapshots
- Helps enforce regular backup schedules
- Creates standardized AMIs that can be refreshed at regular intervals
- Retains backups as required by auditors or internal compliance
- Reduces storage costs by deleting outdated backups
- Can create disaster recovery backup policies that back up data to isolated regions or accounts
- GP3 Volume Advantages:
- Allows provisioning disk performance (IOPS) independent of storage capacity
- Provides baseline performance of 3,000 IOPS at any volume size
- Can scale up to 16,000 IOPS with additional cost
- Most cost-effective option for workloads requiring up to 16,000 IOPS
- More efficient than io1/io2 volumes for similar performance requirements
- Better choice than GP2 when specific IOPS requirements must be met
- EBS Data Encryption at Rest:
- For ensuring all data written to EBS volumes is encrypted:
- Create EBS volumes as encrypted volumes when attaching to EC2 instances
- Uses transparent encryption/decryption process requiring no additional action from users
- Encrypts both data at rest on the volume and all snapshots created from it
- More effective than IAM roles which don't enforce encryption by themselves
- More direct than instance tagging which doesn't affect volume encryption state
- Simpler than KMS key policies which control access to keys but don't automatically enable encryption
- For ensuring all data written to EBS volumes is encrypted:
AWS Transfer Family
- AS2 Protocol Support:
- Supports Applicability Statement 2 (AS2) protocol for secure file transfers
- Can be integrated with custom identity providers through AWS Lambda functions
- Provides secure and reliable transfer of data over the Internet
- More suitable for AS2 protocol requirements than DataSync, AppFlow, or Storage Gateway
- Particularly valuable for legacy systems requiring AS2 protocol for EDI transfers
- Enables application users to authenticate with company's own identity provider
- SFTP Solution with EFS:
- For implementing a serverless high-IOPS SFTP solution:
- Create an encrypted Amazon EFS volume
- Create an AWS Transfer Family SFTP service with elastic IP addresses and a VPC endpoint
- Attach a security group that allows only trusted IP addresses
- Attach the EFS volume to the SFTP service endpoint
- This provides a highly available SFTP service with integrated AWS authentication
- More suitable for high IOPS workloads than S3-based solutions
- Offers better throughput for file systems requiring high performance
- Allows maintaining control over user permissions with the same serverless benefits
- For implementing a serverless high-IOPS SFTP solution:
- AS2 Protocol Support:
- Supports Applicability Statement 2 (AS2) protocol for secure file transfers
- Can be integrated with custom identity providers through AWS Lambda functions
- Provides secure and reliable transfer of data over the Internet
- More suitable for AS2 protocol requirements than DataSync, AppFlow, or Storage Gateway
- SFTP for S3 with Active Directory Authentication:
- For enabling customers to download S3 files using existing SFTP clients:
- Set up AWS Transfer Family with SFTP for Amazon S3
- Configure integrated Active Directory authentication
- Provides fully managed SFTP service that scales automatically with demand
- Leverages existing on-premises Active Directory for user authentication
- Requires no changes to customer applications that use SFTP clients
- More appropriate than AWS DMS or DataSync for file transfer workflows
- More operational efficient than setting up and managing SFTP servers on EC2 instances
- Eliminates need to manage server infrastructure, patching, or scaling
- For enabling customers to download S3 files using existing SFTP clients:
- Immediate File Processing Patterns:
- For efficient processing of files received via FTP:
- Use AWS Transfer Family to create an FTP server storing files in Amazon S3 Standard
- Create Lambda functions triggered by S3 event notifications to process files
- Set up automatic deletion of files after successful processing
- Processes files as soon as they arrive without manual intervention
- More operationally efficient than storing on EBS volumes and processing nightly
- More appropriate than using S3 Glacier Flexible Retrieval for files requiring immediate processing
- Eliminates the need for separate FTP servers and processing infrastructure
- Perfect for companies transitioning from legacy batch processing to more responsive workflows
- For efficient processing of files received via FTP:
AWS Storage Gateway
- AWS S3 File Gateway for Medical Research Data:
- Ideal for providing low-latency access to S3 data for on-premises file-based applications
- Deployed as a virtual machine on premises at each location
- Allows clinics to access data in S3 buckets as if it were a local file system
- Caches frequently accessed data to minimize latency
- Works with read-only permissions on S3 buckets to ensure data security and immutability
- More appropriate than DataSync which is primarily for moving data, not providing continuous access
- Better solution than Volume Gateway for file-based applications that need access to S3 data
- More suitable than EFS which cannot be directly attached to on-premises servers without Direct Connect or VPN
- File Gateway for Remote Access:
- Ideal for providing low-latency access to S3 data for on-premises file-based applications
- Deployed as a virtual machine on premises at remote locations
- Allows accessing data in S3 buckets as if it were a local file system
- Caches frequently accessed data to minimize latency
- Works with read-only permissions to ensure data security and immutability
- Particularly valuable for organizations like medical research labs sharing data with remote clinics
- More appropriate than DataSync which is primarily for moving data, not providing continuous access
- Superior to Volume Gateway for file-based application access to S3 data
- Better than EFS which cannot be directly attached to on-premises servers without Direct Connect or VPN
- S3 File Gateway for Hybrid Storage:
- Provides a seamless way to extend on-premises file storage to AWS
- Supports industry-standard file protocols like SMB and NFS
- Combines with S3 Lifecycle policies for automatic tiering of older data
- Perfect for organizations needing to reduce on-premises storage footprint
- Offers a solution for extending on-premises SMB file servers with cloud storage
- Enables continued low-latency access to recently created/accessed files
- Files stored in S3 can be automatically transitioned to lower-cost storage classes:
- After 7 days, files can be moved to S3 Glacier Deep Archive for cost savings
- Recently accessed files remain available with low latency
- Preserves existing file access methods and user experience
- Users continue to access files through familiar SMB protocols
- More suitable than DataSync which is primarily for migration rather than extending storage
- Superior to installing S3 utilities on end-user computers which changes user experience
- More comprehensive than FSx for Windows File Server for lifecycle management purposes
- Provides automatic lifecycle management capabilities missing from pure file server solutions
- Extremely effective for managing files with predictable access patterns (frequently accessed when new, rarely accessed when older)
- File Gateway Integration with Active Directory:
- Allows creation of SMB file shares with Active Directory integration
- Provides user authentication and file permissions using existing AD infrastructure
- Primarily used for secure connection between on-premises environments and AWS Cloud storage
- Not recommended for Microsoft SharePoint workloads requiring specialized Windows file storage
- S3 File Gateway for SMB File Servers:
- For file servers with increasing storage needs and files rarely accessed after 7 days:
- Create an Amazon S3 File Gateway to extend on-premises storage
- Configure S3 Lifecycle policy to transition files to S3 Glacier Deep Archive after 7 days
- Provides seamless extension of storage capacity into the cloud
- Offers familiar SMB access protocol for users
- Automatically manages frequently vs rarely accessed files with lifecycle policies
- More suitable than DataSync which is primarily for migration rather than ongoing storage extension
- Better than FSx for Windows File Server which doesn't provide automatic tiering to archive storage
- Avoids changing user experience compared to installing S3 utilities on client computers
- For file servers with increasing storage needs and files rarely accessed after 7 days:
- Virtual Tape Library (VTL) for Backup Migration:
- For eliminating physical backup tapes while preserving investments in backup applications:
- Deploy AWS Storage Gateway with the iSCSI-virtual tape library (VTL) interface
- Connect existing backup applications to the VTL interface
- Presents itself as a traditional tape library to backup software
- Eliminates costs associated with managing and storing physical tapes
- Simplifies backup infrastructure while maintaining compatibility with existing processes
- More suitable than NFS interfaces for physical tape replacement
- Better than Amazon EFS which doesn't provide tape library emulation
- For eliminating physical backup tapes while preserving investments in backup applications:
- Disaster Recovery for On-premises Storage:
- For implementing disaster recovery of on-premises iSCSI storage volumes:
- Provision AWS Storage Gateway Volume Gateway in stored volume configuration
- Mount the Volume Gateway stored volume to the existing file server using iSCSI
- Configure scheduled snapshots of the storage volume
- Allows entire dataset to remain on-premises for immediate, latency-free access
- Primary data storage remains on-premises with asynchronous backup to AWS
- Requires minimal modifications to existing infrastructure
- Leverages familiar iSCSI connection method
- Provides reliable recovery points through scheduled snapshots
- Recovery involves restoring a snapshot to an EBS volume attached to an EC2 instance
- More suitable than Volume Gateway cached volumes which primarily store data in AWS
- For implementing disaster recovery of on-premises iSCSI storage volumes:
AWS DataSync
- Primary Use:
- Automates data transfers between on-premises storage and AWS or between different AWS storage services
- Not for running data analysis jobs or containerized workloads
- Confirmed use of DataSync for efficient bulk data transfers
- Enhanced Data Transfer:
- Provides a secure and reliable method for transferring large volumes of data
- Can be used in conjunction with AWS Direct Connect for dedicated network connection
- Offers higher throughput and security compared to transfers over public internet
- Particularly suited for transferring instrumentation data (JSON files) to S3
- Minimizes risk of data exposure and provides dedicated network connection
- AWS DataSync for Windows File Server Migration:
- Ideal for migrating on-premises Windows file shares to FSx for Windows File Server
- Provides bandwidth throttling capabilities to minimize impact on shared network links
- Preserves file metadata and permissions during transfer
- Accelerates migration by maximizing available network bandwidth
- More suitable than Snowcone, AWS Transfer Family, or manual methods for network-based migrations
- Perfect for situations where controlling bandwidth usage is critical
- Continuous File Synchronization:
- For copying files between S3 buckets and EFS file systems continuously:
- Create DataSync locations for source and destination storage
- Configure tasks with transfer mode set to transfer only changed data
- Schedule tasks to run at appropriate intervals
- Provides automated data movement with minimal operational overhead
- Only copies changed files, optimizing transfer efficiency and reducing costs
- More efficient than Lambda functions mounting file systems or EC2-based solutions
- Includes automatic encryption and data integrity validation for secure transfers
- Supports AWS security mechanisms including IAM roles and VPC endpoints
- Can use a purpose-built network protocol and parallel, multi-threaded architecture
- Reduces operational costs with flat per-gigabyte pricing compared to custom scripts
- For copying files between S3 buckets and EFS file systems continuously:
- Data Migration for Complex File Systems:
- For migrating intricate directory structures with millions of small files:
- Use AWS DataSync to migrate data to Amazon FSx for Windows File Server
- Preserves SMB-based file storage compatibility
- Maintains complex directory structures and unstructured data
- Provides better automation than AWS Direct Connect alone
- More suitable than Amazon FSx for Lustre which is optimized for high-performance workloads
- Better than AWS Storage Gateway volume gateway which is designed for hybrid cloud storage rather than full-scale migrations
- Handles permissions and metadata preservation during the migration process
- Offers efficient transfer with bandwidth throttling capabilities for minimal impact on production environments
- For migrating intricate directory structures with millions of small files:
- Enhanced Data Transfer:
- Provides a secure and reliable method for transferring large volumes of data
- Can be used in conjunction with AWS Direct Connect for dedicated network connection
- Offers higher throughput and security compared to transfers over public internet
- Particularly suited for transferring instrumentation data (JSON files) to S3
- Minimizes risk of data exposure and provides dedicated network connection
- When combined with Direct Connect, ensures traffic remains on the private AWS global network
- Offers more reliable transfer method than transfers over public internet
AWS Backup
- Cross-Region Backup:
- Provides centralized, fully managed backup service across AWS services
- Can easily copy EC2 and RDS backups to a separate region
- Creates backup policies to automate backup tasks and enforce retention periods
- Ensures data is backed up to different regions for disaster recovery with minimal manual intervention
- Offers the least operational overhead compared to managing individual service backups
- More comprehensive than DLM for handling different AWS service backups
- Simpler than managing multiple manual snapshot processes and copies
- Can automate the entire backup workflow with far fewer manual steps than alternative approaches
- Ideal for long-term data retention requirements (e.g., 7 years) for compliance purposes
- More operationally efficient than custom scripts or manual backup processes for DynamoDB and other services
- The most operationally efficient solution for maintaining long-term data retention such as DynamoDB tables that must be kept for 7+ years
- Vault Lock:
- Offers two modes for protecting backups: governance mode and compliance mode
- Governance mode allows privileged users to delete or modify protected backups if needed
- Compliance mode enforces immutability where no user (including administrators) can delete protected backups
- When configured with a minimum retention period, ensures backups cannot be deleted until the period expires
- Critical for meeting regulatory requirements for immutable data retention
- Perfect for ensuring replicated backups (like FSx for Windows File Server) remain unmodified for specific durations
- Can be configured to retain backups for multiple years to meet long-term compliance requirements
- More restrictive than standard retention policies as it prevents override by administrators
- Especially valuable for financial and healthcare data that must be preserved unchanged for regulatory purposes
- Disaster Recovery Implementation:
- For Windows Server workloads on EC2 requiring cross-region disaster recovery with RPO of 24 hours:
- Create a backup vault using AWS Backup
- Create a backup plan for EC2 instances based on tag values
- Define the destination for the copy as another region (e.g., us-west-2)
- Specify the backup schedule to run twice daily
- Alternatively, create an Amazon EC2-backed AMI lifecycle policy with:
- Backups based on tags
- Schedule to run twice daily
- Automatic copy to the destination region
- Both options provide automated cross-region backup with minimal administrative effort
- More efficient than manually copying images or using Lambda functions for orchestration
- For Windows Server workloads on EC2 requiring cross-region disaster recovery with RPO of 24 hours:
- Cross-Region EC2 and EBS Backup:
- For backing up EC2 instances with attached EBS volumes and enabling cross-region recovery:
- Create a backup plan in AWS Backup
- Configure nightly backups of application EBS volumes
- Set up cross-region copy to a secondary region
- More operationally efficient than custom Lambda-based snapshot solutions
- Provides fully managed, centralized backup service across multiple AWS resources
- Ensures recoverability in different AWS regions for disaster recovery
- Offers simpler management than copying snapshots manually or with custom scripts
- For backing up EC2 instances with attached EBS volumes and enabling cross-region recovery:
- Vault Lock for Data Protection:
- Governance mode allows privileged users to delete or modify protected backups if needed
- Compliance mode enforces immutability where no user (including administrators) can delete protected backups
- When configured with a minimum retention period, ensures backups cannot be deleted until the period expires
- Critical for meeting regulatory requirements for immutable data retention
- Perfect for ensuring replicated backups (like FSx for Windows File Server) remain unmodified for specific durations
- Can be configured to retain backups for multiple years to meet long-term compliance requirements
- More restrictive than standard retention policies as it prevents override by administrators
- Especially valuable for financial and healthcare data that must be preserved unchanged for regulatory purposes
- Vault Lock in Compliance Mode:
- Enforces regulatory compliance requirements that prevent deletion of backup data for specific durations
- Ensures backup files are immutable for the retention period specified
- Once enabled, cannot be disabled or modified, providing strong protection for regulated data
- More appropriate than vault lock in governance mode when strict immutability is required
- Essential for industries with regulatory requirements for backup retention
- Prevents any user (including administrators) from deleting protected backups
- Creates an air-tight barrier against accidental or malicious deletion
S3 Lifecycle Management
- IoT Data Management Strategy:
- For IoT applications generating trillions of objects annually:
- Use S3 Standard storage class for data less than 30 days old
- Create lifecycle policies to transition objects to S3 Standard-IA after 30 days
- Move data to S3 Glacier Deep Archive after 1 year for archival purposes
- This approach optimizes storage costs based on access patterns
- Provides appropriate performance for daily ML model retraining with recent data
- Supports periodic analysis (e.g., quarterly) with data up to 1 year old
- Ensures cost-effective long-term retention while maintaining appropriate access times for different use cases
- For IoT applications generating trillions of objects annually:
- Long-term Data Archiving:
- For data requiring long-term retention (25+ years) with first 2 years needing immediate access:
- Keep data in S3 Standard for first 2 years
- Set up lifecycle policies to transition to S3 Glacier Deep Archive after 2 years
- Provides optimal balance between accessibility for recent data and cost-effective long-term storage
- Ensures high availability and immediate retrieval for frequently accessed recent data
- Significantly reduces storage costs for seldom-accessed older data
- For data requiring long-term retention (25+ years) with first 2 years needing immediate access:
- Cost-Effective Storage Tiering for Large Datasets:
- For IoT or streaming data producing trillions of S3 objects yearly:
- Start with S3 Standard for initial 30 days of frequent access
- Transition to S3 Standard-IA after 30 days
- Move to S3 Glacier Deep Archive after 1 year for long-term archival
- This approach optimizes costs based on changing access patterns while maintaining availability requirements
- Perfect for data that requires immediate availability initially but becomes less frequently accessed over time
- For IoT or streaming data producing trillions of S3 objects yearly:
- DynamoDB Backup Plans for Compliance:
- For maintaining database backups with specific retention requirements:
- Create an AWS Backup plan to back up DynamoDB tables on a schedule (e.g., monthly)
- Configure lifecycle policy to transition backups to cold storage after specified period
- Set appropriate retention period (e.g., 7 years) for compliance requirements
- Provides centralized, fully managed backup service across AWS services
- Simplifies backup management and compliance adherence
- More efficient than developing custom scripts for backup management
- Enables automated lifecycle transitions to optimize storage costs
- Superior to on-demand backups without lifecycle management
- For maintaining database backups with specific retention requirements:
Container and Kubernetes Services
Amazon ECS (Elastic Container Service)
- Launch Types:
- Fargate:
- Serverless compute engine for containers
- No need to provision or manage servers
- Ideal for cloud-native workloads needing minimal infrastructure management
- Cost-effective for running tasks that exceed the maximum execution duration limit of AWS Lambda functions (15 minutes)
- Only pay for the vCPU and memory resources your containerized application uses while it's running
- Suitable for data processing jobs that run once daily and can take up to 2 hours to complete
- Supports ECS Service Auto Scaling (target tracking on metrics like CPU utilization) to automatically adjust the number of running tasks based on demand, ensuring high availability under heavy load
- Fargate confirmed for its serverless efficiency and simplified management
- EC2:
- You manage and patch the underlying Amazon EC2 instances
- Allows more granular control over infrastructure
- Requires managing server infrastructure, including capacity, provisioning, and scaling
- Recommended when specialized AMIs or hardware is required
- ECS Anywhere:
- Extend ECS to on-premises hardware or other clouds
- Use an external launch type for hybrid container management
- Fargate:
- Scheduling:
- Can schedule ECS tasks on a recurring basis with Amazon EventBridge (e.g., weekly batch jobs)
- Load Balancing:
- Typically use Application Load Balancer (ALB) for HTTP/HTTPS traffic
- Network Load Balancer (NLB) can be used for TCP/UDP pass-through
- Application Migration:
- Ideal for migrating monolithic applications to microservices
- Combined with ALB provides a scalable solution for breaking applications into smaller, independently managed services
- Minimizes operational overhead through managed container orchestration
- ALB ensures high availability and distributes traffic to containers based on demand
- Perfect solution for containerized workloads when breaking down monolithic applications
- Provides better container orchestration than EC2-based deployment for modernizing applications
- Allows preservation of front-end and back-end code while enabling decomposition into smaller components
- Reduces operational complexity when migrating from on-premises container environments
- Containerized Web Application Migration:
- When migrating on-premises containerized applications to AWS with minimal changes:
- Use AWS Fargate on Amazon ECS with Service Auto Scaling
- Configure an Application Load Balancer to distribute incoming requests
- Provides serverless compute engine for containers that eliminates need to provision and manage servers
- Automatically scales application in response to changing demand without manual intervention
- Requires minimal code changes as containers can use the same images from on-premises environment
- Significantly reduces operational overhead compared to managing EC2 instances for container hosting
- More suitable than Lambda for containerized applications as it doesn't require application redesign
- Better than HPC solutions which are designed for compute-intensive workloads rather than web applications
- Creates a highly available, scalable architecture with minimal development effort and operational overhead
- When migrating on-premises containerized applications to AWS with minimal changes:
- AWS Application Auto Scaling for Fargate:
- For optimizing costs while maintaining performance for ECS with Fargate:
- Use AWS Application Auto Scaling with target tracking policies
- Configure to scale based on CPU and memory usage metrics
- Set CloudWatch alarms to trigger scaling actions
- Automatically scales in when utilization decreases to reduce costs
- More appropriate than EC2 Auto Scaling for Fargate tasks
- More direct and easier to manage than custom Lambda-based scaling solutions
- Provides responsive scaling based on actual resource utilization rather than time-based scaling
- For optimizing costs while maintaining performance for ECS with Fargate:
- Windows Batch Jobs:
- Perfect for modernizing Windows batch jobs that currently run on-premises
- When combined with AWS Batch, provides fully managed infrastructure for batch processing
- Automates deployment, scheduling, and scaling for jobs running up to 1 hour
- Eliminates the need to manage EC2 instances in Auto Scaling groups
- More suitable than Lambda which doesn't natively support Windows workloads
- Less complex than EKS which requires Kubernetes cluster management
Amazon EKS (Elastic Kubernetes Service)
- Key Features:
- Managed service for Kubernetes clusters
- Lowers operational overhead for running upstream Kubernetes
- Integrates well with AWS networking, security, and scaling services
- EKS confirmed for providing managed Kubernetes with robust integration features
- Perfect for migrating containerized workloads from on-premises Kubernetes environments
- Allows migration without code changes or deployment method modifications
- When paired with AWS Fargate and Amazon DocumentDB (with MongoDB compatibility), provides the least disruptive path for migrating MongoDB-based containerized applications
- Fargate for EKS:
- Combining EKS with Fargate can minimize overhead while running containerized workloads, if using Kubernetes
- Amazon CloudWatch Container Insights:
- Purpose-built for collecting, aggregating, and summarizing metrics and logs from containerized applications
- Provides comprehensive monitoring for Amazon EKS, Amazon ECS, and Kubernetes on EC2
- Automatically collects detailed performance data at every layer of the container stack
- Offers a centralized view for monitoring container performance metrics and logs
- Specifically designed for monitoring microservices architectures in EKS clusters
- Integrates seamlessly with CloudWatch alarms and dashboards
- More tailored to containerized environments than standard CloudWatch agents
- Offers application-level insights without requiring custom instrumentation
- Provides more container-aware monitoring than CloudTrail or AWS App Mesh
- EKS Connector:
- Allows registering and connecting any conformant Kubernetes cluster to AWS
- Enables viewing all Kubernetes clusters (both AWS and on-premises) from a central location in the Amazon EKS console
- Provides a unified view of all connected clusters with minimal operational overhead
- More efficient than CloudWatch Container Insights or Systems Manager for centralized cluster management
- Purpose-built for viewing and connecting to Kubernetes clusters across different environments
- The most operationally efficient way to view all clusters and workloads from a central location
- Secrets Encryption:
- Create a new AWS KMS key and enable Amazon EKS KMS secrets encryption on the EKS cluster
- This ensures all secrets stored in the Kubernetes etcd key-value store are encrypted at rest
- More secure than using the default EKS configuration without encryption
- Required when handling sensitive information in Kubernetes secrets
- Amazon EBS CSI driver add-ons do not address etcd secret encryption requirements
- Kubernetes Auto Scaling:
- For scaling EKS clusters based on workload with minimal operational overhead:
- Use Kubernetes Metrics Server to activate horizontal pod autoscaling
- Implement Kubernetes Cluster Autoscaler to manage node count
- Metrics Server collects resource metrics from Kubelets for use by autoscalers
- Cluster Autoscaler adjusts the number of nodes based on pending pods and resource utilization
- More efficient than using Lambda functions, API Gateway, or App Mesh for scaling
- For scaling EKS clusters based on workload with minimal operational overhead:
Networking and Content Delivery
VPC CIDR Selection
- For VPC Peering Connections:
- When creating a new VPC to peer with an existing VPC (e.g., 192.168.0.0/24), select a non-overlapping CIDR block
- A CIDR block like 10.0.1.0/24 is appropriate as it doesn't overlap with 192.168.0.0/24
- Avoid using the same CIDR block (e.g., 192.168.0.0/24) as the existing VPC
- Don't use /32 subnet masks (e.g., 10.0.1.0/32 or 192.168.1.0/32) as they represent single IP addresses, not usable ranges
- Ensure the CIDR block is large enough to support your VPC infrastructure needs
- Select the smallest viable CIDR range to conserve IP address space within your organization
Network Load Balancer for UDP Applications
- Gaming Applications with UDP Traffic:
- For running gaming applications that transmit data using UDP packets in Auto Scaling groups:
- Attach a Network Load Balancer (NLB) to the Auto Scaling group
- NLB supports UDP traffic, which is critical for many gaming applications
- Enables the application to scale out and in as traffic fluctuates
- Provides high throughput with ultra-low latencies for UDP traffic
- Not possible with Application Load Balancer which only supports HTTP/HTTPS traffic
- More effective than Route 53 weighted policy routing which doesn't provide dynamic scaling
- Better solution than NAT instances with port forwarding which introduce complexity and bottlenecks
- For running gaming applications that transmit data using UDP packets in Auto Scaling groups:
Amazon API Gateway
- Key Features:
- Provides a fully managed service for creating, publishing, maintaining, monitoring, and securing APIs
- Can integrate with various AWS services including Lambda, DynamoDB, and Kinesis
- Supports REST and WebSocket APIs
- Offers features like request throttling, API keys, and monitoring
- Handles API versioning and different stages (dev, test, prod)
- Provides a scalable and elastic solution for handling increased inquiries during peak periods like holiday seasons
- When combined with Lambda, automatically scales execution in response to incoming requests without manual intervention
- Modernizing Multi-Tier Applications:
- For migrating on-premises applications to AWS:
- Use API Gateway to direct transactions to AWS Lambda functions as the application layer
- Use Amazon SQS as the communication tier between services
- This provides a highly operationally efficient and modern solution by leveraging serverless technologies
- Eliminates the need to manage servers or instances
- Prevents dropped transactions between tiers by decoupling components
- More effective than simply increasing EC2 instance sizes or using EC2 with Auto Scaling
- For migrating on-premises applications to AWS:
- Real-time Data Ingestion:
- Configure API Gateway to send data to an Amazon Kinesis data stream
- Create an Amazon Kinesis Data Firehose delivery stream using the Kinesis data stream as a data source
- Use AWS Lambda functions to transform the data in transit
- Direct the Kinesis Data Firehose delivery stream to send data to Amazon S3
- Provides fully managed, real-time data processing with minimal operational overhead
- Lambda Authorizers:
- Provide custom authorization for API Gateway endpoints
- Enable authorization through a Lambda function
- Can validate tokens or perform complex authorization logic
- Particularly effective for applications with unpredictable traffic patterns
- When combined with Kinesis Data Firehose, creates a scalable solution for capturing and storing customer activity data
- More secure and flexible than authorization at load balancer or network level
- Ideal for web applications that need to capture analytics data with proper authorization
- Custom Domain Names:
- For providing individual secure URLs for multiple customers:
- Register a domain in a registrar
- Create a wildcard custom domain in Route 53 with a record pointing to API Gateway
- Request a wildcard certificate in AWS Certificate Manager in the same region as API Gateway
- Import the certificate into API Gateway and create a custom domain name
- This provides the most operationally efficient way to manage individual secure customer URLs
- Creating separate hosted zones or API endpoints for each customer introduces unnecessary complexity
- For providing individual secure URLs for multiple customers:
- Accessing Private VPC Services:
- To expose REST APIs that access backend services in private subnets:
- Design a REST API using Amazon API Gateway
- Host the backend application in Amazon ECS in a private subnet
- Create a private VPC link for API Gateway to access the ECS services
- More secure than using security groups to connect API Gateway to ECS
- Provides private connectivity without exposing backend services to the internet
- Supports REST API requirements better than WebSocket APIs for standard request-response patterns
- To expose REST APIs that access backend services in private subnets:
- Subscription Management:
- For restricting access to premium content with minimal operational overhead:
- Implement API usage plans and API keys in API Gateway
- Assign API keys to subscribed users
- Create usage plans that specify access levels for premium content
- More efficient than implementing custom authorization logic
- Simpler than using AWS WAF rules for subscription filtering
- More appropriate than fine-grained IAM permissions at the database level
- Leverages built-in API Gateway features for access control
- For restricting access to premium content with minimal operational overhead:
- REST API with Lambda Integration:
- For scalable tax computation services with variable traffic:
- Design a REST API using API Gateway to accept parameters like item prices
- Configure Lambda integration to perform tax calculations when requests are received
- Set appropriate throttling and scaling policies to handle traffic spikes
- Provides serverless architecture that minimizes operational overhead
- Automatically scales to handle holiday season traffic spikes without manual intervention
- More scalable and elastic than single EC2 instances or fixed-size instance pools
- Eliminates the need to manage and patch underlying infrastructure
- For scalable tax computation services with variable traffic:
- Subscription Access Control:
- For controlling access to premium content in serverless applications:
- Implement API Gateway usage plans and API keys
- Assign API keys to subscribed users
- Create usage plans that define access levels for premium content
- Provides built-in solution with minimal operational overhead
- More appropriate than AWS WAF for subscription management
- More targeted than API throttling or caching which focus on performance, not access control
- Simpler than implementing custom authorization logic in Lambda functions
- For controlling access to premium content in serverless applications:
AWS PrivateLink
- Overview:
- Private connectivity between VPCs and AWS services or SaaS providers
- Traffic remains on the AWS global network, avoiding public internet exposure
- Eliminates the need for NAT gateways or VPN for private access to supported services
- Interface Endpoints:
- Enable private connections between a VPC and AWS services
- Use private IP addresses within your VPC to access services
- When combined with AWS Direct Connect, ensures data from on-premises to AWS doesn't traverse the public internet
- VPC Endpoints:
- Allow applications to access S3 buckets through a private network path within AWS
- Enable EC2 instances in private subnets to use AWS services without internet access
- Create a more secure solution for file transfers between applications and storage services
- Provide better security compared to using NAT gateways for S3 access
- Significantly reduce data transfer costs when accessing S3 from within the same AWS Region
- When properly configured with bucket policies using
aws:SourceVpce
condition, ensure data never traverses the public internet - Perfect solution for applications that process sensitive information from S3 when compliance requirements prohibit internet transit
- Eliminate data transfer fees between EC2 instances and S3 within the same region when using gateway endpoints
- Can be combined with bucket policies to restrict S3 access to only traffic coming from specific VPC endpoints
- More cost-effective than using NAT gateways or internet gateways for S3 access from private subnets
- Gateway VPC endpoints provide private connectivity to S3 without requiring internet access
- Perfect solution for EC2 instances that need to access S3 without internet connectivity
- Enables applications to process S3 data securely within a VPC's private network
- Gateway endpoints appear as a target in your route tables
- Gateway Endpoints:
- Provide private connectivity to services like S3 and DynamoDB
- Appear as a target in your route tables
- Enable applications to access AWS services without going through the internet
- The most direct and secure solution for accessing S3 from private subnets
- Keep traffic within AWS network without traversing the internet
- No additional cost for using gateway endpoints
- Appear as a target in your route tables
- Interface Endpoints (AWS PrivateLink):
- Enable private connectivity to services using private IP addresses
- Powered by AWS PrivateLink
- Require an Elastic Network Interface with a private IP address in your subnet
- Security Benefits:
- VPC endpoint for S3 enables EC2 instances in private subnets to access S3 through a private network path
- Removes need for internet gateway, NAT device, VPN connection, or Direct Connect
- Provides most secure connection solution for medical record applications requiring private data transfer
- Eliminates exposure to the public internet for sensitive file transfers
- More secure than using NAT gateways which still route traffic over the internet
- More cost-effective and appropriate than Direct Connect for internal AWS service communications
- Simple to implement by moving instances to private subnets and creating the VPC endpoint
- Subscription Access Control:
- For controlling access to premium content in serverless applications:
- Implement API Gateway usage plans and API keys
- Assign API keys to subscribed users
- Create usage plans that define access levels for premium content
- Provides built-in solution with minimal operational overhead
- More appropriate than AWS WAF for subscription management
- More targeted than API throttling or caching which focus on performance, not access control
- Simpler than implementing custom authorization logic in Lambda functions
- Enables fine-grained control over which resources different users can access
- Provides ready-made solution for tiered access to API endpoints
- For controlling access to premium content in serverless applications:
AWS Transit Gateway
- Key Features:
- Hub-and-spoke model for connecting multiple VPCs and on-premises networks
- Reduces the need for numerous VPC peering connections
- Ideal for large multi-VPC or multi-account architectures
- Simplifies routing and management at scale
- Confirmed as an effective solution for centralizing network management
- Multi-Region VPC Communication:
- For connecting VPCs across all regions with minimal administrative effort:
- Use AWS Transit Gateway for VPC communication within a single region
- Implement Transit Gateway peering across regions for cross-region communication
- This provides centralized management of interconnections
- Significantly reduces complexity compared to managing individual VPC peering connections
- More scalable than VPC peering for environments with many VPCs
- More efficient than using Direct Connect gateways or AWS PrivateLink for VPC-to-VPC connectivity
- For connecting VPCs across all regions with minimal administrative effort:
AWS Direct Connect
- Redundancy Best Practices:
- For mission-critical workloads, use multiple connections at separate locations/devices
- Ensure path and device redundancy from each data center (to mitigate single points of failure)
- Validated the importance of multiple connections for high availability
- Data Transfer Benefits:
- Provides a dedicated network connection from on-premises to AWS
- Offers more reliable and consistent network experience than internet-based connections
- Reduces bandwidth limitations and improves performance for large data transfers
- Ideal for transferring time-sensitive data to Amazon S3
- Separates backup traffic from regular internet usage to minimize impact on users
- Cost Optimization Options:
- For connections with low utilization (e.g., less than 10% usage of a 1 Gbps connection), consider a hosted connection with lower capacity
- Contact an AWS Direct Connect Partner to order a hosted connection (e.g., 200 Mbps) tailored to actual usage requirements
- Hosted connections provide the same security benefits as dedicated connections but at lower costs when full capacity isn't needed
- More cost-effective than maintaining a full 1 Gbps connection when only a fraction of the capacity is utilized
- Better than connection sharing, which doesn't fundamentally address low utilization issues
- Data Transfer Cost Optimization:
- For minimizing data transfer egress costs when accessing data warehouse from corporate offices:
- Host visualization tools in the same AWS Region as the data warehouse
- Access the visualization tool over Direct Connect from corporate locations
- Process queries (e.g., 50 MB results) within AWS network
- Only transfer processed visualization data (e.g., 500 KB per webpage) over Direct Connect
- Significantly reduces egress costs by processing data within AWS before transferring
- More cost-effective than querying data warehouse directly over the internet
- Better than hosting visualization tools on-premises which would require transferring large query results
- More efficient than accessing AWS-hosted visualization tools over the internet
- For minimizing data transfer egress costs when accessing data warehouse from corporate offices:
- Direct Connect with Transit Gateway:
- For connecting on-premises data centers to multiple VPCs:
- Set up Direct Connect from on-premises to AWS
- Create a transit gateway and attach each VPC
- Establish connectivity between Direct Connect and transit gateway
- Creates a hub-and-spoke network architecture with minimal management overhead
- More efficient than setting up individual Direct Connect connections to each VPC
- Enables secure private connection without traffic traversing the public internet
- Simplifies network management for organizations with many VPCs
- Provides consistent connectivity across the entire AWS environment
- Reduces operational complexity compared to multiple peering connections
- For connecting on-premises data centers to multiple VPCs:
AWS Global Accelerator
- Key Features:
- Improves availability and performance of applications for users worldwide
- Provides static IP addresses as fixed entry points to applications
- Directs traffic to optimal endpoints based on health, geographic location, and routing policies
- Offers automated failover across AWS regions
- Routes traffic to the nearest healthy application endpoint
- Bypasses DNS caching issues that can lead to outdated routing
- Particularly useful for applications requiring real-time communication (e.g., VoIP)
- Gaming Applications:
- Ideal for TCP and UDP multiplayer gaming capabilities that require low latency
- Can be placed in front of Network Load Balancers to improve global performance
- Routes traffic to the nearest AWS edge location and then to application endpoints over the AWS global network
- Superior to CloudFront for applications requiring both TCP and UDP support
- Unlike Application Load Balancers, supports the UDP protocol essential for real-time gaming communication
- More effective than API Gateway for reducing latency in multiplayer gaming architectures
- UDP Application Performance:
- For UDP-based gaming applications requiring global performance optimization:
- Configure Network Load Balancers in multiple regions pointing to on-premises endpoints
- Create an AWS Global Accelerator and register the NLBs as endpoints
- Use a CNAME record pointing to the accelerator DNS name
- Provides improved performance by routing traffic over AWS global network
- Essential for UDP applications since CloudFront doesn't support UDP traffic
- Better than Application Load Balancers which don't support UDP protocols
- Perfect for applications that must remain on-premises but need global performance
- Specifically designed for applications requiring low latency, edge location routing, and static IP addresses
- Best solution for modified Linux kernels that only support UDP-based traffic
- For UDP-based gaming applications requiring global performance optimization:
- Global Accelerator Protection:
- For protecting self-managed DNS services running on EC2 instances with Global Accelerator:
- Subscribe to AWS Shield Advanced
- Add the Global Accelerator as a protected resource (not the EC2 instances)
- Provides enhanced protection against DDoS attacks at the network and application layers
- More effective than protecting individual EC2 instances since Global Accelerator is the entry point
- Superior to WAF web ACLs with rate-limiting rules for comprehensive DDoS protection
- Offers the most complete protection for Global Accelerator endpoints distributing traffic across regions
- For protecting self-managed DNS services running on EC2 instances with Global Accelerator:
- Optimizing UDP Traffic:
- Provides static IP addresses for application endpoints across AWS regions
- Routes traffic to the nearest edge location for improved performance
- Works well with Network Load Balancer for applications supporting only UDP traffic
- Especially valuable for gaming applications requiring low latency
- Superior to Route 53 with ALB for UDP traffic (ALBs don't support UDP)
- Better than CloudFront for non-HTTP traffic
- More suitable than API Gateway which primarily handles HTTP/S-based traffic
Amazon Route 53
- Multi-Value Routing Policy:
- Returns IP addresses of all healthy EC2 instances in response to DNS queries
- When used with health checks, only returns addresses for healthy instances
- Provides a form of basic DNS-based load balancing
- More suitable than Simple routing policy which returns only one randomly selected value
- Different from Latency routing policy which routes based on network performance
- Not the same as Geolocation routing policy which routes based on user geographic location
- Perfect for applications requiring clients to receive addresses of all healthy backend instances
- Can return up to 8 healthy records in response to each DNS query
- Particularly valuable for distributing traffic across multiple endpoints (e.g., a set of seven EC2 instances)
- Different from using an ELB as it returns multiple IP addresses directly to the client for client-side load balancing
- Migrating from Another DNS Provider:
- To migrate from a DNS provider experiencing outages to Route 53:
- Create an Amazon Route 53 public hosted zone for the domain name
- Import the zone file containing the domain records hosted by the previous provider
- This approach ensures reliable DNS hosting with high availability
- More suitable than private hosted zones which are designed for use within specific VPCs
- Better than using AWS Directory Service for Microsoft Active Directory which is for directory services, not public DNS hosting
- More appropriate than Route 53 Resolver inbound endpoints which are for forwarding DNS queries between on-premises networks and VPCs
- To migrate from a DNS provider experiencing outages to Route 53:
Amazon CloudFront
- Key Features:
- Content Delivery Network (CDN) for caching and global distribution
- Low latency, high transfer speeds for delivering content to users
- Integrates with Amazon S3 for static website hosting
- Can integrate with AWS WAF for security at the edge
- Provides geographic restrictions (geo-blocking) to prevent users in specific locations from accessing content
- Helps comply with content distribution rights by restricting access based on the user's location
- Field-Level Encryption:
- Adds an additional security layer beyond HTTPS for protecting sensitive user information
- Encrypts specified form fields at the edge location before forwarding to the origin
- Ensures sensitive data remains encrypted throughout the application stack
- Only applications with the correct decryption keys can access the protected data
- Can specify up to 10 data fields in POST requests to be encrypted
- Reduces risk of data breaches by keeping sensitive information encrypted even when moving through internal systems
- More secure than relying solely on HTTPS which only protects data in transit
- Provides more granular protection than signed URLs or cookies which control access to content rather than encrypting specific data
- Can use different public keys for encrypting different data fields for additional security separation
- Signed URLs/Cookies:
- Restrict access to private content by requiring a signed URL or cookie (often with an expiration time)
- Commonly used to securely serve content to authorized users (e.g., subscriber-only videos or downloads)
- For high-speed, secure global distribution, use CloudFront with S3. S3 Transfer Acceleration can also help for uploads from remote locations
- DDoS Protection:
- Caches content at edge locations around the world, absorbing traffic before it reaches origin servers
- Helps mitigate DDoS attacks by distributing traffic across multiple edge locations
- Integrates with AWS Shield (including standard protection at no extra cost)
- Provides a global distribution network for both static and dynamic website content
- When configured for both static and dynamic content, absorbs DDoS traffic at edge locations
- Prevents attack traffic from reaching origin servers directly
- Distributes traffic across globally distributed edge locations
- Includes integration with standard AWS Shield at no additional cost
- Particularly effective for Windows-based web server environments
- Can protect both static assets and dynamic application content
- Creates a buffer zone between internet users and origin infrastructure
- When combined with Shield Advanced, creates comprehensive DDoS protection solution
- More effective than attempting to block attacker IPs with network ACLs
- Better than relying on Auto Scaling alone which could lead to excessive costs during attacks
- Content Delivery Optimization:
- Ideal for caching and delivering static content like HTML pages to global audiences
- Provides efficient and effective solution for delivering media files stored in S3 globally
- Caches content at edge locations nearest to end users for maximum performance
- Can handle millions of views from around the world without impacting origin S3 buckets
- More effective for global content delivery than presigned URLs, cross-region replication, or Route 53 geoproximity
- Securely delivers confidential media files with options for signed URLs/cookies for protected content
- Offers significant performance advantages over accessing S3 buckets directly
- Essential for serving static content like event reports that will receive millions of global views
- The most efficient solution for globally distributing S3-hosted static websites with high traffic volume
- Perfect for serving daily static HTML reports expected to generate millions of views from users around the world
- Eliminates the need for customers to maintain their own content distribution infrastructure while providing global reach
- Significantly reduces origin load by caching content at edge locations, improving performance for both static and dynamic content
- Cost Optimization for Static Content:
- Most cost-effective solution for reducing load on EC2 instances serving static website content
- Caches static files at edge locations closest to users, reducing origin server load
- Automatically scales to handle increasing website traffic without proportional cost increases
- Decreases content delivery latency while reducing data transfer costs from EC2 instances
- More suitable than ElastiCache, WAF, or multi-region ALB deployments for static content delivery
- Provides global content distribution with minimal operational overhead and infrastructure costs
- Cache Invalidation for Content Updates:
- When updates to static websites hosted in S3 are not appearing:
- Invalidate the CloudFront cache to force retrieval of the latest content
- Ensures updates from CI/CD pipelines are immediately visible to users
- More appropriate than adding Application Load Balancers for static content delivery
- More effective than adding ElastiCache which addresses database performance, not content delivery
- Better solution than modifying SSL certificates which don't affect content caching
- Essential operation when content updates must be immediately available to all users
- Can be performed through the console, API, or AWS CLI
- Typical request syntax: aws cloudfront create-invalidation --distribution-id DISTRIBUTION_ID --paths "/*"
- Most efficient approach when specific paths or files need to be immediately updated
- More targeted than waiting for TTL expiration which could result in stale content being served
- When updates to static websites hosted in S3 are not appearing:
- Downloadable Content Distribution:
- Ideal for websites that provide downloadable historical reports or large files
- Combined with S3, provides a highly scalable, durable, and globally available solution
- Offers the fastest possible response times by serving content from edge locations closest to users
- More cost-effective than EC2 or Lambda-based solutions for serving static content
- Requires minimal infrastructure management as both services scale automatically
- Particularly effective for websites with unpredictable global traffic patterns
- Works seamlessly with S3 to deliver high-performance downloads to users worldwide
- Edge caching significantly reduces origin load and improves download speeds
- Optimizes bandwidth usage and reduces latency for global user base
- Provides better global performance than any single-region solution could offer
- Securing S3 Content:
- Create an origin access identity (OAI) and assign it to the CloudFront distribution
- Configure S3 bucket permissions to grant read permission only to the OAI
- Blocks direct access to files via S3 URLs while allowing access through CloudFront
- More effective than writing individual bucket policies
- Better than creating IAM users for CloudFront access
- More accurate than setting the CloudFront distribution ID as Principal in bucket policies
- Static Website Optimization:
- Caching static content at edge locations significantly reduces load on origin EC2 instances
- Improves performance by decreasing content delivery times to end users
- More cost-effective than adding more EC2 instances to handle increased static content requests
- Particularly valuable for applications with unpredictable usage spikes
- Reduces origin server load by serving cached content from edge locations
- Creates better user experience with lower latency content delivery
- Provides built-in DDoS protection as part of AWS's global infrastructure
Elastic Load Balancing
- Types of Load Balancers:
- Application Load Balancer (ALB): For HTTP/HTTPS (layer 7) traffic routing with advanced rule-based routing
- Network Load Balancer (NLB): For TCP/UDP (layer 4) traffic routing
- Supports extremely high throughput (millions of requests per second)
- Provides ultra-low latencies (ideal for real-time or gaming applications using UDP)
- Efficiently distributes UDP traffic across multiple targets
- Gateway Load Balancer: For deploying and managing third-party virtual network appliances
- Classic Load Balancer: Legacy load balancer (Elastic Load Balancing version 1; not recommended for new designs)
- Application Load Balancer Features:
- Distributes incoming traffic across multiple targets (e.g., EC2 instances, containers)
- Monitors the health of targets and only routes traffic to healthy ones
- Supports multi-AZ deployments for high availability
- Can integrate with Auto Scaling groups to scale the target fleet as traffic changes
- Designed to work with HTTP and HTTPS traffic, allowing for advanced routing decisions
- Can perform health checks based on HTTP response content to detect application-level errors
- Automatically removes unhealthy instances from the target group
- When combined with Auto Scaling, can replace unhealthy instances to maintain application availability
- Application Load Balancer for HTTP Error Detection:
- When an HTTP application behind a Network Load Balancer (NLB) experiences undetected HTTP errors:
- Replace the NLB with an Application Load Balancer (ALB)
- Enable HTTP health checks by supplying the application's health URL
- Configure an Auto Scaling action to replace unhealthy instances automatically
- ALBs are designed specifically for HTTP/HTTPS traffic with content-based health checks
- NLBs operate at Layer 4 (transport) and cannot detect application-level HTTP errors
- ALBs operate at Layer 7 (application) and can evaluate HTTP response content
- This approach improves application availability without requiring custom scripts or code
- More effective than adding cron jobs to instances which would introduce operational complexity
- Superior to CloudWatch alarms monitoring UnHealthyHostCount which might not capture HTTP-specific errors
- Allows automatic replacement of instances experiencing HTTP errors without manual intervention
- Creates a self-healing application architecture that maintains high availability
- When an HTTP application behind a Network Load Balancer (NLB) experiences undetected HTTP errors:
- Gateway Load Balancer:
- Ideal for integrating third-party virtual appliances for packet inspection
- Deploys in an inspection VPC to analyze traffic before it reaches application servers
- Gateway Load Balancer endpoints receive incoming packets and forward them to security appliances
- Provides the least operational overhead for implementing traffic inspection with third-party appliances
- Makes it easy to deploy, scale, and manage third-party virtual appliances like firewalls
- Enables transparent network traffic inspection without complex routing configurations
- Simplifies integration of security appliances from AWS Marketplace into existing architectures
- Most efficient solution for inspecting traffic with third-party security appliances before it reaches application servers
- Particularly useful when integrating virtual firewall appliances from AWS Marketplace that are configured with IP interfaces
- Can be deployed in a dedicated inspection VPC to centralize security inspection for multiple application VPCs
Amazon VPC
- Internet Gateways:
- A single Internet Gateway can route traffic for the entire VPC (across all AZs)
- Internet Gateways provide a target in VPC route tables for internet-routable traffic
- Do not require redundancy across Availability Zones (they are a managed service)
- Managed Prefix Lists:
- Sets of one or more CIDR blocks to simplify security group and route table configurations
- Customer-managed prefix lists can be shared across AWS accounts via AWS Resource Access Manager
- Helps centrally manage allowed IP ranges across an organization
- Makes it easier to update and maintain security groups and route tables
- Can consolidate multiple security group rules (with different CIDRs but same port/protocol) into a single rule
- NAT Gateways:
- Managed service provided by AWS allowing instances in private subnets to connect to the internet or other AWS services
- Do not require patching, are automatically scalable, and provide built-in redundancy for high availability
- Should be placed in different Availability Zones for fault tolerance
- Replacing NAT instances with NAT gateways in different AZs ensures high availability and automatic scaling
- For multi-AZ high availability, create a NAT gateway in each public subnet for each Availability Zone. Then the route tables for private subnets route traffic to their local NAT gateway
- Best practice for high availability is to create one NAT gateway in each AZ where you have private subnets
- This design ensures that if one AZ becomes unavailable, instances in other AZs can still access the internet
- Using NAT gateways is preferred over NAT instances as they're managed services that automatically scale and are more fault-tolerant
- NAT instances require more management and manual intervention for high availability and scaling
- A VPC can only have one internet gateway attached at any time
- Egress-only internet gateways are specifically designed for outbound-only IPv6 traffic, not for IPv4 traffic
- For multi-AZ redundancy, it's recommended to create NAT gateways in each public subnet and configure private subnet route tables to forward traffic to the NAT gateway in the same AZ
- This approach maintains high availability by ensuring if one AZ becomes unavailable, the other AZs can still provide internet access for EC2 instances
- Security Groups:
- For bastion host setups, the security group of the bastion host should only allow inbound access from the external IP range of the company
- For application instances in private subnets, the security group should allow inbound SSH access only from the private IP address of the bastion host
- This configuration ensures that only connections from company locations can reach the bastion host, and only the bastion host can access application servers
- For web tiers in multi-tier architectures, configure the security group to allow inbound traffic on port 443 from 0.0.0.0/0 (HTTPS from the internet)
- For database tiers, configure the security group to allow inbound traffic only on the specific database port (e.g., 1433 for SQL Server) from the web tier's security group
- These configurations follow the principle of least privilege and enhance overall security posture
- VPC Connectivity Options:
- For connecting two VPCs in the same region within the same AWS account, VPC peering is the most cost-effective solution
- VPC peering provides direct network connectivity that enables inter-VPC communication with minimal operational overhead
- More cost-effective than Transit Gateway for simpler networking scenarios with moderate data transfer (e.g., 500 GB monthly)
- Simpler than Site-to-Site VPN which introduces unnecessary complexity and additional costs
- Not suited for Direct Connect which is designed for connecting on-premises networks to AWS
- Requires updating route tables in each VPC to use the peering connection for inter-VPC communication
- Expanding VPC Address Space:
- To resolve insufficient IP addresses in a VPC with minimal operational overhead:
- Add an additional IPv4 CIDR block to the existing VPC
- Create additional subnets using the new CIDR range
- Place new resources in the new subnets
- This approach is more efficient than:
- Creating a second VPC with peering connection
- Using Transit Gateway to connect multiple VPCs
- Setting up VPN connections between VPCs
- To resolve insufficient IP addresses in a VPC with minimal operational overhead:
- Gateway Endpoints for S3 Cost Reduction:
- For EC2 instances in private subnets that access S3 frequently, provisioning a VPC gateway endpoint eliminates data transfer costs
- Configure route tables for private subnets to use the gateway endpoint for S3 traffic instead of NAT gateways
- Eliminates data processing and transfer costs associated with routing S3 traffic through NAT gateways
- More cost-effective than using NAT instances or multiple NAT gateways for S3 access
- Provides direct, private connectivity to S3 without traversing the public internet or NAT devices
- NAT Gateways for Internet Access from Private Subnets:
- For EC2 instances in private subnets that need to communicate with external services (like license servers):
- Provision a NAT gateway in a public subnet
- Modify each private subnet's route table with a default route pointing to the NAT gateway
- This is a managed solution that minimizes operational maintenance compared to NAT instances
- NAT gateways must be placed in public subnets, not private subnets
- Creates a scalable, highly available solution that doesn't require patching or maintenance
- Appropriate for three-tier web applications where application and database tiers need outbound internet access
- For EC2 instances in private subnets that need to communicate with external services (like license servers):
- Public Subnet Placement:
- NAT gateways must be provisioned in a public subnet to provide internet access for resources in private subnets
- Placing NAT gateways in a private subnet would prevent them from routing traffic to the internet
- For three-tier applications, a NAT gateway enables application servers in private subnets to access external services
- Always configure NAT gateways in public subnets with routing to an Internet Gateway
VPC Security Best Practices
- Multi-tier Security Group Configuration:
- For securing two-tier architectures with web and database layers:
- Create a security group for web servers allowing inbound traffic from 0.0.0.0/0 on port 443
- Create a security group for database instances allowing inbound traffic only from the web server security group on the database port
- References security groups rather than CIDR blocks for finer control of database access
- More secure than allowing database access from the entire public subnet CIDR
- Avoids using network ACLs which are stateless and require separate inbound/outbound rules
- Follows principle of least privilege by limiting database access to only web servers
- Security groups don't support deny rules; they're stateful and only need allow rules
- Simplifies security management as new web servers automatically get database access
- For securing two-tier architectures with web and database layers:
Security Best Practices
Security Group Configurations
- For Two-Tier Web Applications:
- Web Tier in Public Subnets:
- Configure security groups to allow inbound traffic on port 443 (HTTPS) from 0.0.0.0/0 for public web access
- This enables secure web application access from the internet using SSL/TLS
- Allows customers with dynamic IP addresses to access the web application
- Ensures proper encryption of data in transit between users and the web tier
- Database Tier in Private Subnets:
- Configure security groups to allow inbound traffic on the database port (e.g., 1433 for SQL Server) only from the web tier's security group
- Restricts database access to only the application servers in the web tier
- Enhances security by preventing direct database access from the internet or unauthorized sources
- Follows the principle of least privilege by exposing only necessary resources
- This configuration creates a properly secured multi-tier architecture where:
- Users from the internet can only access the web tier through HTTPS
- Only the web tier can access the database tier
- The database tier is completely isolated from direct internet access
- Each tier has precisely the access controls needed for its function
- Web Tier in Public Subnets:
VPC Peering
- Secure Cross-VPC Database Access:
- For EC2 instances in one VPC accessing databases in another VPC (same account):
- Configure a VPC peering connection between the VPCs
- Update route tables in both VPCs to route traffic to the peered VPC
- Configure security groups to allow traffic between the instances
- Provides secure private connectivity without exposing resources to the internet
- Uses private IP addresses which enhances security of database access
- More secure than allowing database access using public IP addresses via security groups
- Better than making database instances publicly accessible with public IP addresses
- More efficient and secure than proxy solutions through intermediate EC2 instances
- For EC2 instances in one VPC accessing databases in another VPC (same account):
- Cross-Account VPC Access:
- For applications in VPC-A accessing files in EC2 instances in VPC-B across different AWS accounts:
- Set up VPC peering between the VPCs across accounts
- Configure appropriate route tables and security groups in both VPCs
- Provides private connectivity between VPCs in different accounts using AWS global infrastructure
- Eliminates single points of failure with AWS's highly available networking infrastructure
- Offers direct network routes using private IP addresses with no bandwidth constraints
- More appropriate than VPC gateway endpoints which are for connecting to AWS services, not EC2 instances
- Superior to virtual private gateway solutions which typically for connecting to on-premises networks
- Better than private virtual interfaces (VIFs) which are used with Direct Connect for on-premises connectivity
- For applications in VPC-A accessing files in EC2 instances in VPC-B across different AWS accounts:
- Cross-Account Database Access:
- Enables private networking communication between VPCs leveraging AWS global infrastructure
- Allows applications in one VPC to securely access databases in another VPC without exposing to the internet
- VPC peering supports inter-VPC connectivity without requiring gateways, VPN connections, or separate hardware
- Uses private IP addresses, enhancing security of database access compared to public IP connections
- For applications across AWS accounts that need to access files in EC2 instances:
- Set up VPC peering between VPCs across accounts
- Configure route tables and security groups in both VPCs
- Provides secure, direct network routes using private IP addresses with no bandwidth constraints
- No single points of failure as peering connections use AWS's highly available networking infrastructure
- More suitable than gateway endpoints which are designed for AWS services, not EC2 instances
- Better than virtual private gateway solutions which typically connect to on-premises networks
- Superior to proxy solutions through intermediate EC2 instances with Elastic IP addresses
- Cross-Account Data Sharing with Lake Formation:
- For sharing data across multiple AWS accounts within an organization:
- Use Lake Formation tag-based access control permissions
- Define tags for specific data resources requiring access
- Grant permissions based on these tags to users in other accounts
- Provides granular access control without copying data between accounts
- More efficient than creating a common account with IAM roles
- Reduces operational overhead compared to copying data or creating custom solutions
- Perfect for organizations with central data teams serving multiple departments with separate AWS accounts
- Maintains security while enabling cross-account collaboration
- For sharing data across multiple AWS accounts within an organization:
- Multi-Tier Application Security Groups:
- For securing two-tiered architectures with web servers and databases:
- Create a security group for web servers allowing traffic from the internet on port 443
- Create a security group for the database allowing traffic only from the web servers' security group on port 3306
- Creates secure separation between application tiers
- Follows principle of least privilege by restricting database access
- More secure than using network ACLs alone which operate at subnet level
- Better than allowing database access from the entire subnet CIDR block
- Provides stateful access control with automatic return traffic allowance
- For securing two-tiered architectures with web servers and databases:
Network Load Balancer
- TLS for Data in Transit:
- For securing data transmission in multi-tier applications:
- Configure a TLS listener on the Network Load Balancer
- Deploy the server certificate on the NLB
- Ensures encryption of data between clients and the load balancer
- Essential for applications processing sensitive information like sensor data
- Protects data in transit across all tiers of the application
- More directly addresses transport security than WAF or Shield (which focus on different security aspects)
- For securing data transmission in multi-tier applications:
Route 53 Geolocation Routing
- Key Features:
- Routes traffic based on the geographic location of users
- Optimizes website load times by directing users to the nearest geographic infrastructure
- Can direct traffic near specific AWS regions to on-premises data centers in the same area
- More precise for geographic traffic management than latency or weighted routing policies
- Ideal for minimizing load times when hosting infrastructure in multiple geographic locations
- Can be configured to route users based on continent, country, or US state
- Different from latency-based routing which focuses on network performance rather than geographic location
- Perfect for hybrid infrastructure with both cloud and on-premises hosting in different geographic regions
- More effective than simple or weighted routing policies for globally distributed applications
Network Redundancy
- Eliminating Single Points of Failure:
- For Management VPC connected to on-premises via VPN:
- Add a second set of VPNs from a second customer gateway device
- This provides redundancy for the on-premises connection
- More effective than adding a second virtual private gateway to the VPC
- Better than adding redundant VPC peering or VPNs between VPCs
- Addresses the specific single point of failure in the customer gateway
- For Management VPC connected to on-premises via VPN:
Security and Identity Services
AWS KMS (Key Management Service)
- Customer Managed Keys:
- KMS keys that you create, own, and manage in your AWS account
- You have full control over these keys, including establishing and maintaining their key policies
- Can enable/disable keys, rotate cryptographic material, add tags, create aliases, and schedule deletion
- Customer managed keys appear on the Customer managed keys page of the AWS Management Console
- Can be identified by the KeyManager field value of "CUSTOMER" in the DescribeKey response
- Usable in cryptographic operations with usage tracked in AWS CloudTrail logs
- Many AWS services allow specifying customer managed keys to protect stored and managed data
- Incur monthly fees and usage fees beyond the free tier
- Counted against AWS KMS quotas for your account
- Support optional automatic key rotation for symmetric encryption keys with AWS KMS-created key material
- AWS Managed Keys:
- Automatically rotated every 3 years by AWS
- Less control over rotation schedule
- AWS Owned Keys:
- Entirely controlled by AWS
- No visibility or control for customers
- External Keys:
- Allows key material import
- Provides highest level of control but more operational overhead
- Key Rotation Benefits:
- Key rotation changes only the key material (cryptographic secret), not the KMS key itself
- The key ID, ARN, region, policies, and permissions remain unchanged during rotation
- No need to update applications or aliases that reference the key ID or ARN
- Rotating key material doesn't affect the use of the KMS key in any AWS service
- AWS KMS automatically rotates customer managed keys yearly when enabled
- Properties of the KMS key remain constant regardless of key material changes
- AWS always rotates AWS managed keys yearly (rotation interval changed in May 2022)
- CloudTrail Integration:
- All key usage operations are logged in CloudTrail
- Provides detailed audit trail for compliance requirements
- Essential for regulated industries that need to track all encryption/decryption operations
- Helps organizations demonstrate compliance with key management policies
- Benefits for Scalable Key Management:
- Reduces operational burden of managing encryption keys
- Provides a managed service for creating and controlling encryption keys
- Automatically scales to meet application demands
- Highly available and secure solution for scalable key management
- More comprehensive than using MFA for protecting encryption keys
- Better suited than AWS Certificate Manager (ACM) for application data encryption
- Complements IAM policies which limit access but don't reduce operational burden
- AWS KMS (Key Management Service) for Application Encryption:
- Provides centralized control over encryption keys used in applications
- Reduces operational burden of managing encryption keys across scalable infrastructure
- Integrates with other AWS services for easier data encryption
- Offers highly available and secure solution for key management
- Better option than using MFA, ACM, or IAM policies alone for encryption key management
- Cross-Account S3 Access:
- For allowing team members to access S3 buckets across multiple AWS accounts:
- Create an IAM role in the production account with access to target S3 bucket
- Add the development account as a principal in the trust policy of the role
- Allow users in development account to assume the production role as needed
- Follows principle of least privilege by granting only necessary permissions
- Reduces administrative overhead compared to creating duplicate users in production
- More secure than modifying S3 Block Public Access settings
- Provides centralized access control through IAM roles and policies0
- For allowing team members to access S3 buckets across multiple AWS accounts:
AWS Secrets Manager
- Key Features:
- Secure storage of credentials, API keys, etc.
- Automatic secret rotation (e.g., for RDS databases)
- Integrated with AWS Lambda for custom rotation logic
- Helps manage, retrieve, and rotate database credentials, application credentials, OAuth tokens, API keys, and other secrets
- Has native integration with Amazon RDS for automatic password rotation
- Improves security posture by eliminating hard-coded credentials in application code
- Credentials are retrieved dynamically via API when needed
- Supports automatic rotation scheduling for secrets
- Alternatives:
- AWS KMS: Manages encryption keys, not arbitrary secrets
- AWS Systems Manager Parameter Store: Can store secrets, but has no built-in rotation
- Reducing Credential Management Overhead:
- Designed for secure storage, management, and rotation of credentials
- Supports automatic rotation of database credentials without manual intervention
- Particularly valuable for EC2 instances connecting to databases like Aurora
- Enhances security while minimizing operational overhead
- Superior to Systems Manager Parameter Store for automatic credential rotation
- More secure than storing credentials in S3 buckets even with KMS encryption
- More efficient than using encrypted EBS volumes for credential storage
- Database Credential Management:
- Best solution for storing and automatically rotating database credentials with minimal operational overhead
- Enables applications to retrieve credentials at runtime instead of hardcoding them
- Can be integrated with EC2 instances through IAM roles for secure access
- Automatically handles credential updates in RDS databases during rotation
- Particularly effective for connecting EC2 instances to Aurora databases for automated credential management
- Minimizes the operational overhead of credential management by enabling automatic rotation of database credentials
- AWS Secrets Manager for Database Credentials:
- Securely stores, manages, and rotates database credentials
- Supports automatic rotation of credentials to meet security requirements
- Integrates with IAM for controlled access to secrets
- Provides direct integration with RDS and other AWS database services
- More appropriate than AWS Systems Manager OpsCenter for credential management
- More secure than storing credentials in S3 buckets
- Less operational overhead than manually managing encrypted files or EBS volumes
- Migrating On-Premises Applications to AWS:
- For securing credentials when migrating Node.js applications to Lambda:
- Store database credentials as secrets in AWS Secrets Manager
- Configure automatic rotation every 30 days
- Update Lambda functions to retrieve credentials from Secrets Manager
- More operationally efficient than Parameter Store which lacks native rotation capabilities
- More secure than storing credentials as encrypted Lambda environment variables which requires custom rotation logic
- Better aligned with best practices than using KMS for credential storage
- For securing credentials when migrating Node.js applications to Lambda:
AWS Systems Manager
- Session Manager Logging:
- To send Session Manager logs to S3 for archival with minimal operational overhead:
- Enable S3 logging directly from the Systems Manager console
- Choose an S3 bucket as the destination for session data
- This approach is more operationally efficient than using CloudWatch agents with export to S3
- Eliminates the need for custom script development or additional services
- Provides the most direct path for archiving Session Manager logs
- To send Session Manager logs to S3 for archival with minimal operational overhead:
- Replacing SSH Keys:
- Provides secure EC2 instance access without the need to manage SSH keys
- Allows connecting to instances via AWS Management Console or AWS CLI without:
- Opening inbound ports
- Managing SSH keys
- Setting up bastion hosts
- Centralizes access control through IAM policies
- Records session activity for audit purposes
- Significantly reduces administrative overhead compared to traditional SSH key management
- Perfect solution for meeting security requirements to remove all shared SSH keys
- More secure and manageable than STS-generated temporary credentials or bastion host approaches
AWS Firewall Manager
- Key Benefits:
- Centrally configure AWS WAF rules across accounts
- Manage AWS Shield Advanced, AWS WAF, and even security groups in one place
- Automatically protect new resources (e.g., CloudFront distributions, Application Load Balancers) with common security rules
AWS Network Firewall
- Key Features:
- Protects your VPCs with stateful traffic filtering and intrusion detection
- Not used for automatically auditing or managing security groups
- Supports deep packet inspection for advanced filtering
- Can replicate on-premises inspection use cases, performing traffic flow inspection and filtering in the AWS environment
- Makes it easy to deploy essential network protections for all VPCs
- Provides customizable rules for stateful traffic inspection and filtering
- Scales automatically with network traffic without provisioning or managing infrastructure
- Ideal for replacing on-premises inspection servers that perform traffic flow inspection and filtering
- More appropriate for traffic inspection and filtering than GuardDuty (which focuses on threat detection)
- Provides direct traffic filtering capabilities that Traffic Mirroring alone doesn't offer
- Traffic Control for Private Subnets:
- Managed service that makes it easier to deploy essential network protections for VPCs
- Can be configured with domain list rule groups to allow specific outbound traffic
- Perfect solution for allowing EC2 instances to access only approved third-party software repositories for updates
- More appropriate than WAF for managing outbound traffic based on domain names/URLs
- Superior to security groups which can't filter traffic based on domain names
- Unlike Application Load Balancers, specifically designed to control outbound internet access
- Can be implemented by updating private subnet route tables to route outbound traffic through the firewall
- Network Inspection Capabilities:
- Enables traffic flow inspection across VPC environments
- Provides customizable traffic filtering similar to traditional firewalls
- Can be deployed as a replacement for on-premises traffic inspection solutions
- More appropriate than Traffic Mirroring which only copies traffic but doesn't filter it
- More suitable than GuardDuty which detects threats but doesn't actively filter traffic
- Superior to Firewall Manager alone, which manages firewall rules but doesn't perform inspection
- Firewall Manager can be used in conjunction with Network Firewall for centralized management
- Essential component for organizations transitioning from on-premises network security controls
- Provides a complete replacement for traditional inspection servers that monitor traffic flows
- Helps maintain consistent security posture between on-premises and AWS environments
- On-premises Firewall Replacement:
- Best solution for replacing on-premises inspection servers that perform traffic flow inspection and filtering
- Creates highly customizable rules for stateful traffic inspection and filtering
- Scales automatically with network traffic without provisioning or managing infrastructure
- More suitable than GuardDuty (which detects threats but doesn't filter traffic) or Traffic Mirroring (which copies traffic but doesn't filter)
- More appropriate than Firewall Manager which manages policies but doesn't perform inspection itself
AWS WAF (Web Application Firewall)
- Key Attach Points:
- Application Load Balancer
- Amazon CloudFront distribution
- Amazon API Gateway
- AWS AppSync
- Helps protect web applications against common exploits (e.g., SQL injection, XSS)
- Can create custom rules to block or allow traffic based on attributes (e.g., IP, geolocation)
- Provides a robust solution for filtering web traffic and enforcing access rules
- Cannot be directly attached to Network Load Balancers (NLBs) - use Shield Advanced for NLB protection
- Rate-Limiting:
- To block illegitimate traffic with minimal impact on legitimate users:
- Deploy AWS WAF and associate it with an Application Load Balancer
- Configure rate-limiting rules to restrict requests per client
- More effective than network ACLs for handling attacks from changing IP addresses
- Provides granular control over traffic patterns without impacting legitimate users
- Better solution than GuardDuty which focuses on detection rather than prevention
- Superior to Amazon Inspector which is for security assessment rather than traffic filtering
- To block illegitimate traffic with minimal impact on legitimate users:
- API Gateway Protection:
- Regional AWS WAF web ACLs with rate-based rules can protect API Gateway endpoints from HTTP flood attacks
- Automatically tracks requests per IP address and blocks IPs that exceed defined thresholds
- Provides immediate protection against DDoS attacks with minimal operational overhead
- More effective than CloudWatch monitoring which only provides detection but not prevention
- Simpler to implement than custom Lambda@Edge solutions for rate limiting
- SQL Injection Protection:
- For applications vulnerable to SQL injection attacks:
- Deploy AWS WAF in front of Application Load Balancer
- Associate appropriate web ACLs with SQL injection detection patterns
- Provides real-time protection against common web exploits
- More effective than security group rules which operate at network level, not application level
- More appropriate than AWS Shield Advanced which focuses on DDoS protection, not application attacks
- Superior to Amazon Inspector which assesses vulnerabilities but doesn't actively block attacks
- Essential for applications with database backends accessible through web interfaces
- For applications vulnerable to SQL injection attacks:
AWS Shield Advanced
- Key Features:
- Provides enhanced protection against Distributed Denial of Service (DDoS) attacks
- Offers additional detection and mitigation capabilities against large-scale and sophisticated DDoS attacks
- Can protect Amazon EC2 instances, Elastic Load Balancing, CloudFront distributions, and more
- Ensures website remains available during DDoS attacks
- Integrates with CloudFront for edge-based protection
- Provides 24×7 access to the AWS DDoS Response Team (DRT) for assistance during attacks
- Offers protection against DDoS-related charges that might result from attacks
- Particularly important for Network Load Balancers which cannot use AWS WAF directly
- Critical for protecting NLBs in API-driven cloud communication platforms against DDoS attacks
- Should be combined with AWS WAF on API Gateway for comprehensive protection of API architectures
- Global Accelerator Protection:
- For protecting self-managed DNS services running on EC2 instances with Global Accelerator:
- Subscribe to AWS Shield Advanced
- Add the Global Accelerator as a protected resource (not the EC2 instances)
- Provides enhanced protection against DDoS attacks at the network and application layers
- More effective than protecting individual EC2 instances since Global Accelerator is the entry point
- Superior to WAF web ACLs with rate-limiting rules for comprehensive DDoS protection
- Offers the most complete protection for Global Accelerator endpoints distributing traffic across regions
- DDoS Protection:
- Provides comprehensive protection against Distributed Denial of Service (DDoS) attacks for applications running on AWS
- Offers enhanced detection and mitigation capabilities for EC2 instances, Elastic Load Balancing (ELB), CloudFront distributions, and Route 53
- Protects Application Load Balancers (ALBs) from web attacks and DDoS threats
- Includes cost protection in the event of DDoS attacks that result in increased usage charges
- More effective for DDoS mitigation than Amazon Inspector, Macie, or GuardDuty which focus on different security aspects
- Amazon Inspector is designed for automated security assessments, not DDoS protection
- Amazon Macie is focused on sensitive data discovery and protection rather than attack prevention
- Amazon GuardDuty detects malicious activity but does not provide DDoS attack mitigation
- Works with CloudFront to create a comprehensive solution that protects websites both at edge locations and origin servers
- Specific Protection Benefits:
- Essential component for protecting Windows web server infrastructure against large-scale DDoS attacks
- Can mitigate attacks originating from thousands of IP addresses simultaneously
- Provides early detection and mitigation of sophisticated network and application layer attacks
- Works with CloudFront to create multilayered defense against DDoS traffic
- Offers protection for mission-critical websites where downtime is not acceptable
- More effective than GuardDuty which focuses on detection rather than attack mitigation
- Superior to network ACL modification via Lambda which cannot scale to thousands of IP addresses
- More appropriate than Spot Instance scaling which doesn't address the attack vector directly
- Protection Against DDoS Attacks:
- Provides enhanced protection beyond standard AWS Shield for ELB, CloudFront, and Route 53
- Offers detection and mitigation against large, sophisticated DDoS attacks
- Includes 24/7 access to AWS DDoS Response Team
- Provides financial protection against DDoS-related usage spikes
- More specialized than GuardDuty (threat detection) for DDoS protection
- More comprehensive than Inspector (vulnerability assessment) for attack mitigation
- More effective than simply enabling AWS Shield without specific resource assignment
- Web Application DDoS Protection:
- For protecting public-facing web applications behind Elastic Load Balancers:
- Enable AWS Shield Advanced and assign the ELB to it
- Provides comprehensive protection against large-scale DDoS attacks
- More appropriate than GuardDuty which focuses on threat detection but not DDoS mitigation
- More effective than Amazon Inspector which is for vulnerability assessment not attack protection
- Better than just enabling standard AWS Shield without specific resource assignment
- Essential for applications using third-party DNS services that still need AWS DDoS protection
- For protecting public-facing web applications behind Elastic Load Balancers:
AWS Regional WAF Web ACL
- HTTP Flood Protection:
- For protecting API Gateway endpoints from HTTP flood attacks:
- Create a Regional AWS WAF web ACL with rate-based rules
- Associate the web ACL with the API Gateway stage
- Tracks requests per client IP and automatically blocks when thresholds are exceeded
- Provides immediate protection against DDoS attacks with minimal operational overhead
- More proactive than CloudWatch monitoring which only alerts after detection
- Less complex to maintain than custom solutions using Lambda@Edge
- For protecting API Gateway endpoints from HTTP flood attacks:
- DDoS Protection with Rate-Based Rules:
- Regional AWS WAF web ACLs with rate-based rules can protect API Gateway endpoints from HTTP flood attacks
- Automatically tracks requests per IP address and blocks IPs that exceed defined thresholds
- Provides immediate protection against DDoS attacks with minimal operational overhead
- More effective than CloudWatch monitoring which only provides detection but not prevention
- Creates a protective barrier against excessive requests from single sources
- Helps maintain application availability during attempted denial-of-service attacks
- Particularly valuable for public-facing APIs with variable legitimate traffic patterns
AWS IAM Identity Center (AWS Single Sign-On)
Key Features:
- Centrally manage SSO access to multiple AWS accounts and business applications
- Enables unified management of users and their access
- Can be integrated with existing identity providers (e.g., Active Directory, Okta)
- Users sign in to a portal with a single set of credentials to access assigned accounts and applications
Centralized Access Management:
- For providing access to multiple AWS accounts for thousands of employees:
- Configure AWS IAM Identity Center (formerly AWS Single Sign-On)
- Connect IAM Identity Center to the existing identity provider (IdP)
- Provision users and groups from the existing IdP
- Provides centralized access management with minimal operational overhead
- More scalable than creating individual IAM users in each AWS account
- More secure than using AWS account root users with synchronized passwords
- Better than AWS Resource Access Manager (RAM) which is for resource sharing, not identity federation
- Simplifies permission management across multiple AWS accounts in an organization
- Supports customizable permission sets that define the level of access users have to AWS accounts
- Integrates seamlessly with existing corporate directories and identity providers
- For providing access to multiple AWS accounts for thousands of employees:
Cross-Account AMI Sharing with Encryption:
- When sharing an encrypted AMI with another AWS account:
- Modify the launchPermission property of the AMI to share with specific account
- Modify the KMS key policy to allow the target account to use the key
- This maintains security while enabling the other account to launch instances from the AMI
- More secure than making AMIs publicly available even with key restrictions
- Avoids unnecessary complexity of re-encrypting with a different key
- Eliminates need for exporting/importing AMIs through S3 buckets
- Follows least privilege principle by providing targeted access only to required resources
- Preserves the encryption integrity of the original AMI
- Essential best practice for organizations working with Managed Service Providers (MSPs)
- Works seamlessly with EBS-backed AMIs that use KMS customer managed keys
- Provides an auditable and controlled method for sharing secure machine images
- More straightforward than alternative approaches involving multiple encryption steps
- Required when AMIs contain sensitive or proprietary configurations
- Creates a direct, secure sharing mechanism without exposing the AMI to unauthorized accounts
- When sharing an encrypted AMI with another AWS account:
AWS Certificate Manager (ACM)
- Key Features:
- Request public SSL/TLS certificates for domains and subdomains
- Supports wildcard certificates (e.g.,
*.example.com
) to cover all subdomains - Validates domain ownership via DNS (or email) for certificate issuance
- Supports automatic certificate renewal
- Integrates with AWS services like CloudFront and Application Load Balancer
- Provides an easy way to implement HTTPS for websites and applications
- Custom Domain Integration:
- Can be used to secure custom domain names for API Gateway endpoints
- Certificates should be imported into ACM in the same region as the API Gateway endpoint
- Enables HTTPS communications for third-party services consuming APIs
- Provides certificate management for secure API URLs using company domain names
- SSL/TLS Offloading with Application Load Balancer:
- For improving application performance with SSL/TLS:
- Import existing SSL certificates into AWS Certificate Manager (ACM)
- Create an Application Load Balancer with HTTPS listener using the ACM certificate
- Configure EC2 instances to accept traffic from the ALB
- Offloads CPU-intensive SSL/TLS encryption/decryption from application servers
- Significantly improves application performance when SSL processing is a bottleneck
- Centralizes certificate management through ACM
- More efficient than installing certificates directly on EC2 instances
- Automatic certificate renewal when using ACM-generated certificates
- Eliminates need for load balancing proxy servers with their own overhead
- Perfect for applications experiencing performance issues due to SSL processing
- Superior to instance-level SSL termination when traffic increases
- More scalable than adding proxy instances as traffic grows
- Provides integrated certificate management with load balancing
- More appropriate than S3-based or proxy server solutions for SSL termination
- For improving application performance with SSL/TLS:
AWS Organizations
- Service Control Policies (SCPs):
- Can be used to prevent modification of mandatory CloudTrail configurations across member accounts
- Effectively restrict actions even for users with root-level access within member accounts
- Apply permission guardrails at the account, organizational unit, or organization root level
- Perfect for enforcing standard security controls when providing developers with individual AWS accounts
- More effective than IAM policies for restricting root user actions (which cannot be limited by IAM policies)
- Ensure security standards are maintained across all accounts regardless of individual user permissions
- When created at the root organizational unit level, provide centralized control over maximum available permissions for all accounts
- Offer a scalable solution with a single management point for permissions across many accounts
- More suitable than ACLs, security groups, or cross-account roles for organization-wide permission management
- Account Notification Management:
- For ensuring root user email notifications are properly managed:
- Configure AWS account root user email addresses as distribution lists that include multiple administrators
- Set up AWS account alternate contacts in the AWS Organizations console or programmatically
- This approach ensures notifications are distributed to appropriate personnel and not missed
- Provides redundancy by involving multiple administrators rather than relying on a single point of contact
- More secure than forwarding notifications to all organization users, which could expose sensitive information
- More manageable than using the same root email for all accounts, which complicates account management
- Particularly important for accounts managing critical infrastructure or sensitive data
- Distribution lists should be limited to administrators who need the information to maintain security
- When combined with proper IAM user setup for day-to-day operations, creates a comprehensive account governance system
- For ensuring root user email notifications are properly managed:
- Service Control Policies for Billing Access Control:
- Can restrict access to billing information even from root users in member accounts
- Effectively implements organization-wide governance for sensitive financial data
- More restrictive than identity-based policies which cannot limit root user access
- Essential for financial controls in multi-account environments
AWS CloudFormation
- Least Privilege Access Control:
- For deployment engineers working with CloudFormation:
- Create IAM users with membership in groups that have policies allowing only CloudFormation actions
- Create dedicated IAM roles with explicit permissions for specific CloudFormation stack operations
- Use these IAM roles when launching stacks instead of user credentials
- Follows principle of least privilege by restricting permissions to only necessary CloudFormation operations
- More secure than using PowerUsers or Administrator policies for deployment activities
- Never use root user credentials for operational CloudFormation tasks
- For deployment engineers working with CloudFormation:
- Infrastructure as Code for Validated Prototypes:
- After manually validating infrastructure prototypes, define them as CloudFormation templates
- Enables consistent, automated deployment of validated configurations across development and production environments
- More suitable than AWS Systems Manager for replicating complex infrastructure across Availability Zones
- More appropriate than AWS Config, which is designed for compliance auditing rather than deployment
- Provides more detailed infrastructure control than Elastic Beanstalk for complex multi-component architectures
- Perfect for deploying identical infrastructure components (Auto Scaling groups, ALBs, RDS) across multiple environments
AWS Control Tower
- Governance Features:
- Implements governance at scale across AWS accounts
- Can enforce data residency guardrails for compliance requirements
- Restricts operations to specific AWS Regions
- Creates a secure and compliant environment while maintaining network isolation
- Helps meet regulatory requirements for data handling and access
- AWS Control Tower Account Drift Notifications:
- Automatically identifies changes to the organizational unit (OU) hierarchy
- Alerts operations teams when organizational structure changes occur
- Offers the least operational overhead for monitoring organizational changes
- More efficient than using AWS Config aggregated rules for monitoring OU hierarchy
- Superior to using CloudTrail organization trails for detecting organizational structure changes
- Provides automated monitoring without manual intervention
- More effective than CloudFormation drift detection for organizational structure monitoring
Identity Federation
- SAML 2.0-based Federation:
- Enables single sign-on (SSO) between on-premises Active Directory (AD) and AWS
- Leverages existing AD identities to authenticate to AWS
- Maps AD groups to AWS IAM roles for permission management
- Minimizes administrative overhead by maintaining a single identity source
- Users don't need separate AWS credentials to access AWS resources
AWS Identity and Access Management (IAM)
Best Practices:
- Create IAM policies that grant least privilege permissions and attach to IAM groups
- Group users by department and manage permissions at the group level for operational efficiency
- This approach aligns with security best practices by ensuring users have only the permissions necessary for their job functions
- More effective for departmental permission management than using SCPs or permissions boundaries
- IAM roles are intended for delegation scenarios, not for direct attachment to IAM groups
- For cross-account or service-based permissions, roles are more appropriate than group-based policies
Identity-Based Policies:
- Can be attached to:
- IAM roles
- IAM groups
- Cannot be directly attached to:
- AWS Organizations (which use Service Control Policies instead)
- Amazon ECS resources (which use IAM roles for tasks)
- Amazon EC2 resources (which use instance profiles/roles)
- Understanding proper policy attachment points helps ensure effective permission management
- Following this guidance helps maintain proper security boundaries between service roles
- Can be attached to:
Secure DynamoDB Access Across Accounts:
- For a central application to access DynamoDB tables in multiple accounts:
- Create an IAM role in each business account with DynamoDB access permissions
- Configure trust policies to allow specific roles from the central account
- Create an application role in the central account with AssumeRole permissions
- Configure the application to use STS AssumeRole to access cross-account resources
- More secure than using IAM users with access keys
- More appropriate than using Secrets Manager or Certificate Manager
- Follows IAM best practices for cross-account access
- For a central application to access DynamoDB tables in multiple accounts:
IP-Based Access Control:
- IAM policies can use IP address conditions to restrict access:
- Use NotIpAddress condition with aws:SourceIp to deny actions from unauthorized IP ranges
- Will deny actions (like EC2 termination) when requests originate from IP addresses outside specified ranges
- Particularly useful for administrative actions that should only be performed from secure networks
- Enforces security at the API action level rather than just network level
- IAM policies can use IP address conditions to restrict access:
AWS IAM Password Policies:
- Can define password complexity requirements at the account level that apply to all IAM users
- Password policies can specify minimum length, required character types, and expiration periods
- Setting an overall password policy for the entire AWS account simplifies security management
- More efficient than configuring password requirements individually
- Helps enforce organizational security standards consistently across all IAM users
- Essential component of account-level security best practices
- Allows organizations to align AWS credential policies with their internal security requirements
EC2 Instance Access to S3:
- For granting EC2 instances access to S3 buckets:
- Create an IAM role with appropriate S3 access permissions
- Attach the role to the EC2 instances
- This approach follows AWS best practices by using temporary credentials that rotate automatically
- More secure than using IAM users with access keys which involve long-term credentials
- Cannot directly attach IAM policies to EC2 instances - policies must be attached to roles first
- IAM groups cannot be attached to EC2 instances as they're designed for organizing IAM users
- IAM users should not be attached to EC2 instances - roles are the proper mechanism
- For granting EC2 instances access to S3 buckets:
Centralized Access Management:
- For providing access to multiple AWS accounts for thousands of employees:
- Configure AWS IAM Identity Center (formerly AWS Single Sign-On)
- Connect IAM Identity Center to the existing identity provider (IdP)
- Provision users and groups from the existing IdP
- Provides centralized access management with minimal operational overhead
- More scalable than creating individual IAM users in each AWS account
- More secure than using AWS account root users with synchronized passwords
- Better than AWS Resource Access Manager (RAM) which is for resource sharing, not identity federation
- For providing access to multiple AWS accounts for thousands of employees:
AWS CloudTrail
- Tracking User Actions:
- Records API calls made by users, roles, or AWS services
- Essential for auditing changes to AWS resources like security group configurations
- Can identify which IAM user made specific changes to resources
- Provides detailed information about actions including time, parameters, and response elements
- More appropriate than AWS Config for determining who made configuration changes
- Better than GuardDuty or Inspector for historical audit trails of administrative actions
- Records both console actions and programmatic API calls
- Helps organizations meet compliance requirements by providing comprehensive activity logs
- Tracking Instance Sizing and Security Group Changes:
- For monitoring oversized EC2 instances and unauthorized security group modifications:
- Enable AWS CloudTrail to track user activity and API usage
- Enable AWS Config and create rules for auditing and compliance
- CloudTrail provides detailed audit trails of who made specific changes
- AWS Config continuously monitors resource configurations and evaluates against desired states
- Together they provide comprehensive tracking and auditing capabilities
- More effective for detailed change tracking than Trusted Advisor or CloudFormation
- Essential for enforcing proper change control processes across AWS accounts
- For monitoring oversized EC2 instances and unauthorized security group modifications:
Amazon Macie
- Key Features:
- Fully managed data security and privacy service
- Uses machine learning and pattern matching to discover sensitive data like PII
- Can analyze data stored in Amazon S3 buckets to identify sensitive information
- Provides detailed reports on where sensitive data exists
- More suitable for PII discovery than Security Hub, Inspector, or GuardDuty
- Requires configuration in each region where data needs to be analyzed
- Creates automated discovery jobs that can be scheduled to run regularly
- Helps organizations meet compliance requirements for data protection
- Works with EventBridge to create automated notification workflows when PII is detected
- Can be used to scan S3 buckets to ensure compliance with regulations that prohibit storage of PII
- Perfect solution for organizations that need to automatically scan storage locations for sensitive information
- Data Discovery Capabilities:
- Perfect for discovering personally identifiable information (PII) or financial information in S3 buckets
- Can be configured to scan data lakes managed by AWS Lake Formation
- Uses managed identifiers to detect sensitive data types like passport numbers and credit card numbers
- More effective for sensitive data discovery than AWS Audit Manager, S3 Inventory, or S3 Select
- Provides comprehensive reporting on where sensitive information exists in your storage
- Essential for internal audits and compliance verification
- Data Discovery for Lake Formation:
- Can scan data lakes managed by AWS Lake Formation to detect sensitive information
- Runs data discovery jobs using managed identifiers for sensitive data types
- Essential for ensuring compliance with data privacy regulations
- More effective than manual inspection or custom scripts for detecting PII
- Can identify sensitive customer or employee data stored in S3 buckets
- Particularly valuable for financial information like credit card numbers
- Helps organizations maintain compliance with regulations like GDPR, HIPAA
- Superior to S3 Inventory or Athena queries for sensitive data discovery
Data Analytics and Visualization
Amazon Kinesis
- Kinesis Data Streams:
- Real-time ingestion of streaming data at massive scale
- Low-latency, high-throughput streaming data processing
- Often used with Amazon OpenSearch Service for real-time analytics and search
- Kinesis Data Firehose:
- Fully managed service to reliably load streaming data into data lakes, data stores, and analytics services
- Can be used to send VPC Flow Logs to Amazon CloudWatch Logs or S3
- Often used for streaming logs into Amazon OpenSearch Service for analysis
- Provides a simple way to capture, transform, and load streaming data (near real-time)
- Real-Time Ingestion Architecture:
- API Gateway can send incoming data to Kinesis Data Streams
- Kinesis Data Firehose can automatically load that data to S3
- Use AWS Lambda for on-the-fly transformations
- Minimizes operational overhead by avoiding self-managed EC2 ingestion hosts
- Ordered Message Processing:
- For applications requiring strict message ordering (e.g., payment processing):
- Use Kinesis Data Streams with payment ID as partition key
- Messages with the same partition key are processed in the exact order they are received
- Particularly important for financial transactions where order affects outcomes
- Provides guaranteed ordering with high throughput for streaming data
- For applications requiring strict message ordering (e.g., payment processing):
- Near Real-Time Data Querying:
- For handling high-rate data ingestion (up to 1 MB/s) with near real-time querying:
- Publish data to Amazon Kinesis Data Streams
- Use Kinesis Data Analytics to query the data in near real-time
- Ensures minimal data loss even when ingestion instances are rebooted
- More suitable for real-time analytics than Firehose with Redshift which introduces latency
- More durable than solutions using instance store or EBS volumes for temporary storage
- Provides scalable solution with built-in redundancy and fault tolerance
- For handling high-rate data ingestion (up to 1 MB/s) with near real-time querying:
- Near Real-Time Data Processing for Financial Transactions:
- For handling millions of financial transactions with near-real-time processing requirements:
- Stream transaction data into Amazon Kinesis Data Streams
- Use AWS Lambda integration to process data (e.g., remove sensitive information)
- Store processed data in Amazon DynamoDB for low-latency retrieval
- Allow other applications to consume data directly from the Kinesis stream
- Provides scalable, resilient architecture for high-volume transaction processing
- Enables multiple applications to access the same data stream without affecting original processing
- Better solution than storing directly in DynamoDB when data needs preprocessing
- More suitable than batch processing with S3 for real-time transaction sharing
- For handling millions of financial transactions with near-real-time processing requirements:
- Serverless Real-time Analytics:
- For web applications with real-time analytics from online games:
- Use Amazon DynamoDB for low-latency database needs with single-digit millisecond response times
- Implement Amazon Kinesis for streaming data from online games in real-time
- This combination provides scalable performance for unpredictable user counts
- More suitable than CloudFront which is optimized for content delivery rather than real-time data streaming
- Better than RDS which doesn't provide the same scalability for unpredictable workloads
- More appropriate than Global Accelerator which focuses on routing traffic rather than data streaming or storage
- For web applications with real-time analytics from online games:
Amazon Timestream for LiveAnalytics
- Key Features:
- Purpose-built time-series database service for storing and analyzing trillions of events per day
- Ideal for telemetry data from connected devices and vehicles
- Automatically scales up or down to adjust capacity
- Provides built-in time-series analytics functions
- When combined with SageMaker and QuickSight, creates a complete pipeline for telemetry processing and visualization
- Perfect for IoT applications generating millions of data points at regular intervals
- More suitable for time-series data than DynamoDB or Neptune
Amazon OpenSearch Service
- Key Features:
- Search, analyze, and visualize data in near real-time
- Used for log analytics, full-text search, and operational analytics
- Integrates with Kinesis Data Streams and Kinesis Data Firehose for ingesting data
- Can be queried from Amazon QuickSight for visualization
Amazon QuickSight
- Key Features:
- Fully managed Business Intelligence (BI) service
- Create dashboards and visualizations from multiple data sources
- Can connect to data in Amazon OpenSearch Service, Redshift, Athena, RDS, etc.
- Access Control:
- Provides fine-grained access control through its own system of users and groups
- Enables sharing dashboards with specific users and groups based on organizational roles
- Allows different levels of access (e.g., full access for management team, limited access for others)
- More flexible for dashboard sharing than relying solely on IAM roles
- Data Lake Visualization:
- Ideal for creating reports from data spread across S3 and RDS
- Connect to all data sources within the data lake to create comprehensive datasets
- Publish dashboards for data visualization with differentiated access controls
- Share dashboards with appropriate users and groups to control access levels
- Share full access with management while providing limited access to other employees
- More suitable than Glue+Athena solutions for interactive data visualization
Amazon Security Lake
- Key Features:
- Purpose-built service to centralize and aggregate security data across AWS
- Automates data collection from various AWS services (like CloudTrail, VPC Flow Logs, etc.)
- Simplifies organization and analysis of security information (ready for query or third-party tools)
Amazon Athena
- Key Features:
- Interactive query service to analyze data in Amazon S3 using standard SQL
- Serverless (no infrastructure to manage), you pay only for the queries you run
- Works with structured, semi-structured, and unstructured data (supports CSV, JSON, Parquet, etc.)
- Can be combined with AWS Glue for data cataloging (schema discovery)
- Cost-effective for on-demand data analysis that occurs sporadically
- Eliminates the need to set up and manage database or Hadoop clusters for querying data
- Ideal solution for analyzing log files in JSON format with minimal operational overhead
- No data movement required - simply point Athena at your S3 data, define schema, and run SQL queries
- Log File Analysis:
- Enables direct SQL queries against data stored in Amazon S3 with no infrastructure to manage
- Ideal for analyzing log files in JSON format with minimal operational overhead
- No need to provision or manage servers
- Define a table based on your data in S3, define the schema, start querying using standard SQL
- No data movement required compared to loading into Redshift
- More efficient than redirecting logs to CloudWatch Logs for SQL querying
- Less complex than using AWS Glue with EMR for simple on-demand queries
AWS Glue
- Key Features:
- Fully managed Extract, Transform, Load (ETL) service
- Discovers and catalogs metadata about data sources (with Glue Crawlers)
- Creates a data catalog that makes data searchable and queryable
- Provides a managed Spark environment to run ETL jobs
- Integrates with Athena for serverless querying of data
- Glue crawlers can automatically create or update the metadata catalog for data in S3
- Data Processing Workflow:
- Can be used to catalog clickstream data in S3 making it queryable
- Works with Athena to enable SQL-based analysis without managing infrastructure
- Provides serverless, minimal-overhead solution for analyzing data stored in S3
- Creates schema-on-read capabilities for JSON and other semi-structured data
- Job Bookmarks:
- Enable AWS Glue jobs to track previously processed data
- Avoids reprocessing old data and focuses on newly added or changed data in S3
- Saves time and resources by eliminating repeated work each time the job runs
- Increases efficiency for daily or periodic ETL jobs by only processing incremental updates
- Essential for jobs that run on a schedule and process files added daily
- More efficient than deleting processed data
- Better approach than modifying worker count (NumberOfWorkers=1)
- Not related to FindMatches ML transform (which is for identifying duplicate records)
- Prevents reprocessing of old data in ETL jobs that run regularly
- Tracks data that has already been processed in previous runs
- Enables processing of only new or changed data since the last execution
- Optimizes ETL processes by saving on processing time and resources
- More efficient than deleting processed data or modifying worker count
- CSV to Redshift ETL Processing:
- For processing legacy application CSV data for use with COTS applications:
- Create AWS Glue ETL job running on a schedule
- Configure job to process CSV files and load data into Amazon Redshift
- Provides least operational overhead compared to custom EC2 scripts or EMR clusters
- Fully managed service eliminates need to provision and maintain infrastructure
- More efficient than custom Lambda functions for complex data transformations
- Perfect for making legacy data available to applications that require SQL-compatible formats
- Most cost-effective solution when original format cannot be changed
- For processing legacy application CSV data for use with COTS applications:
- Processing Legacy Data Formats:
- Ideal for transforming legacy data formats (like CSV) to formats compatible with modern analytics tools
- Can be scheduled to run automatically at specified intervals
- Supports writing directly to Amazon Redshift for SQL-based analysis
- Provides the least operational overhead compared to custom EC2 scripts, EMR clusters, or Lambda functions
- Perfect for integrating legacy applications that can't be modified with modern analytics platforms
- Particularly valuable when source data format can't be changed but must be analyzed with COTS applications
- CSV to Redshift ETL Processing:
- For processing legacy application CSV data for use with COTS applications:
- Create AWS Glue ETL job running on a schedule
- Configure job to process CSV files and load data into Amazon Redshift
- Provides least operational overhead compared to custom EC2 scripts or EMR clusters
- Fully managed service eliminates need to provision and maintain infrastructure
- More efficient than custom Lambda functions for complex data transformations
- Perfect for making legacy data available to applications that require SQL-compatible formats
- For processing legacy application CSV data for use with COTS applications:
AWS Lake Formation
- Centralized Data Lake Management:
- Ideal for companies with data distributed across S3 and RDS needing centralized analytics access
- Creates a unified data catalog with AWS Glue JDBC connections to RDS
- Provides fine-grained access control capabilities for multiple teams
- Minimizes operational overhead compared to manual ETL with Lambda or Redshift
- Enables centralized governance of data across multiple storage services
- Particularly valuable for retail companies with large customer datasets (50M+ customers)
- Simplifies compliance with data access policies across organization
Data Analytics Solutions
- Clickstream and Ad-hoc Analysis:
- Use Amazon Athena for one-time queries against data in S3 with minimal operational overhead
- Combine with Amazon QuickSight for creating KPI dashboards and visualizations
- Use AWS Lake Formation blueprints to simplify data ingestion into data lakes
- AWS Glue can crawl sources, extract data, and transform it to analytics-friendly formats like Parquet
- This serverless approach requires less operational overhead than custom Lambda functions or Redshift
- Provides a more cost-effective solution than running Kinesis Data Analytics for one-time queries
- Enables SQL-based analysis without managing infrastructure
- Ideal for consolidating batch data from databases and streaming data from sensors for business analytics
- Most effective combination for producing KPI dashboards with minimal operational overhead
Analysis and ETL Solutions
- For Clickstream Data in S3:
- Use AWS Glue crawler to catalog data and make it queryable
- Configure Amazon Athena for SQL-based analysis of the data
- Minimal operational overhead as both services are serverless and fully managed
- Enables quick analysis for decision-making about further processing
- Confirmed as a best practice for real-time web analytics
- Document Ingestion and Transformation:
- Store large volumes of documents in Amazon S3
- Use AWS Lambda triggers on object upload
- Perform OCR with Amazon Textract or Rekognition
- Use Amazon Comprehend (or Comprehend Medical) to extract relevant information
- Store extracted data in S3 or a database, queryable via Athena
- Validated for transforming scanned documents into searchable data
Big Data Processing
- Amazon EMR (Elastic MapReduce):
- Industry-leading cloud big data platform for processing vast amounts of data
- Supports open source tools like Apache Spark, Apache Hive, Apache Flink
- Cluster configuration options for cost-optimization:
- Transient clusters: Terminated after completing tasks, most cost-effective for workloads with known duration
- Long-running clusters: Stay active continuously, less efficient for workloads with specific durations
- Primary node and core nodes on On-Demand Instances: Ensures reliability for critical parts of the workload
- Task nodes on Spot Instances: Cost-effective for compute-intensive portions of workload that can tolerate interruptions
- EMR configurations are optimized for both transient and long-running workloads
Amazon EMR (Elastic MapReduce)
- Industry-Leading Big Data Platform:
- Supports open source tools like Apache Spark, Apache Hive, Apache Flink
- Cluster configuration options for cost-optimization:
- Transient clusters: Terminated after completing tasks, most cost-effective for workloads with known duration
- Long-running clusters: Stay active continuously, less efficient for workloads with specific durations
- Primary node and core nodes on On-Demand Instances: Ensures reliability for critical parts of the workload
- Task nodes on Spot Instances: Cost-effective for compute-intensive portions of the workload that can tolerate interruptions
- EMR configurations are optimized for both transient and long-running workloads
Amazon Transcribe
- Audio Processing with PII Redaction:
- Ideal for transcribing audio files stored in S3 buckets while protecting sensitive information
- Provides built-in PII redaction capabilities to automatically remove personal information from transcripts
- Can be triggered by Lambda functions when new audio files are uploaded to S3
- More appropriate for voice transcription than Amazon Textract (which is designed for documents)
- Superior to using Kinesis Video Streams for batch processing of stored audio files
- More efficient than creating custom PII detection logic with Lambda functions
- Perfect for organizations that need to transcribe customer conversations while maintaining compliance
Machine Learning Services
Machine Learning Solutions
- AI for Call Analysis:
- To analyze customer service calls in multiple languages:
- Use Amazon Transcribe to convert audio recordings into text in various languages
- Use Amazon Translate to translate the text into a standard language (e.g., English)
- Use Amazon Comprehend for sentiment analysis and report generation
- This combination provides automated multilingual support without maintaining ML models
- More effective than using Amazon Lex, Polly, or custom solutions
- To analyze customer service calls in multiple languages:
Amazon Rekognition
- Content Moderation:
- Purpose-built service for identifying objects, people, text, scenes, and activities in images and videos
- Can automatically detect and filter potentially unsafe, inappropriate, or objectionable content
- Uses deep learning-based image and video analysis to examine visual content
- When confidence level is low, content can be routed for human review for accurate moderation
- Requires minimal development effort compared to building custom solutions
- More appropriate for image content moderation than Amazon Comprehend (which is designed for text analysis)
- Better than SageMaker which would require significant development effort to train custom models
- More suitable than AWS Fargate which is a compute engine for containers, not a machine learning service
- Particularly valuable for applications and platforms hosting user-generated visual content
- Perfect for enforcing content standards and ensuring compliance with policies against inappropriate content
- Helps meet legal requirements and community guidelines for visual content platforms
- Machine Learning Solutions for Healthcare:
- PHI Identification:
- Use Amazon Textract to extract text from medical reports in PDF or JPEG format
- Use Amazon Comprehend Medical to identify protected health information (PHI) in the extracted text
- Provides purpose-built solution for PHI detection with minimal operational overhead
- Most operationally efficient solution for extracting and identifying PHI in healthcare documents
- More efficient than using custom Python libraries for text extraction and PHI identification
- More straightforward than using SageMaker to build custom ML models for PHI detection
- More appropriate than Amazon Rekognition which is designed for image analysis, not document text processing
- PHI Identification:
- Image Content Moderation:
- Ideal for detecting inappropriate content in user-uploaded images on websites and social media platforms
- Provides pre-built machine learning models that detect unsafe or objectionable content
- Requires minimal development effort compared to building and training custom models
- Can be configured to send low-confidence predictions for human review
- More appropriate than Amazon Comprehend which is designed for text analysis, not images
- More efficient than Amazon SageMaker or custom ML models which require significant development effort
- Perfect solution for social media platforms, content sharing sites, and applications with user-generated images
- Helps companies enforce content policies and ensure safe user experiences with minimal implementation effort
Encryption and Data Protection
EBS Encryption by Default
- EC2 Account Attributes:
- You can enable EBS volume encryption by default at the account level
- All newly created EBS volumes will be encrypted automatically
- Prevents creation of unencrypted volumes, ensuring compliance with security policies
S3 Bucket Encryption
- Client-Side Encryption:
- Data is encrypted before upload (the client manages encryption)
- Provides encryption in transit and at rest (since it's encrypted when stored)
- Server-Side Encryption:
- SSE-S3: S3 handles key management and encryption for you
- SSE-KMS: S3 uses AWS KMS-managed customer keys for encryption (gives control over keys and audit via KMS)
- SSE-C: You provide the encryption keys for S3 to use (keys are not stored by AWS)
- SSE-KMS with Automatic Key Rotation:
- Ideal for storing confidential data with encryption at rest requirements
- Provides automatic key rotation for yearly rotation requirements
- Includes detailed logging of key usage in CloudTrail for auditing purposes
- Most operationally efficient solution when compliance mandates encryption key logging
- Eliminates manual intervention needed for key rotation processes
- Automatically rotates the cryptographic material while maintaining the same CMK
- More efficient than SSE-C which requires customer-managed keys for each S3 object request
- Superior to SSE-S3 when detailed key usage auditing is required for compliance
- Better than manual KMS key rotation which introduces unnecessary operational overhead
Cloud Financial Management
AWS Cost Anomaly Detection
- Key Features:
- Monitors AWS costs and usage to detect unusual spending patterns
- Allows setting up monitors that analyze historical spending to identify anomalies
- Notifies stakeholders through alerts (e.g., email or SNS) when unexpected spending is detected
- A proactive solution for monitoring costs without manual intervention
AWS Cost Explorer
- Key Features:
- Analyze cost and usage data with interactive graphs, filtering, and grouping
- Forecast future costs based on trends
- Create custom reports (e.g., cost per service, per account, per tag) to break down spending
Resource Tagging and Cost Allocation
- AWS Lambda with EventBridge:
- Can be used to automatically tag resources with cost center IDs
- EventBridge detects resource creation events via CloudTrail
- Lambda function queries databases for appropriate cost center information
- Enables automatic tagging based on the user who created the resource
- Provides a proactive and dynamic approach to resource tagging
- Ensures consistent tagging and cost allocation across resources
Amazon DynamoDB Capacity Management
- On-Demand Capacity Mode:
- Ideal for tables with unpredictable traffic patterns
- Automatically adjusts capacity to maintain performance as application traffic changes
- Best for tables not used during extended periods that experience quick traffic spikes
- Pay-per-request model eliminates the need to provision capacity in advance
- More cost-effective than provisioned capacity for variable workloads
- More efficient than adding global secondary indexes (which improve query efficiency but not capacity management)
- Better than auto scaling for very quick, unpredictable spikes
- More appropriate than global tables when multi-region replication isn't required
- On-Demand Capacity for Unpredictable Traffic:
- Particularly cost-effective for tables not used during specific periods (e.g., most mornings)
- Ideal when traffic spikes occur very quickly and unpredictably (e.g., evenings)
- Eliminates the need for capacity planning or management even for rapid traffic changes
- More flexible than provisioned capacity with auto scaling for handling very quick traffic spikes
- Better than global tables when multi-region replication isn't required for the workload
- Particularly suitable for tables with periods of inactivity followed by sudden, unpredictable usage
Configuration and Infrastructure Management
AWS Config
- Key Features:
- Monitors and records configurations of AWS resources continuously
- Evaluates recorded configurations against desired configurations (compliance as code)
- Provides a detailed view of how resource configurations change over time
- Can track changes at the resource level and trigger alerts for non-compliance
- Resource Tagging Compliance:
- Can define rules to detect AWS resources that are not properly tagged
- Continuously evaluates resources like EC2 instances, RDS DB instances, and Redshift clusters for tag compliance
- Provides automated detection with minimal operational effort compared to custom solutions
- More efficient than using manual checks through Cost Explorer which requires additional manual tagging effort
- Superior to writing custom API calls and running them on EC2 or Lambda, which introduces unnecessary development and maintenance overhead
- Perfect for organizations requiring consistent tagging for cost allocation and resource governance
- Automatically assesses resources against your defined tagging requirements without manual intervention
AWS Resource Access Manager (RAM)
- Key Features:
- Securely share AWS resources across AWS accounts
- Share resources within your organization or organizational units (OUs) in AWS Organizations
- Supports sharing of transit gateways, subnets, license manager configurations, Route 53 Resolver rules, and more
- Reduces operational overhead of duplicating resources in multiple accounts
- Enables sharing of customer-managed prefix lists across accounts (for consistent network ACLs and security groups)
- Organization-Based Access Control:
- Add the aws:PrincipalOrgID global condition key to S3 bucket policies to limit access to users within your AWS Organization
- Provides the least operational overhead for restricting S3 bucket access to organization members
- Automatically applies to all accounts within the organization without manual policy updates
- More efficient than creating OUs for each department and using aws:PrincipalOrgPaths
- More automated than monitoring CloudTrail events to update policies
- More scalable than tagging each user and using aws:PrincipalTag in policies
Auto Scaling and Capacity Management
Amazon EC2 Auto Scaling
- Scheduled Scaling:
- Configure automatic scaling based on predictable load changes
- Create scheduled actions to increase or decrease capacity at specific times
- Can proactively adjust the number of EC2 instances in anticipation of known load changes
- Optimizes costs and performance by ensuring sufficient capacity during peak times
- Eliminates lag in scaling up resources, addressing slow application performance issues under sudden load
- Scaling for Unpredictable Traffic Patterns:
- For applications experiencing sudden traffic increases on random days, dynamic scaling is the most cost-effective solution
- Dynamic scaling automatically adjusts capacity based on real-time demand by monitoring metrics like CPU utilization
- More responsive than manual scaling which requires human intervention
- More appropriate than predictive scaling for truly random patterns that don't follow historical trends
- Better than scheduled scaling which works only for known, time-based patterns
- Ensures application performance is maintained during unexpected traffic spikes while optimizing costs during normal periods
- Scheduled Scaling:
- Configure Auto Scaling groups to automatically scale at predetermined times
- Perfect for predictable workload patterns (e.g., scaling up every Friday evening)
- Minimizes operational overhead compared to manual scaling for recurring patterns
- More efficient than using CloudWatch Events/EventBridge with Lambda for simple scheduled scaling
- Allows precise specification of desired capacity at specific times
- Can be configured to scale both up and down according to schedule
- Ideal for workloads with known time-based patterns like batch processing jobs
- Provides the simplest solution for workloads with regular, predictable traffic patterns
- Can be combined with dynamic scaling policies to handle both predictable and unexpected load changes
- Spot Instances for Scheduled Workloads:
- Using Spot Instances for nightly batch jobs (e.g., between 12:00 AM - 6:00 AM) significantly reduces costs compared to On-Demand instances
- Particularly cost-effective for workloads that can be reprocessed if interrupted
- Works well with Auto Scaling groups that can scale based on CPU usage
- Suitable for batch processing jobs where occasional interruptions can be tolerated
- More cost-effective than Savings Plans or Reserved Instances for time-limited workloads that don't run continuously
- Better for off-peak processing jobs when Spot availability is typically higher
EventBridge for Resource Scheduling
- Start/Stop EC2 and RDS Instances:
- Create Lambda functions to start/stop EC2 instances and RDS instances on a schedule
- Configure EventBridge (formerly CloudWatch Events) to invoke these functions based on cron expressions
- More cost-effective than running a dedicated EC2 instance to manage the scheduling
- Easier to maintain than shell scripts and crontab
- Fully serverless solution with minimal operational overhead
- Ideal for non-production environments that only need to run during business hours
AWS EC2 Capacity Management
- On-Demand Capacity Reservation:
- Guarantees EC2 capacity in specific Availability Zones for any duration
- Ideal for ensuring capacity for short-term events (like a one-week requirement)
- More appropriate than Reserved Instances for guaranteed capacity in specific AZs
- Region-only specifications don't ensure capacity in specific Availability Zones
- Provides the assurance of resource availability without long-term commitments
- Short-term Capacity Guarantees:
- For guaranteeing EC2 capacity in specific Availability Zones for short durations (e.g., 1 week):
- Create On-Demand Capacity Reservation specifying both Region and required Availability Zones
- Ensures capacity is available in the exact locations needed for time-limited events
- More appropriate than Reserved Instances which provide billing discounts but don't guarantee capacity
- Region-only specifications without AZ details won't ensure capacity in specific zones
- Most precise way to reserve capacity for short-term events with specific location requirements
- For guaranteeing EC2 capacity in specific Availability Zones for short durations (e.g., 1 week):
Migration and Data Transfer
AWS Snowball with Tape Gateway
- Large Data Migration:
- Ideal for migrating petabyte-scale tape data to AWS
- Overcomes bandwidth limitations with physical data transfer
- Snowball devices equipped with Tape Gateway functionality allow copying physical tapes to virtual tapes
- Most cost-effective solution for migrating huge archive data (e.g., 5 PB) within limited timeframes
- Virtual tapes can be archived to S3 Glacier Deep Archive for long-term retention
- Suitable for compliance requirements demanding data preservation for many years
- More efficient than solutions relying on internet transfers for massive datasets
- Creates a pathway for modernizing legacy tape infrastructure
AWS Snowball Edge for Data Transfer
- Large Data Migration Use Cases:
- Most cost-effective solution for transferring petabytes of data to AWS within tight timeframes
- Ideal when internet bandwidth constraints would make online transfer take months or years
- Securely transfers large datasets using physical devices with encryption
- More practical than Direct Connect or internet-based transfers for one-time large migrations
- Particularly valuable when transfer must be completed within a specific timeframe (e.g., 2 weeks)
- Encrypted-by-default for sensitive data protection during transit
- Cost Optimization for Tight Timeframes:
- For transferring massive datasets (e.g., 5 PB) to AWS with bandwidth limitations:
- AWS Snowball/Snowball Edge devices provide offline data transfer capability
- Significantly reduces transfer time compared to internet-based transfers
- Ensures transfer can complete within required business timeframes (e.g., 2 weeks)
- More efficient than Direct Connect for one-time transfers of multi-petabyte scale
- Most appropriate option when physical shipping is faster than network transfer
- Essential when business requirements demand transfer completion by specific dates
- More reliable than attempting to maximize internet bandwidth for extremely large datasets
- For extremely large datasets (5+ PB), often the only practical method to meet tight migration deadlines
- For transferring massive datasets (e.g., 5 PB) to AWS with bandwidth limitations:
- Large Data Migration with Limited Bandwidth:
- For migrating multiple terabytes of data with limited network bandwidth:
- Use AWS Snowball devices for physical data transfer
- Most efficient solution for large data transfers with bandwidth constraints
- Faster than transferring over networks with limited bandwidth (e.g., 15 Mbps)
- More suitable than AWS DataSync or VPN connections for bandwidth-limited scenarios
- Provides secure, physical data transport with encryption
- Essential for meeting tight migration timeframes when network transfer would take too long
- For migrating multiple terabytes of data with limited network bandwidth:
AWS EC2 Capacity Management
- On-Demand Capacity Reservation:
- Guarantees EC2 capacity in specific Availability Zones for any duration
- Ideal for ensuring capacity for short-term events (like a one-week requirement)
- More appropriate than Reserved Instances for guaranteed capacity in specific AZs
- Region-only specifications don't ensure capacity in specific Availability Zones
- Provides the assurance of resource availability without long-term commitments
Encrypted AMI Sharing
- Secure AMI Sharing Across Accounts:
- Modify the launchPermission property of the AMI to share with specific AWS accounts
- Modify the key policy to allow the target AWS account to use the KMS key
- Maintains security by controlling access to both the AMI and its encrypted EBS snapshots
- More secure than making AMIs publicly available
- More straightforward than creating new KMS keys in target accounts
- More direct than exporting to S3 and re-importing in the target account
- Sharing KMS-Encrypted AMIs Between Accounts:
- For securely sharing AMIs backed by encrypted EBS volumes with another AWS account:
- Modify the launchPermission property of the AMI to share with specific target AWS account
- Update the KMS key policy to allow the target AWS account to use the key
- This approach maintains security by controlling access to both the AMI and its encrypted snapshots
- More secure than making AMIs or snapshots publicly available even with restricted key access
- Simpler than creating trust for new KMS keys owned by the target account
- More efficient than exporting to S3 and re-importing in the target account
- Eliminates unnecessary complexity while maintaining strong security controls
- Most direct method for securely transferring encrypted AMIs between accounts
- For securely sharing AMIs backed by encrypted EBS volumes with another AWS account:
Monitoring and Logging
VPC Flow Logs and Monitoring
- VPC Flow Logs:
- Feature to capture information about IP traffic going to and from network interfaces in a VPC
- Flow log data can be published to CloudWatch Logs, Amazon S3, or Amazon Kinesis Data Firehose
- Helps diagnose overly restrictive security group rules
- Monitors traffic reaching instances
- Determines the direction of traffic to/from network interfaces
- Additional logging details help in proactive network troubleshooting
- Log Capture Configuration:
- VPC Flow Logs can be published to Amazon CloudWatch Logs, Amazon S3, or Amazon Kinesis Data Firehose
- To analyze flow logs with OpenSearch Service in near real-time with minimal overhead:
- Create a log group in Amazon CloudWatch Logs
- Configure VPC Flow Logs to send data to the log group
- Use Amazon Kinesis Data Firehose to stream logs from CloudWatch Logs to OpenSearch Service
- This approach provides more direct integration than using Kinesis Data Streams which introduces additional complexity
- More appropriate than AWS CloudTrail which is designed for recording API calls, not network traffic data
- Flow logs help with diagnosing overly restrictive security group rules, monitoring traffic reaching instances, and determining traffic direction
- Provides centralized visibility across all network interfaces in your VPC
- Can be used for both traffic analysis and security monitoring with minimal operational overhead
High Availability and Disaster Recovery
High Availability Architectures
- For MySQL and stateless Python web applications:
- Migrate the database to Amazon RDS for MySQL with Multi-AZ deployment for high availability
- Use an Application Load Balancer with Auto Scaling groups of EC2 instances across multiple AZs for the web/application tier
- RDS Multi-AZ provides automatic failover to a standby instance in a different AZ, improving resiliency
- Confirmed as the preferred architecture for high availability
- For Global Applications Needing Multi-Region Redundancy:
- Deploy workload components in a secondary region (including data stores and compute)
- Use Route 53 or AWS Global Accelerator to route users to the nearest healthy endpoint or fail over if a region goes down
- Verified multi-region designs to maintain global availability
- For Applications Requiring Zero Downtime and No Data Loss:
- Consider active-active multi-region architectures (using global databases like DynamoDB global tables or Aurora Global Database)
- Implement replication and health checks to enable instant failover across regions
- Active-active configurations validated for mission-critical applications
- Minimal Intervention Solutions:
- For ecommerce applications requiring high availability:
- Deploy Amazon RDS in Multi-AZ mode for the database tier
- Use Amazon ECS with Fargate for container-based applications
- RDS Multi-AZ provides automatic failover during outages with synchronous replication
- Fargate eliminates need to manage EC2 instances for container workloads
- More automated than read replicas which require manual promotion
- More hands-off than EC2-based Docker solutions that require infrastructure management
- Combined, they create a comprehensive solution for highly available applications with minimal operational overhead
- For ecommerce applications requiring high availability:
Backup Strategy for Stateless Applications
- Best Practices:
- For stateless web applications in Auto Scaling groups:
- Retain the latest Amazon Machine Images (AMIs) of the web and application tiers
- Enable automated backups in RDS and use point-in-time recovery to meet RPO requirements
- This approach efficiently leverages the stateless nature of the application
- No need for continuous snapshot backups of EC2 instances' EBS volumes since critical data resides in the database
- Provides quick and efficient method to restore components if necessary
- More cost-effective than taking EBS snapshots for stateless applications
- Avoids unnecessary storage costs and management overhead
- For stateless web applications in Auto Scaling groups:
AWS Backup
- Vault Lock in Compliance Mode:
- Enforces regulatory compliance requirements that prevent deletion of backup data for specific durations
- Ensures backup files are immutable for the retention period specified
- Once enabled, cannot be disabled or modified, providing strong protection for regulated data
- More appropriate than vault lock in governance mode when strict immutability is required
- Essential for industries with regulatory requirements for backup retention
Aurora Database Scaling
- Read Workload Management:
- Use Amazon Aurora with Multi-AZ deployment and Aurora Auto Scaling with Aurora Replicas
- Automatically handles failover to replicas in different AZs for high availability
- Aurora Replicas offer identical read performance to the primary instance
- Auto Scaling adds or removes replicas based on actual workload
- Ideal for unpredictable read-heavy workloads like ecommerce applications
- Better than Redshift for transactional database workloads
- Superior to Single-AZ RDS deployments with read replicas for high availability
- More effective than ElastiCache with EC2 Spot Instances for database scaling
- Aurora Replica for Reporting:
- Ideal for offloading reporting queries from the primary database instance
- Significantly improves application performance when reporting processes are CPU-intensive
- Provides a cost-effective way to generate reports without impacting production workloads
- Superior to using RDS Multi-AZ secondary nodes which are not accessible for read queries
- Database Cloning for Development Environment:
- Use Amazon Aurora database cloning to create staging databases from production
- Creates database clone in minutes regardless of size
- Eliminates latency issues associated with full export processes
- Does not impact production database performance during clone creation
- More efficient than using mysqldump utility which adds load to production
- Better than using standby instances in Multi-AZ deployments (which aren't available for staging use)
- Superior to backup and restore processes for large databases
Serverless Architectures
- Minimizing Server Maintenance and Scaling:
- For highly available dynamic websites that need to scale quickly:
- Host static content in Amazon S3
- Deploy CloudFront to deliver content globally with low latency
- Use API Gateway and AWS Lambda for backend APIs
- Store data in Amazon DynamoDB with on-demand capacity
- This architecture eliminates need for server maintenance and patching
- Provides automatic scaling for both compute and database resources
- More operationally efficient than EC2-based solutions that require manual intervention
- More efficient than hosting the full application on EC2 instances or using container services like EKS
- Each component automatically scales to meet demand without provisioning
- DynamoDB with on-demand capacity offers faster scaling for unpredictable workloads than Aurora with Auto Scaling
- Serverless approach significantly reduces operational overhead compared to managed database solutions
- Perfect solution for high-traffic applications requiring millisecond response times like ecommerce sites
- For highly available dynamic websites that need to scale quickly:
- AWS Lambda Resource-Based Policies:
- EventBridge Integration:
- For Lambda functions triggered by EventBridge (CloudWatch Events) rules:
- Apply resource-based policies to the Lambda function
- Use lambda:InvokeFunction as the specific action permission
- Specify Service: events.amazonaws.com as the principal
- Follows least privilege principle by limiting exactly what service can invoke the function
- More secure than using wildcard (*) permissions which grant excessive access
- More appropriate than execution roles which control what the function can access, not what can invoke it
- Avoids overly permissive lambda:* action which violates least privilege principle
- Creates proper security boundary between the event source and Lambda function
- Essential for secure serverless architectures with event-driven workflows
- For Lambda functions triggered by EventBridge (CloudWatch Events) rules:
- EventBridge Integration:
- Serverless File Processing:
- Configure S3 to send event notifications to SQS when files are uploaded
- Use Lambda to process data from the SQS queue and store results in DynamoDB
- Provides scalable solution for processing files with minimal operational overhead
- Automatically scales to handle variable demand of uploads
- More efficient than Amazon EMR for simple file processing workflows
- Better than EC2-based processing for unpredictable workloads
- More straightforward than EventBridge with Kinesis Data Streams for simple use cases
Best Practices and Key Concepts
- Use read replicas in Aurora (or RDS) to offload read traffic and improve performance for primary databases.
- Enable EBS encryption by default at the EC2 account attribute level to ensure all new volumes are encrypted (enhancing data security).
- Use an Auto Scaling group spanning multiple AZs with an Application Load Balancer to improve availability and automatically scale out/in based on demand.
- Attach AWS WAF to the ALB or CloudFront for web application protection against common exploits.
- Use S3 Access Points for granular access to subsets of data in a shared S3 bucket (simplifies managing data access for multiple applications or tenants).
- Use S3 Event Notifications + Amazon SQS + AWS Lambda to trigger short-lived image processing tasks without needing dedicated EC2 instances (serverless on-demand processing).
- Amazon ECS + AWS Fargate for scheduled container-based workloads (use EventBridge for cron-like scheduling of tasks without managing servers).
- AWS Transit Gateway is recommended for scaling to 100+ VPCs or large multi-account environments (simplifies network management with a hub-and-spoke model).
- Kinesis Data Streams + Amazon OpenSearch Service for near real-time data ingestion, analysis, search, and building dashboards (e.g., with QuickSight) on streaming data.
- Use Amazon EventBridge to capture specific AWS API events and trigger automated actions: Configure an EventBridge rule (fed by CloudTrail events) for critical operations (e.g., an EC2 CreateImage API call) to send an SNS alert or invoke a Lambda. This provides real-time notifications of important activities with minimal operational overhead (no manual log polling).
- AWS Config can monitor resource configurations but does not enforce encryption or block resource creation; it is primarily for auditing and compliance checks.
- Use AWS Organizations and Service Control Policies (SCPs) for centralized governance across multiple AWS accounts (apply guardrails and manage account policies from a single place).
- Use Amazon EFS for a shared POSIX-compliant file system across multiple EC2 instances (and across multiple AZs) when applications require common file storage.
- For DR scenarios with minimal downtime:
- Pre-provision resources in a secondary region (e.g., have an Auto Scaling group, load balancer, and a global DynamoDB table in the DR region).
- Use DNS failover to direct traffic to the DR region's load balancer if the primary region fails.
- For S3 multi-region active-active design with minimal management:
- Use S3 Multi-Region Access Points with a single global endpoint.
- Configure Cross-Region Replication for durability and cross-region failover of data.
- Amazon Security Lake for centralized aggregation of security data across accounts and regions (makes it easier to run analysis or threat detection on combined logs).
- Use Amazon Neptune for applications that need to store and navigate complex relational data (e.g., social graphs or network topologies) in a graph database.
- AWS KMS Customer Managed Keys give you control over key rotation schedules, key usage policies, and full key lifecycle management (use when you need explicit control beyond AWS-managed keys).
- For a serverless architecture behind API Gateway:
- Use AWS Lambda for handling unpredictable or spiky request patterns (scales automatically with demand).
- Use DynamoDB for a fully managed NoSQL database with auto-scaling throughput.
- This combination yields a completely serverless stack that scales with demand and has minimal maintenance.
- Hospital Scanning & Document Processing:
- Store scanned documents in Amazon S3.
- Use S3 event notifications to trigger a Lambda function.
- Extract text with Amazon Textract or Amazon Rekognition (for images), then use Amazon Comprehend (or Comprehend Medical) for deeper NLP analysis.
- Query the results with Amazon Athena for on-demand analytics.
- Static + Dynamic Website:
- Host static content in Amazon S3 (for scalability and low cost).
- Use Amazon API Gateway + AWS Lambda for dynamic requests (serverless backend).
- Store dynamic data in Amazon DynamoDB (on-demand capacity for unpredictable traffic).
- Use Amazon CloudFront to deliver the entire website globally with low latency.
- Eliminates the need for patching and maintaining web servers
- Provides high availability, scalability, and enhanced security with minimal operational overhead
- DynamoDB Accelerator (DAX):
- In-memory cache for DynamoDB
- Significantly improves read performance (microsecond latency) without major application rework (just use the DAX client)
- Ideal for read-intensive workloads that require microsecond response times
- This update increases overall system responsiveness in high-traffic scenarios
- AWS Shield Advanced + CloudFront:
- Combined solution for protecting websites against DDoS attacks
- Shield Advanced provides enhanced protection against large-scale and sophisticated DDoS attacks
- CloudFront caches content at global edge locations, absorbing attack traffic before reaching origin servers
- Ensures websites remain available even during DDoS attacks
- CloudFront standard protection is included at no extra cost
- Serverless Daily Deal Website Architecture:
- Use Amazon S3 bucket to host static content with CloudFront distribution
- Use API Gateway and AWS Lambda for backend APIs
- Store data in Amazon DynamoDB
- This combination provides minimal operational overhead while handling millions of requests each hour
- More efficient than hosting the full application on EC2 instances or using container services like EKS
AWS PrivateLink
- VPC Endpoints:
- Allow applications to access S3 buckets through a private network path within AWS
- More cost-effective than using NAT gateways or internet gateways for S3 access from private subnets
- Gateway VPC endpoints provide private connectivity to S3 without requiring internet access
- Perfect solution for EC2 instances that need to access S3 without internet connectivity
- Enables applications to process S3 data securely within a VPC's private network
- Gateway endpoints appear as a target in your route tables
- Gateway Endpoints:
- Provide private connectivity to services like S3 and DynamoDB
- Appear as a target in your route tables
- Enable applications to access AWS services without going through the internet
- The most direct and secure solution for accessing S3 from private subnets
AWS Resource Access Manager (RAM)
- Organization-Based Access Control:
- Add the aws:PrincipalOrgID global condition key to S3 bucket policies to limit access to users within your AWS Organization
- Provides the least operational overhead for restricting S3 bucket access to organization members
- Automatically applies to all accounts within the organization without manual policy updates
- More efficient than creating OUs for each department and using aws:PrincipalOrgPaths
- More automated than monitoring CloudTrail events to update policies
- More scalable than tagging each user and using aws:PrincipalTag in policies
Machine Learning Solutions for Healthcare
- PHI Identification:
- Use Amazon Textract to extract text from medical reports in PDF or JPEG format
- Use Amazon Comprehend Medical to identify protected health information (PHI) in the extracted text
- Provides purpose-built solution for PHI detection with minimal operational overhead
- Most operationally efficient solution for extracting and identifying PHI in healthcare documents
- More efficient than using custom Python libraries for text extraction and PHI identification
- More straightforward than using SageMaker to build custom ML models for PHI detection
- More appropriate than Amazon Rekognition which is designed for image analysis, not document text processing
Amazon Kinesis
- Near Real-Time Data Processing for Financial Transactions:
- For handling millions of financial transactions with near-real-time processing requirements:
- Stream transaction data into Amazon Kinesis Data Streams
- Use AWS Lambda integration to process data (e.g., remove sensitive information)
- Store processed data in Amazon DynamoDB for low-latency retrieval
- Allow other applications to consume data directly from the Kinesis stream
- Provides scalable, resilient architecture for high-volume transaction processing
- Enables multiple applications to access the same data stream without affecting original processing
- Better solution than storing directly in DynamoDB when data needs preprocessing
- More suitable than batch processing with S3 for real-time transaction sharing
- For handling millions of financial transactions with near-real-time processing requirements:
AWS Solution Architect Associate (SAA-C03) Exam Keyword Matrix
This matrix organizes key phrases from the official SAA-C03 exam guide along with their definitions and associated AWS services to help you quickly identify what the question is asking for.
Domain 1: Design Secure Architectures (30%)
This domain focuses on secure access to AWS resources, designing secure workloads and applications, and determining appropriate data security controls.
Key Phrase: "Secure access to AWS resources"
Definition: Creating identity and access management strategies that follow the principle of least privilege. Services: IAM, IAM Identity Center (SSO), AWS Control Tower, SCPs, AWS STS
Key Phrase: "Secure workloads and applications"
Definition: Protecting applications from threats and securing network architecture. Services: Security groups, Network ACLs, VPC, AWS Shield, AWS WAF, AWS Secrets Manager
Key Phrase: "Data security controls"
Definition: Encryption, governance, and compliance for data at rest and in transit. Services: AWS KMS, AWS Certificate Manager, data backup solutions
Domain 2: Design Resilient Architectures (26%)
This domain focuses on scalable, loosely coupled architectures and highly available/fault-tolerant architectures.
Key Phrase: "Scalable and loosely coupled"
Definition: Architectures that can handle growth and where components can change independently. Services: Amazon SQS, API Gateway, Lambda, containers (ECS/EKS), microservices
Key Phrase: "Highly available" or "Fault-tolerant"
Definition: Systems that continue to function despite component failures. Services: Multi-AZ, Auto Scaling, Load Balancers, Route 53, disaster recovery strategies
Domain 3: Design High-Performing Architectures (24%)
This domain focuses on high-performing storage, compute, database, network, and data processing solutions.
Key Phrase: "High-performing storage"
Definition: Storage solutions that meet performance requirements. Services: S3, EFS, EBS with appropriate configurations
Key Phrase: "High-performing compute"
Definition: Elastic compute resources that can scale to meet demand. Services: EC2 with appropriate instance types, Auto Scaling, Lambda, containers
Key Phrase: "High-performing database"
Definition: Database solutions optimized for specific workloads. Services: RDS, Aurora, DynamoDB, ElastiCache, database proxies
Key Phrase: "High-performing network"
Definition: Network architectures designed for optimal throughput and low latency. Services: CloudFront, Global Accelerator, Direct Connect, VPC design, load balancers
Key Phrase: "Data ingestion and transformation"
Definition: Solutions for efficiently handling large data sets. Services: Kinesis, Glue, DataSync, Lake Formation, Athena
Domain 4: Design Cost-Optimized Architectures (20%)
This domain focuses on cost-effective storage, compute, database, and network architectures.
Key Phrase: "Cost-optimized storage"
Definition: Storage solutions that minimize costs while meeting requirements. Services: S3 storage classes, lifecycle policies, EBS volume types
Key Phrase: "Cost-optimized compute"
Definition: Compute resources that minimize costs while meeting requirements. Services: EC2 purchasing options (Spot, Reserved Instances), right-sized instances
Key Phrase: "Cost-optimized database"
Definition: Database solutions that minimize costs while meeting requirements. Services: RDS instance sizing, read replicas, serverless options
Key Phrase: "Cost-optimized network"
Definition: Network architectures that minimize data transfer costs. Services: VPC endpoints, NAT gateway strategies, Direct Connect vs VPN
Question Type Indicators
The SAA-C03 exam contains:
- Multiple choice questions: One correct answer out of four options
- Multiple response questions: Two or more correct answers out of five or more options
When you encounter a question and aren't sure of the answer, look for these key phrases to help identify what the question is asking for:
- If the question mentions "least expensive," "cost-effective," or "minimize costs," it's testing Domain 4 knowledge.
- If the question mentions "secure," "protect," or "compliance," it's testing Domain 1 knowledge.
- If the question mentions "highly available," "fault-tolerant," or "disaster recovery," it's testing Domain 2 knowledge.
- If the question mentions "performance," "latency," or "throughput," it's testing Domain 3 knowledge.
Service Selection Strategy
When multiple services could potentially solve a problem:
- Look for keywords related to AWS security best practices (least privilege, multi-factor authentication)
- Consider if the question emphasizes loose coupling or microservices
- Check if high availability across AZs/Regions is required
- Determine if performance optimization is the priority
- Assess if cost optimization is the main concern
Content Weighting Strategy
Since the exam has specific weightings for each domain, prioritize your study time accordingly:
- Design Secure Architectures: 30% (heaviest weight)
- Design Resilient Architectures: 26%
- Design High-Performing Architectures: 24%
- Design Cost-Optimized Architectures: 20%
This matrix should help you quickly identify what the question is asking for and narrow down the correct answer based on the key phrases used in the question. Remember that the exam tests your ability to apply AWS services to solve business problems based on the Well-Architected Framework principles.
AWS FSx Services Cheat Sheet FSx for Windows File Server Remember as: "Windows SMB"
Native Windows protocol: SMB (Server Message Block) Use case: Windows applications, file shares, domain integration When to choose: Microsoft applications, Active Directory integration Key feature: Fully compatible with Windows environments
FSx for NetApp ONTAP Remember as: "The Multi-Protocol Swiss Army Knife"
Protocols: Supports BOTH Windows (SMB) AND Linux (NFS) workloads Use case: Hybrid environments, multi-protocol access, enterprise apps When to choose: Need cross-platform compatibility or advanced features Key feature: Storage efficiency with snapshots, cloning, and replication
FSx for Lustre Remember as: "HPC Linux Performance"
Protocol: Linux (uses Lustre - "Linux + Cluster") Use case: High-performance computing, big data processing When to choose: Need massive throughput for data processing Key feature: Hundreds of GB/s throughput and sub-millisecond latencies
FSx for OpenZFS Remember as: "ZFS Economy"
Protocol: Linux (NFS) Use case: Data analytics, web serving, general Linux applications When to choose: Need good Linux file system performance at lower cost Key feature: Fast snapshots and low-cost SSD storage
Memory Aid: "WOLEZ"
Windows = Windows applications (SMB) ONTAP = Omni-protocol (both SMB and NFS) Lustre = Lightning fast for HPC/Linux Economy = Economical Linux (OpenZFS) ZFS = ZFS file system (OpenZFS)
Quick Decision Tree:
Need Windows file shares? → Windows File Server Need both Windows AND Linux access? → NetApp ONTAP Need maximum performance for Linux/HPC? → Lustre Need economical Linux file system? → OpenZFS