In the previous subchapter, we looked at the six pillars of the Well-Architected Framework. But reading best practices is one thing, and rigorously evaluating your specific architecture against them is another. How do you systematically review, without missing anything, whether your system meets those best practices? For that, AWS offers the Well-Architected Tool: a free tool that guides you through a structured review of your architecture.
The problem: reviewing an architecture is hard to do well
Imagine you want to check if your system is well designed according to the six pillars. You could do it “from memory,” but it’s easy to:
- Forget to check important aspects.
- Be too optimistic (“I’m sure it’s fine...”) without evidence.
- Not have a record of what you reviewed and what you decided to improve.
- Do it inconsistently (each person reviews different things).
You need a structured and guided way to evaluate, that asks you the right questions and records the results.
What the Well-Architected Tool is
The AWS Well-Architected Tool is a free tool within AWS that guides you to evaluate your architecture against the six pillars through a structured questionnaire. It asks you questions about your system, identifies risks, and gives you recommendations for improvement.
Well-Architected Tool: 1. You define your "workload" (workload = the application you want to review) 2. You answer a guided questionnaire, pillar by pillar 3. The tool identifies RISKS (high/medium) 4. It gives you specific RECOMMENDATIONS for improvement 5. You save the result and measure progress over time
Analogy: the Well-Architected Tool is like a vehicle technical inspection (MOT) guided by an official checklist. You don’t check the car “by eye”: you follow a structured list of checks (brakes, lights, emissions, steering...), each point is evaluated, and at the end you get a clear report of what’s good and what needs fixing, with its severity level. The tool does the same with your architecture: a rigorous and systematic review, not improvised.
How the review works
- You define your workload
You indicate which system or application you want to review. In the framework’s vocabulary, a specific application or system being evaluated is called a workload.
- You answer the questionnaire, pillar by pillar
The tool asks you a series of questions for each pillar. For example, for the reliability pillar it might ask: “How do you handle component failures?”, “Do you have a disaster recovery strategy?” (remember Chapter 26). You answer according to how your system is actually designed, honestly.
- It identifies risks
Based on your answers, the tool detects risks and classifies them by severity (for example, high risk or medium risk). A high risk is something important you should fix soon; a medium one, something to improve when you can.
Review result: 🔴 High risks: 3 (address soon) 🟡 Medium risks: 7 (improve when possible) ✓ Best practices met: 45
- It gives recommendations for improvement
For each risk, the tool suggests how to improve it, often linking to specific AWS documentation and best practices. You get a prioritized action list to improve your architecture.
Why it’s valuable: an honest snapshot and an improvement plan
The great value of the Well-Architected Tool is that it gives you an honest and structured evaluation of your architecture, with a concrete action plan. Instead of “we think it’s fine,” you get: “these are your 3 high risks and this is what you should do.” Additionally:
- Record and tracking: you save the reviews and can repeat them over time to see your progress (have we reduced high risks compared to the previous review?).
- Common language: gives the team a shared framework to talk about architecture quality.
- It’s free: it costs nothing to use, so there’s no excuse not to review.
Real-world example: before launching an important application to production, a team does a review with the Well-Architected Tool. The questionnaire makes them realize, in the reliability pillar, that they have no disaster recovery strategy (a high risk they had overlooked in the rush). It also detects a security risk: certain data is unencrypted. Thanks to the review, before launching, they add a DR plan (Chapter 26) and enable encryption (Chapter 23). Three months later they repeat the review: the high risks have disappeared. The tool turned “we think it’s ready” into “we know it’s well designed, and we’ve verified it.”
When to use the Well-Architected Tool
- Before launching an important system to production (to catch problems in time).
- Periodically on existing systems (architectures degrade over time; regular reviews maintain quality).
- After major changes in the architecture.
- As a learning exercise: the questionnaire itself teaches you best practices you might not have known.
What you should remember
- Reviewing an architecture “from memory” is prone to oversights, optimism, and lack of record; you need a structured and guided approach.
- The AWS Well-Architected Tool is a free tool that guides a structured evaluation of your architecture (your “workload”) against the six pillars, through a questionnaire. Like an MOT guided by a checklist.
- The process: define the workload → answer the questionnaire pillar by pillar (honestly) → it identifies risks (high/medium) → gives prioritized recommendations for improvement.
- Its value: an honest evaluation with a concrete action plan, record to track progress, a common language for the team, and it’s free.
- Use it before launching to production, periodically, after major changes, and as a learning exercise.
In the last subchapter of the chapter, we’ll see how to apply the framework and the tool in practice in your day-to-day work, so they don’t just become a one-off review.
Cloud, AWS & Terraform — From Zero to Expert
Chapter 1 · What is cloud computing
- 1.1 The traditional client-server model
- 1.2 Problems the cloud came to solve
- 1.3 On-premise vs cloud vs hybrid
- 1.4 The three service models: IaaS, PaaS, SaaS
- 1.5 The five pillars of cloud (according to NIST)
- 1.6 Real advantages: elasticity, pay-as-you-go, global availability
Chapter 2 · The cloud market and major providers
- 2.1 AWS, Azure and GCP: differences and market share
- 2.2 Why learn AWS first
- 2.3 Concepts that are universal among providers
Chapter 3 · Regions, availability zones and edge
- 3.1 What is an AWS region and how to choose it
- 3.2 Availability Zones: high availability by design
- 3.3 Edge locations and CloudFront
- 3.4 Latency, resilience and data sovereignty
Chapter 4 · Compute: EC2
- 4.1 Instances: types, families and when to choose each
- 4.2 AMIs, key pairs and Security Groups
- 4.3 Instance lifecycle
- 4.4 Elastic IPs and Placement Groups
- 4.5 Savings Plans vs Reserved vs On-Demand vs Spot
Chapter 5 · Storage: S3
- 5.1 Buckets, objects and keys
- 5.2 Storage classes (Standard, IA, Glacier…)
- 5.3 Versioning and object lifecycle
- 5.4 Bucket policies and ACLs
- 5.5 Static website hosting
Chapter 6 · Networking: VPC
- 6.1 What is a VPC and why you need it
- 6.2 Public and private subnets
- 6.3 Internet Gateway and NAT Gateway
- 6.4 Route Tables and Network ACLs
- 6.5 VPC Peering and endpoints
Chapter 7 · Identity and access: IAM
- 7.1 Users, groups, roles and policies
- 7.2 The principle of least privilege
- 7.3 Identity-based vs resource-based policies
- 7.4 MFA and temporary credentials (STS)
- 7.5 IAM security best practices
Chapter 8 · Managed databases
- 8.1 RDS: engines, Multi-AZ and read replicas
- 8.2 Aurora and its advantages over vanilla RDS
- 8.3 DynamoDB: key-value / document model
- 8.4 ElastiCache for in-memory cache
- 8.5 When to use each type of database
Chapter 9 · Why Infrastructure as Code
- 9.1 Problems with manual provisioning
- 9.2 Declarative vs imperative IaC
- 9.3 Terraform vs CloudFormation vs Pulumi vs CDK
- 9.4 The plan → apply → destroy cycle
Chapter 10 · HCL: the Terraform language
- 10.1 Resource, variable, output, locals blocks
- 10.2 Data types: string, number, bool, list, map, object
- 10.3 Expressions, references and built-in functions
- 10.4 Conditionals and loops (count, for_each, for)
Chapter 11 · Providers and state
- 11.1 How the AWS provider works
- 11.2 The terraform.tfstate file and its importance
- 11.3 Local state vs remote state (S3 + DynamoDB)
- 11.4 Essential commands: init, plan, apply, destroy, fmt, validate
Chapter 12 · Your first real infrastructure in Terraform
- 12.1 Create a VPC with subnets from scratch
- 12.2 Launch a public EC2 instance
- 12.3 Associate a Security Group and an Elastic IP
- 12.4 Outputs and references between resources
- 12.5 Team workflow: PR review of plans
Chapter 13 · Load balancing and auto scaling
- 13.1 Application Load Balancer vs Network Load Balancer
- 13.2 Target Groups, listeners and rules
- 13.3 Auto Scaling Groups: policies and metrics
- 13.4 Warm pools and lifecycle hooks
Chapter 14 · Serverless with Lambda
- 14.1 The Lambda execution model
- 14.2 Triggers: API Gateway, S3, DynamoDB Streams, SQS
- 14.3 Dependency management and layers
- 14.4 Cold starts and strategies to reduce them
- 14.5 Limits and anti-patterns
Chapter 15 · Messaging and events
- 15.1 SQS: standard vs FIFO queues, DLQ
- 15.2 SNS: topics, subscriptions, fan-out
- 15.3 EventBridge: event buses and rules
- 15.4 Patterns: pub/sub, decoupling, saga
Chapter 16 · Content delivery and DNS
- 16.1 Route 53: record types and routing policies
- 16.2 CloudFront: distributions, caches and origins
- 16.3 ACM: free SSL/TLS certificates
- 16.4 WAF integrated with CloudFront
Chapter 17 · Containers on AWS
- 17.1 Docker: quick review of key concepts
- 17.2 ECR: private image registry
- 17.3 ECS: task definitions, services, Fargate vs EC2
- 17.4 EKS: when Kubernetes and when not
Chapter 18 · Modules: reuse and composition
- 18.1 Anatomy of a Terraform module
- 18.2 Input variables, outputs and dependencies
- 18.3 Local modules vs Terraform Registry modules
- 18.4 Module versioning with Git tags
- 18.5 Design of generic vs domain-specific modules
Chapter 19 · Workspaces and environment management
- 19.1 Terraform workspaces: use cases and limitations
- 19.2 Directory strategy per environment (dev/stg/prod)
- 19.3 Terragrunt: DRY for environment configurations
- 19.4 Environment variables and .tfvars files
Chapter 20 · Remote backends and locking
- 20.1 Configure S3 + DynamoDB as backend
- 20.2 State locking: avoiding team corruption
- 20.3 State migration between backends
- 20.4 terraform import: bring existing resources into state
Chapter 21 · Infrastructure testing
- 21.1 Terraform validate and fmt in CI
- 21.2 Checkov and tfsec: static security analysis
- 21.3 Terratest: integration tests in Go
- 21.4 Contract testing between modules
Chapter 22 · Terraform in CI/CD
- 22.1 Basic pipeline: lint → plan → apply in GitHub Actions
- 22.2 Atlantis: GitOps for Terraform
- 22.3 Terraform Cloud / HCP Terraform
- 22.4 Drift detection and automatic reconciliation
Chapter 23 · Defense in depth
- 23.1 AWS Organizations and Service Control Policies
- 23.2 AWS Config: continuous compliance
- 23.3 GuardDuty: threat detection
- 23.4 Security Hub: centralized view
- 23.5 KMS: key management and rotation
- 23.6 Secrets Manager vs Parameter Store
Chapter 24 · Observability: logs, metrics and traces
- 24.1 CloudWatch Logs, metrics and alarms
- 24.2 CloudWatch Dashboards and Contributor Insights
- 24.3 X-Ray: distributed tracing
- 24.4 OpenTelemetry on AWS
- 24.5 Managed Grafana and Managed Prometheus
Chapter 25 · Cost optimization
- 25.1 AWS Cost Explorer and budgets with alerts
- 25.2 Trusted Advisor and Compute Optimizer
- 25.3 Rightsizing: how to detect overprovisioning
- 25.4 Savings Plans vs Reserved Instances: strategic decision
- 25.5 FinOps: culture and processes to control spending
Chapter 26 · High availability and disaster recovery
- 26.1 RTO and RPO: defining objectives
- 26.2 Strategies: backup/restore, pilot light, warm standby, multi-site
- 26.3 Route 53 health checks and automatic failover
- 26.4 AWS Backup: centralized backup policy
Chapter 27 · AWS Well-Architected Framework
- 27.1 The six pillars: operational excellence, security, reliability, performance efficiency, cost optimization, sustainability
- 27.2 Well-Architected Tool: formal reviews
- 27.3 How to apply the framework in design decisions
Chapter 28 · Serverless architectures at scale
- 28.1 Event-driven architecture with Lambda + EventBridge
- 28.2 Saga pattern for distributed transactions
- 28.3 Step Functions: orchestration of complex workflows
- 28.4 Lambda@Edge and CloudFront Functions
Chapter 29 · Data platforms on AWS
- 29.1 Data Lake with S3, Glue and Athena
- 29.2 Kinesis Data Streams and Firehose for streaming
- 29.3 Redshift: data warehousing at scale
- 29.4 Lake Formation: data governance
Chapter 30 · Multi-account and landing zones
- 30.1 Why separate workloads into different accounts
- 30.2 AWS Control Tower and Account Factory
- 30.3 Centralized log and security management
- 30.4 Terraform at multi-account scale with shared modules
Chapter 31 · Platform Engineering and Internal Developer Platform
- 31.1 Golden paths and abstractions over Terraform
- 31.2 AWS Service Catalog
- 31.3 Backstage as a developer portal
- 31.4 Terraform modules as internal product
Chapter 32 · Relevant AWS certifications
- 32.1 Cloud Practitioner: is it worth it?
- 32.2 Solutions Architect Associate → Professional
- 32.3 DevOps Engineer Professional
- 32.4 Specialty: Security, Database, Networking
- 32.5 HashiCorp Terraform Associate
Chapter 33 · Projects to consolidate what you've learned
- 33.1 Project 1: serverless blog (S3 + CloudFront + Lambda + DynamoDB)
- 33.2 Project 2: REST API with ECS Fargate + RDS + ALB
- 33.3 Project 3: data platform with Glue + Athena + Redshift
- 33.4 Project 4: multi-account landing zone with Terraform and Control Tower
