Data sources let Terraform read information from your cloud provider or external systems without creating or managing resources. Use them to look up AMIs, reference existing VPCs, read secrets, and query anything you didn’t create with Terraform.
Basic Syntax
# Data source: reads, never creates
data "aws_ami" "al2023" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
# Use the result
resource "aws_instance" "web" {
ami = data.aws_ami.al2023.id # ← Read from data source
instance_type = "t3.micro"
}
Data Source vs Resource
| Feature | resource | data |
|---|---|---|
| Action | Create, update, destroy | Read only |
| State | Tracked in state | Refreshed every plan |
| Lifecycle | Full (create → destroy) | None (read each time) |
| Purpose | Manage infrastructure | Reference existing infrastructure |
| Prefix | resource "aws_vpc" | data "aws_vpc" |
| Reference | aws_vpc.main.id | data.aws_vpc.main.id |
Common Data Sources
Look Up the Latest AMI
# Amazon Linux 2023
data "aws_ami" "al2023" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# Ubuntu 24.04
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-*"]
}
}
Reference an Existing VPC
# Look up VPC by tag
data "aws_vpc" "existing" {
filter {
name = "tag:Name"
values = ["production-vpc"]
}
}
# Look up subnets in that VPC
data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [data.aws_vpc.existing.id]
}
filter {
name = "tag:Tier"
values = ["private"]
}
}
# Use them
resource "aws_ecs_service" "app" {
# ...
network_configuration {
subnets = data.aws_subnets.private.ids
}
}
Get Current AWS Account Info
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
locals {
account_id = data.aws_caller_identity.current.account_id # "123456789012"
region = data.aws_region.current.name # "us-east-1"
}
# Use in ARN construction
resource "aws_iam_policy" "app" {
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = "s3:GetObject"
Resource = "arn:aws:s3:::${local.account_id}-data/*"
}]
})
}
Get Available AZs
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_subnet" "private" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
}
Read from Secrets Manager
data "aws_secretsmanager_secret_version" "db" {
secret_id = "prod/database/password"
}
resource "aws_db_instance" "main" {
# ...
password = data.aws_secretsmanager_secret_version.db.secret_string
# ⚠️ Note: this stores the secret in Terraform state!
# Consider ephemeral resources for secrets (Terraform 1.10+)
}
Look Up an IAM Role
data "aws_iam_role" "existing" {
name = "AWSServiceRoleForECS"
}
resource "aws_ecs_service" "app" {
# ...
iam_role = data.aws_iam_role.existing.arn
}
Read SSM Parameters
data "aws_ssm_parameter" "db_endpoint" {
name = "/prod/database/endpoint"
}
resource "aws_ecs_task_definition" "app" {
container_definitions = jsonencode([{
name = "app"
environment = [{
name = "DB_HOST"
value = data.aws_ssm_parameter.db_endpoint.value
}]
}])
}
Get Route53 Zone
data "aws_route53_zone" "main" {
name = "example.com"
}
resource "aws_route53_record" "www" {
zone_id = data.aws_route53_zone.main.zone_id
name = "www.example.com"
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true
}
}
Data Sources with for_each
variable "secret_names" {
default = ["db-password", "api-key", "jwt-secret"]
}
data "aws_secretsmanager_secret_version" "secrets" {
for_each = toset(var.secret_names)
secret_id = each.key
}
# Access: data.aws_secretsmanager_secret_version.secrets["db-password"].secret_string
Remote State as Data Source
Read outputs from another Terraform project:
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "tf-state-bucket"
key = "networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.networking.outputs.public_subnet_ids[0]
# ...
}
Note: Prefer Terraform Stacks or module outputs over remote state when possible.
External Data Source
Run a script and use its output:
data "external" "git_hash" {
program = ["bash", "-c", "echo '{\"hash\": \"'$(git rev-parse --short HEAD)'\"}'"]
}
resource "aws_instance" "web" {
tags = {
GitCommit = data.external.git_hash.result["hash"]
}
}
Tips
- Data sources refresh every plan — they always show current state
- Use filters, not IDs —
filterblocks are more maintainable than hardcoded IDs - Data sources can fail — if the resource doesn’t exist,
terraform planerrors - Secrets in state — data sources store results in state; use ephemeral resources for secrets
Hands-On Courses
- Terraform for Beginners on CopyPasteLearn
- Terraform By Example — practical code examples
Conclusion
Data sources bridge Terraform-managed and non-Terraform infrastructure. Use them to look up AMIs, reference existing VPCs, read secrets, and query account metadata. The most common pattern is data "aws_ami" for dynamic AMI lookup and data "aws_vpc" / data "aws_subnets" for referencing networking created by another team. Remember: data sources read, never create.