Managing Multiple AWS Accounts with Terraform
Master multi-account AWS management with Terraform. Learn provider aliases, cross-account IAM roles, AWS Organizations integration, and production-ready.
Infrastructure as Code
Learn how to use Terraform data sources to query existing resources, look up AMIs, reference remote state, and build dynamic configurations. Complete.
Not everything in your infrastructure is managed by Terraform. Legacy resources, manually created configurations, resources managed by other teams, or information from external services all exist outside your Terraform state. Data sources are Terraform's mechanism for reading information from these external sources and using it in your configurations.
Unlike resources, data sources don't create, update, or delete anything. They perform read-only queries that return information you can reference elsewhere in your code. This makes them essential for building configurations that integrate with existing infrastructure rather than starting from scratch.
A data source is defined using a data block:
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
}In this example, instead of hardcoding an AMI ID (which changes across regions and over time), we query for the latest Ubuntu 22.04 AMI dynamically. Every time you run terraform plan, it fetches the current AMI ID.
When your networking is managed by a different team or Terraform configuration:
data "aws_vpc" "main" {
filter {
name = "tag:Name"
values = ["production-vpc"]
}
}
data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [data.aws_vpc.main.id]
}
filter {
name = "tag:Tier"
values = ["private"]
}
}
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
subnet_id = data.aws_subnets.private.ids[0]
vpc_security_group_ids = [aws_security_group.app.id]
}
resource "aws_security_group" "app" {
name = "app-sg"
vpc_id = data.aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [data.aws_vpc.main.cidr_block]
}
}One of the most powerful data sources lets you read outputs from another Terraform state:
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
}This pattern enables separation of concerns — the networking team manages VPCs and subnets, and application teams reference them via remote state outputs.
Reference AWS-managed or existing custom policies:
data "aws_iam_policy" "admin" {
name = "AdministratorAccess"
}
data "aws_iam_policy_document" "custom" {
statement {
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
]
resources = [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*",
]
}
statement {
actions = ["logs:*"]
resources = ["*"]
}
}
resource "aws_iam_policy" "app" {
name = "app-policy"
policy = data.aws_iam_policy_document.custom.json
}The aws_iam_policy_document data source is particularly valuable because it generates valid JSON policy documents with proper syntax, reducing the chance of policy errors.
Build region-agnostic configurations:
data "aws_availability_zones" "available" {
state = "available"
filter {
name = "opt-in-status"
values = ["opt-in-not-required"]
}
}
resource "aws_subnet" "private" {
count = min(length(data.aws_availability_zones.available.names), 3)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${data.aws_availability_zones.available.names[count.index]}"
Tier = "private"
}
}This creates subnets in up to 3 availability zones, regardless of which region you deploy to.
Access metadata about your current context:
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
data "aws_partition" "current" {}
locals {
account_id = data.aws_caller_identity.current.account_id
region = data.aws_region.current.name
partition = data.aws_partition.current.partition
}
output "account_info" {
value = "Running in account ${local.account_id} in region ${local.region}"
}Reference existing DNS zones:
data "aws_route53_zone" "main" {
name = "example.com"
private_zone = false
}
resource "aws_route53_record" "app" {
zone_id = data.aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
alias {
name = aws_lb.app.dns_name
zone_id = aws_lb.app.zone_id
evaluate_target_health = true
}
}external Data SourceFor data that doesn't have a native Terraform provider, use the external data source to call any script:
data "external" "git_info" {
program = ["bash", "-c", <<-EOT
echo '{"commit":"'$(git rev-parse --short HEAD)'","branch":"'$(git rev-parse --abbrev-ref HEAD)'"}'
EOT
]
}
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = {
GitCommit = data.external.git_info.result.commit
GitBranch = data.external.git_info.result.branch
}
}The script must output valid JSON to stdout with string values only.
http Data SourceFetch data from HTTP endpoints:
data "http" "my_ip" {
url = "https://ipv4.icanhazip.com"
}
resource "aws_security_group_rule" "ssh_from_my_ip" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${chomp(data.http.my_ip.response_body)}/32"]
security_group_id = aws_security_group.bastion.id
}Understanding when data sources are evaluated is crucial:
terraform planThis means data source results can change between runs if the underlying data changes. For example, a new AMI might be published, or a VPC might be modified.
Avoid broad queries that might return unexpected results:
# Bad — might match unintended VPCs
data "aws_vpc" "main" {
default = true
}
# Good — specific filter
data "aws_vpc" "main" {
filter {
name = "tag:Name"
values = ["production-vpc"]
}
filter {
name = "tag:Environment"
values = ["production"]
}
}Some data sources can return multiple results. Use most_recent for AMIs or ensure your filters are specific enough to return exactly one result.
When both teams use Terraform, terraform_remote_state is more reliable than querying resources by tags:
# Better — explicit contract via outputs
data "terraform_remote_state" "networking" {
backend = "s3"
config = { ... }
}
# Riskier — depends on tags being correct
data "aws_vpc" "main" {
filter { ... }
}aws_iam_policy_document for IAMAlways prefer the aws_iam_policy_document data source over inline JSON:
source_policy_documentsThe external data source should be a last resort. It introduces dependencies on local tools and reduces portability. Check if a native provider or data source exists first.
Learn by doing with interactive courses on CopyPasteLearn:
Data sources are the bridge between Terraform-managed infrastructure and everything else. They enable you to build configurations that reference existing resources, query dynamic information, and integrate with external systems — all without duplicating resource management. Master data sources, and you'll write more flexible, maintainable, and collaborative Terraform code.
Master multi-account AWS management with Terraform. Learn provider aliases, cross-account IAM roles, AWS Organizations integration, and production-ready.
Learn how to implement Terraform state locking with AWS DynamoDB to prevent concurrent modifications and state corruption. Complete setup guide with examples.
Learn how to integrate Terraform with GitHub Actions for automated infrastructure deployments. Complete guide with workflows, best practices, and.
Master Terraform version constraints for Terraform core and providers. Covers operators, lock files, required_version, required_providers, and upgrade...