TerraformPilot

Terraform

Terraform External Data Source - Run Scripts and Commands

Use the Terraform external data source to run scripts and fetch data from APIs. Shell scripts, Python scripts, JSON output, and common use cases.

LLuca Berton1 min read

Quick Answer

#
data "external" "ip" {
  program = ["bash", "-c", "echo '{\"ip\": \"'$(curl -s ifconfig.me)'\"}'" ]
}
 
output "my_ip" {
  value = data.external.ip.result.ip
}

How It Works

#

The external data source:

  1. Runs a program/script
  2. Passes query parameters as JSON to stdin
  3. Expects a JSON object on stdout
  4. All values must be strings

Shell Script Example

#
data "external" "latest_ami" {
  program = ["bash", "${path.module}/scripts/get-latest-ami.sh"]
 
  query = {
    region = var.region
    os     = "ubuntu"
  }
}
 
output "ami_id" {
  value = data.external.latest_ami.result.ami_id
}
#!/bin/bash
# scripts/get-latest-ami.sh
 
# Read JSON input from stdin
eval "$(jq -r '@sh "REGION=\(.region) OS=\(.os)"')"
 
# Query AWS
AMI_ID=$(aws ec2 describe-images \
  --region "$REGION" \
  --owners 099720109477 \
  --filters "Name=name,Values=${OS}/images/hvm-ssd/*" \
  --query 'sort_by(Images, &CreationDate)[-1].ImageId' \
  --output text)
 
# Output JSON (all values must be strings)
jq -n --arg ami_id "$AMI_ID" '{"ami_id": $ami_id}'

Python Script Example

#
data "external" "config" {
  program = ["python3", "${path.module}/scripts/get-config.py"]
 
  query = {
    environment = var.environment
    service     = "api"
  }
}
#!/usr/bin/env python3
# scripts/get-config.py
import json
import sys
 
# Read query from stdin
query = json.load(sys.stdin)
env = query["environment"]
service = query["service"]
 
# Your logic here
config = {
    "endpoint": f"https://{service}.{env}.example.com",
    "replicas": "3" if env == "production" else "1",
    "log_level": "warn" if env == "production" else "debug",
}
 
# Output JSON (all values must be strings!)
json.dump(config, sys.stdout)

Fetch from API

#
data "external" "github_release" {
  program = ["bash", "-c", <<-EOF
    RELEASE=$(curl -s https://api.github.com/repos/hashicorp/terraform/releases/latest)
    VERSION=$(echo $RELEASE | jq -r '.tag_name')
    DATE=$(echo $RELEASE | jq -r '.published_at')
    jq -n --arg v "$VERSION" --arg d "$DATE" '{"version": $v, "date": $d}'
  EOF
  ]
}

Read from Vault/SSM

#
data "external" "secret" {
  program = ["bash", "-c", <<-EOF
    SECRET=$(aws ssm get-parameter \
      --name "/myapp/api-key" \
      --with-decryption \
      --query 'Parameter.Value' \
      --output text)
    jq -n --arg s "$SECRET" '{"value": $s}'
  EOF
  ]
}

Important Rules

#
  1. Output must be valid JSON — a flat object with string values only
  2. No nested objects or arrays — flatten everything to strings
  3. Stderr goes to Terraform output — use for error messages
  4. Runs on every plan — avoid slow or rate-limited operations
  5. Non-zero exit code = error — Terraform treats it as a data source failure
# ❌ Wrong — nested object
echo '{"config": {"port": 8080}}'
 
# ❌ Wrong — non-string value  
echo '{"port": 8080}'
 
# ✅ Correct — flat object with strings
echo '{"port": "8080", "host": "localhost"}'

Error Handling

#
#!/bin/bash
set -e
 
# Validate inputs
eval "$(jq -r '@sh "REGION=\(.region)"')" 2>/dev/null
if [ -z "$REGION" ]; then
  echo "Error: region is required" >&2
  exit 1
fi
 
# Your logic...

When NOT to Use

#
  • AWS/GCP/Azure data → Use native data sources (aws_ami, etc.)
  • HTTP APIs → Use the http data source
  • Secrets → Use provider-specific data sources (vault_generic_secret, aws_ssm_parameter)
  • Slow operations → Runs on every plan, will slow your workflow
#

Conclusion

#

The external data source is an escape hatch for data Terraform can't fetch natively. Use it for custom scripts, legacy APIs, or complex data lookups. Keep scripts fast, handle errors properly, and always output flat JSON with string values. Prefer native Terraform data sources when available.

#Terraform#DevOps#Infrastructure as Code

Share this article