TerraformPilot

Troubleshooting

Fix Terraform Error - Timeout Waiting for State

Fix the Terraform timeout waiting for state error for RDS, EKS, CloudFront, and other slow resources. Increase timeouts and debug stuck resources.

LLuca Berton2 min read

Quick Answer

#

The resource takes longer to create/update/delete than Terraform's default timeout. Add a timeouts block to the resource with longer values. If the resource is stuck, check it in the cloud console — it may have failed or be waiting on a dependency.

The Error

#
Error: timeout while waiting for state to become 'ACTIVE'
  (last state: 'CREATING', timeout: 20m0s)
Error: waiting for RDS DB Instance (mydb) create: timeout while
  waiting for state to become 'available'
Error: error waiting for EKS Cluster (production) to create:
  timeout while waiting for state to become 'ACTIVE'

What Causes This Error

#

1. Slow Resource Creation

#

Some AWS resources genuinely take a long time:

ResourceTypical Create TimeDefault Timeout
RDS Instance10-30 min40 min
RDS Cluster (Aurora)15-45 min120 min
EKS Cluster15-25 min30 min
CloudFront Distribution15-30 min70 min
Elasticsearch/OpenSearch20-60 min60 min
Redshift Cluster10-20 min75 min
NAT Gateway2-10 min10 min
VPN Gateway Attachment5-15 min30 min

2. Resource Is Stuck

#

The resource hit an error during creation but hasn't transitioned to a failure state yet.

3. AWS Service Degradation

#

Regional outages or service issues can slow down resource provisioning.

4. Configuration Issues

#

Misconfigured VPC, subnet, or security group settings can cause resources to hang during creation.

How to Fix It

#

Solution 1: Increase Timeouts

#
resource "aws_db_instance" "main" {
  identifier     = "production-db"
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.r6g.xlarge"
  allocated_storage = 100
 
  timeouts {
    create = "60m"   # Default: 40m
    update = "80m"   # Default: 80m
    delete = "60m"   # Default: 60m
  }
}
 
resource "aws_eks_cluster" "main" {
  name     = "production"
  role_arn = aws_iam_role.eks.arn
 
  vpc_config {
    subnet_ids = var.subnet_ids
  }
 
  timeouts {
    create = "45m"   # Default: 30m
    update = "60m"   # Default: 60m
    delete = "30m"   # Default: 15m
  }
}
 
resource "aws_elasticsearch_domain" "main" {
  domain_name = "search-prod"
 
  timeouts {
    create = "90m"
    update = "90m"
    delete = "60m"
  }
}

Solution 2: Check Resource Status in Console

#
# Check RDS instance status
aws rds describe-db-instances --db-instance-identifier mydb \
  --query 'DBInstances[0].{Status:DBInstanceStatus,Event:PendingModifiedValues}'
 
# Check EKS cluster status
aws eks describe-cluster --name production \
  --query 'cluster.{Status:status,Issues:health.issues}'
 
# Check CloudFront distribution
aws cloudfront get-distribution --id E1234 \
  --query 'Distribution.Status'

Solution 3: Fix Underlying Configuration Issues

#

RDS stuck in "creating" often means VPC/subnet issues:

# Ensure DB subnet group spans multiple AZs
resource "aws_db_subnet_group" "main" {
  name       = "production-db"
  subnet_ids = var.private_subnet_ids  # Must span 2+ AZs
 
  tags = {
    Name = "Production DB subnet group"
  }
}
 
# Ensure security group allows the DB port
resource "aws_security_group" "db" {
  vpc_id = var.vpc_id
 
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.app_security_group_id]
  }
}

Solution 4: Resume After Timeout

#

If Terraform times out but the resource is still creating:

# Wait for the resource to finish in the console, then:
terraform plan
# Terraform will detect the now-active resource and update state
 
# Or refresh state explicitly
terraform apply -refresh-only

Solution 5: Use -parallelism to Reduce Load

#
# Reduce parallel operations when creating many slow resources
terraform apply -parallelism=5  # Default: 10
#
# RDS
timeouts { create = "60m"; update = "80m"; delete = "60m" }
 
# EKS Cluster
timeouts { create = "45m"; update = "60m"; delete = "30m" }
 
# EKS Node Group
timeouts { create = "60m"; update = "60m"; delete = "60m" }
 
# OpenSearch / Elasticsearch
timeouts { create = "90m"; update = "90m"; delete = "60m" }
 
# CloudFront
timeouts { create = "90m"; update = "60m"; delete = "30m" }
 
# Redshift
timeouts { create = "75m"; update = "75m"; delete = "40m" }

Troubleshooting Checklist

#
  1. ✅ What state is the resource stuck in? (Check cloud console)
  2. ✅ Is the resource actually still creating, or has it failed?
  3. ✅ Are VPC/subnet/security group settings correct?
  4. ✅ Is there an AWS service incident? (Check status.aws.amazon.com)
  5. ✅ Have you increased the timeout in the timeouts block?
  6. ✅ Can you resume with terraform apply -refresh-only?

Prevention Tips

#
  • Set explicit timeouts for RDS, EKS, OpenSearch, and CloudFront resources
  • Check AWS service health before large deployments
  • Use terraform plan to estimate what will be created
  • Reduce parallelism when creating many slow resources simultaneously
  • Monitor resource creation in the console during long applies
#

Conclusion

#

Timeout errors mean the resource took longer to provision than expected. Add a timeouts block with generous values for slow resources like RDS, EKS, and OpenSearch. If the resource is stuck, check the cloud console for underlying issues — VPC misconfigurations and service degradation are common culprits.

#timeout#provisioning#state

Share this article