Terraform for AI Companions: Real-Time Voice and Chat Backends

AI companions — voice and chat agents with persistent memory — are one of the breakout 2026 product categories. They look simple from the outside and look terrifying inside: real-time inference, low-latency voice, durable per-user memory, content moderation, and ruthless autoscaling. Terraform turns the backend into a single deployable unit.

This guide shows how to provision an AI companion backend on AWS.

Architecture

Layer	AWS service
Real-time edge	API Gateway WebSocket, CloudFront
Voice	Amazon Polly + Transcribe Streaming
LLM inference	Bedrock or SageMaker real-time endpoint
Per-user memory	DynamoDB + OpenSearch (vectors)
Moderation	Bedrock Guardrails + Comprehend
Scaling	Application Auto Scaling, SQS backpressure

Low-Latency WebSocket Front Door

resource "aws_apigatewayv2_api" "companion_ws" {
  name                       = "companion-ws"
  protocol_type              = "WEBSOCKET"
  route_selection_expression = "$request.body.action"
}
 
resource "aws_apigatewayv2_route" "send" {
  api_id    = aws_apigatewayv2_api.companion_ws.id
  route_key = "send"
  target    = "integrations/${aws_apigatewayv2_integration.send.id}"
}
 
resource "aws_apigatewayv2_integration" "send" {
  api_id           = aws_apigatewayv2_api.companion_ws.id
  integration_type = "AWS_PROXY"
  integration_uri  = aws_lambda_function.turn.invoke_arn
}

Per-User Memory (Hot + Vector)

resource "aws_dynamodb_table" "memory" {
  name         = "companion_memory"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "user_id"
  range_key    = "ts"
 
  attribute {
    name = "user_id"
    type = "S"
  }
  attribute {
    name = "ts"
    type = "N"
  }
 
  ttl {
    attribute_name = "expires_at"
    enabled        = true
  }
 
  point_in_time_recovery { enabled = true }
}
 
resource "aws_opensearch_domain" "memory_vectors" {
  domain_name    = "companion-vectors"
  engine_version = "OpenSearch_2.15"
 
  cluster_config {
    instance_type          = "r7g.large.search"
    instance_count         = 3
    zone_awareness_enabled = true
    zone_awareness_config {
      availability_zone_count = 3
    }
  }
 
  ebs_options {
    ebs_enabled = true
    volume_size = 200
    volume_type = "gp3"
  }
 
  encrypt_at_rest { enabled = true }
  node_to_node_encryption { enabled = true }
  domain_endpoint_options {
    enforce_https       = true
    tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
  }
}

Moderation (Bedrock Guardrails)

Bedrock Guardrails ship via Terraform's aws_bedrock_guardrail:

resource "aws_bedrock_guardrail" "companion" {
  name                      = "companion-safety"
  blocked_input_messaging   = "I can't help with that."
  blocked_outputs_messaging = "I can't help with that."
 
  content_policy_config {
    filters_config {
      input_strength  = "HIGH"
      output_strength = "HIGH"
      type            = "SEXUAL"
    }
    filters_config {
      input_strength  = "HIGH"
      output_strength = "HIGH"
      type            = "VIOLENCE"
    }
    filters_config {
      input_strength  = "HIGH"
      output_strength = "HIGH"
      type            = "HATE"
    }
  }
 
  sensitive_information_policy_config {
    pii_entities_config {
      action = "ANONYMIZE"
      type   = "EMAIL"
    }
    pii_entities_config {
      action = "BLOCK"
      type   = "CREDIT_DEBIT_CARD_NUMBER"
    }
  }
}

Auto-Scaling Inference

resource "aws_appautoscaling_target" "endpoint" {
  service_namespace  = "sagemaker"
  resource_id        = "endpoint/${aws_sagemaker_endpoint.companion.name}/variant/AllTraffic"
  scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
  min_capacity       = 2
  max_capacity       = 50
}
 
resource "aws_appautoscaling_policy" "endpoint_invocations" {
  name               = "companion-invocations"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.endpoint.scalable_dimension
 
  target_tracking_scaling_policy_configuration {
    target_value = 70
    predefined_metric_specification {
      predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
    }
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

Best Practices

Pin moderation versions in Terraform — don't let safety policy drift through console clicks.
TTL old memories to limit privacy risk and cost.
Stream audio over WebSockets with Polly streaming for sub-300 ms turn latency.
Backpressure with SQS between WebSocket and inference so a viral spike degrades gracefully.

Architecture

Low-Latency WebSocket Front Door

Per-User Memory (Hot + Vector)

Moderation (Bedrock Guardrails)

Auto-Scaling Inference

Best Practices

Terraform for AI-Native Development Platforms on AWS

Terraform for Domain-Specific Language Models on AWS

Terraform for Hyperscale AI Data Centers: Multi-Region Patterns

Terraform for Mechanistic Interpretability Research Infrastructure

Architecture

Low-Latency WebSocket Front Door

Per-User Memory (Hot + Vector)

Moderation (Bedrock Guardrails)

Auto-Scaling Inference

Best Practices

Related

Related articles

Terraform for AI-Native Development Platforms on AWS

Terraform for Domain-Specific Language Models on AWS

Terraform for Hyperscale AI Data Centers: Multi-Region Patterns

Terraform for Mechanistic Interpretability Research Infrastructure