TerraformPilot

DevOps

Terraform for Embryo Scoring and Reproductive Genomics

Provision reproductive-genomics ML infrastructure with Terraform: secure compute, data governance, ML pipelines, privacy controls, and regulated storage.

LLuca Berton1 min read

Embryo scoring and reproductive genomics are one of the most ethically loaded 2026 trends — and one of the most data-sensitive. Polygenic risk scoring of embryos requires reproducible ML pipelines, locked-down PHI handling, careful auditing, and per-clinic isolation. Terraform makes those guarantees executable rather than aspirational.

This guide shows how to provision a reproductive-genomics scoring backend on AWS.

Architecture

#
LayerAWS service
Sequencing intakeHealthOmics Sequence Store + S3
Variant callingHealthOmics Workflows
PRS scoringSageMaker batch transform
ReportingLambda + signed PDFs in S3
Per-clinic isolationAccount-per-clinic + Organizations
Consent ledgerDynamoDB + KMS-signed digests

Per-Clinic Account Vending

#
resource "aws_organizations_account" "clinic" {
  for_each = var.clinics
 
  name      = each.value.name
  email     = each.value.ops_email
  parent_id = aws_organizations_organizational_unit.clinics.id
 
  lifecycle {
    ignore_changes = [role_name]
  }
}

Each clinic gets its own AWS account, baselined by Terraform, with no cross-clinic IAM trust.

#
resource "aws_dynamodb_table" "consent" {
  name         = "consent_ledger"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "patient_id"
  range_key    = "consent_version"
 
  attribute {
    name = "patient_id"
    type = "S"
  }
  attribute {
    name = "consent_version"
    type = "S"
  }
 
  stream_enabled   = true
  stream_view_type = "NEW_IMAGE"
 
  server_side_encryption {
    enabled     = true
    kms_key_arn = aws_kms_key.consent.arn
  }
 
  point_in_time_recovery { enabled = true }
}
 
resource "aws_kms_key" "consent" {
  description         = "Consent ledger CMK"
  enable_key_rotation = true
  deletion_window_in_days = 30
}

A scheduled Lambda hashes the day's consent records and signs the digest with KMS — providing tamper-evidence even against a malicious admin.

Variant-Calling Workflow

#
resource "aws_omics_workflow" "embryo_secondary" {
  name             = "embryo-secondary"
  engine           = "NEXTFLOW"
  storage_capacity = 1200
  definition_uri   = "s3://${aws_s3_bucket.workflows.bucket}/embryo.zip"
 
  parameter_template = jsonencode({
    sample_id   = { description = "Embryo sample id", optional = false }
    fastq_uris  = { description = "Sample FASTQ",     optional = false }
  })
}

PRS Scoring Pipeline

#
resource "aws_sagemaker_pipeline" "prs" {
  pipeline_name = "embryo-prs"
  role_arn      = aws_iam_role.sagemaker.arn
 
  pipeline_definition = jsonencode({
    Version = "2020-12-01"
    Steps = [
      {
        Name = "BatchTransform"
        Type = "Transform"
        Arguments = {
          ModelName = aws_sagemaker_model.prs.name
          TransformInput = {
            DataSource = {
              S3DataSource = {
                S3DataType = "S3Prefix"
                S3Uri      = "s3://${aws_s3_bucket.variants.bucket}/incoming/"
              }
            }
            ContentType = "text/csv"
          }
          TransformOutput = {
            S3OutputPath = "s3://${aws_s3_bucket.scores.bucket}/output/"
          }
          TransformResources = {
            InstanceType  = "ml.m5.4xlarge"
            InstanceCount = 4
          }
        }
      }
    ]
  })
}

Signed Report Delivery

#
resource "aws_s3_bucket" "reports" {
  bucket              = "acme-embryo-reports"
  object_lock_enabled = true
}
 
resource "aws_s3_bucket_object_lock_configuration" "reports" {
  bucket = aws_s3_bucket.reports.id
  rule {
    default_retention {
      mode  = "COMPLIANCE"
      years = 25
    }
  }
}

Best Practices

#
  • Account-per-clinic, never namespace-per-clinic. Blast radius matters more than ergonomics here.
  • Ledger every consent change with KMS-signed digests; data deletion must be cross-checked against consent withdrawal.
  • Pin the model version in the SageMaker pipeline — embryo selection from a year-old model under a new clinical guideline is a serious problem.
  • Object Lock the reports — patients return for them years later, and they must be untampered.
  • Restrict outbound network from compute — variant data should never be able to leave.
#
#Terraform#Genomics#AWS#ML#Privacy

Share this article