Architect · DevOps

Infrastructure as Code

If your test environment is a snowflake that only one person can recreate, you're not testing your application. You're testing someone's memory. Learn how to codify environments.

Architect ISTQB CTAL-TAE v2.0 — Chapter 2 ~15 min read + exercise

1 The Hook — Why This Matters

A Christchurch logistics company had a staging environment that "worked on Tuesdays." Seriously. Every Tuesday, a cron job restarted the database with fresh seed data. On Wednesdays, accumulated test data caused timeout errors. One engineer knew the restart schedule, the seed scripts, and the manual fix for when it broke. When he went on holiday for two weeks, the staging environment failed three times and blocked two releases. The environment was not documented, not automated, and not reproducible.

Infrastructure as Code (IaC) is not a DevOps luxury. It is the prerequisite for test environments that you can trust. If you can't recreate your test environment from a Git repo in under 30 minutes, you don't have a test environment. You have a pet.

2 The Rule — The One-Sentence Version

Every test environment must be provisionable from version-controlled code, destroyed and recreated on demand, and identical in configuration to production minus scale and secrets.

"Works on my machine" is a developer joke. "Works on our one staging server that Dave set up" is a production incident waiting to happen. Codify everything.

3 The Analogy — Think Of It Like...

Analogy

A restaurant that only has one chef who knows the secret recipe.

When the chef is sick, the restaurant closes. When the chef leaves, the restaurant dies. A real restaurant writes down recipes, trains multiple cooks, and standardises portions. Infrastructure as Code is writing down the recipe for your test environment. Docker is the standardised kitchen. Terraform is the blueprint for building identical kitchens. Ephemeral environments are pop-up restaurants that appear when needed and vanish when done.

4 Watch Me Do It — Step by Step

Here is how to implement Infrastructure as Code for test environments.

  1. Docker for consistent test environments
    FROM mcr.microsoft.com/playwright:v1.42.0-jammy
    WORKDIR /tests
    COPY package*.json ./
    RUN npm ci
    COPY . .
    ENTRYPOINT ["npx", "playwright", "test"]

    Best practice: Pin base image versions (node:20.11.0-slim not node:20-slim) to prevent drift.

  2. Terraform for test infrastructure
    variable "test_run_id" {}
    resource "aws_db_instance" "test" {
      identifier        = "test-${var.test_run_id}"
      instance_class    = "db.t3.micro"
      allocated_storage = 20
      tags = {
        Environment = "ephemeral-test"
        TTL         = "2h"
      }
    }

    Use separate Terraform workspaces per environment to prevent state collisions.

  3. Ephemeral environments per PR

    Workflow:

    1. Developer opens PR
    2. CI triggers Terraform/Helm to provision environment
    3. Automated smoke tests run against preview URL
    4. Reviewer clicks preview link for manual validation
    5. PR merges → environment auto-destroyed

    Benefits: true isolation, production parity, faster feedback, cost efficiency. Use shared infrastructure (ALB, ECS cluster) with per-PR namespaces to minimise cost.

  4. Environment parity spectrum
    AspectDevTestProd
    CodeFeature branchMain branchMain branch
    ConfigLocal overridesProduction-likeProduction
    DataSynthetic/minimalMasked subset or syntheticLive
    ScaleMinimalProduction-like sizingFull scale
    ServicesMocksReal integrationsReal integrations
Pro tip: Under NZ Privacy Act 2020, production data in test environments requires careful handling. Default to synthetic data for all automated tests. Use masked production subsets only for manual exploratory testing with access controls. Terraform + synthetic data generation tools (GenRocket, Delphix) provide compliance-friendly test data at scale.

5 When to Use It / When NOT to Use It

✅ Use IaC when...

  • Environment drift causes "works on my machine" bugs
  • Multiple engineers need identical environments
  • CI requires reproducible test beds
  • Compliance requires documented environment configs

❌ Skip IaC when...

  • Single static environment that never changes
  • Team lacks DevOps expertise
  • Provisioning takes longer than the test it supports

6 Common Mistakes — Don't Do This

🚫 Environment snowflakes

I used to think: Our staging environment is stable; we don't need to recreate it often.
Actually: If you can't recreate it from code, it's a snowflake. When it breaks — and it will — you'll spend days debugging configuration drift instead of testing. Every environment should be destroyable and rebuildable on demand.

🚫 Ignoring data privacy in test environments

I used to think: Test environments are internal; privacy rules don't apply.
Actually: NZ Privacy Act 2020 requires reasonable safeguards for personal information in all environments. Using unmasked production data in automated tests is a compliance risk. Synthetic data is safer, faster, and more deterministic.

🚫 Perfect parity obsession

I used to think: Test environments must be identical to production in every way.
Actually: Perfect parity is impossible and unnecessary. Target "production-like" for the dimensions that matter: code version, config schema, service topology, and auth patterns. Scale and data volume can differ. Focus on catching the bugs that matter, not achieving theoretical perfection.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Scenario: Your staging environment fails every two weeks with mysterious database connection errors. The "fix" is always the same: restart the database server and clear logs. No one knows why it fails. The environment was set up by a contractor two years ago. There is no documentation.

What is your remediation approach?

The approach:

  1. Audit: Document the current environment state: OS versions, installed packages, config files, database settings, network topology. Treat this as archaeological work.
  2. Codify: Write Terraform/Docker configurations that reproduce the current environment. Test by spinning up a new instance and verifying tests pass.
  3. Replace: Once codified, destroy the old snowflake and use the new IaC-managed environment exclusively.
  4. Prevent: Add CI checks that verify environment configs match the repo. Alert if drift is detected. Never allow manual changes to running environments.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to practice.

Q1. What are the three key benefits of ephemeral environments?

True isolation (no cross-PR contamination), production parity (same IaC definitions), and cost efficiency (environments exist only when needed, typically 8-24 hours).

Q2. Why should you pin Docker base image versions instead of using floating tags?

Floating tags like node:20-slim can change silently when the vendor updates the image, causing "works on my machine" bugs. Pinning to exact versions (node:20.11.0-slim) ensures reproducible builds.

Q3. How does Infrastructure as Code help with NZ Privacy Act 2020 compliance?

IaC enables documented, reproducible environments where synthetic data can be provisioned consistently. This reduces reliance on production data in tests. Terraform + data masking tools provide audit trails for environment configuration.

9 Interview Prep — Architect Q&A

Infrastructure and environment questions test your understanding of reproducibility, compliance, and system design. These are critical architect skills.

Q. "How does IaC change your approach to test environment provisioning?"

Without IaC, provisioning is manual, error-prone, and slow. With IaC, provisioning is automated, reproducible, and fast. This transforms your testing strategy: you can now spin up true production parity environments per PR, run tests, and destroy them. This eliminates "environment drift" where the test environment drifts from production and tests pass in test but fail in prod. IaC also enables ephemeral environments: each PR gets its own isolated test bed, no cross-test contamination. And it enables scalability: provision 10 identical environments for parallel test execution. The cost is learning Terraform or CloudFormation. The benefit is that test reliability jumps dramatically.

Q. "What's the relationship between IaC and test data management?"

IaC codifies the environment. Test data management codifies the state. Together, they enable reproducible testing: spin up environment from Terraform, seed it with test data from a deterministic script, run tests, destroy. For NZ teams under Privacy Act 2020, IaC + synthetic data generation is powerful: instead of copying production data (privacy risk), you generate synthetic data that mimics production structure without exposing real customer information. Tools like GenRocket or Delphix integrate with Terraform to provision masked or synthetic data on demand. This is the future: Infrastructure as Code + Data as Code.

Q. "How do you handle secrets and credentials in IaC test environments?"

Never store secrets in Terraform code or version control. Use a secrets manager: AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. IaC reads from the secrets manager at runtime, not from code. For test environments, use separate secrets from production: test API keys, test database passwords, test OAuth credentials. Limit access: only the CI/CD system and the QA team can read test secrets. Rotate them monthly. For NZ deployments, keep secrets on-shore if sensitive. Document your secrets strategy in the CoE runbook. Auditors love seeing this documented.

Q. "How would you test infrastructure code itself?"

Treat Terraform/CloudFormation like application code: version it, review it, test it. Test types: 1) Linting (Terraform validate, tflint to catch syntax/style errors), 2) Policy checks (Checkov or Sentinel to enforce security policies: "no public databases," "all disks encrypted"), 3) Integration tests (spin up a test environment, verify services communicate), and 4) Compliance checks (ensure the provisioned environment meets regulatory requirements). Run these in CI before Terraform applies to production. For critical environments, require human approval after all tests pass.