Architect · Infrastructure

Scalability Patterns

A suite that takes 4 hours to run is a suite that doesn't run. Learn how to scale automation infrastructure from tens to thousands of tests without linear cost growth.

Architect ISTQB CTAL-TAE v2.0 — Chapter 3 ~15 min read + exercise

1 The Hook — Why This Matters

A global SaaS company with a NZ development office had a 3,000-test suite that ran sequentially in 6 hours. Developers committed code at 9am and got feedback at 3pm. By then, they had moved on to other tasks. Context-switching cost alone was estimated at 2 hours per developer per day. When they implemented sharding across 20 parallel workers, the suite dropped to 18 minutes. Developer productivity improved 15%. The infrastructure cost increase was $400/month. The ROI was visible in the first sprint.

Scalability is not a luxury. It's the difference between automation that enables speed and automation that becomes a bottleneck.

2 The Rule — The One-Sentence Version

Test execution time must scale sub-linearly with test count. If doubling your tests doubles your runtime, your architecture is broken.

Parallel execution, intelligent test selection, and distributed grids are not optimisations. They are requirements for any suite with more than 100 tests. Without them, your feedback loop dies.

3 The Analogy — Think Of It Like...

Analogy

A supermarket with one checkout lane for 1,000 customers.

It doesn't matter how fast the cashier is. The queue will wrap around the building. Parallel execution is adding lanes. Sharding is pre-sorting customers by basket size. Containerisation is ensuring every lane has identical equipment. A scalable checkout system handles 1,000 customers in the time one lane handles 50. That's sub-linear scaling.

4 Watch Me Do It — Step by Step

Here is how to scale test execution from tens to thousands of tests.

Choose a sharding strategy

Approach	How It Works	Best For
File-based	Split test files across workers	Large suites with file-level independence
Spec-based	Split by individual test/spec	Granular control, uneven durations
Dynamic	Orchstrator assigns to idle workers	Maximising resource utilisation
Browser-based	Each shard runs different browser	Cross-browser matrix coverage

Playwright native sharding: --shard-index / --shard-total works seamlessly with GitHub Actions matrix strategy.

Containerise with Docker
```
FROM mcr.microsoft.com/playwright:v1.42.0-jammy
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["npx", "playwright", "test"]
```
Benefits: consistent environment regardless of host OS, isolated dependencies, reproducible failures, scalable execution.
Scale with Kubernetes
- Namespace per test run: Full isolation, no cross-test contamination
- Helm charts: Standardised test bed deployment
- HPA: Scale workers based on queue depth
- Ephemeral containers: Spin up browsers on demand
Tools: Moon (Aerokube), Selenium Grid 4 with native K8s support.

Cloud provider comparison

Provider	Strength	Best For
BrowserStack	Massive device/browser matrix	Cross-browser web, mobile web
Sauce Labs	Unified platform, compliance	Regulated enterprises
LambdaTest	High concurrency	High-volume parallel execution
Self-hosted (Moon/Grid)	Full control, no per-minute costs	Data-sensitive, predictable workloads

Pro tip: For teams running >500 hours/month of tests, self-hosted Kubernetes grids often beat cloud per-minute pricing. Break-even analysis typically shows payoff at 12-18 months for dedicated infrastructure. AWS launched an Auckland region in 2024, improving latency for NZ-based teams.

5 When to Use It / When NOT to Use It

✅ Scale infrastructure when...

Suite exceeds 30 minutes runtime
Multiple teams share execution resources
Cross-browser matrix is required
CI queue times are blocking developers

❌ Don't over-scale when...

Suite runs in <10 minutes already
Tests have shared state preventing parallelism
Budget doesn't cover infrastructure maintenance

6 Common Mistakes — Don't Do This

🚫 Parallelising without test isolation

I used to think: More workers means faster tests.
Actually: If tests share data or state, parallel execution creates race conditions and flaky failures. Guarantee isolation before adding workers. Isolation is a prerequisite, not an afterthought.

🚫 Ignoring per-minute cloud costs

I used to think: Cloud grids are cheap because we only pay for what we use.
Actually: At high volume, per-minute pricing compounds. A team running 1,000 hours/month at $0.20/minute spends $12,000/month. Self-hosted infrastructure often breaks even at 12-18 months. Model your costs before committing.

🚫 One-size-fits-all scaling

I used to think: We should use the same infrastructure for smoke tests and full regression.
Actually: Smoke tests need fast, small runners. Full regression needs large, parallel grids. Performance tests need dedicated, isolated environments. Design tiered infrastructure rather than forcing everything through the same pipe.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Scenario: Your 500-test suite takes 45 minutes on 4 workers. Management wants it under 10 minutes. Your tests are well-isolated. Budget for infrastructure is $2,000/month.

What is your scaling strategy?

The strategy:

Shard aggressively: Split 500 tests across 20 workers (25 tests each). Target runtime: ~2-3 minutes per worker = 3 minutes total (plus overhead).
Use GitHub Actions matrix: Define a matrix with 20 shards. Use Playwright's --shard natively.
Cache everything: Node modules, browser binaries, and build artifacts. Caching can cut setup time by 70%.
Cost check: 20 concurrent runners on GitHub Actions (Linux) costs approximately $0.008/minute x 20 x 10 min x ~20 runs/day = ~$320/month. Well within budget.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to practice.

Q1. What is the difference between file-based and dynamic sharding?

File-based splits test files evenly across workers. Dynamic sharding assigns individual tests to idle workers based on real-time queue depth, maximising resource utilisation when test durations vary widely.

Q2. Why is containerisation valuable for test execution?

Containers provide consistent environments regardless of host OS, isolate dependencies (browser versions, drivers), enable reproducible failures, and allow each container to act as an independent test worker.

Q3. When does self-hosted infrastructure beat cloud per-minute pricing?

Typically when running >500 hours/month consistently. Break-even is usually 12-18 months for dedicated infrastructure. For variable or lower-volume workloads, cloud grids offer better cost flexibility.

9 Interview Prep — Architect Q&A

Scalability questions test your understanding of performance, cost, and infrastructure. These are core architect conversations.

Q. "How do you scale test execution from 100 to 10,000 test cases?"

Linear scaling is broken: if one test takes 1 minute, 10,000 tests take 10,000 minutes sequentially. Sub-linear scaling requires: 1) Parallelisation (shard tests across workers), 2) Isolation (tests don't share state), 3) Containerisation (workers are identical, reduce setup overhead), and 4) Intelligent selection (not all 10,000 tests run on every change). Practically: start with 4-8 workers (GitHub Actions matrix), implement file-based sharding, add caching for dependencies. At 10,000 tests, you'll need 20-50 workers and dynamic sharding. Cost model: estimate infrastructure cost at scale and decide: cloud per-minute pricing or self-hosted Kubernetes. For NZ teams with predictable workloads, self-hosted often wins at 500+ hours/month.

Q. "What's the cost of maintaining highly scalable test infrastructure?"

Hidden costs beyond tooling: 1) DevOps expertise (someone needs to maintain Kubernetes, Terraform, grid orchestrators), 2) Observability (instrumenting so you catch failures fast), 3) Test maintenance (more tests = more flakiness to debug), 4) Vendor lock-in (if you pick a cloud grid, switching is expensive). For a 10-person team: expect 0.5 FTE for infrastructure maintenance. For 50-person team: 2 FTE. Don't under-invest in the platform; under-investment means flaky tests and frustrated teams. Model the total cost: tooling + people + infrastructure. If it's >20% of engineering salaries, reconsider. If it enables 30% faster feedback, it's worth it.

Q. "How do you balance test coverage with execution time at scale?"

You can't run all tests on every change; it's too slow. Use risk-based selection: identify which tests are most likely to catch regressions given the code change. Run those on PR. Run full regression nightly. For 10,000 tests, this might mean: run 100 tests on every PR (5 minutes), 500 tests every 4 hours (20 minutes), 10,000 tests nightly. This maximises feedback speed while maintaining confidence. Measure: escape rate (do we miss bugs?), false positive rate (do we block valid changes?). Adjust selection criteria quarterly.

Q. "What patterns would you use for distributed test execution?"

Common patterns: 1) Master-agent (one orchestrator assigns tests to workers), 2) Peer-to-peer (workers negotiate among themselves), 3) Message queue (tests are queued; workers pull them on demand). For most teams, master-agent is simplest. Tools: Kubernetes with Helm (each test gets a pod), BrowserStack/LambdaTest (cloud grid), self-hosted Selenium Grid or Moon (open-source Kubernetes browser grid). Key principle: workers should be stateless and identical. If worker-1 fails, worker-2 can pick up its tests. Use persistent storage only for results; never for test state. Monitor worker health aggressively; remove unhealthy workers from the pool.

← All Architect learning Practice pages →