Test Strategy · Junior through Lead

Regression Testing

Every change to working software is a risk. Regression testing confirms that changes haven’t broken what was already working. It’s the safety net that enables teams to move fast with confidence.

Junior Senior Test Lead

1 The Hook

A Dunedin fintech ships a small, “obviously safe” change: they tidy up how dates are formatted on a customer statement. The developer tests the statement screen, it looks perfect, and it goes out on a Friday afternoon.

Over the weekend the nightly direct-debit batch fails silently. The same date-formatting helper that drew the statement was also used to build the bank file, and the new format broke the parser at the bank’s end. Nobody re-ran the payments tests, because “we only touched the statement page.” On Monday hundreds of customers had missed payments and a pile of dishonour fees. The change worked exactly as intended — and quietly broke something nobody thought to re-check.

This is the pattern behind regression defects: the new thing works, but a shared component, side effect, or hidden dependency breaks something that used to work. A change is only safe once you have re-checked what it could have affected — not just the part you changed.

2 The Rule

Every change can break something that already worked, so after any change you must re-run a set of previously passing tests — scoped to the change’s blast radius — to confirm nothing regressed, not just that the new thing works.

3 The Analogy

Analogy

Renovating one room in an old villa.

You renovate the kitchen of a 1920s Kiwi villa. The new kitchen looks great — but a sensible builder then checks the rest of the house, because the wiring and plumbing are shared. Did the new oven trip the old fuse box? Did moving the sink drop the water pressure in the bathroom? The renovation “works”, but a good builder confirms the lights still come on in the lounge and the shower still runs hot before calling it done.

Regression testing is walking the rest of the house after the reno. You do not re-test every room every time — you check the ones on the same circuit as what you changed. A full rewire warrants checking the whole house; swapping a tap warrants a quick look at the rooms downstream.

What it is

Regression testing re-executes previously passing tests to verify that new code changes haven’t introduced regressions — bugs in functionality that was already working. It’s run after every change: bug fixes, new features, refactors, infrastructure updates.

The challenge: the regression suite grows with every release, but testing time stays fixed. This forces decisions about which tests to run.

Selection strategies

  • Full regression — run everything. Accurate but slow. Used for major releases or when change scope is large.
  • Risk-based selection — run tests for the changed areas plus any areas they interact with. Balances coverage and speed.
  • Change-impact analysis — trace which code changed and which tests cover that code. Run only those tests. Requires good traceability.
  • Core / smoke regression — run the most critical tests only. Used for quick confidence checks after small changes.

In practice: most teams run a tiered regression suite — smoke tests on every PR, a targeted suite nightly, and a full regression weekly or before release. The tiers are defined by the test lead based on risk and run time.

Automation and regression

Regression testing is the strongest argument for test automation. Manual regression on a large suite is unsustainable — it’s repetitive, error-prone, and gets skipped under time pressure.

Good automation candidates for regression: stable features, critical paths, high-risk areas, tests that run on every build. Poor candidates: UI tests that change frequently, exploratory or investigation testing.

Maintaining the regression suite

A regression suite that’s never pruned becomes a liability. Signs it needs attention:

  • Flaky tests (intermittent failures) erode confidence in the whole suite
  • Tests for deleted features still run and waste time
  • The suite takes so long to run that teams skip it
  • Duplicate tests cover the same path

Review and prune the regression suite at least quarterly. Delete tests that no longer add value; fix or quarantine flaky tests.

Regression vs confirmation testing

  • Confirmation testing (retest) — verify a specific defect has been fixed. Directly re-run the test case that failed.
  • Regression testing — verify nothing else broke as a side effect of the fix. Broader in scope.

Every defect fix should trigger both: a retest of the original bug, and a regression of the surrounding area.

Practice this technique: Try Senior Practice 09 — Cross-level regression.

4 Now You Try

Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot: pick the regression strategy

A NZ utility’s self-service portal has an 1,100-test regression suite. A change has come in: the shared currency-formatting helper (used on bills, the dashboard, and the bank-file export) was modified to add a thousands separator. Which regression strategy fits — full, risk-based selection, change-impact, or smoke only — and why? Name the specific areas you would re-test.

Show model answer
Chosen strategy: risk-based selection driven by change-impact analysis (not smoke only, and not necessarily the full 1,100 tests).

Why: a change to a SHARED helper has a wide blast radius — every place that uses currency formatting is at risk, not just the one screen the developer was thinking about. Smoke only is too shallow; it would likely miss the bank-file break. Full regression is defensible if traceability is poor, but if you can trace usages you can scope it precisely.

Specific areas to re-test: every consumer of the currency helper — bill display, the dashboard totals, and crucially the bank-file export (machine-read formats break easily on a new separator). Also any CSV/PDF outputs and any downstream parser that reads those amounts. The bank-file export is the highest-risk consumer because a human eyeballing a bill would forgive a comma, but a parser will not.
🔧 Exercise 2 of 3 — Fix: repair a broken regression decision

A team lead made the call below for a release. It is wrong on several counts. Rewrite the regression approach with the right strategy for each change and fix the reasoning errors.

Flawed call:
“We refactored the whole pricing engine but no new features were added, so smoke tests are enough. We also fixed a typo in a tooltip — run the full 1,100-test suite to be safe. The pricing fix passed its own retest, so we don’t need any regression around it. And we’ll skip the flaky payment tests since they keep failing.”

Rewrite the regression approach:

Show model answer
Pricing engine refactor — needs FULL (or broad risk-based) regression, not smoke. "No new features" does not mean low risk; an internal refactor of a core engine can break behaviour anywhere it is used. This is the highest-risk change in the release.

Tooltip typo — needs SMOKE only, not the full 1,100 tests. A static text change has essentially no blast radius. Running the full suite here wastes hours for no risk reduction; the lead has the two changes exactly backwards.

"Retest is enough, no regression needed" — wrong. Confirmation testing (retest) only proves the specific bug is fixed. Regression checks nothing ELSE broke as a side effect of the fix. Every fix should trigger both: a retest of the bug and a regression of the surrounding area.

Skipping the flaky payment tests — do not just skip them. Flaky tests on a critical path (payments) hide real failures. Quarantine and FIX them, or replace them with reliable tests; silently skipping payment regression is how a real payment defect escapes. Flakiness is a maintenance problem to solve, not a reason to stop testing payments.
🏗️ Exercise 3 of 3 — Build: design a tiered regression suite

You are the test lead for a NZ online grocery service. Design a tiered regression suite: define the tiers (e.g. smoke, targeted, full), say what runs in each tier and when it is triggered, name candidates for automation vs manual, and describe how you would keep the suite healthy over time (flaky tests, retired features, duplicates).

Show model answer
A strong tiered design:

Tier 1 — Smoke / core: ~20-40 critical-path tests (log in, add to cart, checkout, payment, place order). Triggered on every PR / every build. Fast, fully automated. Purpose: quick confidence that the app is not fundamentally broken.

Tier 2 — Targeted / risk-based: tests for the changed areas plus their interactions and shared components. Triggered nightly and on every change before merge to the release branch. Mostly automated, scoped by change-impact analysis.

Tier 3 — Full regression: the entire suite. Triggered weekly and before every production release, plus after any large or high-risk change (e.g. a payment-gateway switch or a core-engine refactor). Automated where stable; some exploratory/manual checks for areas automation cannot cover well.

Automation candidates: stable features, critical paths, high-risk areas, anything run on every build. Keep manual: frequently changing UI, exploratory and investigation testing, one-off checks.

Keeping it healthy: review and prune at least quarterly. Quarantine and fix flaky tests rather than ignoring them (flakiness on payments is dangerous). Delete tests for retired features and de-duplicate tests covering the same path. The goal is a suite the team trusts and actually runs — an unpruned suite that takes too long gets skipped under pressure, which defeats the point.

Self-Check

Click each question to reveal the answer.

Q1: What is the difference between regression testing and confirmation (retest)?

Confirmation testing re-runs the specific failed test to verify one defect is now fixed. Regression testing checks that nothing else broke as a side effect of the change — it is broader in scope. Every defect fix should trigger both: a retest of the original bug and a regression of the surrounding area.

Q2: Why is “no new features were added” a poor reason to skip deep regression on a refactor?

Internal changes to a core component can break behaviour anywhere that component is used, even with no visible feature change. A refactor of a shared engine often has a wide blast radius, so it warrants broad or full regression — it can be one of the highest-risk changes in a release precisely because it touches so much.

Q3: How does the size of a change’s blast radius map to the regression strategy you choose?

Isolated static changes (a text/typo fix) need smoke only. A change with a known, bounded blast radius (a new feature on one screen) suits risk-based or change-impact selection — the changed area plus its interactions. Broad or unknown-impact changes (a major release, a core refactor) need full regression.

Q4: What makes a test a good automation candidate for regression, and what makes a poor one?

Good candidates: stable features, critical paths, high-risk areas, and tests that run on every build — the repetitive checks humans skip under pressure. Poor candidates: UI that changes frequently, and exploratory or investigation testing, where the cost of maintaining brittle automation outweighs the benefit.

Q5: Why must a regression suite be pruned, and what are the warning signs it needs attention?

An unpruned suite becomes a liability: it gets so slow teams skip it, which defeats its purpose. Warning signs: flaky tests eroding trust, tests for deleted features still running, duplicate tests covering the same path. Review at least quarterly — delete dead tests and fix or quarantine flaky ones.

Try It — Select the right regression strategy

A NZ insurance portal has a regression suite of 800 tests. Four different change scenarios have come in. For each one, choose the most appropriate regression strategy.

Change scenarioBest regression strategy
A typo was fixed in the "Thank you" confirmation email template — no code logic changed
The premium calculation engine was refactored — no new features, but significant internal changes
A new "add vehicle" feature was added to the policy management screen
Major release: payment gateway switched from Stripe to POLi + 40 other features shipped simultaneously