Test Management · Senior & Lead

Test Estimation

How long will testing take? How many people do you need? Estimation is a core lead skill — and being able to defend your numbers is as important as getting them right.

Senior Test Lead ISTQB CTAL-TM Ch. 2

1 The Hook

A Wellington fintech needs sign-off on a new payments release. The project manager corners the test lead in a stand-up: "Testing's quick, right? Call it two days." The lead, not wanting to look slow, nods. Two days goes in the plan.

Day three arrives and testing is nowhere near done. The environment was unstable for half of day one, the build had to be re-tested twice after fixes, and nobody had counted the regression suite. The release slips, and because the lead "owned" the estimate, the lead owns the blame — for a number someone else pulled out of the air.

The trap was accepting a figure with no working behind it. An estimate is not a guess you agree to under social pressure; it is a calculation you can show. Had the lead broken the work into tasks and put a number on each, "two days" would have been visibly wrong before it was ever written down — and the conversation would have been about evidence, not optimism.

2 The Rule

An estimate is only as good as the working behind it — break the testing into tasks, put a number on each (ideally a range, not a point), state your assumptions, and never accept someone else's figure for your own work.

3 The Analogy

Analogy

Getting a builder's quote for a deck.

A trustworthy builder quoting a deck in Tauranga does not just say "about five grand." They itemise: timber, fixings, labour days, council consent, a contingency for rotten piles they might find once they lift the old boards. You can read the quote, challenge a line, and see exactly what assumption the price rests on. The builder who says "five grand, trust me" is the one who hits you with a variation halfway through.

Test estimation is writing the itemised quote, not the back-of-the-napkin number. The work breakdown is your line items, the three-point range is your contingency for the rotten piles you cannot see yet, and the stated assumptions are the fine print that protects you when the ground turns out to be soft.

What it is

Test estimation is the process of predicting how much effort, time, and resources testing will require. No estimate is exact — the goal is a defensible range with known assumptions, not a precise number.

Good estimates account for: test design, test execution, defect management, test environment setup, reporting, and risk buffers.

Work Breakdown Structure (WBS)

Break the testing work into the smallest estimable tasks, estimate each, then sum them up. The WBS approach makes assumptions visible and allows estimates to be challenged at the task level.

WBS extract — login feature testing

Task	Hours
Review requirements and write test cases	3
Set up test environment and test data	2
Execute test cases (happy path)	1.5
Execute test cases (negative / boundary)	2
Exploratory testing session	1.5
Log and manage defects	1
Retest after fixes	1.5
Total estimate	12.5 hours

Three-point estimation

For each task, estimate three scenarios:

Optimistic (O) — best case, everything goes smoothly
Most Likely (M) — realistic expectation
Pessimistic (P) — worst case, things go wrong

Calculate the weighted average (PERT formula): E = (O + 4M + P) ÷ 6

This gives a more realistic estimate than single-point guessing and surfaces the uncertainty explicitly.

Metrics-based estimation

If you have historical data from similar projects, use it. Common metrics-based approaches:

Test cases per story point — if you average 3 test cases per story point and a sprint has 40 points, estimate ~120 test cases.
Time per test case — if each test case takes 30 minutes to design and 15 minutes to execute, multiply by count.
Defect rate — if historically 20% of test cases find defects and each defect takes 45 minutes to log/retest, factor it in.

Defending your estimate

Estimates will be challenged. Make yours defensible by:

Showing your WBS — "here are the tasks I’ve included"
Stating your assumptions — "this assumes the environment is ready and stable"
Giving a range, not a point — "8–12 hours depending on defect volume"
Tracking actuals — compare estimates to reality to improve future estimates

Never accept someone else’s estimate for your work. If a PM says "testing should take a day," ask what they’re basing that on. Your WBS is the evidence. A bad estimate accepted silently becomes your problem later.

ISTQB mapping

ISTQB reference

Ref	Topic	Level
CTAL-TM 2.3	Test estimation — effort, duration, resources	Lead
CTAL-TM 2.3	Estimation techniques: metrics-based, expert judgement, WBS, three-point	Lead

Practice this technique: Try Test Lead Practice 01 — Test strategy review.

← Defect Management Next: Regression Testing →

4 Now You Try

Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot: defend a challenged estimate

You estimated 14 hours to test a new IRD myIR payment screen. A project manager says, "That's too much — make it a day." Write the three things you would put on the table to defend your number, and explain why giving a single point figure of "8 hours" would be a weaker answer than a range.

Show model answer

Three things to put on the table:
1. The work breakdown structure (WBS) — the itemised list of tasks (test design, environment setup, execution, defect logging, retest, reporting) with hours against each. This is the evidence that the 14 hours is built from real work, not a feeling.
2. The assumptions — e.g. "this assumes the test environment is available and stable from day one, and that the build is delivered defect-light." If the PM wants a day, ask which assumption they want to change.
3. A range, not a point — "12 to 16 hours depending on defect volume." This signals honestly where the uncertainty is.

Why a range beats a single "8 hours": a single point implies a precision you do not have and sets you up to be "wrong" the moment reality differs by an hour. A range communicates the uncertainty truthfully, invites a conversation about what drives the high end (usually defect volume and environment stability), and protects you because you committed to a band, not a false certainty.

🔧 Exercise 2 of 3 — Fix: repair a flawed PERT calculation

A junior estimated a task with Optimistic = 2 h, Most Likely = 6 h, Pessimistic = 10 h and wrote: "PERT estimate = (2 + 6 + 10) ÷ 3 = 6 hours." The formula is wrong. Correct it, show the right calculation, and explain what the mistake does to the estimate.

Flawed working:
PERT estimate = (O + M + P) ÷ 3 = (2 + 6 + 10) ÷ 3 = 6 hours

Correct it:

Show model answer

Correct PERT formula: E = (O + 4M + P) ÷ 6

Correct calculation: (2 + 4×6 + 10) ÷ 6 = (2 + 24 + 10) ÷ 6 = 36 ÷ 6 = 6 hours

In this particular case the answer happens to be the same (6 hours) because the values are symmetric, but the working is wrong and would give a different answer for almost any other input.

What the mistake changes: a plain average (O + M + P) ÷ 3 treats best case, likely case and worst case as equally probable. PERT does not — it weights the Most Likely value four times as heavily, which is the whole point of the technique. With skewed inputs (say O=2, M=3, P=20) the plain average gives 8.3 h but PERT gives (2 + 12 + 20) ÷ 6 = 5.7 h. Using the wrong formula on skewed tasks systematically over- or under-estimates and discards PERT's main benefit: anchoring on the realistic case while still accounting for the tail risk.

🏗️ Exercise 3 of 3 — Build: a WBS estimate from scratch

A NZ council is releasing an online dog-registration renewal form (payment, owner details, multiple dogs per household). Build a work breakdown structure estimate: list at least six testing tasks, put an hour figure on each, give a total, and state two assumptions your estimate depends on.

Show model answer

A defensible WBS for the dog-registration renewal form:
- Review requirements and acceptance criteria, write test cases — 4 h
- Set up test environment and seed test data (owners, multiple dogs, payment accounts) — 2 h
- Execute happy-path tests (single dog renewal, payment success) — 2 h
- Execute negative / boundary tests (invalid card, multiple dogs, missing fields) — 3 h
- Exploratory session on the payment and multi-dog logic — 2 h
- Log and manage defects + retest after fixes — 3 h
- Regression check on the existing registration flow — 2 h
- Test report and sign-off — 1 h
Total estimate: ~19 hours (a sensible band would be 17–22 h)

Two assumptions the estimate depends on:
1. The test environment and payment sandbox are available and stable from day one.
2. The build is delivered defect-light — heavy defect volume on first execution would push the retest and defect-management line well past 3 hours, which is why the upper end of the range exists.

A senior would note the multi-dog-per-household rule is the riskiest area (combinations of dogs, part-payments, removals) and would weight exploratory and negative testing there, and would give the total as a range with the defect-volume assumption called out explicitly.

Self-Check

Click each question to reveal the answer.

Q1: Why should you never silently accept a PM's estimate for your own testing work?

Because the moment you accept it, you own it — including the blame when it proves wrong. A figure with no working behind it is a guess. Ask what it is based on, and put your work breakdown on the table as the evidence for your own number.

Q2: Write out the three-point (PERT) formula and explain why the Most Likely value is weighted.

E = (O + 4M + P) ÷ 6. The Most Likely value is multiplied by four because it is the realistic case and should dominate the estimate, while the optimistic and pessimistic values pull the result slightly toward whichever tail is longer. This anchors the estimate on reality while still accounting for uncertainty.

Q3: What makes a Work Breakdown Structure estimate more defensible than a single gut figure?

It exposes every task and its hours, so the estimate can be challenged at the line-item level rather than as a whole. If someone disagrees, they have to point at a specific task — which turns the argument into evidence about the work rather than opinion about the total.

Q4: When is metrics-based estimation the right choice, and what does it require?

When you have reliable historical data from similar past projects — test cases per story point, time per test case, defect rates. It is fast and objective, but it is only as good as your records; without trustworthy historical data it becomes guesswork dressed up as a metric.

Q5: Why should every estimate be accompanied by its assumptions?

Because the assumptions are the conditions the number depends on — "assumes a stable environment from day one." Stating them tells everyone immediately when the estimate needs revising (the moment an assumption is violated) and protects you when reality changes the conditions you priced against.

Try It — Three-point PERT estimation

A NZ e-commerce checkout feature needs testing. Three tasks have been broken down. Apply the PERT formula E = (O + 4M + P) ÷ 6 to calculate the weighted estimate for each task, then sum the total.

Task	Optimistic (O)	Most Likely (M)	Pessimistic (P)
Test case design for checkout flow	2	4	8
Execute checkout tests + defect logging	3	5	10
Retest after fixes	1	2	6
Total estimate (sum of E values)

PERT calculations

Task	Formula	E (hrs)
Test case design	(2 + 4×4 + 8) ÷ 6 = (2+16+8) ÷ 6 = 26 ÷ 6	4.33
Execute + log defects	(3 + 4×5 + 10) ÷ 6 = (3+20+10) ÷ 6 = 33 ÷ 6	5.5
Retest after fixes	(1 + 4×2 + 6) ÷ 6 = (1+8+6) ÷ 6 = 15 ÷ 6	2.5
Total estimate		12.33 hours

Answers within ±0.2 of each value are correct — rounding to 1 decimal place is fine in practice. Notice the pessimistic values pull the estimates above the "most likely" — this is intentional. PERT accounts for uncertainty. If everything goes perfectly you finish in 6 hours; if things go wrong you have 12+ hours buffered.