Defect Management
From the moment you find a bug to the moment it’s confirmed fixed — defect management is the process that keeps bugs tracked, prioritised, and resolved systematically.
1 The Hook
A tester at a Hamilton software house finds a real problem: when a customer pays for an order, the money is debited but the order sometimes never gets created. They fire off a Slack message — “hey, payment thing is broken again” — and move on to the next screen.
Three weeks later it surfaces in a production incident. The developer asks: which payment method? Which browser? What was the order total? Was the customer logged in? Nobody knows. The message had no steps, no environment, no evidence, and was never logged as a defect, so it never reached triage and was never prioritised. A genuine money-losing bug was found and then quietly lost, because finding a defect is worthless if it is not captured, classified, and tracked through to a verified fix.
This is the pattern behind a surprising number of escaped defects: the bug was caught in testing and then leaked out of the process. A defect that lives in a chat message instead of a tracked report is a defect nobody owns — and an unowned defect ships.
2 The Rule
A defect is not managed until it is captured in a tracker with enough detail to reproduce it, classified by severity and priority, and moved through a defined lifecycle to a verified, closed state — finding the bug is only the first step.
3 The Analogy
An ACC claim, not a complaint to a mate.
If you hurt your back, telling a friend “my back’s sore” gets you sympathy and nothing else. Lodging an ACC claim is different: it records what happened, when, and how, gives it a claim number, routes it to the right assessor, tracks it through review and treatment, and closes it only when you are signed off. The structure is what turns “something is wrong” into “something is being dealt with, and we can see where it is up to.”
Defect management is lodging the claim, not moaning to a mate. A good bug report is the claim form — specific enough to act on without a phone call — and the lifecycle states are the claim status, from lodged to assigned to resolved to closed.
The defect lifecycle
| State | Who owns it | What happens |
|---|---|---|
| New | Tester | Bug reported, not yet reviewed |
| Assigned | Lead / Manager | Triaged, assigned to developer |
| Open / In Progress | Developer | Being investigated and fixed |
| Fixed | Developer | Code change made, ready for retest |
| Retest | Tester | Verify the fix works |
| Closed | Tester | Verified fixed in all affected environments |
| Rejected | Developer / Lead | Not a bug — by design, duplicate, or not reproducible |
| Deferred | Lead / PM | Won’t fix this release, scheduled for later |
Writing good bug reports
A good bug report lets a developer reproduce the issue without asking you anything. It contains:
- Title — one sentence: what breaks, where, and under what condition. Not "Login broken" — "Login fails with HTTP 500 when password contains special characters".
- Steps to reproduce — numbered, specific, reproducible steps from a known starting state.
- Expected result — what should happen.
- Actual result — what actually happens. Include error messages verbatim.
- Environment — browser, OS, version, test environment.
- Severity and priority — (see below).
- Evidence — screenshot, video, log snippet.
Severity vs priority
These are different dimensions. Both must be set, and they often conflict.
| Severity | Definition | Example |
|---|---|---|
| Critical | System crash, data loss, security breach | Payment processed but order not created |
| High | Major function broken, no workaround | Users can’t submit the main form |
| Medium | Function broken, workaround exists | Export to CSV fails but PDF works |
| Low | Minor issue, cosmetic, or rarely hit | Tooltip text has a typo |
Priority is a business decision: how urgently does this need to be fixed, relative to everything else? A low-severity cosmetic bug on the homepage might be high priority if a CEO demo is tomorrow. A high-severity crash in a rarely used report might be deferred.
Testers set severity; product/business sets priority. This distinction matters — don’t conflate them.
Root cause analysis
For high-severity or recurring defects, root cause analysis (RCA) asks not just "what broke" but "why." Common RCA techniques: 5 Whys, fishbone diagram, fault tree analysis. The goal: prevent the same class of bug recurring. RCA findings feed into process improvements and updated testing checklists.
Defect metrics
- Defect density — defects per unit of code or per feature area. Identifies unstable modules.
- Defect removal efficiency (DRE) — percentage of defects found before release vs total defects found (pre + post release). Target: > 90%.
- Mean time to detect (MTTD) — average time from defect introduction to detection.
- Defect escape rate — how many defects reach production. The key quality signal for leads.
ISTQB mapping
| Ref | Topic | Level |
|---|---|---|
| CTFL Ch. 5 | Defect management — defect reports, lifecycle, workflow | Foundation |
| CTAL-TA Ch. 6 | When defects can be detected, defect report fields, classification | Advanced |
| CTAL-TM Ch. 4 | Defect management process, metrics, root cause analysis | Lead |
Practice this technique: Try Test Lead Practice 08 — Defect triage, Test Lead Practice 09 — Coaching scenario.
4 Now You Try
Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.
A defect on a NZ supermarket online-shopping site: “On the checkout page, the ‘Pay now’ button is mislabelled ‘Pat now’. The Pak’nSave board is being shown the live site in a press demo tomorrow morning.” Assign a severity and a priority to this bug, and explain why they differ.
Show model answer
Severity: Low. It is a cosmetic typo — nothing is broken, no data is lost, payment still works. Priority: P1 (Urgent). A board press demo tomorrow makes a visible typo on the checkout page embarrassing and time-critical, so it must be fixed today. Why they differ: severity measures technical impact (how badly is the system broken?), priority measures business urgency (how soon must we fix it relative to everything else?). They are independent dimensions. This is the textbook low-severity / high-priority case: trivial bug, but a business deadline drives the urgency. Testers own severity; product/business owns priority.
The bug report below is the kind a developer cannot act on. Rewrite it into a proper report with a one-sentence title, numbered steps to reproduce, expected result, actual result, environment, and what evidence you would attach. Invent reasonable specifics for a NZ context.
“Login is broken. Doesn’t work for me. Pls fix.”
Rewrite as a complete bug report:
Show model answer
A complete report (specifics invented but plausible): Title: Login fails with HTTP 500 when the password contains an ampersand (&) on the customer portal. Steps to reproduce: 1. Go to https://portal.example.co.nz/login on Chrome. 2. Enter a valid registered email (e.g. test.user@example.co.nz). 3. Enter a password containing an ampersand, e.g. "Pass&word1". 4. Click "Log in". Expected result: User is authenticated and lands on the account dashboard. Actual result: Page returns an HTTP 500 error and a generic "Something went wrong" message. The user is not logged in. Reproduces every time with an ampersand in the password; logins without special characters succeed. Environment: Chrome 124 on Windows 11; UAT environment, build 2.14.0. Evidence to attach: screenshot of the 500 page, the server log snippet showing the unescaped-character exception, and a screen recording of the reproduction. Why this is better than the original: the original had no steps, no environment, no expected vs actual, and no evidence, so the developer would have to chase the tester for everything. A good report lets the developer reproduce the bug without asking a single question.
A high-severity defect at a NZ payroll provider: holiday-pay calculations are wrong for staff who changed hours mid-year, under-paying them. Walk the defect through the full lifecycle (New → Closed), naming who owns each state and what happens, then run a 5 Whys root cause analysis and name one process improvement that would stop this class of bug recurring.
Show model answer
Lifecycle walkthrough: - New — Tester: logs the defect with steps, expected vs actual, severity High, evidence (a worked example of an under-paid employee). - Assigned — Lead/Manager: triages, confirms severity and priority (compliance + staff under-payment makes it P1), assigns to a developer. - Open / In Progress — Developer: investigates and fixes the mid-year hours-change calculation. - Fixed — Developer: code change made, ready for retest; notes which build contains the fix. - Retest — Tester: confirms the original case now pays correctly, and runs a regression on the surrounding payroll logic. - Closed — Tester: verified fixed in all affected environments; defect closed with a note linking the fix. 5 Whys (example chain): 1. Why was holiday pay wrong? Because the calculation used the latest hours, not the average across the period. 2. Why did it use the latest hours? Because the requirement for mid-year hours changes was never specified. 3. Why was it never specified? Because the Holidays Act averaging rule was not captured in the acceptance criteria. 4. Why was it not captured? Because no one with payroll-compliance knowledge reviewed the story. 5. Why was there no compliance review? Because the team has no checklist step for legislative requirements. Root cause: a missing compliance-review step in the requirements process, not a coding slip. Process improvement: add a mandatory legislative/compliance review (and a Holidays Act test checklist) to the definition of done for any payroll-calculation story, and add boundary test cases for mid-year changes to the regression suite. The point of RCA is to prevent the class of bug, not just patch the one instance.
Self-Check
Click each question to reveal the answer.
Q1: Why is finding a bug not the same as managing a defect?
A defect is only managed once it is captured in a tracker with enough detail to reproduce it, classified by severity and priority, and moved through a defined lifecycle to a verified, closed state. A bug that lives in a chat message is unowned and untracked — it never reaches triage and tends to escape to production.
Q2: What is the single test of a good bug report?
A developer can reproduce the issue without asking you anything. That means a precise title, numbered steps from a known starting state, expected and actual results (with error messages verbatim), environment details, and evidence such as a screenshot, video, or log snippet.
Q3: How do severity and priority differ, and who owns each?
Severity is the technical impact — how badly the system is broken (crash, data loss vs cosmetic). Priority is the business urgency — how soon it must be fixed relative to everything else. They are independent; a low-severity typo can be high priority before a demo. Testers set severity; product/business sets priority.
Q4: What does the “Deferred” state mean, and how is it different from “Rejected”?
Deferred means it is a real defect but won’t be fixed this release — it is consciously scheduled for later. Rejected means it is not a defect at all: by design, a duplicate, or not reproducible. Deferred keeps the bug alive in the backlog; rejected closes it as a non-issue.
Q5: What is the purpose of root cause analysis, and when is it worth doing?
RCA asks “why” not just “what broke” — using techniques like 5 Whys or a fishbone diagram — so you can prevent the whole class of bug recurring, not just patch one instance. It is worth doing for high-severity or recurring defects, and its findings feed into process improvements and updated test checklists.
Try It — Severity or priority?
Five bugs have been reported on a NZ banking portal. Assign the correct Severity (Critical/High/Medium/Low) and Priority (P1-Urgent/P2-High/P3-Medium/P4-Low) to each.
| Bug | Severity | Priority |
|---|---|---|
| Payment is processed and money debited, but no confirmation email is sent and the transaction doesn’t appear in transaction history | ||
| Bank logo in the footer is pixelated — CEO demo to the board is tomorrow | ||
| Bulk payment CSV export fails — workaround is to export individual payments one by one | ||
| Session expires after 15 minutes with no warning — RBNZ security guidelines require 20-minute timeout with a warning | ||
| "Help" tooltip on the password field has a typo in a rarely visited screen |
Answers
| Bug | Severity | Priority | Why |
|---|---|---|---|
| Payment debited, no record | Critical | P1-Urgent | Data loss and financial integrity breach. Both are maximum — fix immediately. |
| Pixelated logo, CEO demo tomorrow | Low | P1-Urgent | Classic low-severity / high-priority. A cosmetic bug, but a business deadline makes it urgent today. |
| CSV export fails, workaround exists | Medium | P3-Medium | Function broken but not blocked. Workaround reduces urgency — plan for next sprint. |
| Session timeout non-compliant | High | P1-Urgent | Regulatory compliance failure (RBNZ). Severity is High, but compliance risk makes it P1. |
| Tooltip typo, rarely visited | Low | P4-Low | Cosmetic, low-traffic, no workaround needed. Backlog item. |
The key insight: the logo and compliance bugs both illustrate that severity and priority are independent. Business context drives priority — not just technical impact. Testers own severity; product/business owns priority.