Metrics & Dashboards
What gets measured gets managed. But what gets gamed gets useless. Learn how to build metrics that drive real behaviour change.
1 The Hook — Why This Matters
A large NZ bank tracked "automation coverage percentage" as a KPI for every team. By quarter three, teams had achieved 90% coverage. They had also written hundreds of tests with no assertions, duplicated test logic to inflate line counts, and disabled flaky tests rather than fixing them. The metric looked green. The quality was red. When a production incident investigation revealed that a covered path had no real validation, the KPI was scrapped and the test lead was reassigned.
Bad metrics are worse than no metrics. They incentivise the wrong behaviour and give leadership false confidence. Good metrics are tied to outcomes, resistant to gaming, and paired with accountability.
2 The Rule — The One-Sentence Version
Every metric must answer: What is the trend? What is the threshold? Who acts on it? What action do they take?
If you can't answer all four questions, you don't have a metric. You have a number. Numbers don't change behaviour. Metrics with owners and playbooks do.
3 The Analogy — Think Of It Like...
A speedometer with no speed limit signs and no brakes.
You know you're going 120 km/h, but you don't know if that's too fast, who should slow down, or how. A dashboard without thresholds, owners, and actions is just a speedometer in a car with no brakes. Useful for curiosity. Useless for safety.
4 Watch Me Do It — Step by Step
Here is how to design metrics that drive improvement instead of gaming.
- Distinguish leading from lagging indicators
Type Definition Examples Leading Predict future performance Test coverage %, build time, flake rate, PR cycle time Lagging Measure past outcomes Production defects, escape rate, customer-reported bugs Optimise leading indicators to influence lagging outcomes. If flake rate rises (leading), expect escape rate to follow (lagging) unless corrected.
- Track DORA metrics (2024-2026)
Metric Elite Benchmark Deployment Frequency Multiple times/day Lead Time for Changes <1 hour Failed Deployment Recovery Time <1 hour Change Failure Rate 0-5% Rework Rate Minimal unplanned bug-fix deploys - Prevent metric gaming
Anti-Pattern Fix Coverage gaming (no assertions) Require meaningful assertions; measure critical path coverage Flaky test suppression Track "disabled test count"; require justification Deploy frequency inflation Correlate with story points or value delivered Escape rate hiding Tie defect logging to support tickets; independent audit
5 When to Use It / When NOT to Use It
✅ Build dashboards when...
- Teams need shared visibility into quality
- Leadership asks for automation status
- You need to identify trends before they become crises
- Multiple teams contribute to the same pipeline
❌ Skip dashboards when...
- The team is <3 people (conversation is faster)
- No one has time to act on the data
- Metrics would be used punitively rather than supportively
6 Common Mistakes — Don't Do This
🚫 Incentivising metric achievement over quality
I used to think: Setting ambitious targets motivates teams.
Actually: People optimise for what you measure. If you reward coverage %, you'll get worthless tests. If you reward speed, you'll get skipped validation. Pair every metric with a quality guardrail and an independent audit mechanism.
🚫 Dashboards without owners
I used to think: Publishing metrics is enough.
Actually: A metric with no owner is a decoration. Every metric needs a named person who reviews it, explains anomalies, and triggers action when thresholds breach. Without ownership, metrics become wallpaper.
🚫 Too many metrics
I used to think: More data means more insight.
Actually: Cognitive overload means less action. Focus on 5-7 metrics maximum. Three leading, three lagging, one strategic. Anything more dilutes attention and creates analysis paralysis.
7 Now You Try — Interview Warm-Up
Scenario: Management introduces a bonus tied to "automation coverage percentage." Within two sprints, coverage jumps from 45% to 82%. You inspect the new tests and find most have no assertions, many are duplicates with different names, and several test paths that don't exist in production.
What do you do?
Your action plan:
- Immediate: Halt the bonus program before more damage is done. Document the gaming behaviour with specific examples.
- Replace the metric: Switch from "coverage %" to "meaningful coverage of critical paths" with automated assertion quality checks (e.g., SonarQube rules).
- Add guardrails: Require code review for all tests. Flag tests with no assertions or duplicate logic in CI.
- Educate leadership: Explain that metrics drive behaviour. The wrong metric creates the wrong behaviour. Propose a balanced scorecard: coverage + flake rate + escape rate + deployment frequency.
8 Self-Check — Can You Actually Do This?
Click each question to reveal the answer. If you got all three, you're ready to practice.
Q1. What is the difference between a leading and lagging indicator?
Leading indicators predict future performance (e.g., flake rate, coverage growth). Lagging indicators measure past outcomes (e.g., production defects, escape rate). Optimise leading indicators to influence lagging outcomes.
Q2. What are the five DORA metrics?
Deployment Frequency, Lead Time for Changes, Failed Deployment Recovery Time (MTTR), Change Failure Rate, and Rework Rate. These measure both speed and stability of software delivery.
Q3. How do you prevent metric gaming?
Pair metrics with quality guardrails, require independent audit, tie rewards to outcomes not activities, and use multiple balanced metrics rather than a single target. Never incentivise a metric without verifying the behaviour it produces.