Test Lead Automation · Measurement

Metrics & Dashboards

What gets measured gets managed. But what gets gamed gets useless. Learn how to build metrics that drive real behaviour change.

Test Lead Automation ISTQB CTAL-TAE v2.0 — Chapter 6 ~12 min read + exercise

1 The Hook — Why This Matters

A large NZ bank tracked "automation coverage percentage" as a KPI for every team. By quarter three, teams had achieved 90% coverage. They had also written hundreds of tests with no assertions, duplicated test logic to inflate line counts, and disabled flaky tests rather than fixing them. The metric looked green. The quality was red. When a production incident investigation revealed that a covered path had no real validation, the KPI was scrapped and the test lead was reassigned.

Bad metrics are worse than no metrics. They incentivise the wrong behaviour and give leadership false confidence. Good metrics are tied to outcomes, resistant to gaming, and paired with accountability.

2 The Rule — The One-Sentence Version

Every metric must answer: What is the trend? What is the threshold? Who acts on it? What action do they take?

If you can't answer all four questions, you don't have a metric. You have a number. Numbers don't change behaviour. Metrics with owners and playbooks do.

3 The Analogy — Think Of It Like...

Analogy

A speedometer with no speed limit signs and no brakes.

You know you're going 120 km/h, but you don't know if that's too fast, who should slow down, or how. A dashboard without thresholds, owners, and actions is just a speedometer in a car with no brakes. Useful for curiosity. Useless for safety.

4 Watch Me Do It — Step by Step

Here is how to design metrics that drive improvement instead of gaming.

  1. Distinguish leading from lagging indicators
    TypeDefinitionExamples
    LeadingPredict future performanceTest coverage %, build time, flake rate, PR cycle time
    LaggingMeasure past outcomesProduction defects, escape rate, customer-reported bugs

    Optimise leading indicators to influence lagging outcomes. If flake rate rises (leading), expect escape rate to follow (lagging) unless corrected.

  2. Track DORA metrics (2024-2026)
    MetricElite Benchmark
    Deployment FrequencyMultiple times/day
    Lead Time for Changes<1 hour
    Failed Deployment Recovery Time<1 hour
    Change Failure Rate0-5%
    Rework RateMinimal unplanned bug-fix deploys
  3. Prevent metric gaming
    Anti-PatternFix
    Coverage gaming (no assertions)Require meaningful assertions; measure critical path coverage
    Flaky test suppressionTrack "disabled test count"; require justification
    Deploy frequency inflationCorrelate with story points or value delivered
    Escape rate hidingTie defect logging to support tickets; independent audit
Pro tip: The "So What?" test for every dashboard widget: 1) What is the trend? 2) What is the threshold? 3) Who acts on this? 4) What action do they take? If any answer is missing, remove the widget.

5 When to Use It / When NOT to Use It

✅ Build dashboards when...

  • Teams need shared visibility into quality
  • Leadership asks for automation status
  • You need to identify trends before they become crises
  • Multiple teams contribute to the same pipeline

❌ Skip dashboards when...

  • The team is <3 people (conversation is faster)
  • No one has time to act on the data
  • Metrics would be used punitively rather than supportively

6 Common Mistakes — Don't Do This

🚫 Incentivising metric achievement over quality

I used to think: Setting ambitious targets motivates teams.
Actually: People optimise for what you measure. If you reward coverage %, you'll get worthless tests. If you reward speed, you'll get skipped validation. Pair every metric with a quality guardrail and an independent audit mechanism.

🚫 Dashboards without owners

I used to think: Publishing metrics is enough.
Actually: A metric with no owner is a decoration. Every metric needs a named person who reviews it, explains anomalies, and triggers action when thresholds breach. Without ownership, metrics become wallpaper.

🚫 Too many metrics

I used to think: More data means more insight.
Actually: Cognitive overload means less action. Focus on 5-7 metrics maximum. Three leading, three lagging, one strategic. Anything more dilutes attention and creates analysis paralysis.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Scenario: Management introduces a bonus tied to "automation coverage percentage." Within two sprints, coverage jumps from 45% to 82%. You inspect the new tests and find most have no assertions, many are duplicates with different names, and several test paths that don't exist in production.

What do you do?

Your action plan:

  1. Immediate: Halt the bonus program before more damage is done. Document the gaming behaviour with specific examples.
  2. Replace the metric: Switch from "coverage %" to "meaningful coverage of critical paths" with automated assertion quality checks (e.g., SonarQube rules).
  3. Add guardrails: Require code review for all tests. Flag tests with no assertions or duplicate logic in CI.
  4. Educate leadership: Explain that metrics drive behaviour. The wrong metric creates the wrong behaviour. Propose a balanced scorecard: coverage + flake rate + escape rate + deployment frequency.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to practice.

Q1. What is the difference between a leading and lagging indicator?

Leading indicators predict future performance (e.g., flake rate, coverage growth). Lagging indicators measure past outcomes (e.g., production defects, escape rate). Optimise leading indicators to influence lagging outcomes.

Q2. What are the five DORA metrics?

Deployment Frequency, Lead Time for Changes, Failed Deployment Recovery Time (MTTR), Change Failure Rate, and Rework Rate. These measure both speed and stability of software delivery.

Q3. How do you prevent metric gaming?

Pair metrics with quality guardrails, require independent audit, tie rewards to outcomes not activities, and use multiple balanced metrics rather than a single target. Never incentivise a metric without verifying the behaviour it produces.