Junior Automation · Core Skill

Running & Reading Results

Tests that run locally but not in CI are vanity tests. Learn how to execute suites, read failures like a detective, and turn raw output into actionable fixes.

Junior Automation ISTQB CTAL-TAE v2.0 — Chapters 5 & 6 ~10 min read + exercise

1 The Hook — Why This Matters

A Christchurch e-commerce team had 200 automated tests. Every morning, a junior engineer ran them manually from their IDE and pasted screenshots of the green bar into Slack. When they finally integrated the suite into CI, forty-seven tests failed — not because the code was broken, but because the tests had never been run headless, in a clean environment, from the command line.

Running tests from the CLI is the gateway to automation actually being automated. If you only know the green Play button in your IDE, you're not an automation engineer yet.

2 The Rule — The One-Sentence Version

A test you cannot run from the command line with a single command is a test that does not exist in CI.

CI servers don't have IDEs. They have terminals, environment variables, and exit codes. Your tests must be executable without human intervention, and their results must be interpretable by both humans and machines.

3 The Analogy — Think Of It Like...

Analogy

A recipe that only works when the head chef cooks it.

The restaurant can't scale if every dish requires the chef's personal touch. A good recipe works in any kitchen with standard equipment. A good test runs in any environment with standard commands. Headless execution, clean output, and documented CLI flags are your recipe's standard equipment.

4 Watch Me Do It — Step by Step

Here is how to run, filter, and report tests across the three most common frameworks.

  1. Run the full suite
    # pytest
    pytest tests/ -v --tb=short
    
    # Jest
    jest --verbose
    
    # Playwright
    npx playwright test
  2. Filter to specific tests
    # pytest: by keyword
    pytest -k "login or logout"
    
    # Playwright: by tag
    npx playwright test --grep "@smoke"
    
    # Maven: by class and method
    mvn test -Dtest=LoginTest#shouldDisplayError
  3. Generate an HTML report
    # pytest-html
    pytest tests/ --html=report.html --self-contained-html
    
    # Playwright built-in
    npx playwright test --reporter=html,line,junit
  4. Read a failureStart with the assertion message (expected vs. actual), then the stack trace (which line in your code failed), then screenshots or traces.
Exit codes decoded
Exit CodeMeaning
0All tests passed
1Tests failed or collection error
2pytest: Execution interrupted (Ctrl+C)
3pytest: Internal error
4pytest: Usage error (bad CLI args)
Pro tip: Tag your tests with markers like @smoke, @regression, and @flaky. This lets CI run smoke tests on every PR (fast feedback) while scheduling the full regression suite nightly. Without tags, every PR takes 40 minutes and developers start skipping CI.

5 When to Use It / When NOT to Use It

✅ Generate reports when...

  • Running in CI for stakeholder visibility
  • Debugging flaky tests (trace + screenshots)
  • Auditing coverage for compliance
  • Sharing results with non-technical team members

❌ Skip reports when...

  • Running a single test locally during development
  • You need immediate terminal feedback
  • CI already parses JUnit XML

Before you apply this technique, ask:

  • Can you run tests locally and reproduce the failure?
  • Are you checking both exit codes and test output?
  • Does your CI report show where tests failed, not just how many?
 — DON'T DO THIS -->

6 Common Mistakes — Don't Do This

🚫 Ignoring exit codes in CI

I used to think: If the console output says "PASSED," the CI step succeeded.
Actually: CI systems decide pass/fail based on exit codes, not console text. If your test runner returns 0 even on failure (some misconfigured wrappers do this), CI will mark the build green while your tests are red. Always verify: echo $? after a local run.

🚫 Running the full suite for every change

I used to think: More tests running means more confidence.
Actually: A 40-minute PR pipeline trains developers to bypass it. Use filtering: run smoke tests on PR, full regression nightly. Fast feedback beats exhaustive feedback if developers ignore the slow one.

🚫 Not archiving artifacts on failure

I used to think: The failure message in the console is enough to debug.
Actually: "Element not found" tells you nothing about why. Screenshots, trace files, and HAR logs show the actual page state at failure. Configure your CI to upload these on failure only — it saves storage and gives you forensics.

When this technique fails

If your test passes locally but fails in CI, you're not seeing the real error. CI runs in different browser versions, on different hardware, with different timings. Exit code 1 means failures, but you need the HTML report and screenshots to know what actually broke.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Scenario: Your CI pipeline runs pytest tests/ --tb=short. The build is marked as passed, but when you open the HTML report, you see three tests failed and then passed on retry. No one on the team has investigated the original failures.

What is the hidden risk, and what should your team do?

The hidden risk:

Flaky tests that pass on retry mask real instability. The three failures could be timing issues that will eventually fail consistently — possibly in production. The team's behaviour has normalized flakiness. The fix: treat every retry as a bug. Investigate root cause (selector stability, test data isolation, environment drift), fix it, and consider using --fail-on-flaky-tests (Playwright) or similar flags to prevent retry masking.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to practice.

Q1. Why is pytest -k "login" useful?

It filters the test suite to run only tests whose names contain "login." This gives fast feedback when you're working on authentication logic without running the entire suite.

Q2. What does exit code 1 mean from a test runner?

At least one test failed, or there was a test collection error. CI systems treat exit code 1 as a failure and will block the build.

Q3. What three things should you check when a UI test fails?

1) The assertion message (expected vs. actual), 2) The stack trace (which line in your code failed), 3) Screenshots or trace files (what did the page actually look like at the moment of failure).

Interview prep

Interview questions explore how you debug, interpret results, and handle CI-specific issues.

Q1: Your test passes locally with `npm test` but fails in CI with exit code 1. What's the first thing you check?
Reproduced the failure using the exact CI command locally: `npm run test:ci`. This runs tests in the same environment and configuration as CI. If it still fails, it's a real bug. If it passes, it's a CI-specific issue (different browser version, network, timing).
Q2: Your Playwright test report shows 'Test flaky: failed then passed on retry.' You're running CI with automatic retries. Should you merge?
No. Flakiness is a real bug that retries are hiding. Investigate why it's flaky: is a selector timing out? Is a network call slow? Is a mock inconsistent? Fix the root cause, don't rely on retries. Retries mask instability.
Q3: A test fails in the HTML report, but you need to know exactly what assertion broke. Where do you look?
Open the HTML report, find the failed test, and look at the failure screenshot. Playwright also shows the line number of the failing assertion and often the actual vs expected values. If that's not enough, check the test trace file for a frame-by-frame replay.
Q4: Your Trade Me checkout test reports exit code 0 but the team says 'the checkout feature is broken.' How?
Exit code 0 means all tests passed. But if your test coverage is incomplete or your assertions are weak (not actually checking the right behaviour), a broken feature can hide. Review what your test actually asserts—is it checking that money moved, that an order was created, or just that a button clicked?