Running & Reading Results
Tests that run locally but not in CI are vanity tests. Learn how to execute suites, read failures like a detective, and turn raw output into actionable fixes.
1 The Hook — Why This Matters
A Christchurch e-commerce team had 200 automated tests. Every morning, a junior engineer ran them manually from their IDE and pasted screenshots of the green bar into Slack. When they finally integrated the suite into CI, forty-seven tests failed — not because the code was broken, but because the tests had never been run headless, in a clean environment, from the command line.
Running tests from the CLI is the gateway to automation actually being automated. If you only know the green Play button in your IDE, you're not an automation engineer yet.
2 The Rule — The One-Sentence Version
A test you cannot run from the command line with a single command is a test that does not exist in CI.
CI servers don't have IDEs. They have terminals, environment variables, and exit codes. Your tests must be executable without human intervention, and their results must be interpretable by both humans and machines.
3 The Analogy — Think Of It Like...
A recipe that only works when the head chef cooks it.
The restaurant can't scale if every dish requires the chef's personal touch. A good recipe works in any kitchen with standard equipment. A good test runs in any environment with standard commands. Headless execution, clean output, and documented CLI flags are your recipe's standard equipment.
4 Watch Me Do It — Step by Step
Here is how to run, filter, and report tests across the three most common frameworks.
- Run the full suite
# pytest pytest tests/ -v --tb=short # Jest jest --verbose # Playwright npx playwright test - Filter to specific tests
# pytest: by keyword pytest -k "login or logout" # Playwright: by tag npx playwright test --grep "@smoke" # Maven: by class and method mvn test -Dtest=LoginTest#shouldDisplayError - Generate an HTML report
# pytest-html pytest tests/ --html=report.html --self-contained-html # Playwright built-in npx playwright test --reporter=html,line,junit - Read a failureStart with the assertion message (expected vs. actual), then the stack trace (which line in your code failed), then screenshots or traces.
| Exit Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | Tests failed or collection error |
| 2 | pytest: Execution interrupted (Ctrl+C) |
| 3 | pytest: Internal error |
| 4 | pytest: Usage error (bad CLI args) |
@smoke, @regression, and @flaky. This lets CI run smoke tests on every PR (fast feedback) while scheduling the full regression suite nightly. Without tags, every PR takes 40 minutes and developers start skipping CI.5 When to Use It / When NOT to Use It
✅ Generate reports when...
- Running in CI for stakeholder visibility
- Debugging flaky tests (trace + screenshots)
- Auditing coverage for compliance
- Sharing results with non-technical team members
❌ Skip reports when...
- Running a single test locally during development
- You need immediate terminal feedback
- CI already parses JUnit XML
Before you apply this technique, ask:
- Can you run tests locally and reproduce the failure?
- Are you checking both exit codes and test output?
- Does your CI report show where tests failed, not just how many?
6 Common Mistakes — Don't Do This
🚫 Ignoring exit codes in CI
I used to think: If the console output says "PASSED," the CI step succeeded.
Actually: CI systems decide pass/fail based on exit codes, not console text. If your test runner returns 0 even on failure (some misconfigured wrappers do this), CI will mark the build green while your tests are red. Always verify: echo $? after a local run.
🚫 Running the full suite for every change
I used to think: More tests running means more confidence.
Actually: A 40-minute PR pipeline trains developers to bypass it. Use filtering: run smoke tests on PR, full regression nightly. Fast feedback beats exhaustive feedback if developers ignore the slow one.
🚫 Not archiving artifacts on failure
I used to think: The failure message in the console is enough to debug.
Actually: "Element not found" tells you nothing about why. Screenshots, trace files, and HAR logs show the actual page state at failure. Configure your CI to upload these on failure only — it saves storage and gives you forensics.
When this technique fails
If your test passes locally but fails in CI, you're not seeing the real error. CI runs in different browser versions, on different hardware, with different timings. Exit code 1 means failures, but you need the HTML report and screenshots to know what actually broke.
7 Now You Try — Interview Warm-Up
Scenario: Your CI pipeline runs pytest tests/ --tb=short. The build is marked as passed, but when you open the HTML report, you see three tests failed and then passed on retry. No one on the team has investigated the original failures.
What is the hidden risk, and what should your team do?
The hidden risk:
Flaky tests that pass on retry mask real instability. The three failures could be timing issues that will eventually fail consistently — possibly in production. The team's behaviour has normalized flakiness. The fix: treat every retry as a bug. Investigate root cause (selector stability, test data isolation, environment drift), fix it, and consider using --fail-on-flaky-tests (Playwright) or similar flags to prevent retry masking.
8 Self-Check — Can You Actually Do This?
Click each question to reveal the answer. If you got all three, you're ready to practice.
Q1. Why is pytest -k "login" useful?
It filters the test suite to run only tests whose names contain "login." This gives fast feedback when you're working on authentication logic without running the entire suite.
Q2. What does exit code 1 mean from a test runner?
At least one test failed, or there was a test collection error. CI systems treat exit code 1 as a failure and will block the build.
Q3. What three things should you check when a UI test fails?
1) The assertion message (expected vs. actual), 2) The stack trace (which line in your code failed), 3) Screenshots or trace files (what did the page actually look like at the moment of failure).