I Gave Claude Code My Entire QA Job

Claude Code shipped workflows. So I gave it my entire QA job.

Not "help me write a test case." The whole loop: read the requirement, design the coverage, run it on the platform, file the defect, report back with evidence. The kind of thing a QE does in a day, compressed into one skill you trigger with a sentence.

Here's the whole thing. 🧵

The setup

A QE's real job isn't writing test scripts. It's a sequence of decisions:

What changed, and what's the risk?
What's the smallest set of tests that covers that risk?
Which way do I actually run them?
Did it pass — and can I prove it?

Every one of those is judgment. None of them is "type the Selenium." So the question I cared about wasn't "can AI write a test" — it's "can AI hold the decisions and drive the platform?"

Turns out: yes, if you give it the workflow instead of the keystrokes.

The trick: don't automate the clicks, automate the decision tree

The old way to "automate QA" was to record clicks. Brittle, and it skips the only part that matters — the thinking.

The new way: write the decision tree down once, hand it to the agent, and let it route every task through it.

The whole skill is built on one decision — which lane runs this test:

   New behavior,    ┌──────────────────────────────────┐
   no code yet,  ─► │ LANE A — Manual + Run with AI    │
   exploratory      │ design cases → AI executes them  │
                    └──────────────────────────────────┘
   Automation       ┌──────────────────────────────────┐
   already in    ─► │ LANE B — Automated / TestCloud   │
   the repo         │ schedule the suite on the grid   │
                    └──────────────────────────────────┘
   Team lives    ─► ┌──────────────────────────────────┐
   in code,         │ LANE C — Playwright              │
   wants CI         │ real .spec.ts → results upload   │
                    └──────────────────────────────────┘

That's it. That's the brain. Everything else — requirement analysis, ISTQB coverage, suite building, reporting — hangs off this one branch.

What it actually does, start to finish

I type: "We shipped a new product-filter on the storefront. Cover it."

The agent:

Reads the intent. Pulls the requirement (Jira/Azure sync), or analyzes the one-liner. Spits out personas, main flow, alternate flows, negative flows, risk areas.

Designs the coverage — risk-based, not maximal. It picks real ISTQB techniques: equivalence partitions for filter values, boundary analysis for the price slider, a decision table for filter + stock + sort, state transitions for the result list. Then it writes a coverage note — what it's testing, what it's deliberately skipping, and why. The "why" is the part juniors skip and seniors live by.

Routes to a lane. New behavior, no automation yet → Lane A. It drafts human-readable cases, imports them into Katalon True Platform, links them to the requirement for traceability, builds a suite, picks the AUT environment, creates a manual run, and kicks off Run with AI — the platform's agent executes the cases against the live site.

Reports with evidence. Pass/fail/blocked counts. Each failure with the verbatim error, a screenshot, a trace. No "looks broken." If it can't produce the proof, it downgrades the claim. Then it offers to file the defect against the failed result.

One sentence in. A traceable, executed, evidenced run out.

The part I'm proud of: Playwright as a first-class lane

Most "AI QA" demos stop at the manual lane. But half the teams I talk to live in code — they want Playwright, cross-browser, CI gating merges.

Katalon has a real integration for this (@katalon/playwright-reporter): you run your actual @playwright/test specs anywhere, and the results — status, duration, screenshots, videos, traces, browser metadata — upload straight into True Platform's Test Runs.

So I made it Lane C. Same skill. The agent will:

npm install --save-dev @katalon/playwright-reporter
# wire the reporter into playwright.config.ts
KATALON_API_KEY=… KATALON_PROJECT_ID=… npx playwright test   # runs + uploads

And the best bit is the promotion pattern:

Explore the live site with a browser → cover new behavior fast in Lane A → once the happy paths are stable, port them to Lane C Playwright specs so CI gates every future merge.

One product. Three lanes. One place to read the results. The agent knows when to use which — and tells you when it's recommending a promotion vs a one-off.

What surprised me

Honesty beats capability. The most useful thing I wrote into the skill wasn't a feature — it was the boundary table. "You cannot create requirements in Katalon — they sync from Jira. Here's the workaround." "You cannot guarantee Run-with-AI finishes — report the blocked state and the exact fixture that's missing." An agent that knows what it can't do is worth ten that bluff.

The decision tree is the product. OpenAI called it harness engineering; in QA it's the same move. You're not writing tests anymore. You're building the harness that lets an agent decide which tests matter, run them, and know when the result can be trusted.

Fewer, stronger tests. Left alone, an agent will happily generate 200 shallow cases. The ISTQB guardrails — prefer fewer strong tests, mark P0 for revenue/checkout/data-loss — are what make the output a QE would actually sign off on.

Why workflows change this

Before workflows, this was a prompt you re-typed and re-tuned every time. Now it's a skill: written once, versioned, triggered by intent. The agent doesn't improvise the process — it follows it, and improvises only the judgment inside each step. That's the difference between a clever demo and something you'd let near a release.

I gave Claude Code my QA job. It didn't take it. It gave me back the eight hours I spent on the mechanical 80% — and handed me the 20% that's actually judgment.

That's the trade I'll take every time.

Get it running yourself

The whole thing is three moving parts, and two of them are public:

1. Connect the Katalon True Platform MCP server. This is what gives the agent the platform tools — create_test_case, read_auts, create_manual_ai_session, schedule_test_run, and the rest. It ships an OAuth authenticate flow; you connect it once and complete sign-in in the browser. (In Claude Code, that's a claude mcp add for the Katalon True server; the first tool call triggers the auth handshake.)

2. Install the Playwright reporter — only if you want Lane C:

npm install --save-dev @katalon/playwright-reporter
# wire the reporter into playwright.config.ts, then:
KATALON_API_KEY=… KATALON_PROJECT_ID=… npx playwright test   # runs + uploads

3. Give the agent the harness. This is the part that turns the tools into a QE. Drop it in as a skill (.claude/skills/katalon-trueplatform-testing/SKILL.md) or just paste it into the chat as your first message. It's the lane router, the ISTQB coverage guide, and the capability-boundary table — the actual brain, not a paraphrase:

You are a QA engineer driving Katalon True Platform. Given ANY testing task —
a Jira story, a one-line request ("cover the new checkout filter"), or "just
poke the live site" — figure out the lane, design risk-based coverage, run it
on the platform, and report back with evidence.

GOLDEN RULE
- Use Katalon MCP tools for anything on the platform (projects, requirements,
  cases, suites, runs, results, defects).
- Use a browser / Playwright for touching the live app (exploring it, or BEING
  the automation). Never scrape the platform with a browser when an MCP tool exists.

STEP 0 — 30-SECOND INTAKE (infer from context; ask only if truly ambiguous)
  1. What's the AUT (url / app)?         -> check the requirement, chat, or read_auts
  2. What changed / what's the intent?   -> read the requirement, or ask one line of scope
  3. Which lane (below)?                  -> default to Lane A for new/unknown behavior
  4. Where do results go?                 -> a True Platform project + repository
Then state the automation boundary in one line so expectations are honest, and go.

LANE ROUTING — the one decision that matters
  New behavior, no code yet, exploratory      -> LANE A  Manual + Run with AI
  Katalon automation already in the repo       -> LANE B  Automated / TestCloud
  Team lives in code, wants cross-browser + CI -> LANE C  Playwright
Combine when it helps: explore with a browser -> design in A -> once the happy
paths are stable, port them into C so CI gates every future merge. Say so when you do.

COVERAGE — the thinking, before the typing
- Get the requirement (find_requirements / read_requirement), or analyze the
  one-liner. You CANNOT create requirements in Katalon — they sync from Jira/Azure.
- Produce: intent · personas · main flow · alternate flows · negative flows ·
  data & env assumptions · risk areas.
- Design with ISTQB techniques, risk-based not maximal: equivalence partitioning,
  boundary values, decision tables, state transitions, scenario/use-case,
  error guessing, pairwise.
- Write a one-screen coverage note: techniques used · selected P0/P1/P2 cases ·
  what you're deliberately skipping and WHY. Mark P0 for revenue/checkout/data-loss.
- Prefer fewer strong tests over many shallow ones.

LANE A — Manual + Run with AI (default for new behavior)
  draft cases -> create_test_case -> link_requirements_to_test_case -> group into a suite
  -> read_auts (ALWAYS right before creating the run) -> create_manual_test_run
  -> create_manual_ai_session -> poll read_manual_ai_session until nothing is
  TODO / IN_TESTING -> report. Start Run-with-AI automatically unless told "manual only".

LANE B — Automated / TestCloud (existing Katalon automation)
  find_test_suites -> find_execution_profiles -> list_test_cloud_environments
  -> build_run_configuration -> schedule_test_run -> read_execution* .
  schedule_test_run is for automated suites only — never a single manual case.

LANE C — Playwright (code-first -> into True Platform)
  Install @katalon/playwright-reporter, wire it into playwright.config.ts, run
  with platform creds:
      npm install --save-dev @katalon/playwright-reporter
      KATALON_API_KEY=... KATALON_PROJECT_ID=... npx playwright test
  Results (status, duration, screenshots, videos, traces) upload into Test Runs.

CAPABILITY BOUNDARIES — say these out loud, up front
  - Can't create a requirement       -> create it in the ALM, sync, then link.
  - Can't guarantee Run-with-AI ends -> report the BLOCKED state + the exact
                                        fixture / account / env that's missing.
  - Can't file a defect with no fail -> only file on a real failed test-result ID.
  An agent that knows what it can't do beats one that bluffs.

REPORT — every run
  Run:      name · link · status
  Summary:  passed / failed / blocked / not-run
  Findings: each failed case -> concise reason, WITH screenshot / trace / error
  Defects:  created or recommended
  Next:     gaps, skipped items, manual follow-up
EVIDENCE DISCIPLINE: a failure claim carries its proof. No "looks broken" without
the artifact. If you can't produce the evidence, downgrade the claim.

AUTONOMY: run Analyze -> Cases -> Suite -> Run -> Execute -> Report without pausing
on optional decisions. Ask only when a required value can't be resolved safely, a
choice is genuinely 50/50, creds are missing, or the next action is destructive.
Otherwise pick the best match, say what you assumed, and keep moving.

That block is the entire decision tree. Paste it, connect the MCP, and type "we shipped a new product filter — cover it." The lane router does the rest.

Or just download the skill

If you run Claude Code, skip the copy-paste and grab the real thing — both skills, all the reference files, ready to drop into .claude/skills/:

⬇ Download the skill bundle (.zip) — katalon-trueplatform-testing/ (the lane router + ISTQB guide + boundary table + per-lane references) and auto-test/ (the one-command wrapper: Auto test <STORY-ID> → pull requirement → reconcile tests → run → HTML report).
Prefer to read before you download? The raw files: katalon-trueplatform-testing/SKILL.md · auto-test/SKILL.md

Unzip into .claude/skills/, connect the Katalon True MCP, and trigger it with /auto-test BSD-13 or just plain language. Not on Claude Code? The SKILL.md files are plain Markdown system prompts — paste the body into Codex, Cursor, or whatever agent you drive, and it works the same way.

It's built against Katalon True Platform (a public product) and the public @katalon/playwright-reporter package — nothing vendor-internal. Take it, change it, ship it.