mirror of
https://github.com/EveryInc/compound-engineering-plugin.git
synced 2026-06-19 15:41:46 +02:00
239 lines
13 KiB
Markdown
239 lines
13 KiB
Markdown
# `ce-test-browser`
|
|
|
|
> Run end-to-end browser tests on pages affected by current PR or branch — uses `agent-browser` exclusively.
|
|
|
|
`ce-test-browser` is the **end-to-end browser testing** skill. It maps changed files to testable routes, starts (or verifies) the dev server, navigates to each affected page via `agent-browser`, captures snapshots and screenshots, exercises critical interactions, pauses for human verification on flows that require external interaction (OAuth, email, payments, SMS), and produces a structured test summary. Headed mode lets you watch tests run; headless is faster and runs in the background.
|
|
|
|
---
|
|
|
|
## TL;DR
|
|
|
|
| Question | Answer |
|
|
|----------|--------|
|
|
| What does it do? | Maps changed files to routes, navigates each via agent-browser, captures snapshots and screenshots, asks for human verification on external-flow steps |
|
|
| When to use it | After UI changes, before opening a PR, when verifying page behavior on a branch or PR |
|
|
| What it produces | Per-page status table, console errors, human verifications confirmed, screenshots, overall result (PASS / FAIL / PARTIAL) |
|
|
| Modes | Manual (default; user controls server), Pipeline (`mode:pipeline` — auto-starts server, scans for free port) |
|
|
|
|
---
|
|
|
|
## The Problem
|
|
|
|
End-to-end browser testing is fragmented across tools and easy to skip:
|
|
|
|
- **Wrong browser tool** — Playwright, Puppeteer, MCP Chrome, IDE built-ins; each works differently
|
|
- **Manual test mapping** — figuring out "which routes did this PR affect" is its own task
|
|
- **Server orchestration** — tests fail because the dev server wasn't running, or the wrong port, or stale state
|
|
- **Console errors silently slip through** — the page renders fine but JS errors pile up unnoticed
|
|
- **External flows skipped** — OAuth, payments, email delivery need a human; without a structured pause, they get marked "pass" without actually being checked
|
|
- **No artifact** — screenshots end up in the developer's filesystem, not the PR description
|
|
|
|
## The Solution
|
|
|
|
`ce-test-browser` runs end-to-end tests as a structured flow:
|
|
|
|
- **`agent-browser` exclusively** — one tool, predictable behavior; never falls back to Chrome MCP or IDE-specific browser tools
|
|
- **File-to-route mapping** translates changed files into the URLs that need testing
|
|
- **Server orchestration** — manual mode requires the user-started server; pipeline mode auto-starts and scans for a free port
|
|
- **Per-page test loop** — navigate, snapshot, verify elements, exercise critical interactions, capture screenshots
|
|
- **Human verification step** for flows that require external interaction
|
|
- **Failure handling asks how to proceed** — fix now (debug + retest) or skip (continue)
|
|
- **Structured test summary** suitable for PR descriptions
|
|
|
|
---
|
|
|
|
## What Makes It Novel
|
|
|
|
### 1. `agent-browser` exclusively
|
|
|
|
The skill enforces a single browser-automation substrate: **the `agent-browser` CLI**. Not Chrome MCP, not IDE built-ins, not alternative browser-control tools. Specific reasons:
|
|
|
|
- Predictable behavior — one tool's quirks, not three
|
|
- Same commands work in headed and headless modes
|
|
- Same snapshot/click/screenshot pattern across all tests
|
|
- Platform-specific hints (e.g., "in Claude Code, do not use `mcp__claude-in-chrome__*`") are explicit
|
|
|
|
When `agent-browser` isn't installed, the skill stops with `/ce-setup` as the install path — it doesn't try to fall back.
|
|
|
|
### 2. File-to-route mapping table
|
|
|
|
Mapping changed files to URLs that need testing is a recurring task. The skill carries an explicit mapping table:
|
|
|
|
| File pattern | Routes |
|
|
|--------------|--------|
|
|
| `app/views/users/*` | `/users`, `/users/:id`, `/users/new` |
|
|
| `app/controllers/settings_controller.rb` | `/settings` |
|
|
| `app/javascript/controllers/*_controller.js` | Pages using that Stimulus controller |
|
|
| `app/components/*_component.rb` | Pages rendering that component |
|
|
| `app/views/layouts/*` | All pages (test homepage at minimum) |
|
|
| `app/assets/stylesheets/*` | Visual regression on key pages |
|
|
| `src/app/*` _(Next.js)_ | Corresponding routes |
|
|
| `src/components/*` | Pages using those components |
|
|
|
|
This is a starting point, not exhaustive — the skill applies judgment for project-specific layouts.
|
|
|
|
### 3. Two modes — Manual (default) and Pipeline
|
|
|
|
| Mode | Server | Port | Browser default |
|
|
|------|--------|------|-----------------|
|
|
| **Manual** _(default)_ | User-started | Use preferred port as-is; user controls | Asks: headed or headless |
|
|
| **Pipeline** _(`mode:pipeline`)_ | Auto-started in background | Scans for free port; never assumes 3000 is free | Defaults to headless |
|
|
|
|
Pipeline mode exists for LFG and other automated runners where multiple agents may be on the same machine and 3000 might be claimed.
|
|
|
|
### 4. Port detection cascade
|
|
|
|
The preferred port comes from a priority list:
|
|
|
|
1. Explicit argument (`--port 5000`)
|
|
2. Project instructions (`AGENTS.md`, `CLAUDE.md`)
|
|
3. `package.json` (dev/start scripts)
|
|
4. Environment files (`.env`, `.env.local`, `.env.development`)
|
|
5. Default `3000`
|
|
|
|
In pipeline mode, the skill verifies that port is actually free and scans upward if not. In manual mode, it uses the preferred port as-is — the user controls their own server.
|
|
|
|
### 5. Headed vs headless choice
|
|
|
|
In manual mode, the skill asks whether to run **headed** (visible browser, watch tests run) or **headless** (faster, runs in background). Headed mode is useful when you're iterating on a tricky interaction and need to see what's happening. Headless is faster for routine sweeps.
|
|
|
|
### 6. Human verification for external flows
|
|
|
|
Some flows can't be automated:
|
|
|
|
| Flow | What human verification asks |
|
|
|------|------------------------------|
|
|
| OAuth | "Please sign in with [provider] and confirm it works" |
|
|
| Email | "Check your inbox for the test email and confirm receipt" |
|
|
| Payments | "Complete a test purchase in sandbox mode" |
|
|
| SMS | "Verify you received the SMS code" |
|
|
| External APIs | "Confirm the [service] integration is working" |
|
|
|
|
The skill pauses with a blocking question, the user does the thing, then answers yes (continue) or no (describe issue). External flows become explicit rather than silently skipped.
|
|
|
|
### 7. Failure handling — fix now or skip
|
|
|
|
When a route fails (console error, missing element, broken interaction), the skill captures error state (screenshot + reproduction steps) and asks: fix now (debug, propose fix, retest) or skip (continue testing other pages). Either path is valid; the choice is explicit.
|
|
|
|
### 8. Structured test summary
|
|
|
|
After all routes are tested, a markdown summary lands:
|
|
|
|
- Test scope (PR / branch)
|
|
- Server URL
|
|
- Per-route status table (Pass / Fail / Skip with notes)
|
|
- Console errors found
|
|
- Human verifications completed
|
|
- Failures (route + issue description)
|
|
- Overall result (PASS / FAIL / PARTIAL)
|
|
|
|
Suitable for pasting into a PR description as test evidence.
|
|
|
|
---
|
|
|
|
## Quick Example
|
|
|
|
You finish a notification settings page and a layout change. You invoke `/ce-test-browser`.
|
|
|
|
The skill verifies `agent-browser` is installed. Asks whether to run headed or headless — you pick headed (you want to watch). Determines test scope from `git diff --name-only main...HEAD`: `app/views/layouts/application.html.erb`, `app/views/settings/notifications.html.erb`, `app/javascript/controllers/notification_toggle_controller.js`.
|
|
|
|
Maps to routes: `/` (layout change affects every page; test homepage), `/settings/notifications` (the new page), and other pages that render the toggle controller. Detects port 3000 from `bin/dev` config; verifies the user's dev server is running on that port.
|
|
|
|
Tests each route: opens with `agent-browser open`, calls `agent-browser snapshot -i` for the interactive element list, verifies primary content rendered. Takes screenshots. Exercises the toggle on `/settings/notifications` (`agent-browser click @e3`).
|
|
|
|
The settings flow includes an OAuth sign-in step in this app — when the test reaches a protected route, the skill pauses for human verification: "Please sign in with Google and confirm the redirect back works." You do it on the visible browser; answer yes.
|
|
|
|
All routes pass. Summary surfaces: 4 routes tested, 0 console errors, 1 human verification confirmed, overall PASS.
|
|
|
|
---
|
|
|
|
## When to Reach For It
|
|
|
|
Reach for `ce-test-browser` when:
|
|
|
|
- You changed views, components, controllers, layouts, or stylesheets and want to verify pages still work
|
|
- You want to exercise the actual UI before opening the PR
|
|
- The change touches OAuth, payments, or other external flows that need human-in-the-loop verification
|
|
- You want test evidence (per-page status + screenshots) for the PR description
|
|
|
|
Skip `ce-test-browser` when:
|
|
|
|
- The change is backend-only (no observable browser-visible behavior)
|
|
- `agent-browser` isn't installed → run `/ce-setup` first
|
|
- You want unit / integration tests, not E2E → use the project's test runner
|
|
- The dev server can't be brought up locally (cloud-only setup) → use a different testing approach
|
|
|
|
---
|
|
|
|
## Use as Part of the Workflow
|
|
|
|
`ce-test-browser` is invoked at the verification side of the chain:
|
|
|
|
- **`/ce-code-review` Tier 2** — for browser-affecting PRs, can spawn this skill to verify behavior in addition to static review
|
|
- **`/ce-work` Phase 3** — appropriate before opening the PR for UI-heavy work; the test summary becomes part of the PR description's verification narrative
|
|
|
|
`mode:report-only` for `ce-code-review` is the only review mode safe to run concurrently with this skill on the same checkout — other modes mutate, which would interfere with the running dev server's state.
|
|
|
|
---
|
|
|
|
## Use Standalone
|
|
|
|
The skill works directly:
|
|
|
|
- **Current branch** — `/ce-test-browser`
|
|
- **Specific PR** — `/ce-test-browser 847`
|
|
- **Specific branch** — `/ce-test-browser feature/new-dashboard`
|
|
- **Custom port** — `/ce-test-browser --port 5000`
|
|
- **Pipeline mode** — `/ce-test-browser mode:pipeline` (auto-starts server, scans for free port)
|
|
|
|
When the dev server isn't running in manual mode, the skill informs the user with the right start command and stops. In pipeline mode, the skill auto-starts via `bin/dev`, `bin/rails server`, or `npm run dev` (whichever the project uses).
|
|
|
|
---
|
|
|
|
## Reference
|
|
|
|
| Argument | Effect |
|
|
|----------|--------|
|
|
| _(empty)_ | Tests current branch's changes |
|
|
| `<PR number>` | Tests that PR's affected routes |
|
|
| `<branch name>` | Tests that branch's affected routes |
|
|
| `current` | Tests current branch (explicit) |
|
|
| `--port <number>` | Override port detection |
|
|
| `mode:pipeline` | Auto-start server, scan for free port, default headless |
|
|
|
|
Required: `agent-browser` CLI installed (run `/ce-setup` if missing). Local dev server running (manual mode) or available start command (pipeline mode).
|
|
|
|
Key `agent-browser` commands the skill uses: `open <url>`, `snapshot -i` (interactive elements with refs `@e1`, `@e2`), `click @ref`, `fill @ref "text"`, `screenshot out.png`, `screenshot --full`, `--headed` flag for visible browser.
|
|
|
|
---
|
|
|
|
## FAQ
|
|
|
|
**Why `agent-browser` exclusively?**
|
|
Predictable behavior across platforms and modes. Falling back to Chrome MCP or IDE built-ins means three tools' quirks instead of one. The skill is explicit about it: do not use `mcp__claude-in-chrome__*` in Claude Code; do not substitute unrelated browsing tools in Codex.
|
|
|
|
**Headed or headless?**
|
|
Headed when you're iterating on a tricky interaction and need to see what's happening. Headless when you want speed for a routine sweep. Manual mode asks; pipeline mode defaults to headless.
|
|
|
|
**What does pipeline mode do differently?**
|
|
Pipeline mode is for automated runners (LFG, multi-agent on the same machine) where 3000 might be claimed. It scans for a free port starting from the preferred one, auto-starts the dev server in the background, defaults to headless, and skips the headed/headless question.
|
|
|
|
**What if my project layout doesn't match the file-to-route table?**
|
|
The mapping table is a starting point. The skill applies judgment for project-specific layouts. You can also test specific routes directly by adjusting the test scope detection — e.g., reviewing a known-affected route by passing the branch name.
|
|
|
|
**What if the dev server isn't running?**
|
|
Manual mode informs you with the right start command and stops. Pipeline mode auto-starts it via `bin/dev`, `bin/rails server`, or `npm run dev` (project-detected) and waits up to 30 seconds for the server to come up.
|
|
|
|
**Can it run concurrent with `ce-code-review`?**
|
|
Only when code review uses `mode:report-only` (read-only). Other review modes mutate the checkout, which would break the running dev server's state. Pair browser tests with read-only review, or run code review separately in an isolated worktree.
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- [`ce-code-review`](./ce-code-review.md) — can spawn this skill for browser-affecting PRs (use `mode:report-only` for concurrent runs on the same checkout)
|
|
- [`ce-test-xcode`](./ce-test-xcode.md) — sibling skill for iOS simulator testing
|
|
- [`ce-demo-reel`](./ce-demo-reel.md) — captures visual evidence for PR descriptions; complementary to test summary
|
|
- [`ce-work`](./ce-work.md) — orchestrator that may invoke this skill during Phase 3 verification
|
|
- [`ce-setup`](./ce-setup.md) — installs `agent-browser` and other dependencies
|