Admin: skills smoke test
/admin/skills-smoke is an internal page that runs every built-in skill against a stock prompt and reports pass/fail. Useful when verifying a deployment, debugging a regression, or sanity-checking a model swap.
#Who can run it
The page is gated behind admin access. Workspace owners on Max and Enterprise plans can see it. If you don't see it in the sidebar, you don't have access.
#Running the test
Open the page
Navigate to /admin/skills-smoke. The page lists every built-in skill grouped by pack, with a status pill per row (idle / running / pass / fail).
Pick a scope
Three buttons at the top:
- Run all — every skill in batches of 4.
- Run failed — re-run only previously failed rows.
- Run pack — pick a single pack (Roast, Marketing, Career, Motion, Augmenters).
Watch the run
Skills run in parallel batches. Each row updates live with a spinner during the call, then turns green (pass) or red (fail). Click any row to expand the full request/response payload — useful for debugging.
Investigate failures
A failure means the skill returned an error event, parsed badly, or timed out. Expand the row for the raw response. The most common causes:
- LLM provider is rate-limiting (transient).
- System prompt drifted and now produces invalid JSON (augmenters).
- Stock prompt no longer triggers the expected skill behavior.
#What's in a row
| Field | Type | Description |
|---|---|---|
| alias | string | The skill's slash command (e.g. /roast). |
| kind | critique | augmenter | What the skill is supposed to do. |
| status | idle | running | pass | fail | Current state of the smoke run. |
| duration | ms | How long the call took. |
| output | expandable | The full streamed response (markdown for critique, JSON for augmenter). |
| error | expandable | Error message + stack trace if failed. |
#When to run
- After a deployment — verify nothing broke in the LLM pipeline.
- After a model swap — confirm the new model handles all skill prompts correctly.
- After editing a built-in's system prompt — sanity check before merging.
- On a regression report — narrow the scope to the affected pack first.
#It's not a test suite
The smoke test isn't a unit/integration test substitute. Skills have non-determinism by design — a "pass" means "produced output of the right shape", not "produced the right answer". For deeper guarantees, write actual tests against the skill engine.