SEO/GEO Agent
ActiveA technical auditing agent that crawls sites, extracts metadata and structured data, and produces scoring signals for search and generative-engine visibility.
- Automation
- SEO
- GEO
- Crawling
- Structured data
Updated 2026-05-21

Overview
The SEO/GEO Agent is a free-by-default audit toolkit for classic SEO and
Generative Engine Optimization — visibility in ChatGPT, Perplexity, and
Google AI Overviews. A pnpm monorepo with a polite, robots-aware crawler, a
JS-render fallback through Puppeteer, seven collectors across SEO and GEO
signals, deterministic scoring, and three Claude Code skills
(/seo-audit, /geo-audit, /full-audit) that drive the CLI and
synthesize executive summaries.
Phase 1 is shipped, dogfooded against three real brands, and runs against any URL.
The problem
Two adjacent disciplines, no good single tool:
- Classic SEO — title/description quality, headings, schema, robots, performance, content depth, keyword coverage. A dozen paid SaaS products cover this for marketing teams, but each is rented, opinionated, and hostile to scripting.
- GEO (Generative Engine Optimization) — citability in AI search: whether the page is rendered without JavaScript, whether it has structured data the model can cite, whether the brand has external authority a model can ground against. Practically no off-the-shelf tooling exists yet.
The opportunity was a single toolkit that runs both passes against any URL, writes deterministic reports an LLM can summarize cleanly, and stays free for the operator by default. Paid providers (DataForSEO, real AI-platform citation checking) are stubbed for a Phase 2 that only matters if free-tier limits actually bite.
Audience
- Operators running multiple brands who need a repeatable audit workflow that doesn't depend on a paid dashboard.
- Developers who want the audit primitives composable, not locked behind a UI — the CLI emits structured JSON; Claude writes the prose.
- Anyone validating AI-search readiness for content that's currently invisible to ChatGPT or Perplexity because the page renders in JS or has no useful structured data.
What I built
A pnpm monorepo with three packages and three skill orchestrators:
@seo-agent/core— collectors (SEO technical, SEO content, SEO keywords; GEO citability, GEO schema, GEO technical, GEO authority), a polite fetcher (robots-aware, disk-cached, retry), a BFS crawler, a Puppeteer JS-render fallback, scoring, and the orchestrator.@seo-agent/render— markdown and PDF renderers (PDF via a Puppeteer template).@seo-agent/cli— theaudit-cliCommander binary.- Three skills in
skills/:/seo-audit,/geo-audit,/full-audit. The CLI emits deterministicreport.json; Claude synthesizes the executive summary and top-priorities block from it.
Invocation is either through Claude Code directly:
/full-audit https://spiceshelf.app
/seo-audit https://kingrove.corycopeland.dev --pdf
/geo-audit https://example.com --explain
…or through the CLI binary against any URL.
Product decisions
- Deterministic data, LLM-written prose. The CLI's job is to be boring
and correct — emit the same
report.jsonfor the same URL every run. The skill's job is to be readable. Mixing those produces unreproducible outputs and brittle scoring. - Free by default; paid providers are stubs. DataForSEO and real
AI-platform citation checking are stubbed in
packages/core/src/providers/. Phase 2 wires them in if and when the free path is provably insufficient. Default behavior should never require a paid API key. - Polite fetcher, not a scraper. Robots-aware, disk-cached, retry with backoff. The crawler defaults to a 50-page cap. The toolkit should be something I'd run against someone else's site without thinking about it.
- NAS-backed reports, not in-repo storage. Reports land at
/mnt/CodingTest/projects/seo-agent/reports/<host>/<date>/<hash>/and are swept into Kopia backups. The repo stays clean; the audit history stays durable. - System Chrome, not bundled Chromium.
PUPPETEER_SKIP_DOWNLOAD=truewith auto-detection of the system Chrome path. This dropped the install footprint significantly and made deployment simpler. - Skills as the primary UI. A CLI is necessary but not friendly. The
three slash commands turn the toolkit into something you'd actually reach
for in a Claude Code session —
/full-audit <url>is fast to type and drops a markdown summary in front of you.
Technical architecture
- Language / runtime: Node 20+, TypeScript, pnpm monorepo.
- Crawler: BFS, polite (robots.txt aware), disk-cached, retry with
backoff, 50-page default cap (configurable via
--max-pages). - JS render fallback: Puppeteer using system Chrome
(
PUPPETEER_EXECUTABLE_PATHautodetected for the common locations on Linux + macOS). - Collectors: seven distinct collectors split across SEO and GEO. Each
is independently testable and contributes a typed slice to the final
report.json. - Scoring: deterministic, with thresholds calibrated against real-world
dogfooding (more on this below).
--explainprints the scoring math to stderr. - Output:
report.json(machine-readable, the source of truth),report.md(Claude-synthesized summary on top of the JSON),report.pdf(optional, Puppeteer template). - Brand resolution:
brands.jsonat the repo root maps domains → brand name + topics; topics seed keyword scoring, brand name seeds the Wikipedia authority probe. - Tests: 45 unit tests, CI green, the install script symlinks skills
into
~/.claude/skills/and runspnpm install + build.
Real-world calibration
Phase 1's scoring thresholds were originally calibrated against the plan's
own test fixtures. After dogfooding against three brands — Spiceshelf 63,
SetOff 55, KinGrove 49 — geo-citability was restructured. The original
single geo.js-rendered finding was split into:
geo.no-static-render— the raw HTML is empty, but the JS render saves it (high severity).geo.thin-content-rendered— sparse content even after JS render (critical or high by length).
This is the kind of detail that only surfaces by running the toolkit against
sites you understand and reading the results carefully. The
geo-schema per-type weight is still a first guess and is the next thing to
calibrate.
Current status
- Phase 1 shipped. Three packages, seven collectors, three skills, 45 unit tests, CI green, NAS-backed report storage, system Chrome integration.
- Real-world calibration pass done against three brands;
geo-citabilityrestructured as a result. - Phase 2 stubs in place: DataForSEO and real AI-platform citation
checking sit in
packages/core/src/providers/, off by default.
What I would do next
- Calibrate
geo-schemaweights against more real brand data. - Fix the trailing-slash duplicate-title finding in
crawl.ts—hostandhost/are treated as separate URLs today, which inflates the duplicate-titles report when both are reachable. - Wire the Phase 2 paid providers behind a
--paid(or per-env) flag, defaulting off. - Expand the brand list and run a scheduled monthly audit, with diffs from the prior month.
Proof
- Repo:
spirix/seo-agenton self-hosted Forgejo. - Live skills:
/seo-audit,/geo-audit,/full-auditavailable in Claude Code on the dev environment. - An example
report.mdand the crawl-flow diagram are still being prepared for this page; they will land here without changing the status above.