Cory Copeland
Back to work

SEO/GEO Agent

Active

A technical auditing agent that crawls sites, extracts metadata and structured data, and produces scoring signals for search and generative-engine visibility.

  • Automation
  • SEO
  • GEO
  • Crawling
  • Structured data

Updated 2026-05-21

SEO/GEO Agent

Overview

The SEO/GEO Agent is a free-by-default audit toolkit for classic SEO and Generative Engine Optimization — visibility in ChatGPT, Perplexity, and Google AI Overviews. A pnpm monorepo with a polite, robots-aware crawler, a JS-render fallback through Puppeteer, seven collectors across SEO and GEO signals, deterministic scoring, and three Claude Code skills (/seo-audit, /geo-audit, /full-audit) that drive the CLI and synthesize executive summaries.

Phase 1 is shipped, dogfooded against three real brands, and runs against any URL.

The problem

Two adjacent disciplines, no good single tool:

  • Classic SEO — title/description quality, headings, schema, robots, performance, content depth, keyword coverage. A dozen paid SaaS products cover this for marketing teams, but each is rented, opinionated, and hostile to scripting.
  • GEO (Generative Engine Optimization) — citability in AI search: whether the page is rendered without JavaScript, whether it has structured data the model can cite, whether the brand has external authority a model can ground against. Practically no off-the-shelf tooling exists yet.

The opportunity was a single toolkit that runs both passes against any URL, writes deterministic reports an LLM can summarize cleanly, and stays free for the operator by default. Paid providers (DataForSEO, real AI-platform citation checking) are stubbed for a Phase 2 that only matters if free-tier limits actually bite.

Audience

  • Operators running multiple brands who need a repeatable audit workflow that doesn't depend on a paid dashboard.
  • Developers who want the audit primitives composable, not locked behind a UI — the CLI emits structured JSON; Claude writes the prose.
  • Anyone validating AI-search readiness for content that's currently invisible to ChatGPT or Perplexity because the page renders in JS or has no useful structured data.

What I built

A pnpm monorepo with three packages and three skill orchestrators:

  • @seo-agent/core — collectors (SEO technical, SEO content, SEO keywords; GEO citability, GEO schema, GEO technical, GEO authority), a polite fetcher (robots-aware, disk-cached, retry), a BFS crawler, a Puppeteer JS-render fallback, scoring, and the orchestrator.
  • @seo-agent/render — markdown and PDF renderers (PDF via a Puppeteer template).
  • @seo-agent/cli — the audit-cli Commander binary.
  • Three skills in skills/: /seo-audit, /geo-audit, /full-audit. The CLI emits deterministic report.json; Claude synthesizes the executive summary and top-priorities block from it.

Invocation is either through Claude Code directly:

/full-audit https://spiceshelf.app
/seo-audit https://kingrove.corycopeland.dev --pdf
/geo-audit https://example.com --explain

…or through the CLI binary against any URL.

Product decisions

  • Deterministic data, LLM-written prose. The CLI's job is to be boring and correct — emit the same report.json for the same URL every run. The skill's job is to be readable. Mixing those produces unreproducible outputs and brittle scoring.
  • Free by default; paid providers are stubs. DataForSEO and real AI-platform citation checking are stubbed in packages/core/src/providers/. Phase 2 wires them in if and when the free path is provably insufficient. Default behavior should never require a paid API key.
  • Polite fetcher, not a scraper. Robots-aware, disk-cached, retry with backoff. The crawler defaults to a 50-page cap. The toolkit should be something I'd run against someone else's site without thinking about it.
  • NAS-backed reports, not in-repo storage. Reports land at /mnt/CodingTest/projects/seo-agent/reports/<host>/<date>/<hash>/ and are swept into Kopia backups. The repo stays clean; the audit history stays durable.
  • System Chrome, not bundled Chromium. PUPPETEER_SKIP_DOWNLOAD=true with auto-detection of the system Chrome path. This dropped the install footprint significantly and made deployment simpler.
  • Skills as the primary UI. A CLI is necessary but not friendly. The three slash commands turn the toolkit into something you'd actually reach for in a Claude Code session — /full-audit <url> is fast to type and drops a markdown summary in front of you.

Technical architecture

  • Language / runtime: Node 20+, TypeScript, pnpm monorepo.
  • Crawler: BFS, polite (robots.txt aware), disk-cached, retry with backoff, 50-page default cap (configurable via --max-pages).
  • JS render fallback: Puppeteer using system Chrome (PUPPETEER_EXECUTABLE_PATH autodetected for the common locations on Linux + macOS).
  • Collectors: seven distinct collectors split across SEO and GEO. Each is independently testable and contributes a typed slice to the final report.json.
  • Scoring: deterministic, with thresholds calibrated against real-world dogfooding (more on this below). --explain prints the scoring math to stderr.
  • Output: report.json (machine-readable, the source of truth), report.md (Claude-synthesized summary on top of the JSON), report.pdf (optional, Puppeteer template).
  • Brand resolution: brands.json at the repo root maps domains → brand name + topics; topics seed keyword scoring, brand name seeds the Wikipedia authority probe.
  • Tests: 45 unit tests, CI green, the install script symlinks skills into ~/.claude/skills/ and runs pnpm install + build.

Real-world calibration

Phase 1's scoring thresholds were originally calibrated against the plan's own test fixtures. After dogfooding against three brands — Spiceshelf 63, SetOff 55, KinGrove 49 — geo-citability was restructured. The original single geo.js-rendered finding was split into:

  • geo.no-static-render — the raw HTML is empty, but the JS render saves it (high severity).
  • geo.thin-content-rendered — sparse content even after JS render (critical or high by length).

This is the kind of detail that only surfaces by running the toolkit against sites you understand and reading the results carefully. The geo-schema per-type weight is still a first guess and is the next thing to calibrate.

Current status

  • Phase 1 shipped. Three packages, seven collectors, three skills, 45 unit tests, CI green, NAS-backed report storage, system Chrome integration.
  • Real-world calibration pass done against three brands; geo-citability restructured as a result.
  • Phase 2 stubs in place: DataForSEO and real AI-platform citation checking sit in packages/core/src/providers/, off by default.

What I would do next

  • Calibrate geo-schema weights against more real brand data.
  • Fix the trailing-slash duplicate-title finding in crawl.tshost and host/ are treated as separate URLs today, which inflates the duplicate-titles report when both are reachable.
  • Wire the Phase 2 paid providers behind a --paid (or per-env) flag, defaulting off.
  • Expand the brand list and run a scheduled monthly audit, with diffs from the prior month.

Proof

  • Repo: spirix/seo-agent on self-hosted Forgejo.
  • Live skills: /seo-audit, /geo-audit, /full-audit available in Claude Code on the dev environment.
  • An example report.md and the crawl-flow diagram are still being prepared for this page; they will land here without changing the status above.