Skip to content

gbessoni/seobuild-verify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

seobuild-verify

Automated fact-checking for AI-generated SEO pages. Resolves {{VERIFY}}, {{RESEARCH NEEDED}}, {{SOURCE NEEDED}}, and {{MANUAL CHECK}} tags with real, sourced data -- so nothing publishes with a guess in it. Any uppercase-label tag ({{FACT CHECK}}, {{CITATION NEEDED}}, etc.) also works.

Built to run as the final step in the SEO AGI pipeline, or standalone on any page that uses the verification tag format.


The Problem

LLMs are confident liars. Ask one to write a page about airport parking and it will invent a $20/day rate that hasn't been accurate since 2019. SEO-AGI solves this by refusing to commit -- it writes the page but wraps every specific claim in a verification tag:

The garage daily rate is {{VERIFY: $20 | County Parking Rates PDF}}.

That tag is a contract: "I think this is right, here's where to check, but don't publish until someone confirms it."

seobuild-verify is the "someone." It reads every tag, searches for the real source, and either confirms the claim, corrects it, or flags it for a human.


Where This Fits

This is one piece of a four-stage pipeline for generating SEO content that ranks on Google and gets cited by LLMs.

                        THE PIPELINE

  1. KEYWORD DISCOVERY          2. CONTENT ANGLE PREDICTION
  google-keyword-planner        swarm-mcp
  gbessoni/keyword-seo-agent    Content Angle Swarm Service
  ──────────────────────        ───────────────────────────
  Find keywords scored by       25 AI personas debate which
  Estimated Monthly Revenue     content angle will resonate.
  (EMR), not just volume.       Predicts engagement before
  Intent classification,        a single word is written.
  AI visibility scoring,        Outputs: winning angle,
  hyper-local expansion.        audience fit, risk factors.
          │                               │
          └──────────┐    ┌───────────────┘
                     ▼    ▼
          3. PAGE GENERATION
          seo-agi (v1.3+)
          gbessoni/seo-agi
          ──────────────────
          Generates full SEO pages using
          live SERP data, 500-token chunk
          architecture, Reddit Test quality
          gate, and 34-point checklist.

          Output: publish-ready HTML with
          {{VERIFY}}, {{RESEARCH NEEDED}},
          and {{SOURCE NEEDED}} tags on
          every unconfirmed claim.
                     │
                     ▼
          4. CLAIM VERIFICATION    ◄── YOU ARE HERE
          seobuild-verify
          gbessoni/seobuild-verify
          ────────────────────────
          Searches real sources for every
          tagged claim. Confirms, corrects,
          or flags for manual review.
          Replaces tags inline with verified
          data and source URLs.

Each stage is independent. You can use seobuild-verify on any HTML/Markdown file that contains verification tags, even if it wasn't generated by SEO-AGI.


How SEO-AGI Works (and Why Verification Tags Exist)

SEO-AGI is a Claude Code skill that generates pages designed to rank on Google AND get cited by AI assistants (ChatGPT, Perplexity, Gemini, Claude). It is not a template filler. It runs live competitive intelligence before writing a single word.

What makes it different

Real data in, real content out. Before writing, SEO-AGI pulls SERP data via DataForSEO (or Ahrefs/SEMRush MCP), analyzes the top 10 competitors, extracts People Also Ask questions, and identifies content gaps. The page it produces is informed by what actually ranks, not by what an LLM thinks should rank.

The 500-token chunk architecture. Google's AI retrieval works in ~500-token chunks. Every H2 section is a self-contained answer to a real search query. The strongest content goes in the first three chunks, not buried at the bottom. This is how pages get into AI Overviews.

34-point quality checklist. Every page is scored before delivery. Among the checks:

  • Does it pass the Reddit Test? (Would a knowledgeable practitioner upvote it, or call it AI slop?)
  • Core answer in the first 150 words?
  • Contains 2+ hard operational facts with specific numbers?
  • Real HTML comparison tables, not bullet lists?
  • "Not For You" block with honest negative recommendations?
  • All specific numbers tagged with {{VERIFY}}?
  • Entity Consensus validation -- every claim checked against 2+ sources?
  • Original research or first-hand data experiment included?

Pages scoring below 27/34 get revised before delivery.

The verification tag system

SEO-AGI is forbidden from inventing statistics. Instead, it inserts tags:

Tag When Used Example
{{VERIFY}} Any specific price, rate, capacity, schedule, distance, or operational claim {{VERIFY: Garage daily rate $20 | County Parking Rates PDF}}
{{RESEARCH NEEDED}} A section that needs hard data the agent couldn't find {{RESEARCH NEEDED: Garage total capacity | check master plan PDF}}
{{SOURCE NEEDED}} A claim that needs a traceable citation before publish {{SOURCE NEEDED: shuttle frequency | check ground transportation page}}
{{MANUAL CHECK}} (v1.1.0) A claim that cannot be machine-verified -- subjective, local-knowledge, time-sensitive, or already-failed-to-resolve. Both an input (deliberate, with tried: by-design) and an output (after exhausting auto-verification). {{MANUAL CHECK: terminal pickup is faster than ride-share queue | tried: by-design (experiential)}}

Every tag includes the claim and a suggested source (or, for MANUAL CHECK, a tried: note). A typical SEO-AGI page has 10-25 tags. Publishing with tags still in the HTML is the SEO equivalent of shipping with TODO comments in production.

{{MANUAL CHECK}} tags are re-parsed and re-attempted on every run. The tried: notes are appended across runs (separated by ;), never overwritten -- so a tag that's been retried 3 times shows the cumulative search history. After two failed runs, it surfaces in the Manual Follow-ups Required section of the verification report. See SKILL.md Section 1 for full semantics.


What seobuild-verify Does

1. Parse

Extracts every verification tag from the target file(s) using scripts/verify.py. Skips tags inside code blocks and <pre> elements. Groups tags by suggested source to minimize redundant lookups.

2. Search

For each tag, runs a 5-step resolution protocol:

Step Tool When
1 WebSearch Always first. Searches for the suggested source document. Prioritizes .gov, .edu, .org domains.
2 WebFetch When Step 1 finds a promising URL. Pulls the page and extracts the specific data point.
3 DataForSEO When the source is a competitor page or involves content structure analysis.
4 Firecrawl When the source is JavaScript-heavy or needs deeper scraping.
5 Broader search When Steps 1-4 fail. Tries alternative phrasings, site-scoped searches, multiple corroborating sources.

Stops at the first step that produces a confident answer.

3. Resolve

Every tag gets exactly one of three outcomes:

CONFIRMED   The claim matches a fetchable source.
            Tag replaced with verified data + source URL in an HTML comment.

CORRECTED   The topic is right but the number is wrong.
            Tag replaced with corrected data + source URL + correction note.

UNVERIFIED  No reliable source found after all 5 steps.
            Tag replaced with {{MANUAL CHECK}} so a human knows what to do.

4. Replace

Edits the file inline:

<!-- CONFIRMED -->
Before: The garage daily rate is {{VERIFY: $20 | County Parking Rates PDF}}.
After:  The garage daily rate is $20<!-- source: https://broward.org/airport/parking -->.

<!-- CORRECTED -->
Before: The garage daily rate is {{VERIFY: $20 | County Parking Rates PDF}}.
After:  The garage daily rate is $25<!-- source: https://broward.org/airport/parking | corrected from: $20 -->.

<!-- UNVERIFIED -->
Before: {{RESEARCH NEEDED: Garage total capacity | check master plan PDF}}
After:  {{MANUAL CHECK: Garage total capacity | tried: searched county master plan PDF, airport website, no capacity data found}}

Always creates a .pre-verify backup before editing.

5. Report

Outputs a verification report with counts, a resolution table, all sources used, and a list of manual follow-ups:

Verification Report
─────────────────────────────────────────
File: fll-airport-parking.html
Total tags: 14
Confirmed:  9 (64%)
Corrected:  3 (21%)
Unverified: 2 (14%)

Sources Used:
  https://broward.org/airport/parking -- daily rates, valet pricing
  https://faa.gov/air_traffic/... -- 2023 passenger volume
  ...

Manual Follow-ups:
  - Garage total capacity (no data in master plan PDF)
  - Peak season occupancy % (BCAD annual report not publicly available)

Installation

Claude Code / Codex

Clone into your skills directory:

git clone https://github.com/gbessoni/seobuild-verify.git ~/.codex/skills/seobuild-verify

Or for Claude Code:

git clone https://github.com/gbessoni/seobuild-verify.git ~/.claude/skills/seobuild-verify

No dependencies. The parser script is pure Python 3.8+ stdlib.

Verify it works

python3 ~/.codex/skills/seobuild-verify/scripts/verify.py summary ~/Documents/SEO-AGI/pages/

Usage

As a Claude Code skill

/seobuild-verify ~/Documents/SEO-AGI/pages/fll-parking.html

Or after generating a page:

/seo-agi write a page for "FLL airport parking"
/seobuild-verify

When invoked without arguments, it scans ~/Documents/SEO-AGI/pages/ and ~/Documents/SEO-AGI/rewrites/ for files containing unresolved tags.

Batch mode

/seobuild-verify ~/Documents/SEO-AGI/pages/*.html

Parser script standalone

# Extract all tags as JSON
python3 scripts/verify.py parse path/to/page.html

# Quick count summary
python3 scripts/verify.py summary path/to/page.html

# Apply replacements from a JSON file
python3 scripts/verify.py replace path/to/page.html replacements.json

Tag Format Reference

The parser recognizes any uppercase-label tag via this regex:

\{\{([A-Z][A-Z0-9 _\-]*?):\s*(.+?)\s*(?:\|\s*(.+?)\s*)?\}\}

Breaking it down:

{{TAG_TYPE: claim text | suggested source}}
  │          │            │
  │          │            └─ Optional. Where to look for confirmation
  │          │               (or, for MANUAL CHECK, a `tried:` note).
  │          └────────────── Required. The specific claim to verify.
  └───────────────────────── Required. Any uppercase label: VERIFY,
                              RESEARCH NEEDED, SOURCE NEEDED, MANUAL CHECK,
                              FACT CHECK, CITE, CITATION NEEDED, etc.

The parser also accepts an allowlist of non-verification labels it skips: TOC, TABLE OF CONTENTS, INCLUDE, TEMPLATE. Add to this list in scripts/verify.py if you have other custom tags.

The pipe | separator is optional. Tags without a suggested source are valid:

{{VERIFY: shuttle runs every 10 minutes}}

Edge Cases

Situation Behavior
Tag inside an HTML attribute Flagged UNVERIFIED with note. Not safe to inline-replace.
Tag inside <pre> or code block Skipped entirely. These are examples, not real tags.
Same claim appears multiple times Resolved once, applied to all instances.
Source URL returns 404 Noted in report. Tries alternative sources.
Two sources disagree on the data Marked UNVERIFIED with both sources cited. Human decides.
File has 50+ tags Processed in batches of 20 with progress saved after each.

Safety Guarantees

  • Never fabricates a source URL
  • Never marks CONFIRMED without a fetchable URL containing the data
  • Never silently drops a tag -- every tag appears in the report
  • Always backs up the original file before editing
  • Files with 50+ tags are batched to avoid context overflow

File Structure

seobuild-verify/
  SKILL.md              Skill definition (agent instructions)
  scripts/
    verify.py           Tag parser + replacement engine

Related Projects

Project What It Does Repo
keyword-seo-agent Keyword research scored by Estimated Monthly Revenue. Intent classification, AI visibility scoring, hyper-local expansion. gbessoni/keyword-seo-agent
seo-agi Page generation with live SERP data, 500-token chunks, Reddit Test, 34-point checklist. Produces the verification tags this skill resolves. gbessoni/seo-agi
swarm-mcp Content angle prediction via 25-persona AI debate simulation. Predicts which angle resonates before writing. In development

License

MIT

About

Verification agent skill for SEO-AGI: resolves {{VERIFY}}, {{RESEARCH NEEDED}}, and {{SOURCE NEEDED}} tags with real sources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages