Skip to content

Latest commit

 

History

History
306 lines (242 loc) · 8.76 KB

File metadata and controls

306 lines (242 loc) · 8.76 KB

🔄 Complete Keyword Generation Pipeline Flow

📊 Full Pipeline Overview

┌─────────────────────────────────────────────────────────────────┐
│                    KEYWORD GENERATION PIPELINE                   │
└─────────────────────────────────────────────────────────────────┘

STEP 1: DEEP RESEARCH (60-90s)
├─ Reddit discussions
├─ Quora questions  
├─ Forum posts
└─ Output: Hyper-niche keywords from real user discussions

STEP 2: SE RANKING GAP ANALYSIS (if enabled)
├─ Analyzes company URL vs competitors
├─ Finds keywords competitors rank for but company doesn't
├─ Provides volume & difficulty data
└─ Output: Gap keywords with SEO metrics

STEP 3: AI KEYWORD GENERATION
├─ Gemini generates keywords based on company context
├─ Multiple parallel batches
└─ Output: AI-generated keywords

STEP 3.5: HYPER-NICHE VARIATIONS
├─ Adds geo/industry/size targeting
├─ Long-tail variations (4+ words)
└─ Output: Hyper-niche long-tail keywords

STEP 4: KEYWORD SCORING
├─ AI scores each keyword for company-fit (0-100)
└─ Output: Scored keywords

STEP 5: SEMANTIC DEDUPLICATION
├─ AI removes near-duplicates
└─ Output: Unique keywords

STEP 6: FILTERING
├─ Filter by min_score
├─ Filter by min_word_count
└─ Output: Filtered keywords

STEP 7: CLUSTERING
├─ AI groups keywords into semantic clusters
└─ Output: Clustered keywords

STEP 8: SERP ANALYSIS (45-60s) ← AFTER RESEARCH
├─ Analyzes top 15 keywords
├─ Gets AEO opportunity scores
├─ Detects featured snippets
├─ Detects PAA (People Also Ask)
├─ Extracts bonus keywords from PAA
└─ Output: SERP-enriched keywords + bonus keywords

STEP 9: LIMIT TO TARGET COUNT
└─ Output: Final 30 keywords

STEP 10: VOLUME LOOKUP (optional)
├─ DataForSEO Keywords Data API
└─ Output: Exact search volumes

🔍 Where Each Component Fits

1. SE Ranking (Gap Analysis) - STEP 2

When: Early in pipeline (after research, before AI generation)

What it does:

  • Analyzes company's website vs competitors
  • Finds keywords competitors rank for but company doesn't
  • Provides search volume and difficulty scores
  • Identifies content gaps

Code location:

# Step 2: SE Ranking gap analysis (if available and company has URL)
if self.seranking_client and company_info.url:
    gap_keywords = await self._get_gap_keywords(company_info, config)

Output:

  • Keywords with source: "gap_analysis"
  • Includes volume and difficulty from SE Ranking
  • Pre-scored with aeo_score

Requirements:

  • SERANKING_API_KEY environment variable
  • Company must have url in CompanyInfo

2. Google Trends - ❌ NOT CURRENTLY INTEGRATED

Status: ⚠️ EXISTS BUT NOT USED

What exists:

  • GoogleTrendsAnalyzer class in google_trends_analyzer.py
  • Can analyze trends, rising queries, seasonality
  • But it's NOT called in the main generation flow

What it would do if integrated:

  • Analyze trending keywords
  • Find rising queries (trending up)
  • Detect seasonality patterns
  • Provide relative volume (0-100 scale)

Where it SHOULD fit:

STEP 1.5: GOOGLE TRENDS ANALYSIS (if enabled)
├─ Analyze generated keywords for trends
├─ Find rising queries
├─ Detect seasonality
└─ Output: Trend-enriched keywords

To enable:

  • Add enable_trends: bool to GenerationConfig
  • Call GoogleTrendsAnalyzer after keyword generation
  • Enrich keywords with trend data

Current status: Available but not integrated into pipeline


3. SERP Analysis - STEP 8 (AFTER RESEARCH)

When: Late in pipeline (after clustering, before final limit)

What it does:

  1. Analyzes top 15 keywords (configurable via serp_sample_size)
  2. Gets AEO opportunity scores (0-100)
    • How good is this keyword for Answer Engine Optimization?
    • Based on SERP features (snippets, PAA, etc.)
  3. Detects featured snippets
    • Is there a featured snippet in SERP?
    • Opportunity to rank #0
  4. Detects PAA (People Also Ask)
    • Are there "People Also Ask" questions?
    • Indicates question-based search intent
  5. Extracts bonus keywords from PAA
    • Finds related questions from PAA sections
    • Adds them as bonus keywords

Code location:

# Step 8: SERP Analysis (if enabled) - enriches with AEO scores
if config.enable_serp_analysis:
    serp_analyses, bonus_keywords = await self._analyze_serp(
        all_keywords, config
    )

Output enrichments:

{
  "aeo_opportunity": 80,        // ← AEO score (0-100)
  "has_featured_snippet": false, // ← Featured snippet detected?
  "has_paa": true,              // ← PAA section detected?
  "serp_analyzed": true         // ← Was this keyword analyzed?
}

Bonus keywords:

  • Extracted from PAA sections
  • Added with source: "serp_paa"
  • Scored at 60 (need proper scoring later)

Why AFTER research:

  • Research finds keywords first
  • SERP analysis enriches them with SERP intelligence
  • Helps prioritize which keywords have best AEO opportunity
  • Extracts additional keywords from PAA

📊 Complete Data Flow

Input Sources:

  1. Deep Research → Reddit/Quora/Forums keywords
  2. SE Ranking → Gap analysis keywords (with volume/difficulty)
  3. AI Generation → Gemini-generated keywords
  4. Hyper-Niche → Long-tail variations

Enrichment Steps:

  1. Scoring → Company-fit scores (0-100)
  2. Clustering → Semantic groups
  3. SERP Analysis → AEO scores, snippets, PAA, bonus keywords

Output:

  • 30 keywords with full metadata
  • SERP enrichments (15 analyzed)
  • Clusters (6 groups)
  • Statistics

🔧 Current Integration Status

Component Status Location When Called
Deep Research ✅ Integrated Step 1 First
SE Ranking ✅ Integrated Step 2 Early (if URL provided)
AI Generation ✅ Integrated Step 3 After research/gap
Hyper-Niche ✅ Integrated Step 3.5 After AI generation
SERP Analysis ✅ Integrated Step 8 After clustering
Google Trends NOT INTEGRATED N/A Should be Step 1.5
Autocomplete NOT INTEGRATED N/A Should be Step 0.5

💡 What SERP Analysis Does After Research

Purpose:

Enriches keywords found by research with SERP intelligence

Process:

  1. Takes top 15 keywords (after scoring/clustering)
  2. For each keyword:
    • Queries Google Search (via Gemini)
    • Analyzes SERP features
    • Calculates AEO opportunity score
    • Detects featured snippets
    • Detects PAA sections
    • Extracts bonus keywords from PAA

Why This Matters:

  • Research finds keywords → "client asking for chatgpt ranking report"
  • SERP analysis enriches → Tells you:
    • AEO opportunity: 80/100 (high!)
    • Has PAA: Yes (question intent)
    • Featured snippet: No (opportunity!)
    • Bonus keywords: ["how to track chatgpt rankings", "chatgpt ranking tools"]

Example:

Research finds: "client asking for chatgpt ranking report"
                ↓
SERP Analysis enriches:
  - AEO opportunity: 80/100
  - Has PAA: true
  - Featured snippet: false
  - Bonus keywords: ["how to track chatgpt rankings", ...]

🎯 SE Ranking Integration

What SE Ranking Provides:

  1. Gap Analysis

    • Keywords competitors rank for but company doesn't
    • Content opportunities
  2. Volume & Difficulty

    • Exact search volumes
    • SEO difficulty scores
    • Competition level

When It Runs:

  • Step 2 (early in pipeline)
  • Only if:
    • SERANKING_API_KEY is set
    • Company has url in CompanyInfo
    • Competitors are provided

Output:

  • Keywords with source: "gap_analysis"
  • Pre-scored with aeo_score from SE Ranking
  • Includes volume and difficulty

📝 Summary

Current Pipeline:

  1. Deep Research (finds keywords)
  2. SE Ranking Gap Analysis (finds competitor gaps)
  3. AI Generation (fills remaining slots)
  4. Hyper-Niche Variations (adds long-tail)
  5. Scoring (company-fit)
  6. Deduplication
  7. Filtering
  8. Clustering
  9. SERP Analysis ← Enriches with AEO scores, snippets, PAA
  10. Final limit

Missing:

  • ❌ Google Trends (exists but not integrated)
  • ❌ Autocomplete (exists but not integrated)

SERP Analysis Purpose:

  • Enriches keywords with SERP intelligence
  • Calculates AEO opportunity
  • Finds bonus keywords from PAA
  • Helps prioritize which keywords to target

SE Ranking Purpose:

  • Finds content gaps vs competitors
  • Provides volume/difficulty data
  • Early in pipeline (Step 2)