MCP RSS Search Server

Author: Mihai Criveti

The ultimate RSS feed parser and search server for the Model Context Protocol (MCP)

Advanced RSS feed parsing, searching, filtering, and statistical analysis server built with FastMCP. Features AI-powered semantic search, hybrid retrieval (BM25 + semantic), multi-schema podcast support, and comprehensive RSS analytics.

Warning: This is an unsupported sample server for demonstration and testing only. Never run untrusted MCP servers directly on your local filesystem — always use a sandbox, container, or microVM (e.g. Docker, gVisor, Firecracker) with restricted capabilities. Perform your own security evaluation before registering any remote MCP server, including servers from public catalogs.

Quick Start
Features
Installation
Quick Examples
All Tools (25 Total)
Advanced Topics
JSON-RPC Usage
Configuration
Development & Testing
Troubleshooting
Example RSS Feeds
Performance
Contributing
License

Quick Start

Installation

# Basic installation
cd mcp-servers/python/mcp-rss-search
make install

# With AI similarity and hybrid search (recommended!)
make install-similarity

Running the Server

# Stdio mode (for Claude Desktop, Cursor, etc.)
make dev

# HTTP mode (REST API on port 9100)
make serve-http

First Example

import asyncio
from mcp_rss_search.server_fastmcp import rss_parser

async def main():
    # Fetch NPR News feed
    feed = await rss_parser.fetch_feed('https://feeds.npr.org/1001/rss.xml')
    print(f"Feed: {feed['metadata']['title']}")
    print(f"Entries: {feed['entry_count']}")

asyncio.run(main())

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "rss-search": {
      "command": "python",
      "args": ["-m", "mcp_rss_search.server_fastmcp"]
    }
  }
}

Features

🧠 AI-Powered Semantic & Hybrid Search

Hybrid Search: Combines BM25 (keyword) and semantic similarity for best results
Document-Wide Search: Search across all fields automatically
Configurable Models: Change embedding models at runtime (6+ models supported)
Similarity Search: Find content by meaning, not just keywords using sentence embeddings
Duplicate Detection: Identify duplicate or near-duplicate content semantically
Related Content: Discover "read more like this" recommendations
Topic Clustering: Automatically group entries into topic clusters
BM25 + Semantic: Best of both worlds - exact matching and meaning-based search

🎙️ Multi-Schema Podcast Support

iTunes Podcasts: Full support for itunes:* namespace (subtitle, summary, episode, season, etc.)
Google Play: Support for googleplay:* namespace
Standard RSS/Atom: Universal RSS 2.0 and Atom feed support
Smart Field Detection: Automatically extracts subtitles, summaries, descriptions across all formats
Schema Inspector: Discover available fields in any feed
Semantic Search in Any Field: Search subtitles, summaries, descriptions, or custom field combinations

🔍 Advanced Search Capabilities

Title Search: Find entries by title with regex support
Content Search: Search descriptions and full content
Multi-Field Search: Search across titles, descriptions, authors, and categories
Author/Speaker Search: Find all episodes by specific authors or podcast speakers
Regex Support: Use powerful regular expressions for complex queries
Case-Sensitive Options: Control search precision

📊 Statistical Analysis

Feed Statistics: Comprehensive feed analytics (entry counts, date ranges, etc.)
Author Analytics: Count and distribution of authors/speakers
Category Analytics: Topic distribution and tag analysis
Content Metrics: Average content length, total entries, etc.)
Media Analytics: Track audio/video content in feeds

🎯 Filtering & Organization

Date Range Filtering: Filter entries by publication date
Latest Entries: Get N most recent entries
Author Filtering: Find all content from specific authors
Category Browsing: Explore content by tags and categories

🧹 Clean Data

Automatic HTML Removal: Strips all HTML tags from content
XML Noise Filtering: Clean, structured data without XML clutter
Entity Decoding: Handles HTML entities correctly
Smart Caching: Configurable caching for performance

Installation

Quick Install

cd mcp-servers/python/mcp-rss-search

# Basic installation (keyword search only)
make install

# With AI similarity and hybrid search
make install-similarity

# With development tools
make dev-install

Using pip/uv directly

# Basic installation
pip install -e .

# With AI similarity and hybrid search features
pip install -e ".[similarity]"

# Full installation (similarity + dev tools)
pip install -e ".[full]"

# With uv (faster)
uv pip install -e ".[similarity]"

Requirements

Core Dependencies (always installed):

fastmcp >= 3.0.2
pydantic >= 2.5.0
feedparser >= 6.0.0
httpx >= 0.27.0
python-dateutil >= 2.8.0

Similarity & Hybrid Search (optional, install with [similarity]):

sentence-transformers >= 2.2.0 (~80MB model, auto-downloaded)
numpy >= 1.24.0
scikit-learn >= 1.3.0
rank-bm25 >= 0.2.2 (for hybrid search with BM25 keyword matching)

Environment Variables:

RSS_EMBEDDING_MODEL - Set default embedding model (default: all-MiniLM-L6-v2)

Podcast Schema Support

The server automatically detects and extracts fields from:

iTunes Podcasts: subtitle, summary, episode, season, duration, explicit flag
Google Play: author, description, image
Standard RSS/Atom: All standard fields

Quick Examples

1. Fetch and Parse RSS Feed

import asyncio
from mcp_rss_search.server_fastmcp import rss_parser

async def fetch_example():
    feed = await rss_parser.fetch_feed('https://feeds.npr.org/1001/rss.xml')
    print(f"Feed: {feed['metadata']['title']}")
    print(f"Entries: {feed['entry_count']}")

asyncio.run(fetch_example())

2. Search by Title

async def search_example():
    feed = await rss_parser.fetch_feed('https://feeds.npr.org/1001/rss.xml')
    results = rss_parser.search_entries(feed, 'climate', fields=['title'])
    print(f"Found {len(results)} climate-related articles")
    for result in results[:3]:
        print(f"  - {result['title']}")

asyncio.run(search_example())

3. Semantic Search (AI-Powered)

from mcp_rss_search.server_fastmcp import rss_parser, similarity_engine

async def semantic_search_example():
    feed = await rss_parser.fetch_feed('https://feeds.npr.org/1001/rss.xml')

    # Finds articles about AI, machine learning, automation, etc.
    results = similarity_engine.similarity_search(
        query="artificial intelligence and automation",
        entries=feed['entries'],
        top_k=5,
        threshold=0.5
    )

    for result in results:
        print(f"{result['entry']['title']} (score: {result['similarity']:.2f})")

asyncio.run(semantic_search_example())

4. Hybrid Search (BM25 + Semantic)

from mcp_rss_search.server_fastmcp import rss_parser, hybrid_engine

async def hybrid_search_example():
    feed = await rss_parser.fetch_feed('https://podcast-feed.xml')

    # Combines keyword matching (BM25) with semantic similarity
    results = hybrid_engine.hybrid_search(
        query="machine learning ethics",
        entries=feed['entries'],
        semantic_weight=0.6,  # 60% semantic, 40% keyword
        bm25_weight=0.4,
        top_k=10
    )

    for result in results:
        print(f"{result['entry']['title']}")
        print(f"  Hybrid: {result['hybrid_score']:.2f} "
              f"(Semantic: {result['semantic_score']:.2f}, "
              f"BM25: {result['bm25_score']:.2f})")

asyncio.run(hybrid_search_example())

5. Find Podcast Episodes by Guest

async def podcast_example():
    # Find episodes with specific guest
    results = rss_parser.find_by_author(
        feed,
        author="Neil deGrasse Tyson",
        exact_match=False
    )

    print(f"Found {len(results)} episodes with Neil deGrasse Tyson")

asyncio.run(podcast_example())

6. Get Feed Statistics

async def stats_example():
    feed = await rss_parser.fetch_feed('https://blog-feed.xml')
    stats = rss_parser.get_statistics(feed)

    print(f"Total entries: {stats['total_entries']}")
    print(f"Unique authors: {stats['authors']['count']}")
    print(f"Date range: {stats['date_range']['earliest']} to {stats['date_range']['latest']}")

asyncio.run(stats_example())

All Tools (25 Total)

The server provides 25 MCP tools organized into 5 categories:

🔬 Hybrid Search & Configuration (4 tools - NEWEST!)

1. `get_model_info`

Get information about the current embedding model configuration.

Returns: Model name, status, embedding dimensions, max sequence length

Example: Check which model is currently configured and loaded.

2. `configure_model`

Configure/change the embedding model at runtime.

Parameters:

model_name (string, required): Sentence transformer model to use

Common models:

all-MiniLM-L6-v2 (default) - Fast, lightweight (80MB)
all-mpnet-base-v2 - Higher quality (420MB)
multi-qa-mpnet-base-dot-v1 - Best for Q&A
paraphrase-multilingual-MiniLM-L12-v2 - Multilingual

Example: configure_model("all-mpnet-base-v2") - Switch to higher quality model

3. `hybrid_search`

Combine BM25 (keyword) and semantic similarity for robust search.

Parameters:

url (string, required): RSS feed URL
query (string, required): Search query
fields (list, optional): Fields to search (default: title, description)
top_k (integer, default: 10): Number of results
semantic_weight (float, default: 0.5): Weight for semantic score (0-1)
bm25_weight (float, default: 0.5): Weight for BM25 score (0-1)
threshold (float, default: 0.0): Minimum hybrid score

Returns: Results with hybrid_score, semantic_score, and bm25_score

Why Hybrid?: Combines the best of both worlds - exact keyword matching (BM25) and meaning-based matching (semantic).

4. `document_search`

Document-wide search across ALL fields with automatic hybrid retrieval.

Parameters:

url (string, required): RSS feed URL
query (string, required): Search query
top_k (integer, default: 10): Number of results
use_semantic (boolean, default: true): Enable semantic search
use_bm25 (boolean, default: true): Enable BM25 keyword search

Searches: title, subtitle, summary, description, author

Perfect for: When you don't know which field contains your information.

🎙️ Podcast-Specific Tools (4 tools)

5. `inspect_feed_schema`

Inspect feed to discover available fields and podcast schema type.

Returns:

Detected schemas (iTunes, Google Play, etc.)
Available fields and coverage
Recommended search fields
Sample values

6. `search_subtitles_semantic`

Search specifically in podcast episode subtitles (iTunes, Google Play formats).

Perfect for: Finding episodes by topic when subtitles are well-maintained.

7. `search_summaries_semantic`

Search in episode summaries and descriptions (handles all podcast formats).

Perfect for: Deep content search across iTunes summary, descriptions, etc.

8. `search_multi_field_semantic`

Custom multi-field semantic search - choose exactly which fields to search.

Fields available: title, subtitle, summary, description, author, categories, etc.

🧠 AI-Powered Semantic Search (4 tools)

9. `similarity_search`

Semantic similarity search using AI embeddings. Finds content by meaning, not just keywords.

Example: Query "climate crisis" finds articles about "global warming", "environmental catastrophe", etc.

Parameters:

url (string, required): RSS feed URL
query (string, required): Natural language search query
top_k (integer, default: 10): Number of results
threshold (float, default: 0.0): Minimum similarity (0-1)
use_cache (boolean, default: true)

10. `find_duplicates`

Find duplicate or near-duplicate entries using semantic similarity.

Use cases: Cross-posted content, republished articles, feed deduplication

Parameters:

url (string, required): RSS feed URL
similarity_threshold (float, default: 0.85): Duplicate threshold
use_cache (boolean, default: true)

11. `find_related_entries`

Find entries related to a specific entry - perfect for "read more like this".

Parameters:

url (string, required): RSS feed URL
entry_index (integer, required): Index of source entry
top_k (integer, default: 5): Number of related entries
use_cache (boolean, default: true)

12. `cluster_by_topic`

Automatically cluster entries into topic groups using K-means.

Parameters:

url (string, required): RSS feed URL
n_clusters (integer, default: 5): Number of clusters (2-20)
use_cache (boolean, default: true)

🔍 Classic Search Tools (7 tools)

13. `fetch_rss`

Fetch and parse an RSS feed from URL.

Parameters:

url (string, required): RSS feed URL
use_cache (boolean, default: true): Use cached feed if available

Returns: Complete feed data with metadata and entries

14. `search_titles`

Search RSS feed entries by title.

Parameters:

url (string, required): RSS feed URL
query (string, required): Search query
case_sensitive (boolean, default: false)
regex (boolean, default: false): Use regex pattern matching
use_cache (boolean, default: true)

15. `search_descriptions`

Search in entry descriptions and content.

Parameters: Same as search_titles

16. `search_all`

Search across all fields (title, description, author, categories).

Parameters: Same as search_titles

17. `find_by_author`

Find all entries by a specific author or speaker.

Parameters:

url (string, required): RSS feed URL
author (string, required): Author/speaker name
exact_match (boolean, default: false): Require exact name match
use_cache (boolean, default: true)

18. `list_authors`

List all unique authors/speakers with their entry counts.

Parameters:

url (string, required): RSS feed URL
min_count (integer, default: 1): Minimum entries per author
use_cache (boolean, default: true)

19. `list_categories`

List all unique categories/tags with counts.

Parameters:

url (string, required): RSS feed URL
min_count (integer, default: 1): Minimum entries per category
use_cache (boolean, default: true)

📊 Analytics & Filtering (6 tools)

20. `get_feed_statistics`

Get comprehensive feed statistics and analysis.

Returns:

Total entry count
Date range (earliest to latest)
Author statistics (count and distribution)
Category statistics
Media statistics (for podcasts)
Content length statistics

21. `filter_by_date`

Filter entries by date range.

Parameters:

url (string, required): RSS feed URL
start_date (string, optional): Start date (ISO format or natural language)
end_date (string, optional): End date
use_cache (boolean, default: true)

Example Dates:

ISO format: "2024-01-01"
Natural: "last week", "January 2024"

22. `get_feed_metadata`

Extract feed-level metadata (title, description, author, etc.).

23. `get_latest_entries`

Get N most recent entries from the feed.

Parameters:

url (string, required): RSS feed URL
count (integer, default: 10, range: 1-100): Number of entries
use_cache (boolean, default: true)

24. `analyze_feed`

Comprehensive feed analysis with insights and recommendations.

Returns:

Feed type detection (Podcast, Blog, News, etc.)
Update frequency analysis
Content patterns
Automated insights
Recommendations

25. `clear_cache`

Clear the internal RSS feed cache.

Advanced Topics

Semantic Similarity Search

Semantic similarity search uses AI embeddings to find content by meaning, not just keywords.

How It Works

Text is converted to high-dimensional vectors (embeddings)
Similarity is computed using cosine similarity
Results are ranked by semantic similarity score (0-1)

When to Use

Use Semantic Search When	Use Keyword Search When
Looking for concepts/ideas	Need exact phrase/term
Exploring related topics	Know specific keyword
Finding similar articles	Filtering by author/date
Duplicate detection	Precise field matching
Content recommendations	Fast, simple queries

Understanding Similarity Scores

Score	Meaning	Use Case
0.9-1.0	Nearly identical	Duplicate detection
0.7-0.9	Very similar	Same topic/event
0.5-0.7	Related	Recommended reading
0.3-0.5	Loosely related	Topic exploration
< 0.3	Minimal relation	Filter out

Examples

# High precision search
results = similarity_engine.similarity_search(
    query="quantum computing applications in drug discovery",
    entries=feed['entries'],
    threshold=0.7,  # High threshold
    top_k=5
)

# Discovery mode (find related concepts)
results = similarity_engine.similarity_search(
    query="climate change",
    entries=feed['entries'],
    threshold=0.3,  # Lower threshold
    top_k=20  # More results
)

# Find duplicates
duplicates = similarity_engine.find_duplicates(
    entries=feed['entries'],
    similarity_threshold=0.85
)

# Topic clustering
clusters = similarity_engine.cluster_entries(
    entries=feed['entries'],
    n_clusters=5
)

Hybrid Search (BM25 + Semantic)

Hybrid search combines traditional keyword-based ranking (BM25) with semantic similarity for robust search results.

Why Hybrid Search?

Problem with Pure Semantic Search:

May miss exact keyword matches
Can be too "fuzzy" for specific queries

Problem with Pure Keyword Search:

Misses synonyms and related concepts
Requires exact word matches

Solution: Hybrid Search:

Gets both exact matches AND semantic matches
Configurable weights let you tune precision vs. recall

Weight Configuration Guidelines

Use Case	Semantic Weight	BM25 Weight	Why
General search	0.5	0.5	Balanced
Concept discovery	0.7	0.3	Find related topics
Exact term search	0.3	0.7	Precise keyword matching
Technical docs	0.4	0.6	Technical terms matter
News articles	0.6	0.4	Concepts > exact words

Examples

# Balanced hybrid search
results = hybrid_engine.hybrid_search(
    query="artificial intelligence regulation",
    entries=feed['entries'],
    semantic_weight=0.5,
    bm25_weight=0.5,
    top_k=10
)

# Favor semantic (find related concepts)
results = hybrid_engine.hybrid_search(
    query="climate change policy",
    entries=feed['entries'],
    semantic_weight=0.7,
    bm25_weight=0.3,
    top_k=5
)

# Favor keyword (precise matching)
results = hybrid_engine.hybrid_search(
    query="Python 3.12 new features",
    entries=feed['entries'],
    semantic_weight=0.3,
    bm25_weight=0.7,
    fields=["title", "description"],
    top_k=10
)

# Document-wide search (all fields)
results = hybrid_engine.document_search(
    query="quantum computing interview",
    entries=feed['entries'],
    top_k=10
)

Result Format

{
  "success": true,
  "query": "machine learning ethics",
  "semantic_weight": 0.5,
  "bm25_weight": 0.5,
  "match_count": 5,
  "matches": [
    {
      "hybrid_score": 0.82,
      "semantic_score": 0.75,
      "bm25_score": 0.89,
      "entry": {
        "title": "AI Ethics and Responsible Development",
        "description": "...",
        ...
      }
    }
  ]
}

Model Configuration

Available Models

Model	Size	Speed	Quality	Use Case
`all-MiniLM-L6-v2`	80MB	⚡⚡⚡	Good	Default, fast
`all-MiniLM-L12-v2`	120MB	⚡⚡	Better	Balanced
`all-mpnet-base-v2`	420MB	⚡	Best	High quality
`multi-qa-mpnet-base-dot-v1`	420MB	⚡	Best	Q&A, search
`all-distilroberta-v1`	290MB	⚡⚡	Better	Fast, good quality
`paraphrase-multilingual-MiniLM-L12-v2`	470MB	⚡	Good	Multilingual

Change Model at Runtime

# Get current model info
info = similarity_engine.get_model_info()
print(f"Current model: {info['configured_model']}")

# Switch to higher quality model
similarity_engine.model_name = "all-mpnet-base-v2"
similarity_engine.model = None  # Force reload

# Or via environment variable
import os
os.environ['RSS_EMBEDDING_MODEL'] = "all-mpnet-base-v2"

When to Change Models

Use Larger Models When:

Search quality is critical
You have time for slower processing
You're doing offline batch processing
You need best possible results

Use Smaller Models When:

Speed is important
You're running on limited hardware
You need real-time responses
Default quality is sufficient

Use Specialized Models When:

Q&A format: multi-qa-mpnet-base-dot-v1
Multilingual content: paraphrase-multilingual-MiniLM-L12-v2
Code search: all-mpnet-base-v2

Podcast Schema Support

The server automatically detects and extracts fields from multiple podcast formats.

Supported Schemas

iTunes Podcast - Full itunes:* namespace support
Google Play Podcasts - googleplay:* namespace
Standard RSS 2.0 - All standard fields
Atom Feeds - Summary, content, author
Media RSS - Thumbnails, media content

Field Extraction Priority

Subtitle:

itunes:subtitle → subtitle → googleplay:description (if < 200 chars)

Summary:

itunes:summary → content → summary → description → googleplay:description

Author:

itunes:author → googleplay:author → author

Image:

itunes:image → googleplay:image → media:thumbnail → image

Extracted Fields (16 total)

title, subtitle, summary, description
link, author, published, updated
categories, episode, season
media_url, media_type, media_duration, media_size
image, explicit, guid

Workflow Example

# 1. Inspect feed schema
schema = rss_parser.inspect_feed_schema(feed_url)
print(f"Schemas: {schema['detected_schemas']}")
print(f"Best fields: {schema['recommended_search_fields']}")

# 2. Search subtitles (topic-focused)
if 'subtitle' in schema['available_fields']:
    results = similarity_engine.similarity_search(
        query="machine learning applications",
        entries=feed['entries'],
        fields=["subtitle"]
    )

# 3. Search summaries (deep content)
results = similarity_engine.similarity_search(
    query="discussion about quantum computing",
    entries=feed['entries'],
    fields=["summary", "description"]
)

# 4. Custom field search
results = similarity_engine.similarity_search(
    query="climate change policy",
    entries=feed['entries'],
    fields=schema['recommended_search_fields']
)

JSON-RPC Usage

When running in HTTP mode, the server supports JSON-RPC 2.0 protocol.

Start HTTP Server

make serve-http
# Server runs on http://0.0.0.0:9100/mcp/

List All Tools

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
  http://0.0.0.0:9100/mcp/ | python3 -m json.tool

Fetch RSS Feed

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "fetch_rss",
      "arguments": {
        "url": "https://feeds.npr.org/1001/rss.xml",
        "use_cache": false
      }
    }
  }' \
  http://0.0.0.0:9100/mcp/ | python3 -m json.tool

Search Titles

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "search_titles",
      "arguments": {
        "url": "https://feeds.npr.org/1001/rss.xml",
        "query": "climate",
        "case_sensitive": false,
        "regex": false
      }
    }
  }' \
  http://0.0.0.0:9100/mcp/ | python3 -m json.tool

Hybrid Search

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "hybrid_search",
      "arguments": {
        "url": "https://feeds.npr.org/1001/rss.xml",
        "query": "artificial intelligence ethics",
        "semantic_weight": 0.6,
        "bm25_weight": 0.4,
        "top_k": 5
      }
    }
  }' \
  http://0.0.0.0:9100/mcp/ | python3 -m json.tool

Configuration

Environment Variables

# Set default embedding model
export RSS_EMBEDDING_MODEL="all-mpnet-base-v2"

# Set cache directory (optional)
export RSS_CACHE_DIR="/path/to/cache"

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json:

{
  "mcpServers": {
    "rss-search": {
      "command": "python",
      "args": ["-m", "mcp_rss_search.server_fastmcp"],
      "env": {
        "RSS_EMBEDDING_MODEL": "all-MiniLM-L6-v2"
      }
    }
  }
}

Cursor / Other MCP Clients

The server supports standard MCP protocol via stdio. Refer to your client's documentation for configuration.

Development & Testing

Running Tests

# Run all tests with coverage
make test

# Run specific test
pytest tests/test_server.py::TestRSSParser::test_fetch_feed_success -v

# Check coverage report
pytest --cov=mcp_rss_search --cov-report=html
open htmlcov/index.html

Code Quality

# Format code
make format

# Lint code
make lint

# Type checking
mypy src/mcp_rss_search

Development Workflow

# Create virtual environment
make venv

# Install development dependencies
make dev-install

# Run development server with auto-reload
make dev

# Run all quality checks
make test lint format

Troubleshooting

Issue: ModuleNotFoundError

Solution: Make sure the virtual environment is activated and the package is installed:

. .venv/bin/activate
cd mcp-servers/python/mcp-rss-search
pip install -e .

Issue: Similarity search not available

Error: "sentence-transformers not installed"

Solution: Install similarity dependencies:

pip install -e ".[similarity]"
# Or
make install-similarity

Issue: Slow first run

Reason: Model needs to download (~80MB)

Solution: Wait 30-60 seconds on first use. Model is cached after that.

Issue: Out of memory

Solution: Process fewer entries at once

# Process in batches
batch_size = 50
for i in range(0, len(entries), batch_size):
    batch = entries[i:i+batch_size]
    results = similarity_engine.similarity_search(query, batch)

Issue: Server won't start

Solution: Check if port 9100 is already in use:

lsof -i :9100
# Kill any process using the port
kill <PID>

Issue: Tests failing

Solution: Reinstall dev dependencies:

pip install -e ".[dev]"
pytest -v

Example RSS Feeds

Performance

Caching

Feeds are cached by URL
Cache persists for server lifetime
Manual cache clearing with clear_cache tool
Configurable via use_cache parameter

Optimization

Async HTTP requests
Efficient feedparser usage
Minimal memory footprint
Fast regex-based text processing

Performance Tips

Use caching: Set use_cache=True for frequently accessed feeds
Filter early: Use specific tools instead of fetching everything
Regex carefully: Simple string searches are faster than regex
Clear cache: Use clear_cache() when feeds update frequently
Batch processing: For large feeds, process entries in batches

Contributing

Contributions welcome! Areas for enhancement:

Persistent caching (Redis, SQLite)
Feed autodiscovery from website URLs
OPML import/export
Multi-feed aggregation and deduplication
Custom feed filters and transformations
RSS feed creation/generation
Webhook support for feed updates
More embedding models support
GraphQL API support

License

Apache-2.0

Credits

Built with:

FastMCP - MCP protocol implementation
feedparser - RSS/Atom parsing
httpx - HTTP client
sentence-transformers - Semantic embeddings
rank-bm25 - BM25 ranking
ContextForge - ContextForge ecosystem

Happy RSS parsing with AI-powered search! 🎉🧠🔍✨

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MCP RSS Search Server

Table of Contents

Quick Start

Installation

Running the Server

First Example

Claude Desktop Configuration

Features

🧠 AI-Powered Semantic & Hybrid Search

🎙️ Multi-Schema Podcast Support

🔍 Advanced Search Capabilities

📊 Statistical Analysis

🎯 Filtering & Organization

🧹 Clean Data

Installation

Quick Install

Using pip/uv directly

Requirements

Podcast Schema Support

Quick Examples

1. Fetch and Parse RSS Feed

2. Search by Title

3. Semantic Search (AI-Powered)

4. Hybrid Search (BM25 + Semantic)

5. Find Podcast Episodes by Guest

6. Get Feed Statistics

All Tools (25 Total)

🔬 Hybrid Search & Configuration (4 tools - NEWEST!)

1. get_model_info

2. configure_model

3. hybrid_search

4. document_search

🎙️ Podcast-Specific Tools (4 tools)

5. inspect_feed_schema

6. search_subtitles_semantic

7. search_summaries_semantic

8. search_multi_field_semantic

🧠 AI-Powered Semantic Search (4 tools)

9. similarity_search

10. find_duplicates

11. find_related_entries

12. cluster_by_topic

🔍 Classic Search Tools (7 tools)

13. fetch_rss

14. search_titles

15. search_descriptions

16. search_all

17. find_by_author

18. list_authors

19. list_categories

📊 Analytics & Filtering (6 tools)

20. get_feed_statistics

21. filter_by_date

22. get_feed_metadata

23. get_latest_entries

24. analyze_feed

25. clear_cache

Advanced Topics

Semantic Similarity Search

How It Works

When to Use

Understanding Similarity Scores

Examples

Hybrid Search (BM25 + Semantic)

Why Hybrid Search?

Weight Configuration Guidelines

Examples

Result Format

Model Configuration

Available Models

Change Model at Runtime

When to Change Models

Podcast Schema Support

Supported Schemas

1. `get_model_info`

2. `configure_model`

3. `hybrid_search`

4. `document_search`

5. `inspect_feed_schema`

6. `search_subtitles_semantic`

7. `search_summaries_semantic`

8. `search_multi_field_semantic`

9. `similarity_search`

10. `find_duplicates`

11. `find_related_entries`

12. `cluster_by_topic`

13. `fetch_rss`

14. `search_titles`

15. `search_descriptions`

16. `search_all`

17. `find_by_author`

18. `list_authors`

19. `list_categories`

20. `get_feed_statistics`

21. `filter_by_date`

22. `get_feed_metadata`

23. `get_latest_entries`

24. `analyze_feed`

25. `clear_cache`