This document explains how to customize the Writer Context Tool to better suit your needs, especially if you want to pull more content or customize how it works.
The tool now implements permanent caching to disk and uses embeddings for semantic search. Here's how you can customize this behavior:
The default configuration now fetches up to 100 posts from each platform. If you want to adjust this:
{
"platforms": [
...
],
"max_posts": 200, // Increase from default of 100
"cache_duration_minutes": 10080, // One week (7 days)
"similar_posts_count": 15 // Increase from default of 10
}You can set max_posts as high as needed, but be aware that:
- Fetching many posts will take longer during refresh
- Each post requires storage for both content and embeddings
- Very large numbers of posts may impact Claude's resource selection UI
The tool now includes a similar_posts_count parameter that controls how many related essays are returned for each query:
{
"platforms": [
...
],
"similar_posts_count": 15 // Return 15 most relevant essays instead of default 10
}This parameter affects:
- The number of essays shown in search results
- The amount of context Claude can reference when answering your questions
- The processing time needed for each search (higher values might be slightly slower)
The tool now automatically preloads all content and generates all embeddings at startup. This means:
- Your content will be refreshed when you start the tool
- All embeddings will be generated immediately, making searches faster
- Claude will have access to your entire writing corpus from the start
You can modify this behavior by editing the preload_all_content function in writer_tool.py if you want to:
- Make preloading optional
- Run it on a periodic schedule instead of just at startup
- Add filtering to only preload certain platforms
The tool uses the all-MiniLM-L6-v2 model from sentence-transformers for embeddings. Advanced users can modify this in the code:
- Open
writer_tool.pyand find theget_embedding_modelfunction - Change the model name to any valid sentence-transformers model:
def get_embedding_model():
global model
if model is None:
logger.info("Loading embedding model...")
# Change to a different model if needed
model = SentenceTransformer('all-mpnet-base-v2') # Higher quality but slower
logger.info("Embedding model loaded")
return modelSome alternative models:
all-mpnet-base-v2: Higher quality embeddings but slowerall-MiniLM-L12-v2: Better quality than L6 with moderate speedparaphrase-multilingual-MiniLM-L12-v2: For multi-language support
The tool currently supports Substack and Medium. To add support for other platforms:
- Study how the existing scrapers work in
writer_tool.py - Implement a new scraper function similar to
fetch_substack_posts - Update the
get_all_contentfunction to support your new platform type
Example for adding WordPress support:
async def fetch_wordpress_posts(url: str, max_posts: int, platform_name: str) -> List[Post]:
"""Fetch posts from a WordPress blog."""
try:
logger.info(f"Fetching WordPress posts from: {url}")
# Ensure URL ends with slash
if not url.endswith("/"):
url += "/"
rss_url = f"{url}feed"
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(rss_url)
response.raise_for_status()
feed = feedparser.parse(response.text)
logger.info(f"Found {len(feed.entries)} WordPress posts")
# Rest of implementation...
except Exception as e:
logger.error(f"Error fetching WordPress feed: {str(e)}")
return []
# Process posts similar to the other scrapers...Then add it to the get_all_content function:
if platform_type == "substack":
posts = await fetch_substack_posts(platform_url, max_posts, platform_name)
elif platform_type == "medium":
posts = await fetch_medium_posts(platform_url, max_posts, platform_name)
elif platform_type == "wordpress": # Add this section
posts = await fetch_wordpress_posts(platform_url, max_posts, platform_name)
else:
logger.warning(f"Unknown platform type: {platform_type}")
continueThe tool now uses semantic search instead of keyword matching. You can customize the search behavior:
In the find_similar_posts function, you can adjust the number of results or add a similarity threshold:
def find_similar_posts(query: str, all_posts: List[Post], top_n: int = 5, min_similarity: float = 0.3) -> List[Tuple[Post, float]]:
"""Find posts similar to the query using embeddings."""
# ... existing code ...
# Add a minimum similarity threshold
results = [(post, sim) for post, sim in results if sim >= min_similarity]
# Sort by similarity (highest first)
results.sort(key=lambda x: x[1], reverse=True)
return results[:top_n]For advanced users, you could implement a hybrid search that combines semantic search with keyword matching:
def hybrid_search(query: str, all_posts: List[Post], top_n: int = 5) -> List[Tuple[Post, float]]:
"""Combine semantic search with keyword matching."""
# Semantic search results
semantic_results = find_similar_posts(query, all_posts, top_n=top_n*2)
# Keyword matching (simple implementation)
query_lower = query.lower()
keyword_matches = []
for post in all_posts:
if query_lower in post.title.lower() or query_lower in post.content.lower():
# Give exact matches a high score
keyword_matches.append((post, 0.95))
# Combine results (with duplicates removed)
seen_ids = set()
combined_results = []
# First add exact matches
for post, score in keyword_matches:
if post.id not in seen_ids:
combined_results.append((post, score))
seen_ids.add(post.id)
# Then add semantic matches
for post, score in semantic_results:
if post.id not in seen_ids:
combined_results.append((post, score))
seen_ids.add(post.id)
return combined_results[:top_n]For large collections of essays or improved performance:
The tool already implements disk caching, but you might want to customize its behavior:
# Initialize with a custom max_size
embeddings_cache = Cache(str(cache_dir / "embeddings"), size_limit=1_000_000_000) # 1GB limitYou could implement a background refresh mechanism:
async def background_refresh():
"""Refresh content in the background."""
while True:
try:
await get_all_content(refresh=True)
logger.info("Background refresh completed successfully")
except Exception as e:
logger.error(f"Error in background refresh: {str(e)}")
# Wait for next refresh cycle
await asyncio.sleep(3600) # 1 hour
# Start background refresh
asyncio.create_task(background_refresh())If you're customizing the tool and need to debug:
-
Increase Logging Detail:
logging.basicConfig( level=logging.DEBUG, # Change from INFO to DEBUG format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", )
-
Inspect Cache Contents: You can use the following code to inspect cache contents:
for key in posts_cache: print(f"Cache key: {key}")
-
Testing Embeddings: To test if embeddings are working correctly:
test_embedding = calculate_embedding("This is a test") print(f"Embedding shape: {test_embedding.shape}")