The Newsletter Summarizer is a Python tool designed to automatically retrieve, analyze, and summarize newsletters from a user's Gmail account for any topic domain. Originally built for AI newsletters (still the default), it can now analyze Finance, Sports, Technology, Politics, or any other topic by providing custom analysis guidance. It distills key developments and actionable insights from multiple newsletters into a concise, easy-to-understand report targeted at regular users.
- Reliable newsletter fetching - Sequential processing with robust retry logic and error handling
- Resilient error handling - Continues processing even if individual newsletters fail to fetch or parse
- Robust HTML parsing - Multiple fallback strategies ensure content extraction even from malformed emails
- Automatically fetches emails tagged with a specified label from your Gmail account (default: "ai-newsletter")
- Extracts and analyzes content from multiple newsletter sources
- Identifies key topics and trends across newsletters using advanced LLM techniques
- Uses OpenRouter to route requests to Google Gemini 2.5 Flash (default), OpenAI GPT-4.1, or Anthropic's Claude 3.7 Sonnet for cost-efficient API usage and tracking
- Prioritizes recent content and breaking news (configurable)
- Outputs a markdown report with the top AI developments, why they matter, and actionable insights
- Includes links to newsletter sources and a brief methodology section
- Modular codebase: Authentication, fetching, LLM analysis, and reporting are in separate modules for easier maintenance and extension
- Python 3.11 (recommended) or 3.10-3.13 (also supported)
- Gmail account with newsletters tagged/labeled (default:
ai-newsletter, customizable for other topics) - Google API credentials (
credentials.json) - see setup instructions below - OpenRouter API key (set as
OPENROUTER_API_KEYenvironment variable) - required as the default API provider - OpenAI API key (set as
OPENAI_API_KEYenvironment variable) - only needed if not using OpenRouter - Anthropic API key (set as
ANTHROPIC_API_KEYenvironment variable) - only needed if not using OpenRouter
-
Clone the repository
git clone https://github.com/saadiq/newsletter_summary.git cd newsletter_summary -
Set up a virtual environment (Recommended)
# Use Python 3.11 (recommended) python3.11 -m venv venv source venv/bin/activate # On macOS/Linux # venv\Scripts\activate # On Windows
-
Install dependencies
pip install --upgrade pip setuptools wheel pip install -r requirements.txt
-
Set up Google OAuth credentials
You can set up Gmail API credentials using either the Google Cloud Console (web UI) or the gcloud CLI.
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Enable the Gmail API for your project
- Go to "Credentials", click "Create Credentials", and select "OAuth client ID"
- Choose "Desktop application" as the application type
- Download the credentials JSON file
- Rename and save the downloaded file as
credentials.jsonin the project's root directory
If you have the gcloud CLI installed:
# Authenticate with Google Cloud gcloud auth login # Create a new project (or use existing) gcloud projects create my-newsletter-summarizer --name="Newsletter Summarizer" gcloud config set project my-newsletter-summarizer # Enable Gmail API gcloud services enable gmail.googleapis.com # Create OAuth 2.0 credentials for desktop app # Note: This creates the OAuth consent screen and client gcloud auth application-default login --scopes=https://www.googleapis.com/auth/gmail.readonly # Download the credentials # You'll need to create OAuth client ID through the Console for desktop apps # as gcloud doesn't directly support creating desktop OAuth clients
After using gcloud to set up the project and enable APIs, you'll still need to:
- Go to Google Cloud Console
- Navigate to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "OAuth client ID"
- Select "Desktop app" and download the JSON
- Save as
credentials.jsonin the project directory
Note: The gcloud commands help with project setup and API enabling, but OAuth client creation for desktop apps still requires the Console.
-
Get an OpenRouter API key
- Sign up at OpenRouter (if you don't have an account).
- Obtain an API key from your account dashboard.
-
Create a
.env.localfileCreate a file named
.env.localin the project directory and add your API keys:# Required - default API provider OPENROUTER_API_KEY=your_openrouter_api_key_here # Optional - OpenRouter configuration USE_OPENROUTER=true OPENROUTER_COST_LOG=openrouter_costs.json # Optional - only needed if bypassing OpenRouter with USE_OPENROUTER=false ANTHROPIC_API_KEY=your_anthropic_api_key_here OPENAI_API_KEY=your_openai_api_key_here
(Note: Ensure this file is included in your
.gitignoreif you plan to commit the code) -
Validate your configuration
Before running the main application, validate all your settings:
python config_validator.py
This will check:
- API keys are properly set
- Gmail credentials are valid
- Newsletter websites JSON is properly formatted
- No duplicate or invalid URLs in the cache
After installation:
# Validate configuration
python config_validator.py
# Run with defaults (7 days, Google Gemini via OpenRouter)
python main.py
# Use Claude instead
python main.py --llm-provider claude
# Use OpenAI GPT-4.1 instead
python main.py --llm-provider openai
# Analyze last 14 days
python main.py --days 14While the tool defaults to analyzing AI newsletters for backward compatibility, it can analyze newsletters for any topic domain by using the --topic parameter along with custom analysis guidance.
# Analyze Finance newsletters
python main.py --topic Finance --label finance-newsletter
# Analyze Sports newsletters
python main.py --topic Sports --label sports-digest
# Default behavior (AI newsletters)
python main.py # Same as --topic AI --label ai-newsletterThe tool's analysis can be customized using guidance to focus on what matters for your specific domain:
python main.py --topic Finance --label finance-news \
--analysis-guidance "Focus on market movements, regulatory changes, and investment opportunities. Include specific ticker symbols and fund names when relevant. Emphasize risk considerations."# Create a guidance file
cat > finance_guidance.txt << EOF
Prioritize the following in order of importance:
1. Market-moving events and their percentage impacts
2. Regulatory changes affecting individual investors
3. New investment products or opportunities
4. Economic indicators and their implications
5. Company earnings that beat/miss expectations
For each topic, include:
- Specific ticker symbols (e.g., AAPL, SPY)
- Percentage changes when discussing market movements
- Time horizons for investments (short/medium/long term)
- Risk level (low/medium/high)
EOF
# Use the guidance file
python main.py --topic Finance --label finance-news \
--guidance-file finance_guidance.txtpython main.py --topic Finance --label finance-newsletter \
--analysis-guidance "Focus on market movements, regulatory changes, and investment opportunities. Prioritize actionable insights for individual investors. Include specific ticker symbols, fund names, and percentage changes. Emphasize risk considerations and portfolio implications. Highlight opportunities in both bull and bear scenarios."What this produces:
- Headlines focus on market movements with specific percentages
- Includes ticker symbols (AAPL, TSLA, SPY) and fund names (ARKK, VTI)
- "Why It Matters" sections explain impact on portfolios and retirement accounts
- "Practical Impact" includes specific actions like "Consider dollar-cost averaging into index funds"
python main.py --topic Sports --label sports-news \
--analysis-guidance "Focus on game results, player statistics, trades, and injury reports. Include team standings and playoff implications. Highlight fantasy sports relevance. Mention specific player names and statistics. Focus on major leagues (NFL, NBA, MLB, NHL) but include notable stories from other sports."What this produces:
- Headlines about major games, trades, and player achievements
- Specific statistics (e.g., "LeBron James scores 45 points")
- "Why It Matters" explains playoff implications or record-breaking achievements
- "Practical Impact" includes fantasy sports recommendations
python main.py --topic Technology --label tech-digest \
--analysis-guidance "Focus on product launches, security vulnerabilities, developer tools, and industry trends. Include version numbers and release dates. Emphasize practical implications for both consumers and developers. Mention compatibility requirements and migration paths for updates."What this produces:
- Headlines about new products, updates, and security issues
- Specific version numbers and compatibility requirements
- "Why It Matters" explains impact on workflows and security
- "Practical Impact" includes update recommendations and migration strategies
python main.py --topic Politics --label politics-digest \
--analysis-guidance "Focus on policy changes, legislation, and regulatory updates that affect citizens directly. Avoid partisan framing - stick to factual implications. Include effective dates and compliance requirements. Emphasize practical impacts on taxes, healthcare, education, and civil rights."What this produces:
- Headlines about legislation and policy changes
- Specific dates when changes take effect
- "Why It Matters" explains real-world impact on citizens
- "Practical Impact" includes actions like registering to vote or filing for benefits
1. Be Specific About Priorities
Good: "Prioritize market-moving events over analyst opinions"
Poor: "Focus on important stuff"
2. Define Output Expectations
Good: "Include ticker symbols and percentage changes"
Poor: "Add some details"
3. Set Clear Scope
Good: "Focus on US markets with brief mentions of major international events"
Poor: "Cover everything about finance"
4. Specify Actionability
Good: "Practical impacts should include specific investment actions or portfolio adjustments"
Poor: "Make it useful"
Here's a template you can adapt for any domain:
Prioritize these topics in order:
1. [Most important category]
2. [Second priority]
3. [Third priority]
For each topic, include:
- [Specific detail type 1]
- [Specific detail type 2]
- [Specific detail type 3]
Focus on practical implications for [target audience].
Emphasize [key aspect] over [less important aspect].
When discussing [topic], always include [specific information].
| Without Custom Guidance | With Custom Guidance |
|---|---|
| Generic topic selection | Domain-specific prioritization |
| General insights | Targeted, actionable information |
| Broad audience appeal | Specific audience focus |
| Standard formatting | Domain-appropriate details |
The --topic parameter controls how the content is analyzed and presented, while --label controls which emails are fetched:
# Fetch emails labeled "investment-news" but analyze as Finance
python main.py --topic Finance --label investment-news
# Fetch emails labeled "newsletter" but analyze as Technology
python main.py --topic Technology --label newsletter --from-email [email protected]-
Setup Gmail Label
In your Gmail account, create a label for your newsletters. The default is
ai-newsletter, but you can use any label that matches your topic:ai-newsletterfor AI newsletters (default)finance-newsletterfor Finance newsletterssports-digestfor Sports newsletters- Or any custom label you prefer
Apply this label to all the newsletters you want the script to process.
-
Activate the virtual environment
source venv/bin/activate # On macOS/Linux # venv\Scripts\activate # On Windows
-
Run the tool
The entry point is
main.py:python main.py
By default, this analyzes newsletters from the past 7 days using OpenRouter to connect to Google Gemini 2.5 Flash. See Command-line Options below to customize.
Example: Use Claude 3.7 Sonnet instead of Google Gemini (default):
python main.py --llm-provider claude
Example: Use OpenAI GPT-4.1 instead of default:
python main.py --llm-provider openai
Example: Use a specific custom OpenRouter model:
python main.py --model google/gemini-2.5-flash-preview:thinking
Example: Specify the number of topics to extract and analyze:
python main.py --num-topics 7
-
First-time Authentication
The very first time you run the tool, it will open a browser window. You'll need to:
- Log in to the Google account associated with the Gmail inbox you want to analyze.
- Grant the tool permission to view your email messages and settings (this is the
gmail.readonlyscope). After successful authentication, the tool will create atoken.jsonfile to store the authorization credentials, so you won't need to authenticate via the browser on subsequent runs (unless the token expires or is revoked).
-
View the Results
The tool will output progress messages to the console. Once finished, it will generate a markdown file in the
docs/_posts/directory with Jekyll-compatible naming:YYYY-MM-DD-label-summary-Xd.md(where X is the number of days covered by the report). The file includes Jekyll frontmatter for GitHub Pages integration. Open this file to view your summarized report, or visit your GitHub Pages site after pushing the changes.
By default, reports are saved to docs/_posts/ for GitHub Pages integration. You can customize this:
- CLI flag (overrides default):
python main.py --output /custom/path
- Environment variable (overrides default if no CLI flag specified):
export NEWSLETTER_SUMMARY_OUTPUT_DIR=/custom/path - To use current directory (old behavior):
python main.py --output .
For development or testing, you can inject mock newsletter data by setting the NEWSLETTER_SUMMARY_MOCK_DATA environment variable to a JSON array of newsletter objects. This will bypass Gmail fetching:
export NEWSLETTER_SUMMARY_MOCK_DATA='[{"subject": "Test Subject", "date": "2024-01-01", "sender": "[email protected]", "body": "Test body."}]'You can modify the tool's behavior using these optional flags:
-
--days N: Specify the number of past days to retrieve emails from.python main.py --days 14
(Default:
7) -
--topic TOPIC: Specify the topic domain for analysis.python main.py --topic Finance python main.py --topic Sports
(Default:
AI) -
--analysis-guidance TEXT: Provide custom analysis instructions inline.python main.py --topic Finance --analysis-guidance "Focus on market movements and include ticker symbols" -
--guidance-file FILE: Load custom analysis instructions from a file.python main.py --topic Sports --guidance-file sports_analysis_guide.txt
-
--label LABEL: Specify the Gmail label to filter newsletters (default:ai-newsletter).python main.py --label my-custom-label
-
--no-label: Do not use any Gmail label as a search criterion (useful if you want to search by other criteria like sender).python main.py --no-label --from-email [email protected]
-
--from-email EMAIL: Only include emails from the specified sender.python main.py --from-email [email protected]
-
--to-email EMAIL: Only include emails sent to the specified recipient.python main.py --to-email [email protected]
-
--llm-provider PROVIDER: Choose betweengoogle(default),claude, oropenai.python main.py --llm-provider claude # Use Claude python main.py --llm-provider openai # Use OpenAI GPT-4.1
-
--model MODEL: Specify a custom OpenRouter model to use (overrides --llm-provider).python main.py --model google/gemini-2.5-flash-preview:thinking
This allows using any model available on OpenRouter.
-
--num-topics N: Specify the number of topics to extract and summarize (default: 10).python main.py --num-topics 7
-
--no-prioritize-recent: Disable higher weighting for recent newsletters.python main.py --no-prioritize-recent
-
--no-breaking-news-section: Disable the separate "Just In" section for latest developments.python main.py --no-breaking-news-section
-
-h/--help: Show all available command-line options and usage examples.
The tool uses different models depending on whether you're using OpenRouter (default) or direct API calls:
When USE_OPENROUTER=true (default), the presets map to these OpenRouter models:
claude→anthropic/claude-sonnet-4openai→openai/gpt-4.1-minigoogle→google/gemini-2.5-flash
When USE_OPENROUTER=false, the presets map to these direct API models:
claude→claude-3-7-sonnet-20250219openai→gpt-4.1-2025-04-14google→gemini-2.5-flash-preview
You can also use the --model parameter to specify any custom OpenRouter model directly, which overrides the preset mappings.
This project uses OpenRouter by default for all LLM API calls, providing:
- Competitive pricing
- Detailed usage tracking
- Access to both Claude and OpenAI models through a single API
To check your OpenRouter setup:
python verify_openrouter.pyTo analyze request costs:
python analyze_costs.pyThe tool uses a single, streamlined approach to generating summaries:
- Sends newsletter content directly to the LLM in one step
- LLM identifies the most significant topics and generates summaries simultaneously
- Produces coherent, ranked topics with actionable insights for regular users
The codebase is organized into the following modules for clarity and maintainability:
auth.py— Gmail authenticationfetch.py— Email fetchingllm.py— LLM analysisreport.py— Report generationmain.py— Entry point (run this file to use your tool)
For more advanced modifications:
- To modify the number of key topics extracted, adjust the
num_topicsargument. - To change the direct-LLM prompt or model, edit the
analyze_newsletters_unifiedfunction inllm.py. - To customize the final report format or content, modify the
generate_reportfunction inreport.py.
The tool caches detected newsletter websites for each source and marks them as verified or unverified:
- Verified: Trusted and used for future runs.
- Unverified: Used as a fallback, but will be replaced if a better guess or curated mapping is found.
- Curated mapping: Always takes precedence and is always trusted.
-
After running the tool, review the detected websites for accuracy:
python review_newsletter_websites.py
For each unverified entry, you can:
[a]cceptto mark as verified[e]ditto correct the website and mark as verified[d]eleteto remove the entry (it will be re-guessed next run)[s]kipto leave it unverified for now
-
Why review?
- Ensures your report always links to the correct main site for each newsletter.
- Prevents bad guesses (e.g., tracking links, forms) from persisting in your reports.
- Lets you maintain high-quality, human-verified source links.
-
How to extend the curated mapping:
- Create or edit
curated_websites.json(key: newsletter name, value: url). Entries override guesses and are treated as verified.
- Create or edit
- Reliable newsletter fetching with automatic retry logic and exponential backoff
- Graceful error handling - individual newsletter failures don't crash the entire process
- Enhanced HTML parsing with multiple fallback strategies for malformed content
- Detailed error reporting with troubleshooting tips for common issues
- The tool fetches newsletters sequentially with automatic retry on failures
- Built-in exponential backoff prevents rate limiting issues
- Failed fetches are reported but don't stop processing of successful ones
- For very large volumes (50+ newsletters), consider using
--daysto limit the date range
-
Authentication Failed:
- Check that
credentials.jsonexists in the project directory - Delete
token.jsonand re-authenticate if the token has expired - Ensure Gmail API is enabled in Google Cloud Console
- Check that
-
No Newsletters Found:
- Verify the Gmail label exists and is spelled correctly (default:
ai-newsletter) - Check if there are emails within the specified date range (
--days) - Try using
--no-labelwith--from-emailto search by sender instead
- Verify the Gmail label exists and is spelled correctly (default:
-
Rate Limiting:
- The tool automatically handles rate limits with retry logic
- If persistent, wait a few minutes and try again
-
HTML Parsing Errors:
- The tool now has multiple fallback strategies and will extract text even from malformed HTML
- Check the report for
[Plain text extraction]or[Fallback text extraction]markers
-
NumPy Build Errors / Python Version:
- Use Python 3.11 (recommended) or 3.10-3.13
- Some scientific packages may have issues with the latest Python versions
-
OpenRouter API Issues:
- Run
python verify_openrouter.pyto check your setup - If problems persist, set
USE_OPENROUTER=falsein.env.localto use direct API calls
- Run
For detailed error traces, set the DEBUG environment variable:
export DEBUG=1
python main.pyThe project includes comprehensive test coverage:
test_fetch_api.py: Unit tests for email fetching, retry logic, and error handlingtest_e2e_cli.py: End-to-end tests for the CLI workflow and report generationtest_llm.py: Tests for LLM integration and OpenRouter functionalitytest_report.py: Tests for report generation and formattingtest_utils.py: Tests for HTML parsing and text extractiontest_auth.py: Tests for Gmail authenticationtest_config_validator.py: Tests for configuration validation
To run all tests:
pytestTo run specific test files:
pytest test_fetch_api.py # Test fetching with retry logic
pytest test_e2e_cli.py # Test full workflowTo run with coverage:
pytest --cov=. --cov-report=htmlThis project includes full GitHub Actions automation for weekly newsletter summaries published to GitHub Pages.
-
Enable GitHub Pages
- Go to your repository on GitHub
- Navigate to Settings → Pages
- Under "Source", select "Deploy from a branch"
- Choose
gh-pagesbranch and/ (root)folder - Click Save
-
Prepare Gmail Secrets
First, generate properly formatted secrets from your local credentials:
python prepare_github_secrets.py
This creates a
github_secrets.txtfile with your credentials properly formatted for GitHub. -
Add GitHub Secrets
Go to your repository → Settings → Secrets and variables → Actions, then add these repository secrets:
- GMAIL_CREDENTIALS: Copy the entire JSON content from
github_secrets.txt(the part between the dashed lines for GMAIL_CREDENTIALS) - GMAIL_TOKEN: Copy the entire JSON content from
github_secrets.txt(the part between the dashed lines for GMAIL_TOKEN) - OPENROUTER_API_KEY: Your OpenRouter API key
Important: Make sure you're adding "Repository secrets" not "Environment secrets"
- GMAIL_CREDENTIALS: Copy the entire JSON content from
-
Delete Sensitive Files
After adding secrets to GitHub, delete the temporary file:
rm github_secrets.txt
The included workflow (.github/workflows/generate-summary.yml) automatically:
- Runs weekly: Every Sunday at 10 AM UTC
- Fetches newsletters: From the past 7 days with the
ai-newsletterlabel - Generates summaries: Using your configured LLM provider
- Commits reports: To
docs/_posts/with proper Jekyll frontmatter - Publishes to GitHub Pages: Automatically deploys to your site
You can also trigger the workflow manually:
- Go to Actions tab in your repository
- Select "Generate Newsletter Summary"
- Click "Run workflow"
- Optionally customize:
- days: Number of days to look back (default: 7)
- label: Gmail label to filter (default: ai-newsletter)
- topic: Topic domain for analysis (default: AI)
To configure the GitHub Action for a different topic, edit .github/workflows/generate-summary.yml:
# For Finance newsletters
- name: Generate summary
run: |
python main.py --days 7 --topic Finance --label finance-newsletter
# Or with custom guidance
- name: Generate summary with guidance
run: |
python main.py --days 7 --topic Sports --label sports-digest \
--analysis-guidance "Focus on game results and player statistics"You can also add the topic as a workflow input for manual runs:
workflow_dispatch:
inputs:
topic:
description: 'Topic domain for analysis'
required: false
default: 'AI'After the workflow runs successfully:
- Your site will be available at:
https://[username].github.io/newsletter_summary/ - Reports are automatically organized by date
- The site includes:
- Dark/light mode toggle
- Filter by newsletter labels
- Clean, professional styling
- Mobile-responsive design
Common Issues:
-
"Permission denied" errors
- Already fixed in the workflow with proper permissions
- If issues persist, check Settings → Actions → General → Workflow permissions
-
"Invalid JSON" in secrets
- Use
prepare_github_secrets.pyto ensure proper formatting - Secrets must be valid JSON without any extra characters
- Use
-
"No newsletters found"
- Check that your Gmail token hasn't expired
- Verify the label exists in your Gmail account
- Try running locally first to ensure authentication works
-
Memory/Segmentation faults
- The workflow is optimized for GitHub Actions with:
- Sequential processing to ensure reliability
- Memory limits to prevent crashes
- Automatic retry logic with exponential backoff
- The workflow is optimized for GitHub Actions with:
Edit .github/workflows/generate-summary.yml to:
- Change schedule (modify the cron expression)
- Adjust default parameters
- Add additional steps
- Configure different deployment targets
Test the full automation locally:
# Generate a report with commit flag
python main.py --commit
# Push to trigger GitHub Pages build
git push origin mainContributions are welcome! Feel free to submit a Pull Request.
This project is available under the MIT License.