Skip to content

Nethra1/website_summarizer

Repository files navigation

Recipe Summarizer (FastAPI + OpenAI/Gemini)

This project scrapes a food recipe web page and summarizes its content using your configured LLM. It is specialized to handle cooking recipes and provides chef-level, easy-to-understand summaries. If the provided website is not a food recipe, the service will inform you it cannot summarize non-recipe content.

  • A production-ready FastAPI service with Basic Authentication
  • A reusable summarization module that summarizes recipes with a "multicuisine chef" persona
  • A Jupyter notebook for experimentation

Project Structure

  • api_service.py: FastAPI app exposing POST /v1/summarize.
  • website_summarizer.py: Orchestrates recipe scraping and calls the LLM.
  • scraper.py: Extracts website text (Selenium/BS4 based).
  • practice.ipynb: Sample notebook usage.
  • pyproject.toml: Dependencies managed by uv.

Requirements

  • Python 3.12+
  • uv (recommended) or pip
  • Chrome/Chromedriver available for Selenium (if using Selenium paths)

Setup

  1. Install dependencies:
uv sync
  1. Create a .env in the project root and set keys:
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=... 
API_USERNAME=your_user
API_PASSWORD=your_pass

Notes:

  • You can use OpenAI or Gemini via the existing code paths. Ensure keys are set accordingly.

Run the API

uv run uvicorn api_service:app --host 0.0.0.0 --port 8000

Health check:

curl -s http://localhost:8000/health

Summarize a recipe (POST JSON, Basic Auth):

curl -X POST "http://localhost:8000/v1/summarize" \
  -H "Authorization: Basic $(printf '%s' 'your_user:your_pass' | base64)" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.food.com/recipe/creamy-garlic-penne-pasta-43023"}'

Response:

{
  "url": "https://www.food.com/recipe/creamy-garlic-penne-pasta-43023",
  "summary": "..."
}

Notebook Usage

  • Launch Jupyter with the same environment:
uv run jupyter notebook
  • In the notebook, avoid print(display(Markdown(...))); use display(Markdown(...)) or return Markdown(...) as the last cell expression.

Troubleshooting

  • If curl appears to hang:
    • Ensure server is running and reachable: lsof -i :8000
    • Try IPv4 loopback explicitly: curl -v http://127.0.0.1:8000/health
    • Kill previous processes on the same port: kill -9 $(lsof -t -i :8000)
  • Selenium timeouts: ensure Chrome is installed and accessible; adjust waits in scraper.py.

Security

  • Basic Auth is required when API_USERNAME/API_PASSWORD are set.
  • Tighten CORS and logging in production; use a process manager (systemd, Docker, or similar).

License

MIT (or your preferred license).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published