This project automatically generates a short company brochure from a website.
It scrapes the company’s landing page and relevant links (About, Careers, etc.), then summarizes them into a structured Markdown brochure.
The system supports two backends for the LLMs:
- Ollama (local model inference, e.g., LLaMA 3.2)
- OpenAI API (cloud-based model, e.g., GPT-4o-mini)
├── website.py # Scrapes webpages and extracts text + links
├── ollamaLinksExtractor.py # Extracts relevant links using Ollama
├── openAILinksExtractor.py # Extracts relevant links using OpenAI
├── run_w_ollama.py # Full pipeline with Ollama
├── run_w_open_api.py # Full pipeline with OpenAI
├── requirements.txt # Python dependencies
├── pipeline_diagram_two_llms.png # Pipeline diagram (with 2 LLM stages)
- Clone this repository
- Create a virtual environment:
python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
- Install requirements:
pip install -r requirements.txt
- (For OpenAI only) Add your API key to a
.envfile:OPENAI_API_KEY=sk-proj-xxxx
Before running, make sure you have pulled the model you want to use (for example, LLaMA 3.2):
ollama pull llama3.2Then run:
python run_w_ollama.pypython run_w_open_api.pyBoth scripts will:
- Scrape the target website
- LLM #1 filters the relevant links (About, Careers, etc.)
- Fetch contents of those links
- LLM #2 generates a company brochure in Markdown format
- Save the result as
<CompanyName>_brochure.md
The workflow uses two LLM stages: one for link filtering and another for brochure generation.
After running the pipeline on https://huggingface.co, you’ll get a Markdown file like:
=== Generated Company Brochure ===
# Hugging Face
## About
...
## Careers
...
