automated-product-research

Discover and analyze online products or services using keyword-driven search, automated scraping, NLP-based feature extraction, and LLMs descriptions.

P.S: This tools is not fully born yet.

Todo

Python Installation


pip install google-api-python-client beautifulsoup4 pandas requests spacy
python -m spacy download en_core_web_sm
pip install isort black
pip install tldextract
pip install transformers torch
pip install playwright
playwright install

Project Structure

automated-product-research/
│
├── config/
│   └── settings.py             # API keys, search config, paths
│
├── core/
│   ├── search_google.py        # Google CSE logic
│   ├── extractor.py            # NLP, metadata, contact extraction
│   ├── scraper.py              # Page scraping logic
│   ├── csv_writer.py           # CSV and progress tracking
│
├── run/
│   └── run_scraper.py          # Entrypoint script
│
├── data/
│   ├── ai_edtech_results.csv   # Output
│   └── scrape_progress.csv     # Progress tracker
│
├── app/                        # Future Streamlit UI
│   └── streamlit_ui.py
│
├── tests/                      # Unit tests
│
├── requirements.txt
└── README.md

Google Search Engine:

Custom Search API

Potential Columns

 {
    "keyword_category": keyword_category,
    "keyword": keyword,
    "last_updated": datetime.utcnow().isoformat(),
    "website_title": website_title,
    "website_url": url,
    "search_country": country_code.upper(),
    # "email": email,
    **seo,
    # "raw_text": raw_text,
    # "address": address,
    # "phone_number": phone,
    # "target_audience": ", ".join(audience),
    # "delivery_platform": ", ".join(
    #     [f for f in features if f in ["web app", "LMS", "plugin", "mobile app"]]
    # ),
    # "integrations": ", ".join(
    #     [f for f in features if f not in ["web app", "LMS", "plugin", "mobile app"]]
    # ),
    # "raw_homepage_text": raw_text,
    # "llm_summary": "",
    # "business_description_point_1": "",
    # "business_description_point_2": "",
    # "business_description_point_3": "",
    # "business_category_tags": "",
    # "pricing_info": "",
    # "product_stage": "",
    # "funding_info": "",
    # "partner_names": "",
    # "scrape_notes": "",
    # "has_product_signals": signals_flag,
    # "spacy_product_score_flag": spacy_flag,
    # "is_potential_product": signals_flag,
    # "website_summary": website_summary,
    # "website_classification": website_classification,
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
app		app
config		config
core		core
data/in		data/in
run		run
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

automated-product-research

Todo

Python Installation

Project Structure

Google Search Engine:

Potential Columns

About

Uh oh!

Releases

Packages

Languages

License

reachusama/automated-product-research

Folders and files

Latest commit

History

Repository files navigation

automated-product-research

Todo

Python Installation

Project Structure

Google Search Engine:

Potential Columns

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages