Artificial-Intelligence-Based-Web-Scraper

This project demonstrates an AI-powered web scraper using Selenium, BeautifulSoup, LangChain, and Ollama to scrape and parse website content. It leverages the capabilities of the llama3 model to extract specific information based on user-provided descriptions.

Features

Web Scraping: Scrapes websites using Selenium and BeautifulSoup to extract the DOM content.
Content Cleaning: Removes unnecessary scripts and styles, providing clean and readable content.
Content Parsing: Uses LangChain with the Ollama model to extract specific information from the scraped content based on user-defined descriptions.
Interactive UI: Built with Streamlit, allowing users to interactively scrape and parse content from any website.

.
├── main.py           # Streamlit app for interacting with the scraper and parser
├── scrape.py         # Web scraping logic (Selenium, BeautifulSoup)
├── parse.py          # Content parsing logic using LangChain and Ollama
├── sample.env        # Environment variables (SBR_WEBDRIVER)
├── requirements.txt  # Required Python libraries

Technologies Used

Selenium: Web browser automation tool for scraping websites.
BeautifulSoup: Web scraping library for parsing and extracting data from HTML documents.
LangChain: A framework for building language model-powered applications.
Ollama: A pre-trained language model (llama3) used for content parsing.
Streamlit: Framework for building interactive web applications.
python-dotenv: For loading environment variables from a .env file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
sample.env		sample.env
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artificial-Intelligence-Based-Web-Scraper

Features

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Chukwuemeka-James/Artificial-Intelligence-Based-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

Artificial-Intelligence-Based-Web-Scraper

Features

Technologies Used

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages