Skip to content

Scrapely is a customizable web scraper built using Java and Jsoup that allows users to extract structured data from web pages. The program takes a website URL, CSS query, target elements, and optional attributes as inputs and saves the extracted data in a JSON file.

Notifications You must be signed in to change notification settings

adarshpandey18/java-web-scrapper

Repository files navigation

Scrapely - A Web Scraping Tool

Scrapely is a simple and customizable web scraper built using Java and Jsoup. It allows users to extract structured data from web pages based on CSS selectors and stores the results in a JSON file.

🚀 Features

Extract Specific Elements – Users can define which elements to scrape using CSS selectors.
Attribute Extraction – Option to extract specific attributes (e.g., href, src, alt).
Automatic JSON Output – Saves extracted data in a structured JSON file.
User Input Handling – Fully interactive, taking inputs via the command line.

🛠️ Tech Stack

  • Java – Core programming language
  • Jsoup – HTML parsing & web scraping library

📌 How to Use

1️⃣ Clone the Repository

git clone https://github.com/yourusername/scrapely.git
cd scrapely

2️⃣ Compile and Run

Make sure you have Java 8+ installed.

javac -cp .:jsoup-1.13.1.jar Main.java WebScraper.java
java -cp .:jsoup-1.13.1.jar Main

3️⃣ Provide Inputs

When prompted, enter the required details:

  • Website URL
  • CSS Query (class/tag)
  • Element to extract
  • (Optional) Attribute to extract

4️⃣ Check Output

The extracted data is stored in a JSON file named after the website title. Example output:

[
  { "text": "The Great Gatsby" },
  { "text": "1984" },
  { "text": "To Kill a Mockingbird" }
]

📜 License

This project is licensed under the MIT License.

About

Scrapely is a customizable web scraper built using Java and Jsoup that allows users to extract structured data from web pages. The program takes a website URL, CSS query, target elements, and optional attributes as inputs and saves the extracted data in a JSON file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages