Scrapely is a simple and customizable web scraper built using Java and Jsoup. It allows users to extract structured data from web pages based on CSS selectors and stores the results in a JSON file.
✅ Extract Specific Elements – Users can define which elements to scrape using CSS selectors.
✅ Attribute Extraction – Option to extract specific attributes (e.g., href
, src
, alt
).
✅ Automatic JSON Output – Saves extracted data in a structured JSON file.
✅ User Input Handling – Fully interactive, taking inputs via the command line.
- Java – Core programming language
- Jsoup – HTML parsing & web scraping library
git clone https://github.com/yourusername/scrapely.git
cd scrapely
Make sure you have Java 8+ installed.
javac -cp .:jsoup-1.13.1.jar Main.java WebScraper.java
java -cp .:jsoup-1.13.1.jar Main
When prompted, enter the required details:
- Website URL
- CSS Query (class/tag)
- Element to extract
- (Optional) Attribute to extract
The extracted data is stored in a JSON file named after the website title. Example output:
[
{ "text": "The Great Gatsby" },
{ "text": "1984" },
{ "text": "To Kill a Mockingbird" }
]
This project is licensed under the MIT License.