Awesome Web Scraping 2026

A curated list of web scraping tools, frameworks, libraries, and APIs for 2026. Maintained weekly.

⭐ Star this repo to keep it in your bookmarks — new tools added every week.

📖 Need a custom scraper? Get a production-grade scraper built in 48 hours — $250 flat rate. Get a free quote →

🚀 Skip the scraping — I've built 78+ ready-made scrapers for Reddit, HN, Google, LinkedIn, Amazon, and more. Browse Apify actors → | Need something custom? Email spinov001@gmail.com

Frameworks & Libraries
- Python
- JavaScript / TypeScript
- Go
- Ruby
- Rust
- PHP
Browser Automation
Headless Browsers
Anti-Detection & Stealth
Proxy Services
CAPTCHA Solving
Cloud Scraping Platforms
AI-Powered Scraping
E-Commerce & Price Monitoring
Free APIs (No Scraping Needed)
Pre-Built Scrapers (Apify Store)
Job Boards & Company Data
Government & Public Data
Data Parsing & Extraction
Anti-Bot Detection
Scraping Infrastructure
Legal & Ethics
Tutorials & Articles
Related Awesome Lists

💡 Need data from ANY website? I build custom scrapers and data pipelines — fast, reliable, anti-detection built in. Get a quote → or check out my ready-made scrapers on Apify Store.

Quick Comparison: Which Tool Should You Use?

Need	Best Tool	Why
Simple HTML parsing	BeautifulSoup	Easiest API, handles broken HTML
Large-scale crawling	Scrapy	Built-in queuing, middlewares, pipelines
JavaScript-rendered pages	Playwright	Best browser automation, anti-detection
Full scraping framework (JS)	Crawlee	Handles browser + HTTP, auto-scaling
Speed over everything	spider (Rust)	20-100x faster than Python alternatives
No-code scraping	Apify or Portia	Visual tools, no programming needed
LLM-ready data	Firecrawl or Crawl4AI	Output as markdown for AI pipelines
Avoid scraping entirely	Free APIs	Structured JSON, no parsing, no breakage

Python Framework Comparison

Feature	Scrapy	BeautifulSoup	Requests-HTML	Crawlee (Python)
Async	✅ Twisted	❌	✅	✅ asyncio
JS Rendering	Plugin	❌	✅ built-in	✅ Playwright
Rate Limiting	✅ built-in	Manual	Manual	✅ built-in
Export (JSON/CSV)	✅ built-in	Manual	Manual	✅ built-in
Learning Curve	Medium	Low	Low	Medium
Best For	Production crawlers	Quick scripts	Simple pages + JS	Modern async scraping

Browser Automation Comparison

Feature	Playwright	Puppeteer	Selenium
Languages	Python, JS, Java, C#	JS only	All major
Browsers	Chromium, Firefox, WebKit	Chrome only	All
Speed	Fast	Fast	Slower
Anti-Detection	Best	Good (with stealth)	Poor
Mobile Testing	✅	Limited	✅
Auto-Wait	✅	Manual	Manual
Community	Growing fast	Large	Largest
Best For	Modern scraping	Chrome-only projects	Legacy systems

Frameworks & Libraries

Python

Tool	Stars	Description
Scrapy	53k+	The most popular Python scraping framework. Async, middlewares, pipelines, built-in export.
BeautifulSoup	—	HTML/XML parser. Simple API, forgiving of bad markup. Use with `requests`.
Requests-HTML	13k+	Pythonic HTML parsing with JS rendering support via Chromium.
httpx	13k+	Modern async HTTP client. HTTP/2 support, better than `requests` for scraping.
Parsel	1k+	CSS + XPath selector library extracted from Scrapy.
MechanicalSoup	4k+	Stateful web browsing (form submission, cookies) — like a human clicking.
Grab	2k+	Web scraping framework. Network requests, DOM parsing, spider.
Selectolax	1k+	Fast HTML parser (10-20x faster than lxml). C-level speed.
gazpacho	700+	Simple, modern web scraping. Minimal API surface.
Crawlee (Python)	5k+	Apify's scraping framework for Python. BeautifulSoup + Playwright crawlers.
curl_cffi	3k+	Python bindings for curl-impersonate. TLS fingerprint impersonation.
botasaurus	4k+	All-in-one scraping framework: browser, anti-detect, caching, parallel.
Playwright for Python	12k+	Official Playwright Python bindings. Cross-browser automation.
aiohttp	15k+	Async HTTP client/server. Great for high-concurrency scraping.
Scrapling	20k+	Adaptive parsing — auto-relocates elements after page updates. 10x faster JSON.

JavaScript / TypeScript

Tool	Stars	Description
Crawlee	15k+	Full-featured scraping framework by Apify. Cheerio, Playwright, Puppeteer crawlers.
Cheerio	28k+	Fast jQuery-like HTML parser for Node.js. No browser needed.
node-crawler	7k+	HTTP crawler with jQuery-style selectors, rate limiting, retries.
x-ray	6k+	Declarative web scraping with schema definitions.
Apify SDK	4k+	Toolkit for building Apify actors — storage, proxies, queue.
got-scraping	600+	HTTP client with anti-fingerprinting. Built-in header generation.
Axios	106k+	Promise-based HTTP client. Great for API-based scraping.

Go

Tool	Stars	Description
Colly	23k+	Fast and elegant scraping framework for Go.
goquery	14k+	jQuery-like HTML selector in Go.
Ferret	6k+	Declarative web scraping with FQL query language.
Geziyor	2k+	Fast web scraping with concurrent requests and caching.
chromedp	11k+	Chrome DevTools Protocol client for Go. Headless browser control.

Ruby

Tool	Stars	Description
Nokogiri	6k+	HTML/XML parser, industry standard for Ruby.
Mechanize	4k+	Automated web interaction (clicks, forms, cookies).
Kimurai	1k+	Modern Ruby web scraping framework.

Rust

Tool	Stars	Description
spider	3k+	Fastest web crawler. Written in Rust, 20-100x faster.
reqwest	10k+	Ergonomic HTTP client for Rust with async support.
scraper	2k+	CSS selector-based HTML parser for Rust.

PHP

Tool	Stars	Description
Goutte	9k+	Screen scraping and web crawling library for PHP.
Roach	2k+	Scrapy-inspired web scraping for PHP.
Panther	3k+	Browser testing and scraping with real browsers in PHP.

Browser Automation

Tool	Stars	Description
Playwright	68k+	Cross-browser automation by Microsoft. Chromium, Firefox, WebKit. Best anti-detection.
Puppeteer	89k+	Chrome automation by Google. Mature ecosystem.
Selenium	31k+	The OG browser automation. Supports all browsers.
Cypress	47k+	Testing-focused but works for scraping interactive SPAs.
Rod	5k+	Playwright/Puppeteer alternative for Go. DevTools Protocol.
Splash	4k+	Lightweight browser as a service. JS rendering via HTTP API.

Headless Browsers

Tool	Description
Browserless	Chrome as a service. Docker-ready. Free self-hosted.
chrome-headless-shell	Official Google headless Chrome. Smallest footprint.
Playwright Docker	Official Playwright Docker images with all browsers.

Anti-Detection & Stealth

Tool	Stars	Description
undetected-chromedriver	10k+	Patched ChromeDriver that passes bot detection.
puppeteer-extra-stealth	12k+	Plugin bundle to evade detection (WebGL, navigator, etc.)
curl-impersonate	13k+	curl that impersonates Chrome/Firefox TLS fingerprint.
Camoufox	5k+	Anti-detect Firefox browser for scraping.
playwright-stealth	1k+	Stealth plugin for Playwright Python. Evade fingerprinting.
nodriver	3k+	Next-gen undetected browser automation. Successor to undetected-chromedriver.
Rebrowser	1k+	Patches for Playwright/Puppeteer to fix automation leaks.

Proxy Services

Service	Free Tier	Description
Bright Data	Trial	72M+ residential IPs. Enterprise grade.
Oxylabs	Trial	Residential and datacenter proxies.
ScraperAPI	1000 free	API that handles proxies and CAPTCHAs.
Smartproxy	Trial	65M+ residential proxies.
IPRoyal	—	Budget residential proxies from $1.75/GB.
Proxy-Seller	—	Datacenter & residential proxies in 220+ countries. IPv4/IPv6, SOCKS5. Use code `SPINOV15` for 15% off.

CAPTCHA Solving

Service	Price	Description
2Captcha	$1-3/1000	Human-powered CAPTCHA solving API.
Anti-Captcha	$1-2/1000	reCAPTCHA, hCaptcha, image CAPTCHA.
CapSolver	$0.8/1000	AI-powered CAPTCHA solving.

Cloud Scraping Platforms

Platform	Free Tier	Description
Apify	$5/mo free	Run scrapers in cloud. 2000+ pre-built actors. Proxies included.
ScrapingBee	1000 free	API: send URL, get HTML. JS rendering, proxies.
Firecrawl	500 free	Turn websites into LLM-ready markdown. Great for AI.
Crawl4AI	Open source	LLM-friendly web crawler. Markdown extraction.
ScrapeGraphAI	Open source	AI-powered scraping — just describe what you want.
Browserbase	Free tier	Headless browser infrastructure. API-based.
Zyte (Scrapy Cloud)	Free tier	Cloud-based Scrapy deployment + smart proxy. By Scrapy creators.
Agenty	Free tier	No-code cloud scraping. Point-and-click extractors.

AI-Powered Scraping (2026 Trend)

Tools that use LLMs to extract data — describe what you want, get structured output:

Tool	Stars	Description
ScrapeGraphAI	18k+	Describe extraction in plain English. Uses LLMs to parse HTML.
Crawl4AI	50k+	LLM-friendly crawler. Outputs clean markdown. Async, fast.
Firecrawl	70k+	Turn any website into LLM-ready markdown. API + self-hosted.
Jina Reader	8k+	Convert URLs to LLM-friendly text. Free API: `r.jina.ai/URL`.
Scrapfly	—	Web scraping API with AI extraction, anti-bot bypass.
Browserless	8k+	Chrome as a service. Great for LLM agent workflows.

The trend: In 2026, more developers use LLMs to extract data instead of writing CSS selectors. These tools bridge the gap.

E-Commerce & Price Monitoring

Tool	Target	Description
Amazon Product API	Amazon	Official Product Advertising API. Requires affiliate account.
Keepa	Amazon	Price history tracking. API available ($20/mo).
CamelCamelCamel	Amazon	Free price tracker, browser extension.
PriceAPI	Multi	Product data from 1000+ retailers. Enterprise.
Diffbot	Any	AI-powered product extraction. Free tier.
Amazon Scraper (Apify)	Amazon	750K+ users. Product data, reviews, prices.
Walmart Scraper (Apify)	Walmart	Products, prices, reviews.

Tip: For price monitoring, combine scraping with cron jobs (GitHub Actions = free) and alert via email/Slack when prices change.

Free APIs (No Scraping Needed)

IP-API — IP geolocation (country, city, ISP) — no key needed
Open-Meteo — Weather forecasts and historical data — no key needed
ExchangeRate-API — Currency conversion rates for 160+ currencies — no key needed

Why scrape when you can use official APIs? These require no API key:

API	Data	Rate Limit
Reddit JSON	Posts, comments, subreddits	~60/min
Hacker News	Stories, comments, users	~1/sec
YouTube Innertube	Comments, transcripts, channels	No hard limit
Wikipedia	Articles, summaries, media	200/sec
arXiv	2M+ research papers	1/3sec
npm Registry	Package metadata	No hard limit
PyPI JSON	Python package info	No hard limit
GitHub REST	Repos, users, issues	60/hr unauth
Open-Meteo	Weather forecasts	Unlimited
CoinGecko	Crypto prices	30/min
Crossref	150M+ academic papers	50/sec
RDAP	Domain WHOIS data	Varies

📚 Full list: 300+ Free APIs →

Pre-Built Scrapers (Apify Store)

Ready-to-use scrapers — no code required. Run on Apify free tier.

Scraper	Method	Data
Reddit Scraper	JSON API	Posts, comments, scores
YouTube Comments	Innertube	Comments without API key
YouTube Transcript	Captions XML	Subtitles and captions
Hacker News	Firebase	Stories and comments
Trustpilot Reviews	JSON-LD	Reviews via structured data
Google News	RSS	15 languages
SEO Audit	Multi	50+ on-page factors
Email Extractor	HTML	Emails, phones, socials
Tech Stack Detector	Headers+JS	80+ technologies
Bluesky Scraper	AT Protocol	Profiles and posts

🔍 All 78 scrapers →

Job Boards & Company Data

Tool	Target	Description
LinkedIn Scraper (Apify)	LinkedIn	Profiles, companies, jobs. Requires login.
Indeed Scraper (Apify)	Indeed	Job listings, salary data, company reviews.
Glassdoor Scraper (Apify)	Glassdoor	Reviews, salaries, interviews.
Google Maps Scraper (Apify)	Google Maps	Business data, reviews, phone, hours. 500K+ users.
Crunchbase API	Crunchbase	Startup data, funding, investors. Paid.
Hunter.io	Any domain	Find email addresses. 25 free/mo.
Apollo.io	Any company	Contact data, org charts. Free tier.

Government & Public Data

Source	Data	Access
data.gov	US government datasets	Free API + bulk download
EU Open Data	EU datasets	Free API
SEC EDGAR	Company filings	Free API
USPTO	Patent data	Free API
OpenStreetMap	Geographic data	Free API
World Bank	Economic indicators	Free API
FRED	Economic data	Free API key

Data Parsing & Extraction

Tool	Stars	Description
lxml	2k+	Fastest XML/HTML parser for Python. XPath + XSLT.
Readability	8k+	Firefox's reader mode as a library. Extract article content.
Trafilatura	3k+	Extract main text from web pages. Removes boilerplate.
newspaper3k	14k+	Article scraping and NLP. Titles, authors, text, images.
extruct	800+	Extract JSON-LD, Microdata, OpenGraph from HTML.
markdownify	1k+	Convert HTML to Markdown. Great for LLM pipelines.
html2text	2k+	Convert HTML to clean Markdown. Handles complex layouts.
jusText	500+	Remove boilerplate from HTML. Extract just article text.
dateparser	2k+	Parse dates in any format/language. Essential for scraping.
price-parser	300+	Extract price and currency from any string. By Zyte.

Anti-Bot Detection

Tools to test your scraper against detection (for authorized testing only):

Tool	Description
CreepJS	Browser fingerprint test — see what sites see about you.
Fingerprint.com	Browser fingerprinting service.
BotD	2k+
Sannysoft Test	Check what automation signals your browser leaks.
Incolumitas Bot Test	Advanced bot detection test — TLS, JS, canvas fingerprint.

Scraping Infrastructure

Tool	Stars	Description
Scrapyd	3k+	Deploy and run Scrapy spiders as a service.
Gerapy	3k+	Distributed Scrapy management with Django UI.
Portia	9k+	Visual scraping tool — point and click, no code.
Scrapy-Redis	5k+	Distributed Scrapy with Redis. Scale to millions of pages.
Frontera	1k+	Large-scale web crawling frontier. URL management and scheduling.
Scrapy-Splash	2k+	Scrapy + Splash integration for JS rendering in pipelines.
Scrapy-Playwright	1k+	Playwright integration for Scrapy. Modern JS rendering.

Legal & Ethics

Before scraping, know the rules:

Topic	Key Points
robots.txt	Always check. Respect `Disallow` directives. Not legally binding but shows good faith.
Rate Limiting	Never DDoS. Add delays between requests. 1 req/sec is a safe default.
Terms of Service	Some sites explicitly prohibit scraping. Violating ToS can have legal consequences.
Personal Data (GDPR)	Scraping personal data in the EU requires a lawful basis. Be careful with names, emails, etc.
CFAA (US)	The Computer Fraud and Abuse Act can apply. Key case: hiQ v. LinkedIn (public data is generally OK).
Copyright	Scraped content may be copyrighted. Extraction is usually OK; republishing is not.
API Terms	Even free APIs have terms. Read them — especially about commercial use.

Rule of thumb: If the data is publicly available, not behind a login, and you respect rate limits — you're probably fine. When in doubt, use the official API.

Resources:

Tutorials & Articles

📖 Need a custom scraper or data pipeline? Email me — I build production-grade scrapers with anti-detection built in. Check my ready-made scrapers on Apify.

GitHub Actions for Scheduled Scraping — Run scrapers for free on a schedule
Docker for Web Scraping — Containerize your scrapers for consistency
SQLite for Scraped Data — Lightweight storage for scraped datasets
Google Dorking Cheatsheet — Advanced search operators for research
Hetzner Cloud — Affordable servers for running scrapers at scale
Neon Serverless Postgres — Free tier database for storing scraped data

Related Awesome Lists

awesome-web-scraping — The original awesome web scraping list
awesome-crawler — Web crawler tools by language
awesome-free-apis-2026 — 300+ free APIs, no key needed
awesome-data-engineering-2026 — 150+ data engineering tools
awesome-mcp-servers-2026 — MCP servers for AI agents
ai-market-research-reports — 506 AI-generated market research reports (1,600+ clones)
sqlite-vector-search-tutorial — Semantic search with SQLite + vectors (no server needed)
openalex-python-tutorial — Search 250M+ academic papers via API (no key needed)

Starter Template

python-web-scraping-starter — Clone → install → scrape in 5 minutes. API-first with Playwright fallback.

Need Custom Scraping?

I've built 78+ production scrapers. I can extract data from any website — e-commerce, real estate, job boards, social media — with anti-detection, proxy rotation, and structured JSON/CSV output.

What you get: Working scraper in 24-48h, hosted on Apify (free tier available), with monitoring and auto-retry.

📧 Spinov001@gmail.com — describe your data need, get a free quote within 2 hours. First 3 clients this month get priority delivery.

💳 Pay securely via Payoneer → — custom scraper $250 flat rate. Delivered in 48 hours, no hourly surprises.

🔧 Browse 78+ ready-made scrapers → — Reddit, HN, Google, Amazon, and more. Deploy in 1 click, no coding required.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github		.github
covers		covers
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Web Scraping 2026

Contents

Quick Comparison: Which Tool Should You Use?

Python Framework Comparison

Browser Automation Comparison

Frameworks & Libraries

Python

JavaScript / TypeScript

Go

Ruby

Rust

PHP

Browser Automation

Headless Browsers

Anti-Detection & Stealth

Proxy Services

CAPTCHA Solving

Cloud Scraping Platforms

AI-Powered Scraping (2026 Trend)

E-Commerce & Price Monitoring

Free APIs (No Scraping Needed)

Pre-Built Scrapers (Apify Store)

Job Boards & Company Data

Government & Public Data

Data Parsing & Extraction

Anti-Bot Detection

Scraping Infrastructure

Legal & Ethics

Tutorials & Articles

Related Awesome Lists

Starter Template

Need Custom Scraping?

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages