Releases: brightdata/sdk-python
v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite
Bright Data SDK Release Notes (vX.Y.Z)
We are excited to announce the latest release of the Bright Data SDK! This update brings major new capabilities to our Web Scraper API, introduces new supported targets, and includes a massive under-the-hood cleanup to improve stability, maintainability, and test coverage.
New Features
- Scraper Studio Integration: You can now seamlessly trigger and fetch results directly from your custom scrapers built within Bright Data's IDE.
- New Built-in Scrapers: Added official out-of-the-box support for DigiKey and Reddit scrapers.
Bug Fixes
- Scraping Reliability: Resolved a critical issue that caused a crash when calling
ScrapeJob.to_result(), ensuring smoother data extraction workflows.
Maintenance & Code Quality
- Codebase Cleanup: We've performed a major spring cleaning, removing dead code and deprecating legacy modules. This resulted in a significantly leaner SDK with a net reduction of 12,000 lines of code.
- Enhanced Test Coverage: Added 365 new unit tests utilizing shared fixtures. Our key modules now boast robust test coverage ranging from 87% to 98%, ensuring greater reliability for future updates.
v2.2.1 — Datasets API with 100+ Integrations
What's New
Datasets API
Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.
- Callable datasets — trigger snapshots directly: await client.datasets.imdb_movies(filter=..., records_limit=5)
- sample() method — quick data sampling without specifying filters
- get_metadata() — discover available fields and types per dataset
- Export utilities — export_json() , export_csv() , export_jsonl()
Supported Categories
E-commerce (Amazon, Walmart, Shopee, Zalando, Zara, H&M, IKEA, Shein, Sephora), Social media (Instagram, TikTok, Pinterest,
YouTube, Facebook), Business intelligence (ZoomInfo, PitchBook, Owler, Slintel, G2, Trustpilot), Jobs & HR (Glassdoor, Indeed,
Xing), Real estate (Zillow, Airbnb + 8 regional), Luxury brands (Chanel, Dior, Prada, Hermes, YSL), Entertainment (IMDB, NBA,
Goodreads), and more.
Fixes
- LinkedIn search tests updated to match pythonic parameter names ( first_name / last_name instead of firstName / lastName )
v2.1.1 - Instagram Scrapers & Version Centralization
What's New
Instagram Scraper Support
| Method | Description |
|---|---|
| client.scrape.instagram.profiles/posts/reels/comments(url) | Extract data from URL |
| client.search.instagram.profiles(user_name) | Find profile by username |
| client.search.instagram.posts/reels/reels_all(url, ...) | Discover content with filters |
Improvements
- Version centralization - Single source of truth in pyproject.toml
- Bug fix - Discovery endpoints now correctly include type=discover_new&discover_by=... query params
Full Changelog: v2.1.0...v2.1.1
v2.1.0 - Async Mode for SERP and Web Unlocker
What's New
Async Mode
Non-blocking async mode for SERP and Web Unlocker APIs using mode="async":
SERP
result = await client.search.google(query="python", mode="async")
Web Unlocker
result = await client.scrape_url(url="https://example.com", mode="async")
How it works: Triggers request → gets response_id → polls until ready
Bug Fixes
- Fix SyncBrightDataClient: remove unused customer_id parameter
- Fix default poll_timeout for Web Unlocker async mode
API Changes
- Remove _async suffix from method names (products() instead of products_async())
- Remove GenericScraper - use client.scrape_url() directly
Documentation
- Added docs/async_mode_guide.md
Full Changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md
v2.0.0 - Breaking Changes
🚀 v2.0.0 - Complete Architecture Rewrite
⚠️ Breaking Changes - Migration Required
This is a major breaking release requiring code changes. Python 3.9+ now required.
Client Initialization
# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")
# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")API Structure - Hierarchical Methods
# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")
# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")Platform-Specific Scraping
# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)Search Operations
# ❌ Old
results = client.search(query, search_engine="google")
# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)Async Support (New)
# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)
# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
result = await client.scrape_url_async(url)
# ✅ Async batch operations
async def scrape_multiple():
async with BrightDataClient(token="...") as client:
tasks = [client.scrape_url_async(url) for url in urls]
results = await asyncio.gather(*tasks)Manual Job Control (New)
# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
data = await job.fetch_async()Type-Safe Payloads (New)
# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}
# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
url="https://amazon.com/dp/B123",
reviews_count=100
)
result = client.scrape.amazon.products(payload)Return Types
# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data) # Actual scraped data
print(result.timing) # Performance metrics
print(result.cost) # Cost tracking
print(result.snapshot_id) # Job identifierCLI Tool (New)
# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3Configuration Changes
# ❌ Old
client = bdclient(
api_token="token", # Changed parameter name
auto_create_zones=True, # Default changed to False
web_unlocker_zone="sdk_unlocker", # Default changed
serp_zone="sdk_serp", # Default changed
browser_zone="sdk_browser" # Default changed
)
# ✅ New
client = BrightDataClient(
token="token", # Renamed from api_token
auto_create_zones=False, # New default
web_unlocker_zone="web_unlocker1", # New default name
serp_zone="serp_api1", # New default name
browser_zone="browser_api1", # New default name
timeout=30, # New parameter
rate_limit=10, # New parameter (optional)
rate_period=1.0 # New parameter
)✨ New Features
Platform Coverage
| Platform | Status | Methods |
|---|---|---|
| Amazon | ✅ NEW | products(), reviews(), sellers() |
| ✅ NEW | profiles(), posts(), comments(), reels() |
|
| ✅ NEW | posts(), comments(), groups() |
|
| ✅ Enhanced | Full scraping and search | |
| ChatGPT | ✅ Enhanced | Improved interaction |
| Google/Bing/Yandex | ✅ Enhanced | Dedicated services |
Performance
- ⚡ 10x better concurrency - Event loop-based architecture
- 🔌 Advanced connection pooling - 100 total, 30 per host
- 🎯 Built-in rate limiting - Configurable request throttling
✅ Upgrade Checklist
- Update Python to 3.9+
- Change imports:
bdclient→BrightDataClient - Update parameter:
api_token=→token= - Migrate method calls to hierarchical structure
- Handle new
ScrapeResult/SearchResultreturn types - Review zone configuration defaults
- Consider async for better performance
- Test in staging environment
📚 Resources
Full Changelog: v1.1.3...v2.0.0
v1.1.3
New Features:
- Added url parameter to extract function for direct URL specification
- Added output_scheme parameter for OpenAI Structured Outputs support
- Enhanced parse_content to auto-detect multiple results from batch operations
Improvements:
- Added user-agent headers to all dataset API requests for better tracking
- Improved schema validation for OpenAI Structured Outputs compatibility
- Updated examples with proper formatting
Bug Fixes:
- Fixed parse_content handling of multiple scraping results
- Fixed OpenAI schema validation requirements
v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements
New Features
- AI-Powered Extract Function: New
extract()function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries - LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval
Improvements
- Set sync=True as default for all LinkedIn scraping methods for better user experience
- Improved unit test coverage
- Enhanced error handling for LinkedIn API responses
Examples
- Added
extract_example.pydemonstrating AI-powered content extraction capabilities - Updated LinkedIn examples to showcase sync functionality
Technical Changes
- Use correct
/scrapeendpoint for synchronous LinkedIn requests - Pass dataset_id as URL parameter with proper flags
- Handle both 200 and 202 status codes appropriately
- Maintain backward compatibility for async operations
v1.1.1: Documentation Updates & Bug Fixes
Updates
- Enhanced README with examples for
crawl(),parse_content(), andconnect_browser()functions - Added complete client parameter documentation
- Fixed browser connection example import issues
- Improved CI workflow for PyPI package testing
Bug Fixes
- Fixed missing Playwright import in browser example
- Corrected example URL typo
- Updated test workflow to prevent PyPI race conditions
v1.1.0: Web Crawling, Content Parsing & Browser Automation
New Features
🕷️ Web Crawling
- crawl() function for discovering and scraping multiple pages from websites
- Advanced filtering with regex patterns for URL inclusion/exclusion
- Configurable crawl depth and sitemap handling
- Custom output schema support
🔍 Content Parsing
- parse_content() function for extracting useful data from API responses
- Support for text extraction, link discovery, and image URL collection
- Handles both JSON responses and raw HTML content
- Structured data extraction from various content formats
🌐 Browser Automation
- connect_browser() function for Playwright/Selenium integration
- WebSocket endpoint generation for scraping browser connections
- Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
- Seamless authentication with Bright Data's browser service
Improvements
📡 Better Async Handling
- Enhanced download_snapshot() with improved 202 status code handling
- Friendly status messages instead of exceptions for pending snapshots
- Better user experience for asynchronous data processing
🔧 Robust Error Handling
- Fixed zone creation error handling with proper exception propagation
- Added retry logic for network failures and temporary errors
- Improved zone management reliability
🐍 Python Support Update
- Updated to support Python 3.8+ (removed Python 3.7)
- Updated CI/CD pipeline for modern Python versions
- Added BeautifulSoup4 as core dependency
Dependencies
- Added: beautifulsoup4>=4.9.0 for content parsing
- Updated: Python compatibility to >=3.8
Examples
New example files demonstrate the enhanced functionality:
examples/crawl_example.py- Web crawling usageexamples/browser_connection_example.py- Browser automation setupexamples/parse_content_example.py- Content parsing workflows
Release v1.0.7: LinkedIn Integration & Enhanced APIs
🚀 Major Features
LinkedIn Data Integration
- New
scrape_linkedinclass: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts - New
search_linkedinclass: Advanced LinkedIn content discovery with keyword and URL-based search - Production-ready examples: Ready-to-use examples for all LinkedIn functionality
Enhanced ChatGPT API
- Renamed to
search_chatGPT: More intuitive naming for ChatGPT interactions - Sync/Async support: Choose between immediate results or background processing
- Improved NDJSON parsing: Better handling of multi-response data
Improved Architecture
- Modular design: Separated download functionality into dedicated module
- Better code organization: Specialized API modules for different services
- Production optimizations: Cleaner code with improved performance
🔧 API Enhancements
New LinkedIn Methods
# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)
# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")Enhanced ChatGPT API
# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)
# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)🛠️ Technical Improvements
- Better error handling: Enhanced validation and error messages
- Backward compatibility: All existing code continues to work
- Performance optimizations: Faster processing and reduced memory usage
- Production-ready code: Clean, efficient, and maintainable codebase
📝 Breaking Changes
scrape_chatGPT()renamed tosearch_chatGPT()(maintains same functionality)- Added
syncparameter to ChatGPT API (defaults toTrue)
🐛 Bug Fixes
- Fixed NDJSON response parsing for multi-line JSON data
- Improved parameter validation across all APIs
- Enhanced timeout handling for long-running requests
📚 Documentation
- Updated examples with new LinkedIn functionality
- Enhanced docstrings for all new methods
- Added comprehensive usage examples