09 Mar 15:23

43504e6

v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite Latest

Latest

Bright Data SDK Release Notes (vX.Y.Z)

We are excited to announce the latest release of the Bright Data SDK! This update brings major new capabilities to our Web Scraper API, introduces new supported targets, and includes a massive under-the-hood cleanup to improve stability, maintainability, and test coverage.

New Features

Scraper Studio Integration: You can now seamlessly trigger and fetch results directly from your custom scrapers built within Bright Data's IDE.
New Built-in Scrapers: Added official out-of-the-box support for DigiKey and Reddit scrapers.

Bug Fixes

Scraping Reliability: Resolved a critical issue that caused a crash when calling ScrapeJob.to_result(), ensuring smoother data extraction workflows.

Maintenance & Code Quality

Codebase Cleanup: We've performed a major spring cleaning, removing dead code and deprecating legacy modules. This resulted in a significantly leaner SDK with a net reduction of 12,000 lines of code.
Enhanced Test Coverage: Added 365 new unit tests utilizing shared fixtures. Our key modules now boast robust test coverage ranging from 87% to 98%, ensuring greater reliability for future updates.

Assets 2

23 Feb 09:38

shahar-brd

v2.2.1

6699b56

v2.2.1 — Datasets API with 100+ Integrations

What's New

Datasets API

Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.

Callable datasets — trigger snapshots directly: ⁠ await client.datasets.imdb_movies(filter=..., records_limit=5) ⁠
⁠ sample() ⁠ method — quick data sampling without specifying filters
⁠ get_metadata() ⁠ — discover available fields and types per dataset
Export utilities — ⁠ export_json() ⁠, ⁠ export_csv() ⁠, ⁠ export_jsonl() ⁠

Supported Categories

E-commerce (Amazon, Walmart, Shopee, Zalando, Zara, H&M, IKEA, Shein, Sephora), Social media (Instagram, TikTok, Pinterest,
YouTube, Facebook), Business intelligence (ZoomInfo, PitchBook, Owler, Slintel, G2, Trustpilot), Jobs & HR (Glassdoor, Indeed,
Xing), Real estate (Zillow, Airbnb + 8 regional), Luxury brands (Chanel, Dior, Prada, Hermes, YSL), Entertainment (IMDB, NBA,
Goodreads), and more.

Fixes

LinkedIn search tests updated to match pythonic parameter names (⁠ first_name ⁠/⁠ last_name ⁠ instead of ⁠ firstName ⁠/⁠ lastName ⁠)

Assets 2

20 Jan 11:58

shahar-brd

v2.1.1

da52dee

v2.1.1 - Instagram Scrapers & Version Centralization

What's New

Instagram Scraper Support

Method	Description
⁠ client.scrape.instagram.profiles/posts/reels/comments(url) ⁠	Extract data from URL
⁠ client.search.instagram.profiles(user_name) ⁠	Find profile by username
⁠ client.search.instagram.posts/reels/reels_all(url, ...) ⁠	Discover content with filters

Improvements

Version centralization - Single source of truth in ⁠ pyproject.toml ⁠
Bug fix - Discovery endpoints now correctly include ⁠ type=discover_new&discover_by=... ⁠ query params

Full Changelog: v2.1.0...v2.1.1

Assets 2

07 Jan 11:53

shahar-brd

v2.1.0

37d502a

v2.1.0 - Async Mode for SERP and Web Unlocker

What's New

Async Mode

Non-blocking async mode for SERP and Web Unlocker APIs using mode="async":

SERP

result = await client.search.google(query="python", mode="async")

Web Unlocker

result = await client.scrape_url(url="https://example.com", mode="async")

How it works: Triggers request → gets response_id → polls until ready

Bug Fixes

Fix SyncBrightDataClient: remove unused customer_id parameter
Fix default poll_timeout for Web Unlocker async mode

API Changes

Remove _async suffix from method names (products() instead of products_async())
Remove GenericScraper - use client.scrape_url() directly

Documentation

Added docs/async_mode_guide.md

Full Changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md

Assets 2

01 Dec 17:51

shahar-brd

v2.0.0

4108b23

v2.0.0 - Breaking Changes

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

This is a major breaking release requiring code changes. Python 3.9+ now required.

Client Initialization

# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")

# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")

API Structure - Hierarchical Methods

# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")

# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")

Platform-Specific Scraping

# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)

Search Operations

# ❌ Old
results = client.search(query, search_engine="google")

# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)

Async Support (New)

# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)

# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
    result = await client.scrape_url_async(url)
    
# ✅ Async batch operations
async def scrape_multiple():
    async with BrightDataClient(token="...") as client:
        tasks = [client.scrape_url_async(url) for url in urls]
        results = await asyncio.gather(*tasks)

Manual Job Control (New)

# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
    data = await job.fetch_async()

Type-Safe Payloads (New)

# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}

# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123",
    reviews_count=100
)
result = client.scrape.amazon.products(payload)

Return Types

# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data)        # Actual scraped data
print(result.timing)      # Performance metrics
print(result.cost)        # Cost tracking
print(result.snapshot_id) # Job identifier

CLI Tool (New)

# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3

Configuration Changes

# ❌ Old
client = bdclient(
    api_token="token",              # Changed parameter name
    auto_create_zones=True,          # Default changed to False
    web_unlocker_zone="sdk_unlocker", # Default changed
    serp_zone="sdk_serp",            # Default changed
    browser_zone="sdk_browser"       # Default changed
)

# ✅ New
client = BrightDataClient(
    token="token",                   # Renamed from api_token
    auto_create_zones=False,         # New default
    web_unlocker_zone="web_unlocker1", # New default name
    serp_zone="serp_api1",           # New default name
    browser_zone="browser_api1",     # New default name
    timeout=30,                      # New parameter
    rate_limit=10,                   # New parameter (optional)
    rate_period=1.0                  # New parameter
)

✨ New Features

Platform Coverage

Platform	Status	Methods
Amazon	✅ NEW	`products()`, `reviews()`, `sellers()`
Instagram	✅ NEW	`profiles()`, `posts()`, `comments()`, `reels()`
Facebook	✅ NEW	`posts()`, `comments()`, `groups()`
LinkedIn	✅ Enhanced	Full scraping and search
ChatGPT	✅ Enhanced	Improved interaction
Google/Bing/Yandex	✅ Enhanced	Dedicated services

Performance

⚡ 10x better concurrency - Event loop-based architecture
🔌 Advanced connection pooling - 100 total, 30 per host
🎯 Built-in rate limiting - Configurable request throttling

✅ Upgrade Checklist

Update Python to 3.9+
Change imports: bdclient → BrightDataClient
Update parameter: api_token= → token=
Migrate method calls to hierarchical structure
Handle new ScrapeResult/SearchResult return types
Review zone configuration defaults
Consider async for better performance
Test in staging environment

📚 Resources

Full Changelog: v1.1.3...v2.0.0

Assets 2

07 Sep 18:20

Idanvilenski

v1.1.3

5a81b1c

v1.1.3

New Features:

Added url parameter to extract function for direct URL specification
Added output_scheme parameter for OpenAI Structured Outputs support
Enhanced parse_content to auto-detect multiple results from batch operations

Improvements:

Added user-agent headers to all dataset API requests for better tracking
Improved schema validation for OpenAI Structured Outputs compatibility
Updated examples with proper formatting

Bug Fixes:

Fixed parse_content handling of multiple scraping results
Fixed OpenAI schema validation requirements

Assets 2

04 Sep 14:53

Idanvilenski

v1.1.2

b4aec7e

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

AI-Powered Extract Function: New extract() function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries
LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval

Improvements

Set sync=True as default for all LinkedIn scraping methods for better user experience
Improved unit test coverage
Enhanced error handling for LinkedIn API responses

Examples

Added extract_example.py demonstrating AI-powered content extraction capabilities
Updated LinkedIn examples to showcase sync functionality

Technical Changes

Use correct /scrape endpoint for synchronous LinkedIn requests
Pass dataset_id as URL parameter with proper flags
Handle both 200 and 202 status codes appropriately
Maintain backward compatibility for async operations

Assets 2

03 Sep 10:22

Idanvilenski

v1.1.1

ef3d827

v1.1.1: Documentation Updates & Bug Fixes

Updates

Enhanced README with examples for crawl(), parse_content(), and connect_browser() functions
Added complete client parameter documentation
Fixed browser connection example import issues
Improved CI workflow for PyPI package testing

Bug Fixes

Fixed missing Playwright import in browser example
Corrected example URL typo
Updated test workflow to prevent PyPI race conditions

Assets 2

01 Sep 14:31

Idanvilenski

v1.1.0

aefa14a

v1.1.0: Web Crawling, Content Parsing & Browser Automation

New Features

🕷️ Web Crawling

crawl() function for discovering and scraping multiple pages from websites
Advanced filtering with regex patterns for URL inclusion/exclusion
Configurable crawl depth and sitemap handling
Custom output schema support

🔍 Content Parsing

parse_content() function for extracting useful data from API responses
Support for text extraction, link discovery, and image URL collection
Handles both JSON responses and raw HTML content
Structured data extraction from various content formats

🌐 Browser Automation

connect_browser() function for Playwright/Selenium integration
WebSocket endpoint generation for scraping browser connections
Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
Seamless authentication with Bright Data's browser service

Improvements

📡 Better Async Handling

Enhanced download_snapshot() with improved 202 status code handling
Friendly status messages instead of exceptions for pending snapshots
Better user experience for asynchronous data processing

🔧 Robust Error Handling

Fixed zone creation error handling with proper exception propagation
Added retry logic for network failures and temporary errors
Improved zone management reliability

🐍 Python Support Update

Updated to support Python 3.8+ (removed Python 3.7)
Updated CI/CD pipeline for modern Python versions
Added BeautifulSoup4 as core dependency

Dependencies

Added: beautifulsoup4>=4.9.0 for content parsing
Updated: Python compatibility to >=3.8

Examples

New example files demonstrate the enhanced functionality:

examples/crawl_example.py - Web crawling usage
examples/browser_connection_example.py - Browser automation setup
examples/parse_content_example.py - Content parsing workflows

Assets 2

27 Aug 15:14

Idanvilenski

v1.0.7

0e05a6a

Release v1.0.7: LinkedIn Integration & Enhanced APIs

🚀 Major Features

LinkedIn Data Integration

New scrape_linkedin class: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts
New search_linkedin class: Advanced LinkedIn content discovery with keyword and URL-based search
Production-ready examples: Ready-to-use examples for all LinkedIn functionality

Enhanced ChatGPT API

Renamed to search_chatGPT: More intuitive naming for ChatGPT interactions
Sync/Async support: Choose between immediate results or background processing
Improved NDJSON parsing: Better handling of multi-response data

Improved Architecture

Modular design: Separated download functionality into dedicated module
Better code organization: Specialized API modules for different services
Production optimizations: Cleaner code with improved performance

🔧 API Enhancements

New LinkedIn Methods

# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)

# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")

Enhanced ChatGPT API

# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)

# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)

🛠️ Technical Improvements

Better error handling: Enhanced validation and error messages
Backward compatibility: All existing code continues to work
Performance optimizations: Faster processing and reduced memory usage
Production-ready code: Clean, efficient, and maintainable codebase

📝 Breaking Changes

scrape_chatGPT() renamed to search_chatGPT() (maintains same functionality)
Added sync parameter to ChatGPT API (defaults to True)

🐛 Bug Fixes

Fixed NDJSON response parsing for multi-line JSON data
Improved parameter validation across all APIs
Enhanced timeout handling for long-running requests

📚 Documentation

Updated examples with new LinkedIn functionality
Enhanced docstrings for all new methods
Added comprehensive usage examples

Assets 2

Releases: brightdata/sdk-python

v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite

Bright Data SDK Release Notes (vX.Y.Z)

New Features

Bug Fixes

Maintenance & Code Quality

Uh oh!

v2.2.1 — Datasets API with 100+ Integrations

What's New

Datasets API

Supported Categories

Fixes

Uh oh!

v2.1.1 - Instagram Scrapers & Version Centralization

What's New

Instagram Scraper Support

Improvements

Uh oh!

v2.1.0 - Async Mode for SERP and Web Unlocker

What's New

Async Mode

SERP

Web Unlocker

Bug Fixes

API Changes

Documentation

Uh oh!

v2.0.0 - Breaking Changes

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

Client Initialization

API Structure - Hierarchical Methods

Platform-Specific Scraping

Search Operations

Async Support (New)

Manual Job Control (New)

Type-Safe Payloads (New)

Return Types

CLI Tool (New)

Configuration Changes

✨ New Features

Platform Coverage

Performance

✅ Upgrade Checklist

📚 Resources

Uh oh!

v1.1.3

Uh oh!

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

Improvements

Examples

Technical Changes

Uh oh!

v1.1.1: Documentation Updates & Bug Fixes

Updates

Bug Fixes

Uh oh!

v1.1.0: Web Crawling, Content Parsing & Browser Automation

New Features

🕷️ Web Crawling

🔍 Content Parsing

🌐 Browser Automation

Improvements

📡 Better Async Handling

🔧 Robust Error Handling

🐍 Python Support Update

Dependencies

Examples

Uh oh!

Release v1.0.7: LinkedIn Integration & Enhanced APIs

🚀 Major Features

LinkedIn Data Integration

Enhanced ChatGPT API

Improved Architecture

🔧 API Enhancements

New LinkedIn Methods

Enhanced ChatGPT API

🛠️ Technical Improvements

📝 Breaking Changes