Skip to content

Releases: brightdata/sdk-python

v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite

09 Mar 15:23
43504e6

Choose a tag to compare

Bright Data SDK Release Notes (vX.Y.Z)

We are excited to announce the latest release of the Bright Data SDK! This update brings major new capabilities to our Web Scraper API, introduces new supported targets, and includes a massive under-the-hood cleanup to improve stability, maintainability, and test coverage.

New Features

  • Scraper Studio Integration: You can now seamlessly trigger and fetch results directly from your custom scrapers built within Bright Data's IDE.
  • New Built-in Scrapers: Added official out-of-the-box support for DigiKey and Reddit scrapers.

Bug Fixes

  • Scraping Reliability: Resolved a critical issue that caused a crash when calling ScrapeJob.to_result(), ensuring smoother data extraction workflows.

Maintenance & Code Quality

  • Codebase Cleanup: We've performed a major spring cleaning, removing dead code and deprecating legacy modules. This resulted in a significantly leaner SDK with a net reduction of 12,000 lines of code.
  • Enhanced Test Coverage: Added 365 new unit tests utilizing shared fixtures. Our key modules now boast robust test coverage ranging from 87% to 98%, ensuring greater reliability for future updates.

v2.2.1 — Datasets API with 100+ Integrations

23 Feb 09:38

Choose a tag to compare

What's New

Datasets API

Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.

  • Callable datasets — trigger snapshots directly: ⁠ await client.datasets.imdb_movies(filter=..., records_limit=5) ⁠
  • ⁠ sample() ⁠ method — quick data sampling without specifying filters
  • ⁠ get_metadata() ⁠ — discover available fields and types per dataset
  • Export utilities — ⁠ export_json() ⁠, ⁠ export_csv() ⁠, ⁠ export_jsonl() ⁠

Supported Categories

E-commerce (Amazon, Walmart, Shopee, Zalando, Zara, H&M, IKEA, Shein, Sephora), Social media (Instagram, TikTok, Pinterest,
YouTube, Facebook), Business intelligence (ZoomInfo, PitchBook, Owler, Slintel, G2, Trustpilot), Jobs & HR (Glassdoor, Indeed,
Xing), Real estate (Zillow, Airbnb + 8 regional), Luxury brands (Chanel, Dior, Prada, Hermes, YSL), Entertainment (IMDB, NBA,
Goodreads), and more.

Fixes

  • LinkedIn search tests updated to match pythonic parameter names (⁠ first_name ⁠/⁠ last_name ⁠ instead of ⁠ firstName ⁠/⁠ lastName ⁠)

v2.1.1 - Instagram Scrapers & Version Centralization

20 Jan 11:58

Choose a tag to compare

What's New

Instagram Scraper Support

Method Description
⁠ client.scrape.instagram.profiles/posts/reels/comments(url) ⁠ Extract data from URL
⁠ client.search.instagram.profiles(user_name) ⁠ Find profile by username
⁠ client.search.instagram.posts/reels/reels_all(url, ...) ⁠ Discover content with filters

Improvements

  • Version centralization - Single source of truth in ⁠ pyproject.toml ⁠
  • Bug fix - Discovery endpoints now correctly include ⁠ type=discover_new&discover_by=... ⁠ query params

Full Changelog: v2.1.0...v2.1.1

v2.1.0 - Async Mode for SERP and Web Unlocker

07 Jan 11:53

Choose a tag to compare

What's New

Async Mode

Non-blocking async mode for SERP and Web Unlocker APIs using mode="async":

SERP

result = await client.search.google(query="python", mode="async")

Web Unlocker

result = await client.scrape_url(url="https://example.com", mode="async")

How it works: Triggers request → gets response_id → polls until ready

Bug Fixes

  • Fix SyncBrightDataClient: remove unused customer_id parameter
  • Fix default poll_timeout for Web Unlocker async mode

API Changes

  • Remove _async suffix from method names (products() instead of products_async())
  • Remove GenericScraper - use client.scrape_url() directly

Documentation

  • Added docs/async_mode_guide.md

Full Changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md

v2.0.0 - Breaking Changes

01 Dec 17:51
4108b23

Choose a tag to compare

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

This is a major breaking release requiring code changes. Python 3.9+ now required.

Client Initialization

# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")

# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")

API Structure - Hierarchical Methods

# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")

# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")

Platform-Specific Scraping

# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)

Search Operations

# ❌ Old
results = client.search(query, search_engine="google")

# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)

Async Support (New)

# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)

# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
    result = await client.scrape_url_async(url)
    
# ✅ Async batch operations
async def scrape_multiple():
    async with BrightDataClient(token="...") as client:
        tasks = [client.scrape_url_async(url) for url in urls]
        results = await asyncio.gather(*tasks)

Manual Job Control (New)

# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
    data = await job.fetch_async()

Type-Safe Payloads (New)

# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}

# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123",
    reviews_count=100
)
result = client.scrape.amazon.products(payload)

Return Types

# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data)        # Actual scraped data
print(result.timing)      # Performance metrics
print(result.cost)        # Cost tracking
print(result.snapshot_id) # Job identifier

CLI Tool (New)

# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3

Configuration Changes

# ❌ Old
client = bdclient(
    api_token="token",              # Changed parameter name
    auto_create_zones=True,          # Default changed to False
    web_unlocker_zone="sdk_unlocker", # Default changed
    serp_zone="sdk_serp",            # Default changed
    browser_zone="sdk_browser"       # Default changed
)

# ✅ New
client = BrightDataClient(
    token="token",                   # Renamed from api_token
    auto_create_zones=False,         # New default
    web_unlocker_zone="web_unlocker1", # New default name
    serp_zone="serp_api1",           # New default name
    browser_zone="browser_api1",     # New default name
    timeout=30,                      # New parameter
    rate_limit=10,                   # New parameter (optional)
    rate_period=1.0                  # New parameter
)

✨ New Features

Platform Coverage

Platform Status Methods
Amazon ✅ NEW products(), reviews(), sellers()
Instagram ✅ NEW profiles(), posts(), comments(), reels()
Facebook ✅ NEW posts(), comments(), groups()
LinkedIn ✅ Enhanced Full scraping and search
ChatGPT ✅ Enhanced Improved interaction
Google/Bing/Yandex ✅ Enhanced Dedicated services

Performance

  • 10x better concurrency - Event loop-based architecture
  • 🔌 Advanced connection pooling - 100 total, 30 per host
  • 🎯 Built-in rate limiting - Configurable request throttling

✅ Upgrade Checklist

  • Update Python to 3.9+
  • Change imports: bdclientBrightDataClient
  • Update parameter: api_token=token=
  • Migrate method calls to hierarchical structure
  • Handle new ScrapeResult/SearchResult return types
  • Review zone configuration defaults
  • Consider async for better performance
  • Test in staging environment

📚 Resources

Full Changelog: v1.1.3...v2.0.0

v1.1.3

07 Sep 18:20

Choose a tag to compare

New Features:

  • Added url parameter to extract function for direct URL specification
  • Added output_scheme parameter for OpenAI Structured Outputs support
  • Enhanced parse_content to auto-detect multiple results from batch operations

Improvements:

  • Added user-agent headers to all dataset API requests for better tracking
  • Improved schema validation for OpenAI Structured Outputs compatibility
  • Updated examples with proper formatting

Bug Fixes:

  • Fixed parse_content handling of multiple scraping results
  • Fixed OpenAI schema validation requirements

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

04 Sep 14:53

Choose a tag to compare

New Features

  • AI-Powered Extract Function: New extract() function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries
  • LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval

Improvements

  • Set sync=True as default for all LinkedIn scraping methods for better user experience
  • Improved unit test coverage
  • Enhanced error handling for LinkedIn API responses

Examples

  • Added extract_example.py demonstrating AI-powered content extraction capabilities
  • Updated LinkedIn examples to showcase sync functionality

Technical Changes

  • Use correct /scrape endpoint for synchronous LinkedIn requests
  • Pass dataset_id as URL parameter with proper flags
  • Handle both 200 and 202 status codes appropriately
  • Maintain backward compatibility for async operations

v1.1.1: Documentation Updates & Bug Fixes

03 Sep 10:22

Choose a tag to compare

Updates

  • Enhanced README with examples for crawl(), parse_content(), and connect_browser() functions
  • Added complete client parameter documentation
  • Fixed browser connection example import issues
  • Improved CI workflow for PyPI package testing

Bug Fixes

  • Fixed missing Playwright import in browser example
  • Corrected example URL typo
  • Updated test workflow to prevent PyPI race conditions

v1.1.0: Web Crawling, Content Parsing & Browser Automation

01 Sep 14:31

Choose a tag to compare

New Features

🕷️ Web Crawling

  • crawl() function for discovering and scraping multiple pages from websites
  • Advanced filtering with regex patterns for URL inclusion/exclusion
  • Configurable crawl depth and sitemap handling
  • Custom output schema support

🔍 Content Parsing

  • parse_content() function for extracting useful data from API responses
  • Support for text extraction, link discovery, and image URL collection
  • Handles both JSON responses and raw HTML content
  • Structured data extraction from various content formats

🌐 Browser Automation

  • connect_browser() function for Playwright/Selenium integration
  • WebSocket endpoint generation for scraping browser connections
  • Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
  • Seamless authentication with Bright Data's browser service

Improvements

📡 Better Async Handling

  • Enhanced download_snapshot() with improved 202 status code handling
  • Friendly status messages instead of exceptions for pending snapshots
  • Better user experience for asynchronous data processing

🔧 Robust Error Handling

  • Fixed zone creation error handling with proper exception propagation
  • Added retry logic for network failures and temporary errors
  • Improved zone management reliability

🐍 Python Support Update

  • Updated to support Python 3.8+ (removed Python 3.7)
  • Updated CI/CD pipeline for modern Python versions
  • Added BeautifulSoup4 as core dependency

Dependencies

  • Added: beautifulsoup4>=4.9.0 for content parsing
  • Updated: Python compatibility to >=3.8

Examples

New example files demonstrate the enhanced functionality:

  • examples/crawl_example.py - Web crawling usage
  • examples/browser_connection_example.py - Browser automation setup
  • examples/parse_content_example.py - Content parsing workflows

Release v1.0.7: LinkedIn Integration & Enhanced APIs

27 Aug 15:14

Choose a tag to compare

🚀 Major Features

LinkedIn Data Integration

  • New scrape_linkedin class: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts
  • New search_linkedin class: Advanced LinkedIn content discovery with keyword and URL-based search
  • Production-ready examples: Ready-to-use examples for all LinkedIn functionality

Enhanced ChatGPT API

  • Renamed to search_chatGPT: More intuitive naming for ChatGPT interactions
  • Sync/Async support: Choose between immediate results or background processing
  • Improved NDJSON parsing: Better handling of multi-response data

Improved Architecture

  • Modular design: Separated download functionality into dedicated module
  • Better code organization: Specialized API modules for different services
  • Production optimizations: Cleaner code with improved performance

🔧 API Enhancements

New LinkedIn Methods

# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)

# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")

Enhanced ChatGPT API

# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)

# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)

🛠️ Technical Improvements

  • Better error handling: Enhanced validation and error messages
  • Backward compatibility: All existing code continues to work
  • Performance optimizations: Faster processing and reduced memory usage
  • Production-ready code: Clean, efficient, and maintainable codebase

📝 Breaking Changes

  • scrape_chatGPT() renamed to search_chatGPT() (maintains same functionality)
  • Added sync parameter to ChatGPT API (defaults to True)

🐛 Bug Fixes

  • Fixed NDJSON response parsing for multi-line JSON data
  • Improved parameter validation across all APIs
  • Enhanced timeout handling for long-running requests

📚 Documentation

  • Updated examples with new LinkedIn functionality
  • Enhanced docstrings for all new methods
  • Added comprehensive usage examples