Skip to content

Commit 2e6f916

Browse files
Fix client for current Meta API and harden detection avoidance
- Fix __hsi regex extraction (key changed from "hsi" to "__hsi") - Add retry logic to challenge POST with backoff - Update Chrome versions to 140-145 in fingerprint pool - Add curl_cffi TLS fingerprint impersonation (optional stealth dep) - Add dynamic extraction for v parameter and x-asbd-id - Fix typeahead endpoint: add required adType/isMobile variables, handle new dict response structure, update field name mapping - Fix cookie iteration for curl_cffi compatibility - Update fallback __rev token to current value - Update docs: add stealth install option, fix architecture details, correct SEARCH_KEYWORD constant value, fix contributing clone URL Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0ac7ba1 commit 2e6f916

File tree

10 files changed

+173
-61
lines changed

10 files changed

+173
-61
lines changed

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,18 @@ With async support (requires [httpx](https://www.python-httpx.org/)):
5555
pip install meta-ads-collector[async]
5656
```
5757

58+
With stealth TLS fingerprinting (requires [curl_cffi](https://github.com/lexiforest/curl_cffi)):
59+
60+
```bash
61+
pip install meta-ads-collector[stealth]
62+
```
63+
5864
From source:
5965

6066
```bash
6167
git clone https://github.com/promisingcoder/MetaAdsCollector.git
6268
cd meta-ads-collector
63-
pip install -e ".[dev]"
69+
pip install -e ".[dev,async,stealth]"
6470
```
6571

6672
**Requirements:** Python 3.9+
@@ -79,7 +85,7 @@ pip install -e ".[dev]"
7985
- **Collection Reporting** -- summary statistics with throughput metrics
8086
- **Export Formats** -- JSON, CSV, JSONL
8187
- **Stream Mode** -- yield lifecycle events alongside ads through a single iterator
82-
- **Detection Avoidance** -- browser fingerprint randomization, dynamic token extraction, session management
88+
- **Detection Avoidance** -- browser fingerprint randomization, TLS fingerprint impersonation (via `curl_cffi`), dynamic token extraction, session management
8389

8490
---
8591

docs/api-reference.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Search for ads and yield results as an iterator.
6262
| `country` | `str` | `"US"` | ISO 3166-1 alpha-2 country code |
6363
| `ad_type` | `str` | `"ALL"` | Ad type filter (`"ALL"`, `"POLITICAL_AND_ISSUE_ADS"`, `"HOUSING_ADS"`, `"EMPLOYMENT_ADS"`, `"CREDIT_ADS"`) |
6464
| `status` | `str` | `"ACTIVE"` | Status filter (`"ACTIVE"`, `"INACTIVE"`, `"ALL"`) |
65-
| `search_type` | `str` | `"KEYWORD_EXACT_PHRASE"` | Search type (`"KEYWORD_EXACT_PHRASE"`, `"KEYWORD_UNORDERED"`, `"PAGE"`) |
65+
| `search_type` | `str` | `"KEYWORD_UNORDERED"` | Search type (`"KEYWORD_EXACT_PHRASE"`, `"KEYWORD_UNORDERED"`, `"PAGE"`) |
6666
| `page_ids` | `list[str] \| None` | `None` | Filter by specific page IDs |
6767
| `sort_by` | `str \| None` | `"SORT_BY_TOTAL_IMPRESSIONS"` | Sort order (`"SORT_BY_TOTAL_IMPRESSIONS"` or `None` for relevancy) |
6868
| `max_results` | `int \| None` | `None` | Maximum ads to collect (`None` = no limit) |
@@ -160,7 +160,7 @@ Close the collector and release resources.
160160
| `STATUS_ACTIVE` | `"ACTIVE"` |
161161
| `STATUS_INACTIVE` | `"INACTIVE"` |
162162
| `STATUS_ALL` | `"ALL"` |
163-
| `SEARCH_KEYWORD` | `"KEYWORD_EXACT_PHRASE"` |
163+
| `SEARCH_KEYWORD` | `"KEYWORD_UNORDERED"` |
164164
| `SEARCH_EXACT` | `"KEYWORD_EXACT_PHRASE"` |
165165
| `SEARCH_UNORDERED` | `"KEYWORD_UNORDERED"` |
166166
| `SEARCH_PAGE` | `"PAGE"` |
@@ -722,7 +722,7 @@ Format a CollectionReport as a JSON string.
722722

723723
| Constant | Value |
724724
|---|---|
725-
| `SEARCH_KEYWORD` | `"KEYWORD_EXACT_PHRASE"` |
725+
| `SEARCH_KEYWORD` | `"KEYWORD_UNORDERED"` |
726726
| `SEARCH_EXACT` | `"KEYWORD_EXACT_PHRASE"` |
727727
| `SEARCH_UNORDERED` | `"KEYWORD_UNORDERED"` |
728728
| `SEARCH_PAGE` | `"PAGE"` |

docs/architecture.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -90,10 +90,12 @@ MetaAdsClient.initialize()
9090
| `lsd` | CSRF protection (mandatory) | `"LSD",[],{"token":"..."}`, `name="lsd" value="..."` |
9191
| `__rev` / `__spin_r` | Build revision | `"__spin_r":12345`, `"server_revision":12345` |
9292
| `__spin_t` | Timestamp | `"__spin_t":12345` |
93-
| `__hsi` | Session ID | `"hsi":"12345"` |
93+
| `__hsi` | Session ID | `"__hsi":"12345"` (with fallback to `"hsi":"12345"`) |
9494
| `fb_dtsg` | DTSG token | `"DTSGInitialData",[],{"token":"..."}` |
95-
| `__dyn` | Dynamic modules hash | `"__dyn":"..."` |
96-
| `__csr` | CSR hash | `"__csr":"..."` |
95+
| `__dyn` | Dynamic modules hash | `"__dyn":"..."` (no longer reliably present in HTML; fallback values used) |
96+
| `__csr` | CSR hash | `"__csr":"..."` (no longer reliably present in HTML; fallback values used) |
97+
| `v` | Version parameter | `"v":"fbece7"` (dynamically extracted when available) |
98+
| `x-asbd-id` | ASBD identifier | `"asbd_id":"359341"` (dynamically extracted when available) |
9799
| `jazoest` | Anti-abuse token | `"jazoest":12345` or computed from LSD |
98100

99101
Fallback values from `constants.py` are used when extraction fails.
@@ -170,9 +172,9 @@ Sessions become stale after `MAX_SESSION_AGE` (30 minutes by default). Before ea
170172

171173
`_refresh_session()` performs a full re-initialization:
172174

173-
1. Close the old `requests.Session`
175+
1. Close the old session (`requests.Session` or `curl_cffi.requests.Session`)
174176
2. Generate a new `BrowserFingerprint`
175-
3. Create a fresh session with new headers
177+
3. Create a fresh session with new headers (preferring `curl_cffi` when available)
176178
4. Re-run `initialize()` to get new tokens
177179
5. Track consecutive refresh failures to prevent infinite loops
178180

@@ -192,14 +194,18 @@ If `max_refresh_attempts` consecutive refreshes fail, `SessionExpiredError` is r
192194

193195
`fingerprint.py` generates randomized but internally-consistent browser identities. Each session gets a unique combination of:
194196

195-
- **Chrome version**: Randomly selected from 8 recent versions (125--132)
197+
- **Chrome version**: Randomly selected from 6 recent versions (140--145)
196198
- **Platform**: Windows or macOS with matching User-Agent OS string and `sec-ch-ua-platform`
197199
- **Viewport**: 8 common screen resolutions
198200
- **DPR**: 5 device pixel ratio values
199201
- **"Not A Brand" hint**: 4 variations matching real Chrome behavior
200202

201203
All headers derived from the fingerprint are self-consistent -- the Chrome version in the User-Agent matches `sec-ch-ua`, the platform in the UA matches `sec-ch-ua-platform`, etc.
202204

205+
### TLS Fingerprint Impersonation
206+
207+
When the optional `curl_cffi` dependency is installed (`pip install meta-ads-collector[stealth]`), the client uses `curl_cffi.requests.Session(impersonate="chrome")` instead of `requests.Session`. This provides Chrome-like TLS fingerprints (JA3/JA4) at the connection level, making requests indistinguishable from a real Chrome browser to TLS-based bot detection systems. If `curl_cffi` is not installed, the library falls back to `requests.Session` transparently.
208+
203209
### Request Mimicry
204210

205211
The client replicates the exact request patterns of a real browser:

docs/contributing.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,16 @@ Contributions to `meta-ads-collector` are welcome. This guide covers the develop
1313
### Clone and Install
1414

1515
```bash
16-
git clone https://github.com/Yossef/meta-ads-collector.git
17-
cd meta-ads-collector
16+
git clone https://github.com/promisingcoder/MetaAdsCollector.git
17+
cd MetaAdsCollector
1818

1919
# Create a virtual environment
2020
python -m venv .venv
2121
source .venv/bin/activate # Linux/macOS
2222
# .venv\Scripts\activate # Windows
2323

2424
# Install in editable mode with all dev dependencies
25-
pip install -e ".[dev,async]"
25+
pip install -e ".[dev,async,stealth]"
2626
```
2727

2828
Or use the Makefile:
@@ -37,6 +37,7 @@ make install-dev
3737
|---|---|---|
3838
| (none) | `requests>=2.28.0` | Always (core dependency) |
3939
| `async` | `httpx>=0.24.0` | When working on async modules |
40+
| `stealth` | `curl_cffi>=0.7.0` | For TLS fingerprint impersonation (recommended) |
4041
| `dev` | `pytest`, `pytest-cov`, `pytest-asyncio`, `ruff`, `mypy`, `types-requests` | Always for development |
4142

4243
## Running Tests

docs/quickstart.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ Get from zero to collecting ads in under 60 seconds.
88
pip install meta-ads-collector
99
```
1010

11+
For stealth TLS fingerprinting (recommended for production use):
12+
13+
```bash
14+
pip install meta-ads-collector[stealth]
15+
```
16+
1117
## Python API
1218

1319
```python

meta_ads_collector/async_client.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -478,7 +478,7 @@ async def search_ads(
478478
"sessionID": session_id,
479479
"source": None,
480480
"startDate": None,
481-
"v": "fbece7",
481+
"v": self._tokens.get("v", "fbece7"),
482482
"viewAllPageID": "0",
483483
}
484484

@@ -597,7 +597,7 @@ async def search_pages(
597597

598598
from urllib.parse import quote
599599

600-
variables = {"queryString": query, "country": country}
600+
variables = {"queryString": query, "country": country, "adType": "ALL", "isMobile": False}
601601
typeahead_doc_id = self._doc_ids.get(
602602
"useAdLibraryTypeaheadSuggestionDataSourceQuery", DOC_ID_TYPEAHEAD,
603603
)

0 commit comments

Comments
 (0)