Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 1.2 KB

File metadata and controls

42 lines (26 loc) · 1.2 KB

Vermittler-Scraper

Scrape broker registration data from the IHK Vermittlerregister by registration number.

What it does

Takes an IHK broker registration number (e.g. D-21RP-R1O5O-37) and returns structured broker details as JSON. The site uses Friendly Captcha (PoW-based) — this scraper bypasses it automatically.

Two approaches

Script Method Speed Dependencies
scrape.py Direct PDF endpoint fetch → parse PDF Fast (~2s) requests, pypdf, scrapling, playwright
scrape.js Stealth Playwright browser + auto captcha solve Slower (~10s) playwright-extra, puppeteer-extra-plugin-stealth

The Python script (scrape.py) tries the direct PDF download first. If that's captcha-blocked, it falls back to browser-based extraction via Scrapling/Playwright.

Usage

Python (recommended)

pip install -r requirements.txt
playwright install chromium

python scrape.py D-21RP-R1O5O-37

Node.js (fallback)

npm install
npx playwright install chromium

node scrape.js D-21RP-R1O5O-37

Both output JSON to stdout.

License

ISC