Skip to content

Latest commit

 

History

History
49 lines (37 loc) · 1.91 KB

File metadata and controls

49 lines (37 loc) · 1.91 KB

Task: Scrape vermittlerregister.info broker data

Target URL

https://www.vermittlerregister.info/recherche?a=pdf&registernummer=D-21RP-R1O5O-37

Goal

Build a Python script that:

  1. Accepts an IHK registernummer (e.g. D-21RP-R1O5O-37) as input
  2. Returns broker details from the Vermittlerregister
  3. Works consistently and fast (seconds, not minutes)
  4. Does NOT require manual captcha solving

The Challenge

The site uses Friendly Captcha (https://friendlycaptcha.com). This is a PoW-based captcha (not image-based), so:

  • It's a puzzle solved client-side in JS before form submit
  • The server validates a puzzle solution token submitted with the form

Approach to Investigate

Priority 1: Check if the PDF endpoint bypasses captcha

The URL has a=pdf param — test if a direct HTTP GET/POST returns data without captcha validation.

Try:

curl -L "https://www.vermittlerregister.info/recherche?a=pdf&registernummer=D-21RP-R1O5O-37" -o test.pdf

Also try POST:

curl -X POST "https://www.vermittlerregister.info/recherche" -d "a=pdf&registernummer=D-21RP-R1O5O-37" -o test2.pdf

Priority 2: Intercept real browser network traffic

Use Playwright or Scrapling to intercept the actual form submission and find what tokens/headers are sent. Replay those.

Priority 3: Scrapling library

Check if Scrapling (https://github.com/D4Vinci/Scrapling) can handle this. Install and test it. pip install scrapling

Priority 4: Friendly Captcha puzzle solver

Friendly Captcha uses a hashcash-style PoW puzzle. Research if there's an open-source solver that can complete the puzzle programmatically (no human needed).

Deliverables

  • scrape.py — main script, takes registernummer as CLI arg, prints JSON with broker details
  • README.md — how it works, what was found
  • Commit everything

When done, run: openclaw system event --text "Done: vermittler-scraper ready. Check ~/Code/vermittler-scraper/" --mode now