🧪 PoC: Crawler + IAB Taxonomy Classification using OpenAI

I built a simple headless-browser crawler that extracts content + internal links from web pages and integrates IAB taxonomy classification via OpenAI.
It’s not production-ready, but useful as a proof of concept—especially if you’re working on automated tagging, contextual ad targeting, or content classification pipelines.

GitHub 👉 https://github.com/hanishi/pekko-playwright

Highlights:
	•	Reactive architecture using Apache Pekko (Akka) + Playwright for DOM-aware extraction
	•	Starts from a target element and gathers clean text + filtered internal links
	•	IAB taxonomy classification using OpenAI’s API (currently via pageContent → OpenAI → taxonomy_id)
	•	Practical motivations: improve contextual tagging for better CPM and cleaner ad delivery environments

Why?

As someone working in AdTech, I’ve seen how poor or missing taxonomy tagging leads to:
	•	Lower CPMs due to mismatched bids
	•	Unwanted ads on sensitive content
	•	Frustration on both publisher and buyer side

So this PoC is my small step toward cleaner, better targeted, and more trustworthy ad environments.
Hope it’s useful to someone—feedback or collaboration welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🧪 PoC: Crawler + IAB Taxonomy Classification using OpenAI #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🧪 PoC: Crawler + IAB Taxonomy Classification using OpenAI #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions