A crowdsourced moderation and anti-spam system for Launchpad bugs and questions.
bifrost scrapes Launchpad bug comments, applies lightweight heuristic spam scoring, and presents suspicious content to human reviewers in a Tinder-style moderation queue.
The system is intentionally simple:
- Python
- Flask
- SQLite
- No ORM
- No JavaScript framework
- Single-container deployment
scraper.py:
- Scrapes Launchpad bugs via the public API
- Fetches bug metadata and comments
- Tracks previously scraped bug IDs
- Avoids duplicate comment imports
- Applies heuristic filtering before queueing content for review
Suspicious comments are stored with:
is_review_candidate = 1
Likely-safe comments are stored with:
is_review_candidate = 0
This reduces reviewer noise while still preserving data for future analysis.
Reviewers visit:
/review
Reviewers are shown:
- Bug title
- Link to Launchpad bug
- Comment content
They can classify content as:
- Not Spam
- Not Sure
- Spam
Each review:
- is recorded in SQLite
- advances automatically to the next random item
Reviewers also see:
- progress bar
- leaderboard
- personal review progress
Moderators visit:
/moderator
Moderators:
- see reviewer vote counts
- make the final decision
- apply canonical verdicts
Reviewer accuracy is later compared against moderator verdicts.
Reviewers cannot moderate. Moderators cannot review.
The leaderboard ranks reviewers based on:
- Total reviews submitted
- Correct reviews
- Incorrect reviews
Moderators are excluded.
Authentication is intentionally minimal.
Users:
- create a username
- receive a UUID
- use the UUID as a login token
Sessions are maintained using cookies.
This is NOT secure authentication.
The UI intentionally warns users:
- not to use real secrets
- not to reuse credentials
| File | Purpose |
|---|---|
app.py |
Flask web application |
scraper.py |
Launchpad scraper |
extract.py |
Export moderator-confirmed spam |
admin.py |
Create moderator accounts |
init_db.py |
Initialize SQLite schema |
scoring.py |
Heuristic spam scoring |
templates/ |
HTML templates |
static/ |
Images and assets |
SQLite database path:
/data/spam.db
The database can be externalized using a Docker bind mount.
docker build -t bifrost .docker run \
-p 8000:8000 \
-v $(pwd)/data:/data \
bifrostThe application will be available at:
http://localhost:8000
If needed:
docker run \
-v $(pwd)/data:/data \
bifrost \
python init_db.pyScrape random Launchpad bugs:
docker run \
-v $(pwd)/data:/data \
bifrost \
python scraper.py --count 100The scraper:
- skips already-scraped bugs
- logs progress
- applies heuristic pre-filtering
- stores suspicious comments for review
Use:
docker run -it \
-v $(pwd)/data:/data \
bifrost \
python admin.pyThis creates users with role:
moderator
Exporting Spam - Planned, prompted, and generated, but not reviewed or tested (insufficient time); included for completeness
Export moderator-confirmed spam:
docker run \
-v $(pwd)/data:/data \
bifrost \
python extract.pyOutput format:
JSONL
Example:
{
"bug_id": 123,
"bug_title": "Example bug",
"bug_url": "https://bugs.launchpad.net/...",
"message_link": "https://api.launchpad.net/...",
"content": "spam message",
"label": "spam"
}This format is convenient for:
- ML pipelines
- anti-spam systems
- LLM evaluation
- analytics
- classifier training
The current scoring model is intentionally simple.
Signals include:
- suspicious domains
- crypto keywords
- SEO spam phrases
- excessive links
- marketing language
- repeated spam patterns
The heuristic gate exists to reduce reviewer fatigue.
This MVP is designed to evolve toward:
- reviewer trust weighting
- disagreement analysis
- ML-assisted spam classification
- LLM-resistant moderation workflows
- coordinated spam campaign detection
- deterministic queue assignment
- semi-supervised training pipelines
This project intentionally uses:
- UUID-based login
- SQLite
- simple cookies
- no CSRF protection
- no hardened authentication
It is suitable for:
- prototypes
- internal tools
- experimentation
It is NOT suitable for production deployment without additional hardening.
MIT