TriviaAdvisor helps you find and track pub quiz nights and trivia events in your area. Think of it as a "Yelp for pub quizzes" - helping trivia enthusiasts discover new venues and keep track of their favorite quiz nights.
- GitHub Repository: holden/trivia_advisor
- Note: This project uses an underscore in its name (
trivia_advisor
), not a hyphen.
- π― Find trivia nights near you
- π Track recurring events by venue
- π Aggregates data from multiple trivia providers
- πΊοΈ Map integration for easy venue discovery
- π± Mobile-friendly interface
lib/trivia_advisor/scraping/scrapers/[source_name]/
βββ scraper.ex # Main scraper module
βββ venue_extractor.ex # HTML extraction logic
# 1. Index Job to fetch venue list
def perform(%Oban.Job{id: job_id}) do
# Question One: RSS feed pagination
# Inquizition: API endpoint
venues = fetch_venues()
# 2. Process venues by scheduling detail jobs
total_venues = length(venues)
processed_venues = schedule_detail_jobs(venues)
# 3. Update Job Metadata
JobMetadata.update_index_job(job_id, %{
total_venues: total_venues,
enqueued_jobs: processed_venues,
metadata: %{
started_at: start_time,
completed_at: DateTime.utc_now()
}
})
:ok
end
# 1. Extract Basic Venue Data
venue_data = %{
name: extracted_data.title,
address: extracted_data.address,
phone: extracted_data.phone,
website: extracted_data.website
}
# 2. Process Through VenueStore
{:ok, venue} = VenueStore.process_venue(venue_data)
# This handles:
# - Google Places API lookup
# - Country/City creation
# - Venue creation/update
# 3. Process Event
event_data = %{
name: "#{source.name} at #{venue.name}",
venue_id: venue.id,
day_of_week: day,
start_time: time,
frequency: frequency,
description: description,
entry_fee_cents: parse_currency(fee_text)
}
# 4. Create/Update Event
{:ok, event} = EventStore.process_event(venue, event_data, source.id)
# 1. Top-level Job error handling
def perform(%Oban.Job{id: job_id}) do
try do
# Main scraping logic
:ok
rescue
e ->
JobMetadata.update_error(job_id, Exception.format(:error, e, __STACKTRACE__))
Logger.error("Scraper failed: #{Exception.message(e)}")
{:error, e}
end
end
# 2. Individual venue rescue
try do
# Venue processing
rescue
e ->
Logger.error("Failed to process venue: #{inspect(e)}")
nil # Skip this venue but continue with others
end
# 1. Start of scrape
Logger.info("Starting #{source.name} scraper")
# 2. Venue count
Logger.info("Found #{venue_count} venues")
# 3. Individual venue processing
Logger.info("Processing venue: #{venue.name}")
# 4. VenueHelpers.log_venue_details for consistent format
VenueHelpers.log_venue_details(%{
raw_title: raw_title,
title: clean_title,
address: address,
time_text: time_text,
day_of_week: day_of_week,
start_time: start_time,
frequency: frequency,
fee_text: fee_text,
phone: phone,
website: website,
description: description,
hero_image_url: hero_image_url,
url: source_url
})
-
Venue must have:
- Valid name
- Valid address
- Day of week
- Start time
-
Event must have:
- Valid venue_id
- Valid day_of_week
- Valid start_time
- Valid frequency
- Country (find or create)
- City (find or create, linked to country)
- Venue (find or create, linked to city)
- Event (find or create, linked to venue)
- EventSource (find or create, linked to event and source)
lib/trivia_advisor/scraping/oban/[source_name]_index_job.ex # Lists venues and schedules detail jobs
lib/trivia_advisor/scraping/oban/[source_name]_detail_job.ex # Processes individual venues/events
All scrapers should use the centralized JobMetadata
module for updating job metadata:
# In detail jobs:
def perform(%Oban.Job{args: args, id: job_id}) do
# Process the venue and event
result = process_venue(args["venue_data"], source)
# Handle the result and update metadata
handle_processing_result(result, job_id, source)
end
# Handle the processing result uniformly
defp handle_processing_result(result, job_id, source) do
case result do
{:ok, %{venue: venue, event: event}} ->
# Update metadata with the JobMetadata helper
metadata = %{
"venue_name" => venue.name,
"venue_id" => venue.id,
"event_id" => event.id,
# Additional fields...
}
JobMetadata.update_detail_job(job_id, metadata, %{venue_id: venue.id, event_id: event.id})
{:ok, %{venue_id: venue.id, event_id: event.id}}
{:error, reason} ->
# Update error metadata
JobMetadata.update_error(job_id, reason)
{:error, reason}
end
end
For consistently handling venue/event images:
# Download and attach hero images for events
hero_image_url = venue_data["image_url"]
if hero_image_url && hero_image_url != "" do
# Pass force_refresh_images flag to control image refresh
force_refresh_images = Process.get(:force_refresh_images, false)
case ImageDownloader.download_event_hero_image(hero_image_url, force_refresh_images) do
{:ok, upload} ->
Logger.info("β
Successfully downloaded hero image")
Map.put(event_data, :hero_image, upload)
{:error, reason} ->
Logger.warning("β οΈ Failed to download hero image: #{inspect(reason)}")
event_data
end
else
event_data
end
All scrapers support the force_refresh_images
flag that ensures images are always fresh:
-
How It Works:
- When enabled, existing images are deleted before downloading new ones
- Bypasses image caching in ImageDownloader
- Propagates through the entire process from index job to detail job to EventStore
-
Usage in Jobs:
# Through Oban job args
{:ok, _job} = Oban.insert(
TriviaAdvisor.Scraping.Oban.PubquizIndexJob.new(%{
"force_refresh_images" => true,
"limit" => 5
})
)
# Through mix task flags
mix scraper.test_pubquiz_index --limit=3 --force-refresh-images
-
Implementation:
- Index job passes flag to detail jobs
- Detail job sets Process.put(:force_refresh_images, true)
- ImageDownloader checks flag to force redownload
- EventStore explicitly deletes existing images when flag is true
-
Supported Scrapers:
- Question One
- Quizmeisters
- Geeks Who Drink
- PubQuiz
- NEVER make DB migrations without asking first
- Always follow the existing pattern for consistency
- Maintain comprehensive logging
- Handle errors gracefully
- Use the VenueHelpers module for common functionality
- NEVER write repetitive case statements that do the same thing with different data structures - see Scraping Best Practices for details
- NEVER hardcode Unsplash or other external image URLs directly in the code - use database or configuration
- Prefer database-backed data over static lists when possible
- Focus on optimizing queries rather than replacing with hardcoded data