Prague residential asking-price prediction service. The repository contains a Python data pipeline for scraping, curation, model training, and export, plus a Cloudflare Worker frontend/API for predictions, auth, billing, and the premium market-opportunity dashboard.
pipeline/- Python package for source adapters, quality checks, feature engineering, model training, and dashboard feed generation.worker-app/- Cloudflare Worker API, static frontend, D1 migrations, and Worker tests.shared/- JSON configuration shared by the pipeline and Worker.ops/- Operational scripts for local runners, scheduled pipelines, Cloudflare publishing, and housekeeping..github/workflows/- Scheduled and manual pipeline workflows.
Runtime data and model artifacts are intentionally ignored. On macOS the pipeline defaults to ~/Library/Application Support/HousesPredict-v2; override it with HOUSESPREDICT_RUNTIME_DIR, or set HOUSESPREDICT_USE_REPO_RUNTIME=1 to use repo-local data/ and artifacts/.
Requirements:
- Python 3.14
- Node.js 22
- npm
- Cloudflare Wrangler for Worker development and publishing
Install dependencies:
python3 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ./pipeline[dev]
npm installCreate local Worker environment variables from the example:
cp worker-app/.dev.vars.example worker-app/.dev.varsFill in the private values in worker-app/.dev.vars. Never commit that file.
Run the Worker locally:
npm run dev --workspace worker-appRun checks:
npm run typecheck:worker
npm run test:worker
npm run test:python
npm testUseful pipeline commands:
npm run probe:sources
npm run scrape
npm run refresh
npm run train
npm run backfill
npm run status
npm run run-allCloudflare bootstrap notes live in ops/cloudflare-bootstrap.md.
Publish refreshed artifacts and dashboard rows:
./ops/publish-cloudflare.shScheduled production jobs are split by responsibility:
./ops/run-scrape-publish.shevery 6 hours for fresh listings and dashboard feed publishing../ops/run-train-publish.shnightly for training, promotion gating, and publishing../ops/run-backfill.shmanually for dataset growth../ops/run-housekeeping.shweekly for D1 retention cleanup.
GitHub Actions require these repository secrets:
CLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_IDALERT_WEBHOOK_URLoptional
GET /api/configGET /api/mePOST /api/predictPOST /api/prefill-listingGET /api/dashboard/teaserGET /api/dashboard/opportunitiesPOST /api/billing/create-checkout-sessionPOST /api/billing/create-portal-sessionPOST /api/account/deletePOST /api/billing/webhookGET /api/health
Required:
SUPABASE_URLSUPABASE_ANON_KEYSUPABASE_SERVICE_ROLE_KEYSTRIPE_SECRET_KEYSTRIPE_PRICE_IDSTRIPE_WEBHOOK_SECRETAPP_BASE_URL
Optional:
PREMIUM_PLAN_CODEdefaults topremium_monthlyPREMIUM_PRICE_LABELdefaults toMěsíční předplatné
- Predictions estimate typical Prague asking prices, not final transaction prices.
- The Worker is inference-only; scraping, quality checks, and training run outside Cloudflare.
- Candidate models are versioned, and the active model changes only after promotion gates pass.