AI/ML/DS Ticket: Baseline Pricing Engine v0 (Perplexity + LLM + Heuristics)
Summary
- Build a minimal but real pricing engine that combines:
- Simple, explainable heuristics on top of historical features (day-of-week, seasonality, optional occupancy/pickup proxies).
- External signals via the Perplexity Search API to derive a daily event impact score.
- An optional LLM summarizer to produce short human-readable reasons.
- Package it under
ml/ with clean function boundaries so the backend can later call it via MLService or a thin adapter.
- Must run locally without external keys (falls back to heuristics and templated reasons).
Why Now
- Unblocks an initial “real” engine beyond a pure mock while keeping scope tractable. Establishes extensible interfaces for future model swaps (XGBoost/LightGBM), PMS-fed features, and richer signals.
Acceptance Criteria
ml/ package exists and is importable from backend (local dev path is fine).
ml/inference/predict.py exposes predict_price(features: dict) -> { price_rec, price_min, price_max, drivers }.
- A batch function
score_dates(hotel_id, room_type_code, dates) returns a list of items with non-flat prices, weekend uplift, and reasonable bounds.
- Perplexity adapter can fetch events for a location and date range, map them into a per-day
impact_score in [0, 1], and cache responses locally. If PERPLEXITY_API_KEY is missing, returns empty events gracefully.
- LLM summarizer produces a ≤ 160 char reason string from drivers if
OPENAI_API_KEY is set; otherwise, returns a deterministic templated string.
- Unit tests cover: weekend uplift, month seasonality, bounding, and adapter fallbacks.
Proposed Directory Structure
ml/
requirements-ml.txt
__init__.py
features/
__init__.py
schema.py # Pydantic schemas for feature rows
make_features.py # placeholder to compute features from raw (later)
inference/
__init__.py
predict.py # baseline heuristics + optional signals
external/
__init__.py
perplexity_adapter.py # search → normalized events per day + caching
llm_reasoner.py # optional OpenAI call; templated fallback
utils/
__init__.py
dates.py # date helpers
caching.py # simple file cache
cache/ # gitignored JSON cache files
artifacts/ # future model files (gitignored)
Detailed Implementation Plan
- Create
ml/requirements-ml.txt
- Contents (pin reasonably):
pandas>=2.1
numpy>=1.26
pydantic>=2.7
python-dotenv>=1.0
requests>=2.32
perplexityai>=0.17.0
openai>=1.0.0
- Define feature schemas (
ml/features/schema.py)
from pydantic import BaseModel, Field
class FeatureRow(BaseModel):
date: str # YYYY-MM-DD
hotel_id: int
room_type_code: str
published_rate: float | None = None # if provided (e.g., from PMS)
occupancy_pct: float | None = None # 0..1 if available
pickup_24h: int | None = None # new bookings in last 24h
month: int | None = None
dow: int | None = None # 0=Mon..6=Sun
event_impact: float | None = None # 0..1 (filled by adapter)
- Perplexity adapter (
ml/external/perplexity_adapter.py)
- Responsibility: given
(location: str, from: str, to: str), return a dict mapping date -> impact_score and a list of raw sources.
- Strategy:
- Query Perplexity with:
"events in {location} between {from} and {to} that impact hotel demand".
max_results=5..10, extract dates if present, otherwise heuristically map to nearest relevant days.
- Score each result (e.g., concerts/sports: 0.6–0.9; conferences: 0.3–0.6) and clamp to [0, 1].
- Cache JSON responses under
ml/cache/perplexity_{hash}.json keyed by (location, from, to).
- If
PERPLEXITY_API_KEY is missing or request fails, return empty mapping and empty sources.
Example shape:
{
"daily": {"2025-11-12": 0.7, "2025-11-13": 0.4},
"sources": [{"title": "Taylor Swift @ Aviva Stadium", "url": "https://...", "date": "2025-11-12"}]
}
- LLM reasoner (
ml/external/llm_reasoner.py)
- Input:
drivers: list[str], date, optional extra context (event title snippets).
- Output: short reason (≤160 chars). If
OPENAI_API_KEY missing, fallback to ", ".join(drivers) with a prefix like "Drivers: ...".
- Baseline heuristics (
ml/inference/predict.py)
- Rules:
- Start with
base = published_rate if provided else 150.0.
- Weekend uplift: +20 for Fri/Sat (dow 4/5).
- Midweek softness: -10 for Tue/Wed (dow 1/2).
- Seasonality: monthly map
{6: +10, 7: +15, 8: +10, 12: +5}.
- Event impact:
base += round(25 * event_impact, 2) if provided.
- Occupancy/pickup (if available):
base += min(15, (occupancy_pct or 0)*10 + min(10, (pickup_24h or 0))).
- Bounds:
price_min = base - 20, price_max = base + 20 (then round 2 decimals; ensure min < rec < max by adjusting if needed).
- Drivers: collect labels for each adjustment applied (e.g.,
"Weekend uplift", "Seasonality", "High pickup", "Event impact").
API:
from typing import Dict
from ml.features.schema import FeatureRow
def predict_price(model: object | None, row: Dict) -> Dict:
# model is reserved for future use; ignored in v0
f = FeatureRow(**row)
# compute base using rules above → return dict with keys:
# price_rec, price_min, price_max, drivers
- Batch scoring helper
from typing import List, Tuple
from datetime import date, timedelta
def score_dates(*, hotel_id: int, room_type_code: str, from_date: str, to_date: str, location: str | None = None) -> Tuple[list[dict], dict]:
# Build feature rows for each date
# If location provided and PERPLEXITY_API_KEY available → fetch daily impact
# Call predict_price for each row
# Return (items, metadata) where metadata contains sources and parameters
- Caching utilities (
ml/utils/caching.py)
- Minimal JSON read/write with file lock to avoid corruption.
- Hash key:
sha1(json.dumps(params, sort_keys=True)).
- Tests
- Add lightweight unit tests (pytest) for:
- Weekend uplift on a known Friday vs Wednesday.
- Seasonality bumps in July.
- Bounds always 40 wide centered around rec ±20.
- Perplexity adapter fallback without key.
Optional (Stretch)
- Provide a thin adapter in
backend/app/services/ml_service.py that, if USE_ML_BASELINE=true, proxies quote() to ml.inference.predict.score_dates for now. Keep it behind a flag; default continues to mock.
Environment Variables
PERPLEXITY_API_KEY — required to fetch events; otherwise adapter no-ops.
OPENAI_API_KEY — required for LLM reasons; otherwise templated reasons.
USE_ML_BASELINE — optional boolean to toggle backend to this engine later.
Local Run and Manual Test
# 1) Install DS deps
pip install -r ml/requirements-ml.txt
# 2) Quick check (Python REPL)
from ml.inference.predict import score_dates
items, meta = score_dates(hotel_id=1, room_type_code="DLX-QUEEN", from_date="2025-11-10", to_date="2025-11-20", location="Dublin, Ireland")
len(items), items[0], list(meta.keys())
Out of Scope (for this ticket)
- True model training, feature stores, or PMS ingestion.
- Persistence of predictions or integration into API routes (covered by separate backend ticket).
- Advanced explanations (SHAP, feature importances).
Risks & Mitigations
- API rate limits or missing keys → design graceful fallbacks and caching.
- Event date extraction ambiguity → start with manual date fields from results; extend with light NLP later.
- Noisy heuristics → keep drivers explicit and explainable; will be replaced by a trained model.
Resources
Definition of Done
ml/ package created with modules listed above.
score_dates returns valid items for a 10-day range with non-flat prices and sensible drivers.
- Perplexity and LLM integration works when keys set; otherwise code falls back without errors.
- Unit tests for heuristics and adapter fallback pass locally.
- strong documentation on the whats, how and why you made decisions.
AI/ML/DS Ticket: Baseline Pricing Engine v0 (Perplexity + LLM + Heuristics)
Summary
ml/with clean function boundaries so the backend can later call it viaMLServiceor a thin adapter.Why Now
Acceptance Criteria
ml/package exists and is importable from backend (local dev path is fine).ml/inference/predict.pyexposespredict_price(features: dict) -> { price_rec, price_min, price_max, drivers }.score_dates(hotel_id, room_type_code, dates)returns a list of items with non-flat prices, weekend uplift, and reasonable bounds.impact_scorein [0, 1], and cache responses locally. IfPERPLEXITY_API_KEYis missing, returns empty events gracefully.OPENAI_API_KEYis set; otherwise, returns a deterministic templated string.Proposed Directory Structure
Detailed Implementation Plan
ml/requirements-ml.txtml/features/schema.py)ml/external/perplexity_adapter.py)(location: str, from: str, to: str), return a dict mappingdate -> impact_scoreand a list of raw sources."events in {location} between {from} and {to} that impact hotel demand".max_results=5..10, extract dates if present, otherwise heuristically map to nearest relevant days.ml/cache/perplexity_{hash}.jsonkeyed by(location, from, to).PERPLEXITY_API_KEYis missing or request fails, return empty mapping and empty sources.Example shape:
{ "daily": {"2025-11-12": 0.7, "2025-11-13": 0.4}, "sources": [{"title": "Taylor Swift @ Aviva Stadium", "url": "https://...", "date": "2025-11-12"}] }ml/external/llm_reasoner.py)drivers: list[str],date, optional extra context (event title snippets).OPENAI_API_KEYmissing, fallback to", ".join(drivers)with a prefix like"Drivers: ...".ml/inference/predict.py)base = published_rate if provided else 150.0.{6: +10, 7: +15, 8: +10, 12: +5}.base += round(25 * event_impact, 2)if provided.base += min(15, (occupancy_pct or 0)*10 + min(10, (pickup_24h or 0))).price_min = base - 20,price_max = base + 20(then round 2 decimals; ensure min < rec < max by adjusting if needed)."Weekend uplift","Seasonality","High pickup","Event impact").API:
ml/utils/caching.py)sha1(json.dumps(params, sort_keys=True)).Optional (Stretch)
backend/app/services/ml_service.pythat, ifUSE_ML_BASELINE=true, proxiesquote()toml.inference.predict.score_datesfor now. Keep it behind a flag; default continues to mock.Environment Variables
PERPLEXITY_API_KEY— required to fetch events; otherwise adapter no-ops.OPENAI_API_KEY— required for LLM reasons; otherwise templated reasons.USE_ML_BASELINE— optional boolean to toggle backend to this engine later.Local Run and Manual Test
Out of Scope (for this ticket)
Risks & Mitigations
Resources
Definition of Done
ml/package created with modules listed above.score_datesreturns valid items for a 10-day range with non-flat prices and sensible drivers.