Skip to content

feat: add explore() destination discovery (supersedes #20)#28

Merged
saraswatayu merged 36 commits into
mainfrom
worktree-enchanted-napping-crescent
Jun 3, 2026
Merged

feat: add explore() destination discovery (supersedes #20)#28
saraswatayu merged 36 commits into
mainfrom
worktree-enchanted-napping-crescent

Conversation

@saraswatayu

Copy link
Copy Markdown
Owner

Summary

Adds explore() — swoop's fourth primitive, "where could I go from here?" — via the Google Flights GetExploreDestinations RPC. This reshapes #20 from first principles after live de-risking, and supersedes it.

The headline finding from a live trace: GetExploreDestinations returns no price (the map's prices come from a separate client-side path). So explore is a metadata/discovery primitive, and pricing composes via a bridge — mirroring how deals() pairs with price_deal(). explore answers the same question as deals() ("where can I go from X?") but with a different selection algorithm: inspiration/relevance (scenic + leisure destinations) vs. bargains. At JFK they overlap on only ~11 of ~24 destinations.

Public API

  • explore(origin, *, cabin, one_way, max_stops, passengers, transport) -> ExploreResult — one-way or roundtrip (one-way verified supported, unlike deals).
  • price_explore(destination) -> PriceResult | None — prices a chosen destination's cheapest itinerary through the existing search/price pipeline.
  • ExploreDestination / ExploreResult dataclasses; ExploreDestination.to_search_kwargs() as the composition seam. origin + query context are denormalized so the bridge is self-contained.
  • swoop explore JFK CLI with table/JSON/CSV/brief output, --one-way, and deals-style client-side filters (--destination, --exclude-destination, --region, --trip-length). Origin validated via the Click IATA type for the rich error.

Design vs. PR #20 (the delta)

  • Metadata-only (price call traced and found not reliably extractable) → price_explore bridge, no --prices flag (consistent with deals).
  • One-way support (PR hardcoded roundtrip).
  • destination_* field naming aligned with Deal; dropped always-null distance/parent_place_id.
  • Reuses rpc._post_with_retry; anti-bot detection; in-code response-schema map; max_stops validated in the public function.
  • Live canary added to tests/test_live_contract.py (the PR shipped mock-only tests).

Full rationale: docs/superpowers/specs/2026-06-02-explore-endpoint-design.md.

Test plan

  • make check (pyright + offline suite) — 1442 passed, 54 skipped, 0 type errors.
  • New tests/test_explore.py covers payload building (one-way + roundtrip), parsing against fixtures, to_search_kwargs(), price_explore wiring, and CLI for all four formats + the rich IATA error.
  • tests/test_api_surface.py updated for 2 new public types + 2 functions + __all__.
  • Frozen-surface, README, CHANGELOG, and CLAUDE.md (gotchas, architecture, file map) updated.
  • Live canary not yet run live — the probe endpoint soft-blocked (HTTP 400) after heavy use during development; runs in live-canary.yml / can be run once unthrottled.

Known follow-ups (documented, non-blocking)

  • place_id full set: the IATA-origin form returns a region-scoped subset (~24); a Google place_id origin + flag 4 returns the full worldwide set (~85). Shipping the IATA form per the spec's D8 fallback; place_id resolution is additive and won't change the public API.

Also included

  • chore: remove dead code and unused imports across codebase — a repo-wide pyright unused-symbol sweep (3 dead functions, dead imports, unused locals). Project gate stays clean; intentional cross-module re-exports left in place.

🤖 Generated with Claude Code

saraswatayu and others added 30 commits June 2, 2026 12:05
Design for swoop's fourth primitive (destination discovery), reshaping
PR #20 from first principles after live de-risking:

- metadata-only — browser-traced the Explore map's price call; prices are
  not reliably extractable from GetExploreDestinations, so prices come from
  composition via a price_explore() bridge
- one-way + roundtrip (verified the RPC supports both)
- place_id origin form returns the full ~85-destination set vs 24 for IATA
- destination_* model aligned with Deal; drop always-null fields
- live-canary gate (tests/test_live_contract.py) as the credibility bar
- CLI surface matches the shared search/deals/price vocabulary

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove unused functions: _filter_by_flight_number (__init__),
  _stderr_console + _trip_lines + _trip_leg_line (cli/formatters)
- Remove unused typing imports: Union (builders, decoder), datetime (cli/utils)
- Drop/underscore unused loop variables (_selection, cli/commands)

make typecheck clean; full offline suite (1422 tests) passes. Intentional
cross-module re-exports (rpc.py _booking re-exports consumed via the rpc.*
namespace by tests/scripts, _regions._airportsdata_available, utils.format_duration)
are left in place since removing them breaks consumers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The =/+/-/@/tab/CR formula-injection guard was copy-pasted in three CSV
formatters (price, deals, explore). Promote it to a module-level
_csv_safe() / _DANGEROUS_PREFIXES so the security control has one
definition; each formatter now aliases _s = _csv_safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ~15-line MIN-MAX nights parse/validate block was a verbatim copy in
deals_cmd and explore_cmd. Extract _parse_trip_length() so the flag's
syntax, 0..365 bounds, and error wording live in one place.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xplore

Both bridges ran an identical 'search then price the cheapest itinerary'
body. Hoist it into _price_cheapest() so the cheapest-selection contract
has one definition.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the repeated `list(xs[:limit]) if limit else list(xs)` idiom with
`list(xs[:limit])` in all four explore formatters. A slice already honors
limit=None (xs[:None] returns the whole list), so the conditional was
redundant; as a bonus --limit 0 now yields zero rows instead of all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_build_explore_payload hardcoded the passenger slot to [1,0,0,0], so
explore(passengers=...) was silently ignored upstream while destinations
were still labeled query_adults=N — and price_explore then priced a party
size the explore query never performed. Thread Passengers through to slot
6 ([adults, children, infants_in_seat, infants_on_lap]), mirroring
rpc._build_request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_parse_destination read return_date from row [12] unconditionally, so the
one-way 'no return date' invariant depended on the server omitting [12]. If
Google ever returns it, price_explore would inject return_date and price a
roundtrip for a one-way request. Thread one_way through the parser and null
return_date for one-way queries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
to_search_kwargs guarded a missing destination but not a missing
departure_date, so it emitted date=None and price_explore crashed deep in
validate_date with a confusing 'date must be a date string, got NoneType'.
Guard departure_date too and document it; covered by a new test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--trip-length needs a return date, so combining it with --one-way made
_nights() drop every destination, leaving the user with a misleading 'No
destinations found' and exit 1. Reject the combination up front with a
clear error and exit 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… consent check

_extract_inner only parsed the first [[ line and raised on any failure, so a
reordered/empty leading chunk or a garbled first chunk could yield empty or
missing destinations. Now scan every wrb.fr frame, skip unparseable ones, and
prefer the chunk carrying destinations. Also test the consent host on the
lowercased text so a mixed-case 'Consent.Google.com' is still detected as a
block instead of a confusing parse error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Row index [17] is a ground/drive time for nearby drivable suggestions (it
correlates with the 'NNN mi' distance at [8] and is null for fly-to
destinations), not a flight duration — the Explore RPC returns none. The
field claimed flight duration and the CLI showed it under 'Duration', which
is misleading. Rename the public field to drive_minutes, document it, and
relabel the table column 'Drive'. Updates the frozen API-surface test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Explore RPC returns no flight duration; the field surfaces drive time
to nearby drivable destinations (now drive_minutes). Fix the changelog
description to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Filtering (destination whitelist/blacklist, region, trip_length) lived only
in explore_cmd, so library callers of explore() got the full set with no way
to narrow it — inconsistent with deals(). Add the same filter params to
explore(), centralize the logic in a new _explore_filter module (mirroring
_deals_filter), and have the CLI delegate to it. explore() now also rejects
trip_length + one_way at the right altitude (the library), which the CLI
surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fetch_explore did a full page GET on every call just to re-scrape the stable
bl/f.sid identifiers. Cache them per (proxy, impersonate, country) so repeated
explore() calls skip the page fetch, and invalidate + refetch once on a parse
failure so a genuinely stale session still recovers. A looped explore() now
pays one page GET instead of one per call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
price_explore prices one destination (search + price_selector, two round
trips), so pricing a whole explore page meant 2N sequential calls. Add
price_explore_all(destinations) which prices them in parallel via a thread
pool (mirroring deals' multi-origin fetch), preserves input order, and yields
None for unpriceable entries instead of raising — one bad entry never aborts
the batch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hing

Reflect the new public surface in CLAUDE.md (architecture tree, flows, file
map, public-API line), README (filter + bulk-pricing examples), and CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rors

The bl/f.sid cache only invalidated inside the _extract_inner try, so a stale
session that produced a 4xx (raised by _post_with_retry before parsing) left
the bad params cached permanently — every later call re-failed with no
self-heal. Invalidate on any rejection (parse error OR HTTP error), retry once
only when the failed attempt actually used cached params (no pointless double
fetch on a cold-cache failure), and let rate-limit errors propagate untouched
since the per-request backoff already ran.
…_all batch

price_explore_all only caught ValueError, so a transport/parse failure on a
single destination propagated out of pool.map() — which re-raises on the first
erroring item and abandons every other already-computed price. Catch SwoopError
per item, log it at WARNING, and return None for that entry so partial results
survive (mirroring _fetch_deals_per_origin).
Three related defects in the bl/f.sid session handling, all in _explore.py:

- Cache poisoning: _get_browser_params cached whatever _browser_params()
  returned — including an empty {} (both regexes miss on a markup change) or a
  partial dict (only bl, or only f.sid) — and the read guard treated {} as a
  hit (`if cached is not None`). A single bad page fetch then pinned a broken,
  session-less request for every later call, process-wide, until something
  raised. Now only complete params (both tokens) are cached, and the read
  guard is truthy, so an incomplete extraction can never poison the cache.

- Cold-miss stampede + racy retry: the lock was dropped between the cache read
  and the page GET, so N concurrent callers each fetched the page; and the
  retry decision used a pre-snapshot (had_cached_params) that a concurrent
  populate could invalidate. Now a per-key lock makes the fetch single-flight,
  and the retry keys off whether the failing attempt actually used cached
  params (returned as `served_from_cache`), set at the moment of use.

- Wasted retry on hard blocks: every parse/HTTP failure was treated as a stale
  session and retried with a fresh page. An HTML/consent interstitial is a hard
  block a fresh session can't clear, so it now raises _ExploreBlockedError (a
  private SwoopParseError subclass) and propagates without the pointless second
  round trip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The batch-survival handler caught SwoopRateLimitError (a SwoopError subclass)
the same as any per-item failure: it logged, returned None, and kept dispatching
all workers — so the moment Google returned 429, swoop fired up to max_workers
more concurrent multi-RPC searches at the limiter and handed the caller a
quietly-incomplete list of mostly None with no signal to back off.

Now the first 429 sets an event that short-circuits the not-yet-started tasks
(stop adding load) and re-raises SwoopRateLimitError after the pool drains, so
the caller knows to wait and retry. Non-rate-limit transport/parse failures
still yield None per destination and preserve the rest of the batch (now via
order-stable index assignment rather than pool.map's return order). The default
max_workers drops 8 -> 4 to match deals's multi-origin cap, since each pricing
call is itself several RPCs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ExploreDestination only persisted query_cabin and query_adults, so
price_explore() silently re-priced against a default search:

- max_stops was dropped — exploring with --nonstop (max_stops=0) and then
  pricing could pick a cheaper 1-2 stop connection, returning a price for a
  trip the user had constrained away.
- children / infants influenced which destinations and dates Google suggested
  (they are wired into the discovery payload) but were not recorded, so pricing
  used a smaller, cheaper party than the one explored.

Add query_children, query_infants_in_seat, query_infants_on_lap, and
query_max_stops to ExploreDestination (frozen surface test updated), thread the
full Passengers + max_stops through parse_explore_payload/_parse_destination,
and emit them from to_search_kwargs (max_stops via `is not None` so nonstop
survives; passengers only when the party is non-default). Also tidies
destination_country to use the _opt_str helper its sibling fields already use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
saraswatayu and others added 6 commits June 2, 2026 16:00
A negative TransportConfig.retries made `range(1 + retries)` empty, so the
loop never ran, the function fell off the end and returned None, and callers
that dereference `res.text` (e.g. fetch_explore) crashed with AttributeError
instead of a clean error. The CLI --retries option has no lower bound, so
`swoop explore JFK --retries -1` reached this. Clamp the loop to at least one
iteration; negative retries now behave like zero retries (one attempt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two output-rendering bugs in the explore formatters:

- All four formatters sliced `result.destinations[:limit]` unconditionally, so
  `swoop explore JFK -l 0` showed ZERO destinations while the sibling
  `swoop deals JFK -l 0` shows all. Match deals: `[:limit] if limit else all`.

- The table and brief formatters gated the drive column on `if d.drive_minutes`,
  so a real drive_minutes == 0 ("you're basically there") rendered as the "—"/""
  no-data placeholder. Use `is not None` so 0 renders as 0m.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- One-way coverage: the existing tests only asserted return_date is None,
  which the parser hardcodes regardless of the wire — so a one-way response
  layout drift would go undetected. Add an offline parser test asserting the
  other wire-index fields (name [2], country [4], departure [11], IATA [15])
  parse on the one-way path, and strengthen the live one-way canary to assert a
  real departure date + name/place id (not just the forced-None return date).

- Cache isolation: add an autouse conftest fixture that clears the module-global
  _browser_params_cache and per-key locks around every test. Tests seed/populate
  that process-wide cache, which would otherwise leak into later tests (and live
  runs) and make behavior order-dependent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Small, behavior-preserving cleanups from the review:

- Drop the speculative `origin_flag` parameter from _build_explore_payload and
  inline the single-use _origin_block helper. The flag was never passed
  non-zero (the place_id + flag-4 worldwide form is an unimplemented future
  enhancement); inline it with a comment until that lands (YAGNI).

- Remove the dead `return` statements after `ctx.exit()` in explore_cmd, which
  raises SystemExit — matching deals_cmd, which already omits them. To let the
  same cleanup apply in the shared _parse_trip_length (where the exit is
  mid-function before lo/hi are used), type its `ctx` parameter as
  click.Context so the type checker can see ctx.exit is NoReturn; the returns
  were otherwise load-bearing only to satisfy pyright with an untyped ctx.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The review flagged that _extract_inner might handle only the length-prefixed
line framing (the fixtures' shape) and silently return empty on the flat-array
framing _deals documents. It actually handles both — its per-line scan
json.loads()es each `[[`-prefixed line and walks every entry for wrb.fr frames,
so a single flat array of frames parses fine. Pin that with a regression test
rather than refactor three subtly-different batchexecute parsers (explore reads
[3,0], deals [3,9]) into one shared helper across shipped code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump __version__ to 0.6.0 and finalize the CHANGELOG section: explore() +
price_explore()/price_explore_all() + ExploreDestination/ExploreResult, with the
session's hardening folded in (carried query context incl. max_stops, rate-limit
propagation in price_explore_all, complete-params-only single-flight session
cache). No MIGRATION entry — explore is purely additive, matching how the 0.5
deals feature was handled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@saraswatayu saraswatayu merged commit 887929a into main Jun 3, 2026
5 checks passed
@saraswatayu saraswatayu deleted the worktree-enchanted-napping-crescent branch June 3, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant