Skip to content

Various tweaks to our new API#679

Open
edavey wants to merge 12 commits into
mainfrom
tweaks-to-api
Open

Various tweaks to our new API#679
edavey wants to merge 12 commits into
mainfrom
tweaks-to-api

Conversation

@edavey
Copy link
Copy Markdown
Collaborator

@edavey edavey commented May 21, 2026

Please see individual commits for details. The most visible change is the inclusion of the RSwag UI library to serve up our OpenAPI spec at /api-docs:

open_api_spec_served

It's important to be able to see the documentation. It's now available
at /api-docs
edavey added 2 commits May 22, 2026 07:49
We disable SWAGGER_DRY_RUN in the generate_swagger rake task so that
rswag executes requests and captures real responses.

And we use a TimePeriod block with realistic organisation names to produce
a meaningful exampleshowing formats, embed codes, and organisation name.
The generated OpenAPI spec is JSON, not YAML. The misleading .yaml
extension was inherited from rswag defaults. We rename to .json and
update the swagger_helper and rswag_ui config to match.
edavey added 9 commits May 22, 2026 08:05
The organisation factory generates random UUIDs, causing the OpenAPI
example to change on every test run. This causes the OpenAPI spec's example
to have a different Organisation "content_id" on each run
(rake api:generate_swagger) and fail as the test output must match the
committed spec.

The spec will now consistently generate:

"organisation": {
  "name": "HM Revenue & Customs",
  "content_id": "3aa1b2c3d-1234-5678-abcd-000000000001"
},
Each call to page_href was mutating the single @request_url instance
via .tap.

This happens to work today but would silently produce
wrong URLs if call order changed or additional query params were
introduced.

We now use .dup to produce an independent URI for each link.
This extracts query_with_page to separate the concern of building the query
string from constructing the full URL. Each step is now explicit: we
duplicate the URL, build the query with the target page, then assign it.
BlockPresenter previously called lead_organisation.name without
guarding against nil, which would produce "whiny nil"
if an edition's lead_organisation_id didn't resolve to a known
organisation.

This can happen because Organisation is not an ActiveRecord model —
it wraps a cached Publishing API response. The OrganisationValidator
only checks presence of the UUID, not that it maps to an actual
organisation. An org could be removed from the API, or the cache
could be stale.

Options to consider for stronger enforcement:

1. Validate the UUID resolves at write time (adds API dependency)
2. Defensive nil handling in the presenter (return nil for org)
3. Both — validate on write, defend on read

For now we raise a descriptive error to surface the problem clearly.
We replace the two-step approach that materialises IDs in Ruby with a
single SQL subquery, letting Postgres optimise as one operation.

Before (two round-trips, IDs materialised in Ruby):

  SELECT id FROM documents WHERE ...keyword match...
  SELECT * FROM documents JOIN editions ... WHERE documents.id IN ('uuid1', 'uuid2', ...)

After (single query with subquery):

  SELECT * FROM documents JOIN editions ...
  WHERE documents.id IN (SELECT id FROM documents WHERE ...keyword match...)

This avoids passing potentially large UUID arrays over the wire and
allows the query planner to use a semi-join optimisation. A semi-join
returns rows from the outer table where at least one match exists in
the inner table, without duplicating rows. Crucially, the database
can stop scanning as soon as it finds the first match for each outer
row, rather than materialising the full inner result set.

Reference: https://www.navicat.com/en/company/aboutus/blog/2783-the-sql-semi-join
The test previously created 3 published editions on the same document.
This is an impossible state — a document can only have one published
edition at a time (publishing a new edition supersedes the previous).

The test passed by coincidence: most_recent_for_document filters by
MAX(updated_at) per document_id, and all 3 editions shared the same
timestamp, so all matched. This masked the fact that pagination wasn't
being tested with realistic data.

Create a separate document for each edition, representing 3 distinct
blocks in search results — the scenario pagination actually serves.

Note: the constraint that a document may have at most one published
edition is currently unenforced in the model layer and should be
addressed separately.
The stub previously matched any argument, so the test would pass even
if the query forgot to forward the keyword. Add .with('keyword') to
verify the contract between Query and Document.with_keyword. The
scope itself is unit tested elsewhere (document_filter_spec.rb).
We now return 400 Bad Request with a descriptive error message when the page
parameter is zero, negative, or non-numeric. Out-of-range pages
(e.g. page=999) continue to return 200 with empty results, which is
reasonable for paginated APIs and allows consumers to detect the end
of results without a separate error path.
The step definition created all editions with identical created_at
timestamps (factory default: Time.zone.now). Since the query orders
by editions.created_at DESC, results appeared in an unpredictable
order, causing the pagination scenario to fail intermittently.

We now give each edition a distinct created_at (index.days.ago) so the
first row in the feature table is the most recently created and
appears first in results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant