Skip to content

Update documentation#502

Open
hannako wants to merge 1 commit into
mainfrom
engine_customisation_docs
Open

Update documentation#502
hannako wants to merge 1 commit into
mainfrom
engine_customisation_docs

Conversation

@hannako
Copy link
Copy Markdown
Contributor

@hannako hannako commented Jul 9, 2025

No description provided.

@hannako hannako force-pushed the engine_customisation_docs branch from 3aa5463 to 123c60f Compare July 9, 2025 20:56
@hannako hannako requested a review from csutter July 9, 2025 20:57
@hannako hannako force-pushed the engine_customisation_docs branch from 92436fe to 123c60f Compare July 10, 2025 11:33
We capture interaction data from opted in users in Google Analytics. This data is processed and
ingested into Discovery Engine in bulk on a daily basis to help train the model on what content
users are most likely to be looking for.
We capture User event data from opted in users in Google Analytics. This is ingested into Discovery Engine in bulk on a daily basis to help train the model on what content users are most likely to be looking for.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

We capture interaction data from opted in users in Google Analytics. This is then ingested into Discovery Engine as user events in bulk [...]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because "user event" is a VAIS-specific term, not a GA4 one really


This is orchestrated by a set of serverless GCP Cloud Functions and associated plumbing in
[search-v2-infrastructure][search-v2-infrastructure].
The bigquery tables from which we obtain our User event data are configured in [govuk-infrastructure][event-ingestion-link].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably less important to specify where the tables are created, than the Dataform pipeline that actually populates them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


To reduce the latency of User Event data being made available to the models, it's processed and ingested into Discovery Engine as follows:

- [Once per day at midday][link-to-cron-task-1], we import data for the previous day
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth specifying that is complete data, and that the intraday data is not guaranteed to be perfect

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants