Skip to content

Google sheets integration#2112

Open
chigby wants to merge 9 commits intodevelopfrom
google-sheets-integration
Open

Google sheets integration#2112
chigby wants to merge 9 commits intodevelopfrom
google-sheets-integration

Conversation

@chigby
Copy link
Copy Markdown
Contributor

@chigby chigby commented Mar 19, 2026

Description

Fixes #2074

This pull request:

  • Adds 3 models that pertain to Prepublication incidents: 2 for the prepub incident data itself and its relationships, and 1 for the "Sync" which holds the status, date, and human-readable messages about the last time the sync command was completed.
  • Speaking of the command, I've added a new command sync_prepubs. Conceptually, this command is pretty simple: it connects to a specific Google sheet, fetches the rows, selects the ones that refer to pre-published incidents, and saves the data as a Django model: PrepublicationIncident. We're interested in 3 pieces of data: a date, a location, and one or more categories.
  • The date is expected to be in a particular format, and gets saved as a python date.
  • The location is (mostly) in the format of City and State Abbreviation. We use this to look up the GeoName for that location. If there isn't a GeoName for that location, the row will be marked as invalid. There is a special case of "Washington, D.C." that we have to handle separately.
  • The Categories are formatted on the Google sheet as comma-separated names. We would like know which of our CategoryPage objects these names correspond to. The names do not necessarily match the titles of any existing category pages, so I added a field on that model (accessible from the Settings tab) to store the exact string that is used in the Google sheet. These have to be updated manually. The choices for these names are found in the Categories tab on the Google Sheet.
  • The sync_prepubs command analyzes the rows, and if it finds invalid data it will not create or delete any Django objects. All data must be valid before the sync process will actually sync.
  • It is currently not possible to view the PrepublicationIncident objects in the admin or anywhere on the site. I recommend using the ./manage.py shell or viewing the database directly to see the data in there.
  • However, I have also update the Dashboard, visible at the /admin home page, to display the outcome of the most recent sync. If there are errors, it will put the actual error message in a <details> element. This part of the site could use some nicer design, I think, but I have run out of time for that.
  • I've tried to add unit tests for the sync command. I hope reading these will explain some more about the various requirements of the shape of the data and the other behaviors of the command.

Type of change

  • Bug fix
  • New feature
  • Vulnerabilities update
  • Config changes
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires an admin update after deploy
  • Includes a database migration removing or renaming a field

Testing

  1. Credentials. In order to run this command and have it actually connect to Google Drive, you must have the credentials. This takes the form of a JSON document that is taken in by the Wagtail app as an environment variable. I've added this to the "Engineering - Web" vault on 1password under the name "Prepublication Incidents Service Account Credentials". As far as getting this into an environment variable, I am using the shell command
export GOOGLE_SHEETS_CREDS=`cat google_creds.json`

but if you have any better ideas for this, let me know.

  1. Geonames fixtures. The command won't be able to look up the location names unless the geonames fixtures are loaded with ./manage.py loaddata cities5000-us-only.json.xz

  2. Once you have the credentials set up, test this command with ./manage.py sync_prepubs. Ideally there should be no output. The results of the command should be visible at /admin. You can check the actual data in the ./manage.py shell or in the DB.

Pre-deployment actions

The env var GOOGLE_SHEETS_CREDS needs to be populated.

Post-deployment actions

  1. Category pages need to be updated to have the google_sheets_name filled correctly
  2. The sync_prepubs command should be added to a recurring task.

Checklist

General checks

  • Linting and tests pass locally
  • The website and the changes are functional in Tor Browser
  • There is no conflicting migrations
  • Any CSP related changes required has been updated (check at least both firefox & chrome)
  • The changes are accessible using keyboard and screenreader

If you made changes to API flow:

  • Verify that API responses are correct
  • Verify that visualizations using the API endpoints are functional

If you made changes to incident model metadata

  • Verify incident export works correctly
  • Verify incident filters are rendered correctly
  • Verify incident filters show correct incidents
  • Verify categories work
  • Verify incidents are discoverable by search

If you made changes to blog

  • Verify that the blog index page renders correctly
  • Verify that the individual blogs show all the informations correctly

If you made changes to shared templates (e.g. card design, lead media, etc.)

  • Verify that it renders correctly in homepage, if applicable
  • Verify that it renders correctly in incident index page, if applicable
  • Verify that it renders correctly in individual incident page, if applicable
  • Verify that it renders correctly in blog index page, if applicable
  • Verify that it renders correctly in individual blog page, if applicable
  • Verify that it renders correctly in individual special blog page, if applicable

If you made changes to email signup flow

  • Verify that the email signup form in the footer renders and works
  • Verify that the individual email signup pages work

If you made changes to "Submit an Incident" form

  • Verify that the form renders correctly and submit correctly as well

If it's a major change

  • Do the changes need to be tested in a separate staging instance?

If you made any frontend change

If the PR involves some visual changes in the frontend, it is recommended to add a screenshot of the new visual.

@chigby chigby force-pushed the google-sheets-integration branch 5 times, most recently from d25e67f to 22f55b4 Compare March 20, 2026 01:32
@chigby chigby marked this pull request as ready for review March 20, 2026 02:30
@chigby chigby requested a review from a team as a code owner March 20, 2026 02:30
@redshiftzero redshiftzero requested a review from willbarton March 23, 2026 18:44
Comment thread requirements.in
wagtail-inventory>=3.1
unittest-xml-reporting
whitenoise
google-api-python-client
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the google-api-python-client README:

The maintainers of this repository recommend using Cloud Client Libraries for Python, where possible, for new code development due to the following reasons:

Was that considered? Would the Cloud Client Libraries be suitable here?

Comment thread requirements.in Outdated
unittest-xml-reporting
whitenoise
google-api-python-client
google-auth-httplib2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this used? I don't see it in the code.
  2. Google no longer recommends using this library:

For any new usages please see provided transport layers by google-auth library.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, we can remove this requirement. Although it looks like google-auth-httplib2 is still a requirement of google-api-python-client.

@chigby chigby force-pushed the google-sheets-integration branch 2 times, most recently from 6b9eaf1 to 2166c09 Compare April 8, 2026 16:19
Comment thread common/models/pages.py
default='other_incident',
help_text='Please check the styleguide to associate the icons with their name'
)
google_sheets_name = models.TextField(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
google_sheets_name = models.TextField(
google_sheets_category_name = models.TextField(

Comment thread common/models/pages.py
FieldPanel('plural_name'),
FieldPanel('page_symbol'),
FieldPanel(
'google_sheets_name',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'google_sheets_name',
'google_sheets_category_name',

Comment on lines +38 to +39
spreadsheetId=SPREADSHEET_ID,
range="A2:Z5299",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: can you get the whole sheet without a range? Hard-coding this range seems clunky. There's already code to skip rows in sync_prepubs().

class ShortcutsPanel(Component):
# This is an ordering number that is the minimum multiple of 10 to place
# this panel underneath the built-in Wagtail panels.
order = 100
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is there a constant we can use, such as WAGTAIL_MAX_ORDERING + 10?

scopes=["https://www.googleapis.com/auth/spreadsheets.readonly"],
)

return build("sheets", "v4", credentials=creds)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: put v4 in a constant

Copy link
Copy Markdown
Contributor

@harrislapiroff harrislapiroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments—will leave final review to others.

Comment thread incident/models/prepub.py
Comment on lines +31 to +42
class PrepublicationIncidentSync(models.Model):
class Status(models.TextChoices):
SUCCESS = 'SUCCESS'
INVALID_DATA = 'INVALID_DATA'
FAILED = 'FAILED'

status = models.CharField(max_length=255, choices=Status.choices)
completed_at = models.DateTimeField(auto_now=True)
message = models.TextField(default='')

class Meta:
ordering = ["-completed_at"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting in a comment that this is intended to be a singleton

Comment on lines +10 to +17
def handle(self, *args, **options):
try:
status, message = sync_prepubs()
except Exception as e:
message = f'Prepub sync failed: {e}'
status = PrepublicationIncidentSync.Status.FAILED
self.stdout.write(f'Prepub sync failed: {e}')
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally if sync fails, we should ensure the failure gets logged to kibana and marked as an error in some way. Then we can work with infra on what level of notification we want on these sort of failures.

)


SPREADSHEET_ID = "1PeMPpol5d0MrF0KH36ZviN7Z4PipK6ZeSDh9AlJ3-eA"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in today's meeting, we should figure out what to do about this ID. If our auth credentials are tied to the specific spreadsheet, it's possible we should put them in the same place (both env vars?)

Alternatively, if it's possible to update the permissions for our auth credentials dynamically, it might make sense to make this configurable in the wagtail admin as a site setting.

Maybe discuss with infra before settling on a decision?

else:
sync.message = message
sync.status = status
sync.save()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also log here-ish the results of the sync if it's not SUCCESS? That way we'll capture INVALID_DATA in the logs too.

@chigby chigby force-pushed the google-sheets-integration branch from 2166c09 to 410343d Compare April 9, 2026 16:39
@chigby chigby force-pushed the google-sheets-integration branch from 410343d to 031efa9 Compare April 9, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sync unconfirmed incidents data from editorial Google Sheet to Wagtail Database

5 participants