Open
0 of 3 issues completedOpen
0 of 3 issues completed
Description
User Story or Problem Statement
When the CMS is unavailable, Next Build should not attempt to build and deploy static content from CMS data.
Description or Additional Context
This is a follow-up to #20594. The situation is:
- Next Build makes calls to the CMS to retrieve content data as it is building static content
- The CMS goes through a daily deploy, during which the CMS content API is unavailable
- If the content data cannot be retrieved, Next Build does not build the page
- This 'missing' page then becomes part of the static content deploy, and the page is removed from the production site.
Next Build currently has 6 retries on Content API requests before giving up, but it does not fail the build if that threshold is passed.
#20594 was a first step that makes Next Build content release wait until the CMS is available before proceeding with static content build. However, there are still cases where content could be inadvertently removed:
- transient failures in Content API request/response could prevent build of single pages
- CMS deploy can still start while a Next Build content release is underway, removing the Content API mid-build.
Considerations
A mixture of the following could help this situation:
- Next Build can and should fail build upon failing a Content API 6 times; however, this will require shoring up of the Content API connections for CI, lower environment build, etc.
- Next Build could be prevented from starting a content release around the time of the CMS deploy; this is fuzzy and brittle but may be sufficient until we get to on-demand publishing
- This was already completed in Prevent Next Build production Content Release while CMS Deploy is happening #20644
- Next Build could be prevented from deleting content upon sync; but, in this case, we need a different mechanism to delete pages on production when they are archived on the CMS
There may be other solutions that would help the issue as well.
Steps for Implementation
- Review content release logs for network failures to see how often this happens
- Push logs into DataDog for easier visibility
- From the logs, ensure that if we start failing the build on network failure that this will not have a serious detrimental impact (i.e. are we seeing failures multiple times a day, once a week, never, etc.)
- It was also discussed to consider, increasing the number of retries if the network request straight-up fails (see detailed discussion notes below)
Acceptance Criteria
- Next Build content release is modified to fail upon multiple Content API failures for a given request
- Other solutions are also looked at and implemented (this is not a great AC; we may want to split this into multiple tickets)
Activity