docs(outage): resolved

StealthyCoder · StealthyCoder · commit 42d13f42e0a3 · 2025-06-13T06:03:35.000+01:00
Signed-off-by: Eric Bode &lt;eric.bode@foundries.io&gt;
diff --git a/outage/2025-06-12-gcs.md b/outage/2025-06-12-gcs.md
@@ -10,6 +10,18 @@ We use CloudFlare to make sure end users can access our Web UI over at, amongst
 We use Google Cloud Storage to store all our data in the cloud, amongst others, artifacts from our CI.
 The latter is causing our core infrastructure to be down.
 
+### CloudFlare
+
+The root cause analysis for CloudFlare seems to be:
+
+> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.
+
+Which could be due the fact they might be running parts on Google Cloud Platform as well, or it is just a coincidence.
+
+### Google
+
+Google has not released a root cause analysis report as of this time, but when they do it will be included here as well.
+
 ### Timeline of Events
 
 - **18:31 UTC** - It is noticed that CloudFlare has an incident
@@ -18,3 +30,13 @@ The latter is causing our core infrastructure to be down.
 - **19:13 UTC** - CloudFlare states that services are recovering
 - **19:30 UTC** - Google states that services have recovered except us-central1 region
 - **19:30 UTC** - Our CI status page is active again, service seems to be recovering
+- **19:45 UTC** - Temporary measure removed
+- **21:31 UTC** - CloudFlare reports to be fully operational
+- **21:35 UTC** - Fix for added resiliency added to our CI codebase
+- **01:27 UTC** - Google reports to be fully operational
+
+## Lessons Learned
+
+Identified a non-integral part of our CI codebase that had the potential to disrupt and cause a huge fall out if a third party service happens to be down or suffer a partial outage.
+A fix is now put in place to help further to be more resilient when a similar outage happens again. Work is also ongoing to scan all of our services for similar points of failure
+and address then accordingly.