Add blog post: Moving local flag evaluation from Django to Rust#16882
Add blog post: Moving local flag evaluation from Django to Rust#16882jina-yoon wants to merge 5 commits into
Conversation
…ation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deploy preview
|
|
Vale prose linter → found 2 errors, 19 warnings, 0 suggestions in your markdown Full report → Copy the linter results into an LLM to batch-fix issues. Linter being weird? Update the rules!
|
| Line | Severity | Message | Rule |
|---|---|---|---|
| 2:20 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 18:27 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 24:2 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'Feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 26:95 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 52:15 | warning | 'axum' is a possible misspelling. | PostHogBase.Spelling |
| 54:46 | warning | 'axum' is a possible misspelling. | PostHogBase.Spelling |
| 54:126 | error | Hi, Andy here... use an en dash ( – ) with spaces. On Mac, holding down the Option and hyphen key will give you an en dash. | PostHogBase.EnDash |
| 95:39 | error | Hi, Andy here... use an en dash ( – ) with spaces. On Mac, holding down the Option and hyphen key will give you an en dash. | PostHogBase.EnDash |
| 107:47 | warning | 'achilles' is a possible misspelling. | PostHogBase.Spelling |
| 115:85 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'Feature flags' for the general industry concept. | PostHogBase.ProductNames |
contents/blog/local-flag-evals-rust.md — 0 errors, 11 warnings, 0 suggestions
| Line | Severity | Message | Rule |
|---|---|---|---|
| 25:346 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 31:174 | warning | Capitalize 'Feature Flags' for PostHog's product. Use 'feature flags' for the general industry concept. | PostHogBase.ProductNames |
| 37:161 | warning | 'mixin' is a possible misspelling. | PostHogBase.Spelling |
| 39:3 | warning | 'ETags' is a possible misspelling. | PostHogBase.Spelling |
| 39:170 | warning | 'ETags' is a possible misspelling. | PostHogBase.Spelling |
| 41:79 | warning | 'allowlist' is a possible misspelling. | PostHogBase.Spelling |
| 41:122 | warning | 'reimplemented' is a possible misspelling. | PostHogBase.Spelling |
| 45:16 | warning | 'bursty' is a possible misspelling. | PostHogBase.Spelling |
| 70:3 | warning | 'PGBouncer' is a possible misspelling. | PostHogBase.Spelling |
| 81:140 | warning | 'Karpenter' is a possible misspelling. | PostHogBase.Spelling |
| 85:21 | warning | 'Tokio' is a possible misspelling. | PostHogBase.Spelling |
| @@ -0,0 +1,85 @@ | |||
| --- | |||
| title: 'Moving local flag evaluation from Django to Rust: 24x less CPU, 56x less memory' | |||
| date: 2026-05-18 | |||
There was a problem hiding this comment.
| date: 2026-05-18 | |
| date: 2026-05-19 |
| - Feature flags | ||
| seo: { | ||
| metaTitle: "Moving local flag evaluation from Django to Rust: 24x less CPU, 56x less memory", | ||
| metaDescription: "How we moved PostHog's feature flags local evaluation endpoint from Django to Rust, dropping CPU usage by 24x and memory by 56x." |
There was a problem hiding this comment.
| metaDescription: "How we moved PostHog's feature flags local evaluation endpoint from Django to Rust, dropping CPU usage by 24x and memory by 56x." | |
| metaDescription: "How we moved PostHog's feature flags local evaluation endpoint from Django to Rust and dropped CPU usage by 24x and memory by 56x." |
|
|
||
| --- | ||
|
|
||
| I reloaded Grafana three times before I trusted what I was looking at, because the gap between the old numbers and the new ones was bigger than I was expecting. p50 latency had gone from 40ms down to 4ms, CPU usage was sitting at a small fraction of where it used to be, and memory was barely registering at all. We had just finished moving the feature flags local evaluation endpoint from Django to Rust, and despite knowing that it was going to be better – that was the whole reason for this migration – I still wasn't ready for how the comparison ended up looking. |
There was a problem hiding this comment.
| I reloaded Grafana three times before I trusted what I was looking at, because the gap between the old numbers and the new ones was bigger than I was expecting. p50 latency had gone from 40ms down to 4ms, CPU usage was sitting at a small fraction of where it used to be, and memory was barely registering at all. We had just finished moving the feature flags local evaluation endpoint from Django to Rust, and despite knowing that it was going to be better – that was the whole reason for this migration – I still wasn't ready for how the comparison ended up looking. | |
| I reloaded Grafana three times before I could trust what I was looking at. I knew it was going to be better – that was the whole reason for this migration – but I still wasn't ready for how big of a difference it would make. p50 latency had gone from 40ms down to 4ms, CPU usage was sitting at a small fraction of where it used to be, and memory was barely registering at all. | |
| Here's how we moved our local flag evaluation from Django to Rust. |
| | PGBouncer sidecars | Yes | No | | ||
| | Dedicated node pool | Yes (memory-optimized) | No (shared pool) | | ||
|
|
||
| Latency improved too. These numbers are measured at the Envoy layer (the time between Envoy receiving the request and getting a response back from the pod), so they don't include client-to-Envoy network time. A customer's SDK sees this plus its own network round-trip on top. Same Prometheus metric (`envoy_cluster_upstream_rq_time`) for both services, so it's apples to apples. |
There was a problem hiding this comment.
| Latency improved too. These numbers are measured at the Envoy layer (the time between Envoy receiving the request and getting a response back from the pod), so they don't include client-to-Envoy network time. A customer's SDK sees this plus its own network round-trip on top. Same Prometheus metric (`envoy_cluster_upstream_rq_time`) for both services, so it's apples to apples. | |
| Latency improved, too. These numbers are measured at the Envoy layer (the time between Envoy receiving the request and getting a response back from the pod), so they don't include client-to-Envoy network time. A customer's SDK sees this plus its own network round-trip on top. Since it's the same Prometheus metric (`envoy_cluster_upstream_rq_time`) for both services, it's apples to apples. |
|
|
||
| I reloaded Grafana three times before I trusted what I was looking at, because the gap between the old numbers and the new ones was bigger than I was expecting. p50 latency had gone from 40ms down to 4ms, CPU usage was sitting at a small fraction of where it used to be, and memory was barely registering at all. We had just finished moving the feature flags local evaluation endpoint from Django to Rust, and despite knowing that it was going to be better – that was the whole reason for this migration – I still wasn't ready for how the comparison ended up looking. | ||
|
|
||
| ## What local evaluation is |
There was a problem hiding this comment.
| ## What local evaluation is | |
| ## Local evaluation explained |
|
|
||
| The cache hit rate sits at 99.98%, which means almost every request is served from Redis with zero Postgres on the hot path. The dedicated Karpenter pool of memory-optimized instances is gone. | ||
|
|
||
| I keep coming back to the rollout. Being able to shift 10% of traffic, sit with the metrics for a day, and roll back instantly if something looked off is what made this safe to ship fast. If we'd cut over in one step on day one, customer support tickets would have done the testing for me, and I would have been firefighting instead of writing this. |
There was a problem hiding this comment.
| I keep coming back to the rollout. Being able to shift 10% of traffic, sit with the metrics for a day, and roll back instantly if something looked off is what made this safe to ship fast. If we'd cut over in one step on day one, customer support tickets would have done the testing for me, and I would have been firefighting instead of writing this. | |
| --- | |
| The metrics are definitely eye-catching, but what I keep coming back to is the rollout. Being able to shift 10% of traffic, sit with the metrics for a day, and roll back instantly if something looked off is what made it safe to get there. It's also what let us catch three bugs while Django was still handling most of the load. If we'd cut over in one step on day one, it would have been customer support tickets doing the testing for me, and I would have been firefighting incidents left and right instead of writing this. |
|
|
||
| I keep coming back to the rollout. Being able to shift 10% of traffic, sit with the metrics for a day, and roll back instantly if something looked off is what made this safe to ship fast. If we'd cut over in one step on day one, customer support tickets would have done the testing for me, and I would have been firefighting instead of writing this. | ||
|
|
||
| There's [Untangling Tokio and Rayon in production](/blog/untangling-rayon-and-tokio) if you want to read about an earlier optimization on the same Rust service, and [How we built the Rust feature flag service](/blog/feature-flags-service) for the original Django-to-Rust migration that this one builds on. |
There was a problem hiding this comment.
| There's [Untangling Tokio and Rayon in production](/blog/untangling-rayon-and-tokio) if you want to read about an earlier optimization on the same Rust service, and [How we built the Rust feature flag service](/blog/feature-flags-service) for the original Django-to-Rust migration that this one builds on. | |
| _If you want to read more about this topic, check out our blog on [Untangling Tokio and Rayon in production](/blog/untangling-rayon-and-tokio) which covers an earlier optimization on the same service, or [How we made feature flags even faster and more reliable](/blog/even-faster-more-reliable-flags) for the original migration that this one builds on._ |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| "name": "Patricio Tarantino", | ||
| "role": "Product Engineer", | ||
| "link_type": "github", | ||
| "link_url": "https://github.com/patricio-posthog", |
There was a problem hiding this comment.
I noticed other people has a link to their LinkedIn, I don't mind, but if you want to keep it standard, this is my Linkedin https://www.linkedin.com/in/ptarantino/
|
Maybe @haacked wants to proof read it about the technical parts? Also, @jina-yoon, when I go to the Blog in the deploy preview, it's broken (blue screen). |
Summary
authors.jsonas a Product Engineer/blog/feature-flags-definitions-rustTo do before merging
featuredImageURL in frontmatter with the actual Cloudinary image/blog/feature-flags-servicelink at the bottom is correct (no matching post found in the repo — may need a redirect or URL update)🤖 Generated with Claude Code