diff --git a/.prettierignore b/.prettierignore index fa636d58d2..b19a8b647d 100644 --- a/.prettierignore +++ b/.prettierignore @@ -2,4 +2,5 @@ public/ pnpm-lock.yaml *.mdx !src/pages/blog/2024-04-11-announcing-new-graphql-website/index.mdx +!src/pages/blog/2024-08-14-exploring-true-nullability.mdx *.jpg diff --git a/src/pages/blog/2024-08-14-exploring-true-nullability.mdx b/src/pages/blog/2024-08-14-exploring-true-nullability.mdx new file mode 100644 index 0000000000..97c6ce6246 --- /dev/null +++ b/src/pages/blog/2024-08-14-exploring-true-nullability.mdx @@ -0,0 +1,284 @@ +--- +title: "Exploring 'True' Nullability in GraphQL" +tags: ["spec"] +date: 2024-08-14 +byline: Benjie Gillam +--- + +One of GraphQL's early decisions was to allow "partial success"; this was a +critical feature for Facebook - if one part of their backend infrastructure +became degraded they wouldn't want to just render an error page, instead they +wanted to serve the user a page with as much working data as they could. + +## Null propagation + +To accomplish this, if an error occured within a resolver, the resolver's value +would be replaced with a `null`, and an error would be added to the `errors` +array in the response. However, what if that field was marked as non-null? To +solve that apparent contradiction, GraphQL introduced the "error propagation" +behavior (also known colloquially as "null bubbling") - when a `null` (from an +error or otherwise) occurs in a non-nullable position, the parent position +(either a field or a list item) is made `null` instead. This behavior would +repeat if the parent position was also non-nullable, and this could cascade (or +"bubble") all the way up to the root of the query if everything in the path is +non-nullable. + +This solved the issue, and meant that GraphQL's nullability promises were still +honoured; but it wasn't without complications. + +### Complication 1: partial success + +We want to be resilient to systems failing; but errors that occur in +non-nullable positions cascade to surrounding parts of the query, making less +and less data available to be rendered. This seems contrary to our "partial +success" aim, but it's easy to solve - we just make sure that the positions +where we expect errors to occur are nullable so that errors don't propagate +further. Clients now needed to ensure they handle any nulls that occur in these +positions; but that seemed like a fair trade. + +### Complication 2: nullable epidemic + +Almost any field in your GraphQL schema could raise an error - errors might not +only be caused by backend services becoming unavailable or responding in +unexpected ways; they can also be caused by simple programming errors in your +business logic, data consistency errors (e.g. expecting a boolean but receiving +a float), or any other cause. + +Since we don't want to "blow up" the entire response if any such issue occurred, +we've moved to strongly encourage nullable usage throughout a schema, only +adding the non-nullable `!` marker to positions where we're truly sure that +field is extremely unlikely to error. This has the effect of meaning that +developers consuming the GraphQL API have to handle potential nulls in more +positions than they would expect, making for additional work. + +### Complication 3: normalized caching + +Many modern GraphQL clients use a "normalized" cache, such that updates pulled +down from the API in one query can automatically update all the previously +rendered data across the application. This helps ensure consistency for users, +and is a powerful feature. + +However, if an error occurs in a non-nullable position, it's +[no longer safe](https://github.com/graphql/nullability-wg/issues/20) to store +the data to the normalized cache. + +## The Nullability Working Group + +At first, we thought the solution to this was to give clients control over the +nullability of a response, so we set up the Client-Controlled Nullability (CCN) +Working Group. Later, we renamed the working group to the Nullability WG to show +that it encompassed all potential solutions to this problem. + +### Client-controlled nullability + +The first Nullability WG proposal came from a collaboration between Yelp and +Netflix, with contributions from GraphQL WG regulars Alex Reilly, Mark Larah, +and Stephen Spalding among others. They proposed we could adorn the queries we +issue to the server with sigils indicating our desired nullability overrides for +the given fields - client-controlled nullability. + +A `?` would be added to fields where we don't mind if they're null, but we +definitely want errors to stop there; and add a `!` to fields where we +definitely don't want a null to occur (whether or not there is an error). This +would give consumers control over where errors/nulls were handled. + +However, after much exploration of the topic over years we found numerous issues +that traded one set of concerns for another. We kept iterating whilst we looked +for a solution to these tradeoffs. + +### True nullability schema + +Jordan Eldredge +[proposed](https://github.com/graphql/nullability-wg/discussions/22) that making +fields nullable to handle error propagation was hiding the "true" nullability of +the data. Instead, he suggested, we should have the schema represent the true +nullability, and put the responsibility on clients to use the `?` CCN operator +to handle errors in the relevant places. + +However, this would mean that clients such as Relay would want to add `?` in +every position, causing an "explosion" of question marks, because really what +Relay desired was to disable null propagation entirely. + +### A new type + +Getting the relevant experts together at GraphQLConf 2023 re-energized the +discussions and sparked new ideas. After seeing Stephen's "Nullability Sandwich" +talk and chatting with Jordan, Stephen and others in the corridor, Benjie Gillam +was inspired to [propose](https://github.com/graphql/graphql-spec/pull/1046) a +"null only on error" type. This type would allow us to express the "true" +nullability of a field whilst also indicating that errors may happen that should +be handled, but would not "blow up" the response. + +To maintain backwards compatibility, clients would need to opt in to seeing this +new type (otherwise it would masquerade as nullable). It would be up to the +client how to handle the nullability of this position knowing that a "null only +on error" position would only contain a `null` if a matching error existed in +the `errors` list. + +A +[number of alternative syntaxes](https://gist.github.com/benjie/19d784721d1658b89fd8954e7ee07034) +were suggested for this new type, but none were well liked. + +### A new approach to client error handling + +Also around the time of GraphQLConf 2023 the Relay team shared +[a presentation](https://docs.google.com/presentation/u/2/d/1rfWeBcyJkiNqyxPxUIKxgbExmfdjA70t/edit?pli=1#slide=id.p8) +on some of the things they were thinking around errors. In particular they +discussed the `@catch` directive which would give users control over how errors +were represented in the data being rendered, allowing the client to +differentiate an error from a legitimate null. Over the coming months, many +behaviors were discussed at the Nullability WG; one particularly compelling one +was that clients could throw the error when an errored field was read, and rely +on framework mechanics (such as React's +[error boundaries](https://legacy.reactjs.org/docs/error-boundaries.html)) to +handle them. + +### Strict semantic nullability + +GraphQL Foundation director Lee Byron +[proposed](https://github.com/graphql/graphql-wg/discussions/1410) that we +introduce a schema directive, `@strictNullability`, whereby we would change what +the syntax meant - `Int?` for nullable, `Int` for null-only-on-error, and `Int!` +for never-null. This proposal was well liked, but wasn't a clear win; it +introduced many complexities including migration costs and concerns over schema +evolution. + +### A pivotal discussion + +Lee and Benjie had a call where they discussed the history of GraphQL +nullability and all the relevant proposals in depth, including their two +respective solutions. It was clear that though no solution was quite there, the +solutions converging hinted we were getting closer and closer to an answer. This +long and detailed highly technical discussion inspired +[a new proposal](https://github.com/graphql/nullability-wg/discussions/58), +which has been iterated further, and we aim to describe below. + +## Our latest proposal + +We're now proposing a new opt-in execution mode to solve the nullability +problem. It's important to note that both the client and the server must opt-in +to this new mode for it to take effect, otherwise the traditional execution mode +will be used. + +### No-error-propagation mode + +The new proposal centers around the premise of allowing clients to disable the +"error propagation" behavior discussed above. + +Clients that opt-in to this behavior take responsibility for interpretting the +response as a whole, correlating the `data` and `errors` properties of the +response. With error propagation disabled and the previously discussed fact that +any field could potentially throw an error, all positions in `data` can +potentially contain a `null` value. Clients in this mode must cross-check any +`null` values against `errors` to determine if it represents a true `null`, or +an error. + +### "Smart" clients + +The no-error-propagation mode is intended for use by "smart" clients such as +Relay, Apollo Client, URQL and others which understand GraphQL deeply and are +responsible for the storage and retrieval of fetched GraphQL data. These clients +are well positioned to handle the responsibilities outlined above. + +By disabling error propagation, these clients will be able to safely update +their stores (including normalized stores) even when errors occur. They can also +re-implement traditional GraphQL error propagation on top of these new +foundations, shielding applications developers from needing to learn this new +behavior (whilst still allowing them to reap the benefits!). They can even take +on advanced behaviors, such as throwing the error when the application developer +attempts to read from an errored field, allowing the developer to handle errors +with their system's native error boundaries. + +### True nullability + +Just like in traditional mode, for clients operating in no-error-propagation +mode fields are either nullable or non-nullable. However; unlike in traditional +mode, no-error-propagation mode allows for errors to be represented in any +position: + +- nullable (e.g. `Int`): a value, an error, or a true `null`; +- non-nullable (e.g. `Int!`): a value, **or an error**. + +_(In traditional mode, non-nullable fields cannot represent an error because the +error propagates to the nearest nullable position.)_ + +Since this mode allows every field, whether nullable or non-nullable, to +represent an error, the schema can safely indicate to clients in this mode the +true intended nullability of a field. If the schema designer knows that a field +should never be null unless an error occurs, they can mark the field as +"non-nullable for clients in no-error-propagation mode" (see "schema developers" +below). + +### Client reflection of true nullability + +Smart clients can ask the schema about the "true" nullability of each field via +introspection, and can generate a derived SDL by combining that information with +their knowledge of how the client handles errors. This derived SDL, dependent on +client behavior, would look like the traditional representation of the schema, +but with more fields potentially marked as non-nullable where the true +nullability of the underlying schema has been reflected. Application developers +would issue queries and mutations in the same way they always had, but now their +generated types may not need to handle `null` in as many positions as before, +increasing developer happiness. + +### Schema developers + +Schemas that wish to add support for indicating the "true nullability" of a +field in no-error-propagation mode need to be able to discern which types show +up as non-nullable in both modes (traditional non-null types), and which types +show up as non-nullable only in no-error-propagation mode. For this later +concern we've introduced the concept, of a "semantic" non-null type: + +- "strict" (traditional) non-nullable - shows up as non-nullable in both + traditional mode and no-null-propagation mode +- "semantic" non-nullable, aka "null only on error" - shows up as non-nullable + in no-null-propagation mode and masquerades as nullable in traditional mode + +Only clients that opt-in to seeing the "true" nullability will see these two +different types of nullability, otherwise the nullability of the chosen mode +(traditional or no-error-propagation) will be reflected by introspection. + +### Representation in SDL + +Application developers will only need to deal with traditional SDL that +represents traditional nullability concerns. If these developers are using +"smart" clients then they should source this SDL from the client rather than +from the server, this allows them to see the nullability that the client +guarantees based on how it will handle the "true" nullability of the schema, how +it handles errors, and factoring in any local schema extensions that may have +been added. + +Client-derived SDL (see "client reflection of true nullability" above) can be +used for concerns such as code generation, which will work in the traditional +way with no need for changes (but happier developers if there are fewer nullable +positions!). + +Schema developers and people working on "smart" clients may need to represent +the differences between "strict" and "semantic" non-nullable in SDL. For these +people, we're introducing the `@extendedNullability` document directive. When +this directive is present at the top of a document, the `!` symbol means that a +type will appear as non-nullable only in no-error-propagation mode, and a new +`!!` symbol will represent that a type will appear as non-nullable in both +traditional and no-error-propagation mode. + +| Traditional Mode | No-error-propagation mode | Example | +| ---------------- | ------------------------- | ------- | +| Nullable | Nullable | `Int` | +| Nullable | Non-nullable | `Int!` | +| Non-nullable\* | Non-nullable | `Int!!` | + +The `!!` symbol is designed to look a little scary - it should be used with +caution (like `!` in traditional schemas) because it is the symbol that means +that errors will propagate in traditional mode, "blowing up" parent selection +sets. + +## Get involved + +Like all GraphQL Working Groups, the Nullability Working Group is open to all. +Whether you work on a GraphQL client or are just a GraphQL user with thoughts on +nullability, we want to hear from you - add yourself to an +[upcoming working group](https://github.com/graphql/nullability-wg/) or chat +with us in the #nullability-wg channel in +[the GraphQL Discord](https://discord.graphql.org). This solution is not yet +merged into the specification, so there's still time for iteration and +alternative ideas!