Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make "processing a response" non-normative and add a note for clients #304

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
28 changes: 21 additions & 7 deletions spec/GraphQLOverHTTP.md
Original file line number Diff line number Diff line change
Expand Up @@ -539,7 +539,8 @@ others) could originate from intermediary servers; since the client cannot
determine if an `application/json` response with arbitrary status code is a
well-formed _GraphQL response_ (because it cannot trust the source) the server
must use `200` status code to guarantee to the client that the response has not
been generated or modified by an intermediary.
been generated or modified by an intermediary. See
[processing a response](#sec-Processing-a-response) for more details.

If the _GraphQL response_ contains a non-null {data} entry then the server MUST
use the `200` status code.
Expand Down Expand Up @@ -671,6 +672,12 @@ pass validation, then the server SHOULD reply with `400` status code.
If the client is not permitted to issue the GraphQL request then the server
SHOULD reply with `403`, `401` or similar appropriate status code.

Note: When the response media type is `application/graphql-response+json`,
clients can rely on the response being a well-formed _GraphQL response_
regardless of the status code. The intermediary servers that do not understand
GraphQL may use the status code to get some information about the shape of the
_GraphQL response_.
martinbonnin marked this conversation as resolved.
Show resolved Hide resolved

#### Examples

The following examples provide guidance on how to deal with specific error cases
Expand Down Expand Up @@ -738,10 +745,17 @@ Note: The GraphQL specification
and refers to the situation wherein a _GraphQL field error_ occurs as a partial
response; it still indicates successful execution.

## Processing the response
# Non-normative notes

## Processing a response

In certain environments, the source of a response may be an intermediary server,
such as an API gateway, proxy, or firewall. In the case of an error, they may
return their own non-GraphQL `application/json` response with a non-`200` status
code.

For this reason, a client application can rely on the response being a
well-formed _GraphQL response_ if any of the following cases is true:

If the response uses a non-`200` status code and the media type of the response
payload is `application/json` then the client MUST NOT rely on the body to be a
well-formed _GraphQL response_ since the source of the response may not be the
server but instead some intermediary such as API gateways, proxies, firewalls,
etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this normative paragraph is, in my opinion, wrong. The client must not rely on it being a GraphQL response; if they want to treat it as a GraphQL response they must perform their own validations in order to do so (e.g. assert that it conforms to the correct shape/types), and must know that what they're doing is outside of the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjie , the way I see it, this is a server spec, not a client spec. A server has no control over whether or not a client processes the response correctly, incorrectly, or at all. I think that's why this should be a non-normative paragraph. And the modified language is, I feel, much easier to understand.

Now we could write a client spec, defining what features a client must and must not have. For example, we might require the client send the request with a specific content type, or perhaps to support automatic fallback to a known content type if a server responds with an unsupported media type error. We might define the exact mechanism by which the response shape is verified for known response content types, etc. While it seems much of this is implicit based on the server requirements, the existing specification is not currently written as a client spec and does not contain any requirement written from the perspective of the client (apart from the little written in this paragraph). As such, I feel it is improper for it to have a normative note about processing a response in this spec.

If this paragraph were to be kept as a normative paragraph, I feel it should at minimum be in a section labeled for client requirements. But even then, the existing paragraph says "MUST NOT rely on the body", it doesn't explain what TO do in that situation. Should it look for a certain shape? Should it ignore all data in the response? In other words, it's says "client MUST NOT count on it being well-formed, but it MIGHT be", which is not particularly helpful for a normative note.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay here. +1 to @Shane32 comment.

As a client developer relatively new to this spec but pretty familiar with GraphQL, it took me way longer than I'd like to admit to understand the nuances there.

Having the language from the point of view of the client is easier to process in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a server spec, not a client spec

That does not seem accurate. This spec specifies:

  • how a client should form a request including what details the client must put in the request for it to be likely to be executed
  • how a server should check these details, and handle when the client has not conformed
  • how a server should form a response including what details to include
  • how and when a client should process a response, how to know it originated from the GraphQL service or not (and thus whether it can "trust" that the response is GraphQL or not)

It's a protocol spec that defines the protocol for communicating between a spec-compliant client and server. I believe @enisdenjo implemented both a client and a server based on this spec in graphql-http.

A server has no control over whether or not a client processes the response correctly, incorrectly, or at all.

Indeed; but the client does; hence why we state:

the client MUST NOT rely on the body to be a
well-formed GraphQL response since the source of the response may not be the
server but instead some intermediary such as API gateways, proxies, firewalls,
etc.

It is the client's responsibility to handle this concern. The server may not even be the one supplying this response (it may have come from a gateway, proxy, etc), so of course the server cannot influence this - it is entirely the client's responsibility.

the existing specification is not currently written as a client spec and does not contain any requirement written from the perspective of the client (apart from the little written in this paragraph)

It has an entire chapter dedicated to how to form requests to the server. It also has loads of explicit conformance requirement statements for the client:

The role of a client is to issue HTTP requests to a server in order to interact with a GraphQL service.

Servers and clients MUST support JSON and MAY support other, additional serialization formats.

A client SHOULD indicate the media types that it supports in responses using the Accept HTTP header as specified in RFC7231.

If the client supplies an Accept header, the client SHOULD include the media type application/graphql-response+json in the Accept header.

Before 2025-01-01T00:00:00Z, if the client supplies an Accept header, the header SHOULD include the application/json media type. After this watershed, this is no longer necessary.

It is RECOMMENDED that a client set the Accept header to application/graphql-response+json; charset=utf-8, application/json; charset=utf-8.

A client MUST indicate the media type of a request body using the Content-Type header as specified in RFC7231.

There are many more.

Copy link
Member

@benjie benjie Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this paragraph were to be kept as a normative paragraph, I feel it should at minimum be in a section labeled for client requirements.

We don't currently label client and server requirements separately since both are intermingled; such as:

A client MUST indicate the media type of a request body using the Content-Type header as specified in RFC7231.

A server MUST support POST requests encoded with the application/json media type (as indicated by the Content-Type header) encoded with UTF-8.

To do this in one place but not the rest of the spec would be a little strange. And doing it across the spec would make the spec a lot less readable, in my opinion.

But even then, the existing paragraph says "MUST NOT rely on the body", it doesn't explain what TO do in that situation. Should it look for a certain shape? Should it ignore all data in the response?

Indeed; the client must not rely on the body. What it does with the body is up to it, but it cannot be trusted. Throwing an error is the most reliable way of handling it; but we're not going to require that (though we could RECOMMEND it, I suppose). It could also choose to run its own validations, but at that point it's up to the client to figure out what those should be - we've already told it that the response cannot be trusted, so if it feels it should push ahead anyway, that's its responsibility to figure out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the conformance requirements need to write a compliant client and server, and the wire format that they use between them.

Fair enough 👍

Clients have to live in the real world where any arbitrary intermediary can intercept the result and return some nonsense

Same is true for every networking protocol. HTTP doesn't warn about a TCP router swapping bytes around. TCP doesn't warn about an ethernet card rewriting bytes, etc... It's a given.

We cannot enforce any rules on HTTP

We kind of do though? What I understand from the spec is that we're "allocating the graphql-response+json media type": if HTTP returns a graphql-response+json response then it MUST be a valid GraphQL response. I think we expect that from the HTTP layer? It's a given.

I very much feel that this particular rule should not be outside of the spec. It's incredibly important, the abundance of "200 is not okay" talks at conferences over the past 9 years I think adds a lot of evidence that this rule in particular is necessary.

Agree that this must be called out in the spec. Just I wouldn't make it normative as it's already contained in the other parts. Maybe a separate paragraph explaining the rationale for graphql-response+json? Basically this note but elevating it to a more preminent position in the document and explaining about client processing there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP doesn't warn about a TCP router swapping bytes around.

As a counter-example, HTTP details a Content-Length header and states:

When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.

This effectively protects the client against the connection being terminated (as would be expected by Connection: close) before all the data is received - one of the expected failure modes for HTTP.

Protocol generally have to somewhat trust the underlying protocol (which is why we have not detailed how HTTP works, and why HTTP doesn't detail how TCP works); however they also need to take into account how, even in a trusted network, they may fail - especially when the failure modes can cause ambiguity.

Our failure mode here is that there's an intermediary that bails out the request early (permissions, failed cache, netsplit, timeout, etc etc etc) and returns a response using a common media type application/json. We trust these intermediaries to never do this with a 2xx status code; just as we trust them to not use application/graphql-response+json if they're not speaking GraphQL. But we must convey this important information to the client implementers: if they get a non-2xx and it uses this common media type then they must not trust it to be GraphQL even if they trust the underlying network - this is an expected failure mode.

I see what you mean about it effectively being covered by other parts of the spec, but I'm not convinced that it doesn't bear repeating - what problem are we trying to solve by removing this or demoting it to a non-normative note?

Copy link
Contributor Author

@martinbonnin martinbonnin Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our failure mode here is that there's an intermediary that bails out the request early (permissions, failed cache, netsplit, timeout, etc etc etc) and returns a response using a common media type application/json. We trust these intermediaries to never do this with a 2xx status code; just as we trust them to not use application/graphql-response+json if they're not speaking GraphQL.

I love this paragraph. I would love to see that kind of context added to the spec (permissions, failed cache, netsplit, timeout, etc...trust that if this happens it uses non-2xx, etc..)

they must not trust it to be GraphQL even if they trust the underlying network - this is an expected failure mode.

When I read that as a client developer, "I MUST NOT trust something", I'm not sure what this concretely mean. If I get a 2xx or a graphql-response+json then I know what to do (parse the GraphQL response), if not then 🤷 I must not trust it but how does that concretely materialize? I'm not sure.

what problem are we trying to solve by removing this or demoting it to a non-normative note?

2 things (and one extra):

  1. Guide client implementers by using positive, actionable wording:
// This is very close to what a client would write 
// (and also very close to how this PR is worded)
if (status == 2xx || content-type == "graphql-response+json") { 
  parseResponse() 
}

// I find this harder to read
if (status != 2xx && content-type == "application/json") { 
  // What to do here? I don't know, it depends.
} else {
  // This is actually the important branch
}
  1. Keep the normative parts of the spec DRY by not repeating them (repetition in non-normative notes is ok). Took me a while to process that MUST NOT because I was trying to understand if this was new information or just a repetition.
  2. Be more explicit about the rationale behind the 2 content-types. This is partially adressed in this PR by adding an intro sentence (In certain environments,...). Overall, I feel like graphql-response+json is a very important design choice. There are different notes in to the spec that address this but I found it hard to reconciliate all of them. By adding the intro sentence, I can link client implementers to that section and it's relatively self-contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem are we trying to solve by removing this or demoting it to a non-normative note?

I'll grant that the spec does contain a lot of client specifications, and written as such.

However, I feel the requirement in question simply ** is ** non-normative by nature, and probably not within scope of this document.

Let's compare to CORS issues. Is CORS part of the GraphQL-over-http protocol? No. Is it part of the bigger picture that a client/server should consider when designing a server? Yes. How do we deal with it? In #303 we add a non-normative note that mentions that if CORS issues are not considered, there may be security issues with the implementation. Why isn't it normative? CORS may not be relevant. CORS may be solved in another fashion. We don't require HTTPS either - for example, many microservices run on HTTP internally and the HTTPS layer is applied at the edge. Perhaps we could add a non-normative note saying that data over HTTP is insecure, although the readers probably already know that and have a plan to keep their data secure.

By comparison, here we have an issue where an intermediary server might respond with failure. Could this be relevant? Yes. Is it always relevant? No. Perhaps it is known that no intermediary servers are in use. Perhaps the intermediary servers are configured to return 500 rather than 400 errors. Perhaps they return xml instead of json responses. Ultimately, I don't think this is part of the GraphQL-over-http protocol. A note is warranted as a reminder to implementers.

Now if it's deemed important enough to add a normative note, then it should be worded as an optional requirement with specific behavior, such as:

  • A client SHOULD ignore any non-200 response that has an application/json content type, treating the response as a network error.

This tells the client exactly what to do, but as an optional requirement. (just an example; it's worded poorly)

When I read that as a client developer, "I MUST NOT trust something", I'm not sure what this concretely mean. If I get a 2xx or a graphql-response+json then I know what to do (parse the GraphQL response), if not then 🤷 I must not trust it but how does that concretely materialize? I'm not sure.

Completely agree.

Guide client implementers by using positive, actionable wording

Agree

Keep the normative parts of the spec DRY by not repeating them

Agree

Be more explicit about the rationale behind the 2 content-types. This is partially adressed in this PR by adding an intro sentence (In certain environments,...). Overall, I feel like graphql-response+json is a very important design choice. There are different notes in to the spec that address this but I found it hard to reconciliate all of them. By adding the intro sentence, I can link client implementers to that section and it's relatively self-contained.

Yes, I think this PR does a better job communicating with the reader the logic/rationale of the altered content type and how to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this concretely mean

I think "do not"'s are okay in specs - especially when you're telling someone to not do something that they typically would do unless told otherwise. But yes; we currently leave them to fend for themselves in this regard because there is no "correct" way to handle this, though there is an "incorrect" way, which is to blindly trust it.

A client SHOULD ignore any non-200 response that has an application/json content type, treating the response as a network error.

I'm happy changing to wording along these lines - it aligns with what I said before: "throwing an error is the most reliable way of handling it; but we're not going to require that (though we could RECOMMEND it, I suppose)" - SHOULD and RECOMMEND have the same normative meaning.

I still think that we should pair these together though: you MUST NOT blindly trust it, and we RECOMMEND that you (i.e. "you SHOULD" - same difference) raise an exception as if it were a network error.

Keep the normative parts of the spec DRY by not repeating them

I agree in general, but this is a conformance requirement specific to application/json so there should be a constraint in the application/json section - is that actually repeated?

- the response media type is `application/graphql-response+json`
- the response media type is `application/json` and the status code is `200`
Loading