Skip to content

Latest commit

 

History

History
496 lines (364 loc) · 24.2 KB

File metadata and controls

496 lines (364 loc) · 24.2 KB

title: The No-Vary-Search HTTP Response Header Field abbrev: No-Vary-Search category: std

docname: draft-ietf-httpbis-no-vary-search-latest submissiontype: IETF number: date: consensus: true v: 3 area: "Web and Internet Transport" workgroup: "HyperText Transfer Protocol" keyword:

author:

fullname: Domenic Denicola
organization: Google LLC
email: d@domenic.me

normative: HTTP: RFC9110 HTTP-CACHING: RFC9111 FETCH: target: https://fetch.spec.whatwg.org/ title: Fetch Living Standard author: - ins: A. van Kesteren name: Anne van Kesteren org: Apple Inc. ann: WHATWG STRUCTURED-FIELDS: RFC9651 WHATWG-ENCODING: target: https://encoding.spec.whatwg.org/ title: Encoding Living Standard author: - ins: A. van Kesteren name: Anne van Kesteren org: Apple Inc. ann: WHATWG WHATWG-INFRA: target: https://infra.spec.whatwg.org/ title: Infra Living Standard author: - ins: A. van Kesteren name: Anne van Kesteren org: Apple Inc. - ins: D. Denicola name: Domenic Denicola org: Google LLC ann: WHATWG WHATWG-URL: target: https://url.spec.whatwg.org/ title: URL Living Standard author: - ins: A. van Kesteren name: Anne van Kesteren org: Apple Inc. ann: WHATWG

informative: HTML: target: https://html.spec.whatwg.org/ title: HTML Living Standard author: - ins: A. van Kesteren name: Anne van Kesteren org: Apple Inc. ann: WHATWG NAV-TRACKING-MITIGATIONS: target: https://privacycg.github.io/nav-tracking-mitigations/ title: Navigational-Tracking Mitigations author: - ins: P. Snyder name: Pete Snyder org: Brave Software, Inc. - ins: J. Yasskin name: Jeffrey Yasskin org: Google LLC ann: W3C Privacy CG

--- abstract

This specification defines a proposed HTTP response header field for changing how URL search parameters impact caching.

--- middle

Introduction

HTTP caching {{HTTP-CACHING}} is based on reusing resources which match across a number of cache keys. One of the most prominent is the presented target URI ({{Section 7.1 of HTTP}}). However, sometimes multiple URLs can represent the same resource. This leads to caches not always being as helpful as they could be: if the cache contains the resource under one URI, but the resource is then requested under another, the cached version will be ignored.

The No-Vary-Search HTTP header field tackles a specific subset of this general problem, for when a resource has multiple URLs which differ only in certain query components. It allows resources to declare that some or all parts of the query do not semantically affect the served resource, and thus can be ignored for cache matching purposes. For example, if the order of the query parameter keys do not semantically affect the served resource, this is indicated using

No-Vary-Search: key-order

If the specific query parameters (e.g., ones indicating something for analytics) do not semantically affect the served resource, this is indicated using

No-Vary-Search: params=("utm_source" "utm_medium" "utm_campaign")

And if the resource instead wants to take an allowlist-based approach, where only certain known query parameters semantically affect the served resource, they can use

No-Vary-Search: params, except=("productId")

{{header-definition}} defines the header field, using the {{STRUCTURED-FIELDS}} framework. {{data-model}} and {{parsing}} illustrate the data model for how the field value can be represented in specifications, and the process for parsing the raw output from the structured field parser into that data model. {{comparing}} gives the key algorithm for comparing if two URLs are equivalent under the influence of the header field; notably, it leans on the decomposition of the query component into keys and values given by the application/x-www-form-urlencoded format specified in {{WHATWG-URL}}. (As such, this header field is not useful for URLs whose query component does not follow that format.) Finally, {{caching}} explains how to modify {{HTTP-CACHING}} to take into account this new equivalence.

Conventions and Definitions

{::boilerplate bcp14-tagged}

This document also adopts some conventions and notation typical in WHATWG and W3C usage, especially as it relates to algorithms. See {{WHATWG-INFRA}}, and in particular:

  • its definition of lists, including the list literal notation « 1, 2, 3 ».
  • its definition of strings, including their representation as code units.

(Other concepts used are called out using inline references.)

HTTP header field definition {#header-definition}

The No-Vary-Search HTTP header field is a structured field {{STRUCTURED-FIELDS}} whose value MUST be a dictionary ({{Section 3.2 of STRUCTURED-FIELDS}}).

It has the following authoring conformance requirements:

  • If present, the key-order entry's value MUST be a boolean ({{Section 3.3.6 of STRUCTURED-FIELDS}}).
  • If present, the params entry's value MUST be either a boolean ({{Section 3.3.6 of STRUCTURED-FIELDS}}) or an inner list ({{Section 3.1.1 of STRUCTURED-FIELDS}}).
  • If present, the except entry's value MUST be an inner list ({{Section 3.1.1 of STRUCTURED-FIELDS}}).
  • The except entry MUST only be present if the params entry is also present, and the params entry's value is the boolean value true.

The dictionary MAY contain entries whose keys are not one of key-order, params, and except, but their meaning is not defined by this specification. Implementations of this specification will ignore such entries (but future documents might assign meaning to such entries).

{:aside}

As always, the authoring conformance requirements are not binding on implementations. Implementations instead need to implement the processing model given by the obtain a URL search variance algorithm ({{obtain-a-url-search-variance}}).

Data model {#data-model}

A URL search variance consists of the following:

{: vspace="0"} no-vary params : either the special value wildcard or a list of strings

vary params : either the special value wildcard or a list of strings

vary on key order : a boolean

(((!default URL search variance))) The default URL search variance is a URL search variance whose no-vary params is an empty list, vary params is wildcard, and vary on key order is true.

*[default URL search variance]:

The obtain a URL search variance algorithm ({{obtain-a-url-search-variance}}) ensures that all URL search variances obey the following constraints:

  • vary params is a list if and only if the no-vary params is wildcard; and
  • no-vary params is a list if and only if the vary params is wildcard.

Parsing {#parsing}

Parse a URL search variance {#parse-a-url-search-variance}

*[parse a URL search variance]: #parse-a-url-search-variance

(((!parse a URL search variance))) To parse a URL search variance given value:

  1. If value is null, then return the default URL search variance.
  2. Let result be a new URL search variance.
  3. Set result's vary on key order to true.
  4. If value["key-order"] exists:
    1. If value["key-order"] is not a boolean, then return the default URL search variance.
    2. Set result's vary on key order to the boolean negation of value["key-order"].
  5. If value["params"] exists:
    1. If value["params"] is a boolean:
      1. If value["params"] is true, then:
        1. Set result's no-vary params to wildcard.
        2. Set result's vary params to the empty list.
      2. Otherwise:
        1. Set result's no-vary params to the empty list.
        2. Set result's vary params to wildcard.
    2. Otherwise, if value["params"] is an array:
      1. If any item in value["params"] is not a string, then return the default URL search variance.
      2. Set result's no-vary params to the result of applying parse a key ({{parse-a-key}}) to each item in value["params"].
      3. Set result's vary params to wildcard.
    3. Otherwise, return the default URL search variance.
  6. If value["except"] exists:
    1. If value["params"] is not true, then return the default URL search variance.
    2. If value["except"] is not an array, then return the default URL search variance.
    3. If any item in value["except"] is not a string, then return the default URL search variance.
    4. Set result's vary params to the result of applying parse a key ({{parse-a-key}}) to each item in value["except"].
  7. Return result.

{:aside}

In general, this algorithm is strict and tends to return the default URL search variance whenever it sees something it doesn't recognize. This is because the default URL search variance behavior will just cause fewer cache hits, which is an acceptable fallback behavior.

However, unrecognized keys at the top level are ignored, to make it easier to extend this specification in the future. To avoid misbehavior with existing client software, such extensions will likely expand, rather than reduce, the set of requests that a cached response can match.

{:aside}

The input to this algorithm is generally obtained by parsing a structured field ({{Section 4.2 of STRUCTURED-FIELDS}}) using field_type "dictionary".

Obtain a URL search variance {#obtain-a-url-search-variance}

*[obtain a URL search variance]: #obtain-a-url-search-variance

(((!obtain a URL search variance))) To obtain a URL search variance given a response response:

  1. Let fieldValue be the result of getting a structured field value {{FETCH}} given `No-Vary-Search` and "dictionary" from response's header list.
  2. Return the result of parsing a URL search variance ({{parse-a-url-search-variance}}) given fieldValue. (((parse a URL search variance)))

Examples

The following illustrates how various inputs are parsed, in terms of their impacting on the resulting no-vary params and vary params:

| Input | Result | |----------------------------------------+-----------------------------------------------------------| | No-Vary-Search: params | no-vary params: wildcard
vary params: (empty list) | | No-Vary-Search: params=("a") | no-vary params: « "a" »
vary params: wildcard | | No-Vary-Search: params, except=("x") | no-vary params: wildcard
vary params: « "x" » |

The following inputs are all invalid and will cause the default URL search variance to be returned:

{:compact}

  • No-Vary-Search: unknown-key
  • No-Vary-Search: key-order="not a boolean"
  • No-Vary-Search: params="not a boolean or inner list"
  • No-Vary-Search: params=(not-a-string)
  • No-Vary-Search: params=("a"), except=("x")
  • No-Vary-Search: params=(), except=()
  • No-Vary-Search: params=?0, except=("x")
  • No-Vary-Search: params, except=(not-a-string)
  • No-Vary-Search: params, except="not an inner list"
  • No-Vary-Search: params, except=?1
  • No-Vary-Search: except=("x")
  • No-Vary-Search: except=()

The following inputs are valid, but somewhat unconventional. They are shown alongside their more conventional form.

| Input | Conventional form | |---------------------------------------------------+---------------------------------------------------| | No-Vary-Search: params=?1 | No-Vary-Search: params | | No-Vary-Search: key-order=?1 | No-Vary-Search: key-order | | No-Vary-Search: params, key-order, except=("x") | No-Vary-Search: key-order, params, except=("x") | | No-Vary-Search: params=?0 | (omit the header field) | | No-Vary-Search: params=() | (omit the header field) | | No-Vary-Search: key-order=?0 | (omit the header field) |

Parse a key {#parse-a-key}

*[parse a key]: #parse-a-key

(((!parse a key))) To parse a key given an ASCII string keyString:

  1. Let keyBytes be the isomorphic encoding {{WHATWG-INFRA}} of keyString.

  2. Replace any 0x2B (+) in keyBytes with 0x20 (SP).

  3. Let keyBytesDecoded be the percent-decoding {{WHATWG-URL}} of keyBytes.

  4. Let keyStringDecoded be the UTF-8 decoding without BOM {{WHATWG-ENCODING}} of keyBytesDecoded.

  5. Return keyStringDecoded.

Examples

The parse a key algorithm allows encoding non-ASCII key strings in the ASCII structured header field format, similar to how the application/x-www-form-urlencoded format {{WHATWG-URL}} allows encoding an entire entry list of keys and values in ASCII URL format. For example,

No-Vary-Search: params=("%C3%A9+%E6%B0%97")

will result in a URL search variance whose vary params are « "é 気" ». As explained in a later example, the canonicalization process during equivalence testing means this will treat as equivalent URL strings such as:

  • https://example.com/?é 気=1
  • https://example.com/?é+気=2
  • https://example.com/?%C3%A9%20気=3
  • https://example.com/?%C3%A9+%E6%B0%97=4

and so on, since they all are parsed {{WHATWG-URL}} to having the same key "é 気".

Comparing {#comparing}

(((!equivalent modulo search variance))) Two URLs {{WHATWG-URL}} urlA and urlB are equivalent modulo search variance given a URL search variance searchVariance if the following algorithm returns true:

  1. If the scheme, username, password, host, port, or path of urlA and urlB differ, then return false.

  2. If searchVariance is equivalent to the default URL search variance, then:

    1. If urlA's query equals urlB's query, then return true.

    2. Return false.

    In this case, even URL pairs that might appear the same after running the application/x-www-form-urlencoded parser {{WHATWG-URL}} on their queries, such as https://example.com/a and https://example.com/a?, or https://example.com/foo?a=b&&&c and https://example.com/foo?a=b&c=, will be treated as inequivalent.

  3. Let searchParamsA and searchParamsB be empty lists.

  4. If urlA's query is not null, then set searchParamsA to the result of running the application/x-www-form-urlencoded parser {{WHATWG-URL}} given the isomorphic encoding {{WHATWG-INFRA}} of urlA's query.

  5. If urlB's query is not null, then set searchParamsB to the result of running the application/x-www-form-urlencoded parser {{WHATWG-URL}} given the isomorphic encoding {{WHATWG-INFRA}} of urlB's query.

  6. If searchVariance's no-vary params is a list, then:

    1. Set searchParamsA to a list containing those items pair in searchParamsA where searchVariance's no-vary params does not contain pair[0].

    2. Set searchParamsB to a list containing those items pair in searchParamsB where searchVariance's no-vary params does not contain pair[0].

  7. Otherwise, if searchVariance's vary params is a list, then:

    1. Set searchParamsA to a list containing those items pair in searchParamsA where searchVariance's vary params contains pair[0].

    2. Set searchParamsB to a list containing those items pair in searchParamsB where searchVariance's vary params contains pair[0].

  8. If searchVariance's vary on key order is false, then:

    1. Let keyLessThan be an algorithm taking as inputs two pairs (keyA, valueA) and (keyB, valueB), which returns whether keyA is code unit less than {{WHATWG-INFRA}} keyB.

    2. Set searchParamsA to the result of sorting searchParamsA in ascending order with keyLessThan.

    3. Set searchParamsB to the result of sorting searchParamsB in ascending order with keyLessThan.

  9. If searchParamsA's size is not equal to searchParamsB's size, then return false.

  10. Let i be 0.

  11. While i < searchParamsA's size:

    1. If searchParamsA[i][0] does not equal searchParamsB[i][0], then return false.

    2. If searchParamsA[i][1] does not equal searchParamsB[i][1], then return false.

    3. Set i to i + 1.

  12. Return true.

Examples

Due to how the application/x-www-form-urlencoded parser canonicalizes query strings, there are some cases where query strings which do not appear obviously equivalent, will end up being treated as equivalent after parsing.

So, for example, given any non-default value for No-Vary-Search, such as No-Vary-Search: key-order, we will have the following equivalences:

{: newline="true"}

https://example.com
https://example.com/?
A null query is parsed the same as an empty string
https://example.com/?a=x
https://example.com/?%61=%78
Parsing performs percent-decoding
https://example.com/?a=é
https://example.com/?a=%C3%A9
Parsing performs percent-decoding
https://example.com/?a=%f6
https://example.com/?a=%ef%bf%bd
Both values are parsed as U+FFFD (�)
https://example.com/?a=x&&&&
https://example.com/?a=x
Parsing splits on & and discards empty strings
https://example.com/?a=
https://example.com/?a
Both parse as having an empty string value for a
https://example.com/?a=%20
https://example.com/?a=+
https://example.com/?a= &
+ and %20 are both parsed as U+0020 SPACE

Caching {#caching}

If a cache {{HTTP-CACHING}} implements this specification, the presented target URI requirement in {{Section 4 of HTTP-CACHING}} is replaced with:

  • one of the following:
    • the presented target URI ({{Section 7.1 of HTTP}}) and that of the stored response match, or
    • the presented target URI and that of the stored response are equivalent modulo search variance ({{comparing}}), given the variance obtained ({{obtain-a-url-search-variance}}) from the stored response.

Cache implementations MAY fail to reuse a stored response whose target URI matches only modulo URL search variance, if the cache has more recently stored a response which:

  • has a target URI which is equal to the presented target URI, excluding the query, and
  • has a non-empty value for the No-Vary-Search field, and
  • has a No-Vary-Search field value different from the stored response being considered for reuse.

{:aside}

Caches aren't required to reuse stored responses, generally. However, the above expressly empowers caches to, if it is advantageous for performance or other reasons, search a smaller number of stored responses.

That is, because caches might store more than one response for a given pathname, they need a way to efficiently look up the No-Vary-Search value without accessing all cached responses. Such a cache might take steps like the following to identify a stored response in a performant way, before checking the other conditions in {{Section 4 of HTTP-CACHING}}:

  1. Let exactMatch be cache[presentedTargetURI]. If it is a stored response that can be reused, return it.
  2. Let targetPath be presentedTargetURI, with query parameters removed.
  3. Let lastNVS be mostRecentNVS[targetPath]. If it does not exist, return null.
  4. Let simplifiedURL be the result of simplifying presentedTargetURI according to lastNVS (by removing query parameters which are not significant, and stable sorting parameters by key, if key order is to be be ignored).
  5. Let nvsMatch be cache[simplifiedURL]. If it does not exist, return null. (It is assumed that this was written when storing in the cache, in addition to the exact URL.)
  6. Let searchVariance be obtained ({{obtain-a-url-search-variance}}) from nvsMatch.
  7. If nvsMatch's target URI and presentedTargetURI are not equivalent modulo search variance ({{comparing}}) given searchVariance, then return null.
  8. If nvsMatch is a stored response that can be reused, return it. Otherwise, return null.

To aid cache implementation efficiency, servers SHOULD NOT send different non-empty values for the No-Vary-Search field in response to requests for a given pathname over time, unless there is a need to update how they handle the query component. Doing so would cause cache implementations that use a strategy like the above to miss some stored responses that could otherwise have been reused.

Security Considerations

The main risk to be aware of is the impact of mismatched URLs. In particular, this could cause the user to see a response that was originally fetched from a URL different from the one displayed when they hovered a link, or the URL displayed in the URL bar.

However, since the impact is limited to query parameters, this does not cross the relevant security boundary, which is the origin {{HTML}}. (Or perhaps just the host, from the perspective of web browser security UI. {{WHATWG-URL}}) Indeed, we have already given origins complete control over how they present the (URL, reponse body) pair, including on the client side via technology such as history.replaceState() or service workers.

Privacy Considerations

This proposal is adjacent to the highly-privacy-relevant space of navigational tracking, which often uses query parameters to pass along user identifiers. However, we believe this proposal itself does not have privacy impacts. It does not interfere with existing navigational tracking mitigations, or any known future ones being contemplated. Indeed, if a page were to encode user identifiers in its URL, the only ability this proposal gives is to reduce such user tracking by preventing server processing of such user IDs (since the server is bypassed in favor of the cache). {{NAV-TRACKING-MITIGATIONS}}

IANA Considerations

HTTP Field Names

IANA is requested to enter the following into the Hypertext Transfer Protocol (HTTP) Field Name Registry (https://www.iana.org/assignments/http-fields/http-fields.xhtml):

Field Name: : No-Vary-Search

Status: : permanent

Structured Type: : Dictionary

Reference: : this document

Comments: : (none)

--- back

Acknowledgments

{:numbered="false"}

This document benefited from valuable reviews and suggestions by:

  • Adam Rice
  • Julian Reschke
  • Kevin McNee
  • Liviu Tinta
  • Mark Nottingham
  • Martin Thomson
  • Valentin Gosu