-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Redact ancestorOrigins using iframe referrerpolicy #11560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think using the document's referrer is the wrong model. Because we don't care about the referrer of top-level example.com (how the user got to example.com, perhaps from search.example), we care about example.com's policy (or its |
|
I created a doc to work out how to specify this with the new model: https://docs.google.com/document/d/1TDryRMiw7sVKBfrvnzEQk0PzGQk7_G7ShF4Uy68MvSU/edit?usp=sharing Copy of the doc's current stateExample 1
Result: [“https://widget.example”, “null”] Example 2
Result: ??? AlgoHTML:
Referrer Policy: To get an origin if the referrer policy allows given an element or null container, a document doc, and a document innerDoc:
|
|
Doc updated. I've changed the algorithm to avoid exposing A's origin in the A->A->C case with the innermost iframe having Copy of the doc's current stateAlgorithmHTML:
Referrer Policy: To get an ancestor origin if the referrer policy allows given an Element container, a Document doc, and a Document innerDoc:
Note: Since mixed content checks prevent non-secure context documents in secure context documents, there’s no need to check secure context for “strict-origin”, "strict-origin-when-cross-origin", and “no-referrer-when-downgrade”. Polyfill + demohttps://software.hixie.ch/utilities/js/live-dom-viewer/saved/14030 Examplesiframe referrerpolicy=”no-referrer”
Child: Examples below are from w3c/webappsec-referrer-policy#77 (comment) top -> sandboxed iframe -> 3rd party iframe (ad)
Grandchild: {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag that's loading b.com -> loads {3, a.com}
Grandchild: {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}
Grandchild: {1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, c.com}
Grandchild: {1, a.com} default referrer policy -> loads {2, b.com} with default referrer policy -> loads {3, b.com} with noreferrer attribute on the iframe tag inside {3} -> loads {4, a.com}
Grandgrandchild: |
|
After discussing with @farre we found out that reading the |
This was not correct, the spec computes the ancestor origins list when the
https://html.spec.whatwg.org/#concept-location-ancestor-origins-list
https://html.spec.whatwg.org/#the-location-interface So no need to store Copy of the doc's current stateAlgorithmA Location object has an associated ancestor origin objects list. When a Location object is created, its ancestor origins list must be set to the list of origins that the following steps would produce:
Note: Since mixed content checks prevent non-secure context documents in secure context documents, there’s no need to check secure context for “strict-origin”, "strict-origin-when-cross-origin", and “no-referrer-when-downgrade”. A Location object has an associated ancestor origins list. When a Location object is created, its ancestor origins list must be set to a DOMStringList object whose associated list is the list of strings the following steps would produce:
Polyfill + demohttps://software.hixie.ch/utilities/js/live-dom-viewer/saved/14047 Examplesiframe referrerpolicy=”no-referrer”
Child: ["null"] {1, a.com} default referrer policy -> loads {2, b.com} -> loads {3, a.com}
Grandchild: ["https://b.com","https://a.com"\] Examples below are from w3c/webappsec-referrer-policy#77 (comment) top -> sandboxed iframe -> 3rd party iframe (ad)
Grandchild: ["null","https://a.com"\] {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag that's loading b.com -> loads {3, a.com}
Grandchild: ["https://b.com","null"\] {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}
Grandchild: ["null","https://a.com"\] {1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, c.com}
Grandchild: ["null","null"] {1, a.com} default referrer policy -> loads {2, b.com} with default referrer policy -> loads {3, b.com} with noreferrer attribute on the iframe tag inside {3} -> loads {4, a.com}
Grandgrandchild: ["null","null","https://a.com"\] TODOs |
|
I've updated this PR, the algorithm should be the same as in the doc. |
This implements this attribute on Location and also adheres to the changes to the spec introduced in whatwg/html#11560 Due to the tentative nature of the spec, and details still being hashed out, this PR is subject to minor changes.
This follows the current revamping of the spec found at whatwg/html#11560 and will change with it, potentially.
|
I wrote a summary of what this change does in the OP. (To be used in the commit message when squashing.) |
|
cc @domfarolino |
|
Before I forget: we need to patch https://w3c.github.io/ServiceWorker/#client-ancestororigins at the same time. |
This aligns `ancestorOrigins` exposure with referrer policy, so an embedder can prevent revealing its own origin to embedded documents. If an `<iframe>` uses `referrerpolicy="no-referrer"` or `same-origin` (and the parent and child are cross-origin), the parent’s origin and any same-origin ancestors are replaced with opaque origins (until reaching an ancestor that is cross-origin). Other policies continue to expose full origins. If there's no `referrerpolicy` attribute, the embedder document's referrer policy is used. This approach keeps existing behavior by default (for web compat) while addressing privacy concerns with an opt-out. The algorithm reuses the parent's existing list of ancestor origins, avoiding synchronous cross-process lookups and ensuring a stable snapshot even if ancestors mutate their `referrerpolicy` attributes later. Fixes #1918. Closes #2480.
3c5e61f to
0004ccb
Compare
|
@annevk as far as I can tell, the call sites for Create Window Client use a |
|
What is the reason for using the referrer policy instead of a new attribute / header? Are we not concerned that this would lead to situations where one of the two features is required, making it impossible for sites to opt out of the other? |
This implements this attribute on Location and also adheres to the changes to the spec introduced in whatwg/html#11560 Due to the tentative nature of the spec, and details still being hashed out, this PR is subject to minor changes.
|
@theIDinside In https://html.spec.whatwg.org/multipage/browsing-the-web.html#populating-a-session-history-entry:determining-navigation-params-policy-container-2 responsePolicyContainer is passed (defined in step 9).
No, the |
annevk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but I do think we should wrap per our current rules before landing. This is also assuming we have test coverage.
This implements this attribute on Location and also adheres to the changes to the spec introduced in whatwg/html#11560 Due to the tentative nature of the spec, and details still being hashed out, this PR is subject to minor changes.
This follows the current revamping of the spec found at whatwg/html#11560 and will change with it, potentially.
|
@annevk I wrote some tests, I know you've written some tests before, but do you think these would do? I guess the only test that is missing from my suite, is the one where a server responds with headers having been set to something like no-referrer, but it's unclear to me what effect that should have. If A->B->C and A gets served with headers with no-referrer in it's referrer policy, it's still the iframe attribute (for B) that determines first and foremost. I take it, A's response headers (via policy container) would be the deferred-to-value, if <iframe src=B> has no attribute? @zcorpan? |
|
I think that A's Referrer-Policy header is indeed supposed to be the fallback, but how this works out in the specification and I'm not entirely sure. It almost seems like it's missing? |
yeah, from my suite, it's the thing that's missing, I just needed some clarification from zcorpan on what that would be. by using this the tests should be able to incorporate headers while also having the tests look similar to a lot of other tests where I cargo culted the general structure from. |
I don't think anything is missing spec-wise, if that's what you mean. The |
Could you make a PR there, so we can review it? It looks like it's just a branch comparison right now, so I can't add any comments. |
source
Outdated
|
|
||
| <li><p>If <var>referrerPolicy</var> is the empty string, then set <var>referrerPolicy</var> to | ||
| <var>parentDoc</var>'s <span data-x="concept-document-policy-container">policy container</span>'s | ||
| <span data-x="policy-container-referrer-policy">referrer policy</span>.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a WPT test where a parent document is served with a loose referrer policy, but later in life sets a tight one via <meta name=referrer content=no-referrer>, and we test that the iframe that's created after this DOES NOT see its parent's origin in the ancestor origins list?
We also want the inverse test where a document sets a looser referrer policy later in life, before an iframe is created; then we assert that the iframe can see the embedder's origin.
This should ensure all of the policy container stuff is wired up correctly in implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest version of the test has these tests, but for the referrerPolicy on the <iframe> (and them not affecting the grandchild that is, not the direct descendant). I'll be sure to add 2 for your <meta> example here as well, as well as submit a PR to WPT with the test, for commenting purposes and have the .tentative one in the mozilla repo be the one ultimately upstreamed, if that works? or maybe @zcorpan can help me out here wrt this.
| <li><p>If <var>container</var> is an <code>iframe</code> element, then set | ||
| <var>referrerPolicy</var> to <var>container</var>'s <code | ||
| data-x="attr-iframe-referrerpolicy">referrerpolicy</code> attribute's state's corresponding | ||
| keyword.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask how this is implemented in other browsers? Maybe @theIDinside can weigh in. I presume a snapshot of the iframe's referrerpolicy attribute is taken at some point, or things can get racy, right? For example, what if an <iframe referrerpolicy=no-referrer src=slow.html> is created, but before its inner document is created, it loosens or removes its rp attribute? Would the inner doc see the "snapshot" of the rp attribute at creation time? I presume browsers with site isolation would actually implement it that way, instead of live-reaching-up into the embedder's process and querying the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask how this is implemented in other browsers? Maybe @theIDinside can weigh in. I presume a snapshot of the iframe's
referrerpolicyattribute is taken at some point, or things can get racy, right? For example, what if an<iframe referrerpolicy=no-referrer src=slow.html>is created, but before its inner document is created, it loosens or removes its rp attribute? Would the inner doc see the "snapshot" of the rp attribute at creation time? I presume browsers with site isolation would actually implement it that way, instead of live-reaching-up into the embedder's process and querying the attribute.
This is actually exactly how I've implemented it in Gecko, some minor changes may happen going forward, but the gist will be the same - prepend the metadata at load/navigation. This is how allowfullscreen on <iframe> works too, so there's precedent for this as well, so user agents should not expect/require lazy evaluation of the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually exactly how I've implemented it in Gecko
The snapshotting approach you mean, right? @zcorpan how do you feel about documenting this as a note below the step that pulls the attribute value from the iframe? Just documenting that implementations will probably implement this as a snapshot at iframe creation time to avoid races, and that the spec doesn't fully account for this kind of thing. We documented a few things like this during the navigation rewrite, since the spec doesn't fully account for all of the cross-process scheduling machinery that browsers usually have. I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshotting approach you mean, right?
Right. But technically not at iframe creation though. The snapshot as it were, that I've went with is when the iframe gets navigated (similarly to how allowfullscreen works). Creation implies heavily that it'll only happen once. Otherwise we wouldn't be able to create an iframe, change meta, change the attribute on the iframe and then navigate the iframe and have that be seen in the child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://whatpr.org/html/11560/90c74b7...0307930/nav-history-apis.html#the-location-interface says
Each Window object is associated with a unique instance of a Location object, allocated when the Window object is created.
A Window is created in https://html.spec.whatwg.org/#initialise-the-document-object or https://html.spec.whatwg.org/#creating-a-new-browsing-context
Since the steps to create a Location object are sync, this shouldn't be racy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no race between the Window and Location objects being created, but there is a race between the state of the referrerpolicy attribute at navigation time, and the Window/Location objects being created. There's no guarantee that the referrerpolicy attribute at navigation time will be in the same state by the time the navigation completes, https://html.spec.whatwg.org/#initialise-the-document-object runs, and the Window/Location objects are created and finally reach up to the referrerpolicy attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Currently for the load of iframe src, the referrerpolicy attribute is snapshotted before calling "navigate". If that frame is later navigated by e.g. following a hyperlink, the iframe's referrerpolicy attribute is not used, as far as I can tell. For ancestorOrigins I think we want it to apply even if the iframe navigates itself. If we want the timing to be at the start of the navigation, we need some new plumbing to keep track of the iframe referrerpolicy value.
The sandbox and allow attributes are snapshotted in https://html.spec.whatwg.org/#initialise-the-document-object and https://html.spec.whatwg.org/#creating-a-new-browsing-context . In my mind it makes sense to snapshot the referrerpolicy here as well when it's for ancestorOrigins.
|
@zcorpan || @annevk - I had an interesting question during review; "Are there tests when iframe has data-url ancestors (if that is possible)?" What would this imply for this spec and the results? We don't have any tests and I personally don't know yet, whether or not this scenario is possible or not. |
|
@theIDinside data: URL Documents can certainly have child iframes, so yes, those iframes can have data: URL ancestors. data: URL documents have opaque origins, but I don't think there's anything too special about this use case or how it interacts with this PR. Is there some result or case you're particularly concerned about or wondering about? We're just dealing with opaque origins... |
Ok, I was asked the question during review (for the gecko work) and I didn't really have a good answer for it so I figured maybe I could gain better clarity here. |
|
Added a PR @domfarolino for the tests. I've also added additional testing for your example of tightening/loosening, so that it only affects the direct child when it's created. |
|
I spoke with some Chrome people about this change, and we're pretty worried about web compatibility. Has there been a compat analysis done? A lot of spam and fraud detection relies on iframes knowing the top frame's origin, so we're worried about breaking those and other use cases randomly by changing what all a Document's referrer policy influences. |
No, but I can take a look at httparchive.
Interesting, can you say more? Are there known common libraries for this? If we provide an opt-out (even if it's a new attribute), then spam and fraud detection scripts would have to blocklist "null" origins. Or maybe we should make it impossible for the top-level frame to hide its origin? |
+1 would prefer for it to impossible for the top-level frame to hide its origin. This is a valuable signal for ad fraud detection and brand safety, and we'd prefer this change to not happen unless there are alternatives in place. |
|
| If we provide an opt-out (even if it's a new attribute), then spam and fraud detection scripts would have to blocklist "null" origins. Or maybe we should make it impossible for the top-level frame to hide its origin? Yes, being able to opt out would be helpful. For ads I think it's tricky because they purchase the ad slot before loading on the page. It seems like the ads ecosystem will need to adapt and require that the referrer of the top frame (perhaps the entire tree, not sure) is available, and sellers will need to pass that availability information along to buyers at auction time, or face having unhappy buyers. It's kind of a not-the-browser's problem but I do think it's important to give the ecosystem plenty of time to adjust before making this breaking change. |
|
It seems to me that Dom's #11560 (comment) and Josh's #11560 (comment) are exactly the situation that @johannhof was worried about in #11560 (comment). Josh's use case: an ad network only wants to pay to show an ad if that ad is appearing on the site they thought they were buying space on. Seems reasonable to me.
The financial hit might come in the form of those sites seeming like fraud, or in the form of ad buyers simply refusing to buy ads there (if the ad industry adapts as Josh predicts and ad buyers can know what to expect ahead of time). But in either case, a privacy setting that sites already use today would acquire a new and perhaps undesirable cost. |
This would go counter to the goals I think. In that you should be able to embed a common widget without that widget learning about all the websites it's on. Also, the point of using referrer policy is to align it with the information that was available to websites before |
|
Thanks for that additional context @michaelkleber, @jkarlin, @SpaceGnome! Being able to embed a widget without it knowing where it is embedded, and being able to embed an ad without letting the embedder hide itself, seem to be conflicting requirements.
This seems a bit disruptive, if there are a lot of pages doing this. Checking Chromium use counters:
These are all high, but account for any value. The ones impacted here are Checking 1% of httparchive, I see this distribution of values:
|
| referrer_policy_value | count_pages |
|---|---|
| no-referrer-when-downgrade | 3439 |
| strict-origin-when-cross-origin | 3161 |
| origin | 52 |
| unsafe-url | 23 |
| no-referrer | 18 |
| strict-origin | 4 |
| strict-origin-when- | 1 |
| no-follow | 1 |
| no-referrer-when- | 1 |
| no-referrer-when-downgrad | 1 |
Referrer-Policy header
| referrer_policy_value | count_pages |
|---|---|
| strict-origin-when-cross-origin | 15468 |
| no-referrer-when-downgrade | 8566 |
| same-origin | 3682 |
| no-referrer | 1497 |
| origin-when-cross-origin | 967 |
| strict-origin | 941 |
| origin | 496 |
| unsafe-url | 374 |
| 342 | |
| no-referrer, strict-origin-when-cross-origin | 216 |
| origin-when-cross-origin, strict-origin-when-cross-origin | 207 |
| no-referrer-when-downgrade, strict-origin-when-cross-origin | 205 |
| : no-referrer-when-downgrade, strict-origin | 170 |
| same-origin, strict-origin-when-cross-origin | 17 |
| no-referrer, same-origin | 11 |
| ... | ... |
The total number of pages in the dataset is 23,517,510, so 1% of that is 235,175. This means about 2.2% of pages have Referrer-Policy: same-origin or Referrer-Policy: no-referrer. If we assume the numbers for meta is about the same, the impact is about 4.4% of pages.
But for iframe the numbers are more encouraging: same-origin didn't show up at all and no-referrer is 18 pages, or ~0.0077% of pages.
New proposal
If we drop the "fallback" to use the policy container referrer policy (which is set by Referrer-Policy header or a meta element), we don't change the behavior for ~99.99% of pages. But still allow pages to hide their own origin when embedding an iframe by using the referrerpolicy attribute.
Ad scripts that detect fraud will need to check for "null" in the ancestorOrigins list (either just the top value or maybe any value in the list). I think this should already be the case, because of Content-Security-Policy: sandbox (which can be used by top-level and makes the origin opaque).
|
That "New proposal" sounds entirely reasonable to me. Use of If I understand correctly, this would mean that the HTTP |
|
Yes, exactly. Or, the fetch of the iframe |
I've now made this change in this PR. @theIDinside we can keep the tests that exercise |
This aligns
ancestorOriginsexposure with referrer policy when using theiframe referrerpolicyattribute, so an embedder can prevent revealing its own origin to embedded documents. If an<iframe>usesreferrerpolicy="no-referrer"orsame-origin(and the parent and child are cross-origin), the parent’s origin and any same-origin ancestors are replaced with opaque origins (until reaching an ancestor that is cross-origin). Other policies continue to expose full origins.If there's noreferrerpolicyattribute, the embedder document's referrer policy is used.This approach keeps existing behavior by default (for web compat) while addressing privacy concerns with an opt-out.
The algorithm reuses the parent's existing list of ancestor origins, avoiding synchronous cross-process lookups and ensuring a stable snapshot even if ancestors mutate their
referrerpolicyattributes later.Fixes #1918. Closes #2480.
(See WHATWG Working Mode: Changes for more details.)
/iframe-embed-object.html ( diff )
/infrastructure.html ( diff )
/nav-history-apis.html ( diff )