Immediate token handling and "next token" are incompatible

The spec makes the following two statements:

https://html.spec.whatwg.org/multipage/parsing.html#tokenization

> When a token is emitted, it must immediately be handled by the tree construction stage\. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream\. \(For example, the script element can result in scripts executing and using the dynamic markup insertion APIs to insert characters into the stream being tokenized\.\)

https://html.spec.whatwg.org/multipage/parsing.html#next-token

> The next token is the token that is about to be processed by the tree construction dispatcher \(even if the token is subsequently just ignored\)\.

These two statements seem to be incompatible with each other. How can the tree constructor know what the "next token" is if the tokenizer is supposed to wait for the tree constructor to finish its steps?

For example, take the following steps from the [in body](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody) insertion mode.

> When the user agent is to apply the rules for the "in body" insertion mode, the user agent must handle the token as follows:
> ...
> **A start tag whose tag name is "textarea"**
> Run these steps:
> 1. Insert an HTML element for the token.
> 2. If the _next token_ is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of textarea elements are ignored as an authoring convenience.)
> 3. Switch the tokenizer to the RCDATA state.
> 4. Let the original insertion mode be the current insertion mode.
> 5. Set the frameset-ok flag to "not ok".
> 6. Switch the insertion mode to "text".

So, if the tokenizer has just emitted the start tag token, then it is supposed to wait for the tree constructor to run these steps before parsing the next token. How does the tree constructor know whether the next token is a `\n` character token if the tokenizer hasn't parsed it yet? When "next token" appears, does that mean the tree constructor is giving the tokenizer permission to parse another token?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immediate token handling and "next token" are incompatible #4910

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Immediate token handling and "next token" are incompatible #4910

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions