Skip to content

Comments

Normative: Extend JSON.parse and JSON.stringify for interaction with the source text of primitive values#3714

Open
gibson042 wants to merge 13 commits intotc39:mainfrom
gibson042:proposal-json-parse-with-source
Open

Normative: Extend JSON.parse and JSON.stringify for interaction with the source text of primitive values#3714
gibson042 wants to merge 13 commits intotc39:mainfrom
gibson042:proposal-json-parse-with-source

Conversation

@gibson042
Copy link
Member

@gibson042 gibson042 commented Nov 7, 2025

This implements JSON.parse source text access, currently at stage 3. Tests have already been merged, but are being further extended by tc39/test262#4682 .

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

The rendered spec for this PR is available as a single page at https://tc39.es/ecma262/pr/3714 and as multiple pages at https://tc39.es/ecma262/pr/3714/multipage .

@ljharb ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. labels Nov 8, 2025
spec.html Outdated
<p>The `rawJSON` function returns an object representing raw JSON text of a string, number, boolean, or null value.</p>
<emu-alg>
1. Let _jsonString_ be ? ToString(_text_).
1. Throw a *SyntaxError* exception if _jsonString_ is the empty String, or if either the first or last code unit of _jsonString_ is any of 0x0009 (CHARACTER TABULATION), 0x000A (LINE FEED), 0x000D (CARRIAGE RETURN), or 0x0020 (SPACE).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we never use in algorithms the structure "Throw a exception if ", it's usually "If , throw a exception."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did prior to #3540, when this text was originally written, but yeah we've done away with that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and added tighter assertions.

Comment on lines 47301 to 47302
1. Let _internalSlotsList_ be « [[IsRawJSON]] ».
1. Let _obj_ be OrdinaryObjectCreate(*null*, _internalSlotsList_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not inlining _internalSlotsList_, like we do in most OrdinaryObjectCreate calls where we pass extra slots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got examples of both, and I personally like having the name as an explicit hint of semantics for an otherwise opaque list of slot references, but am willing to inline it if that's the consensus preference among editors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inline it, but no strong preference.

spec.html Outdated
1. Else,
1. Assert: _typedValNode_ is an |ObjectLiteral| Parse Node.
1. Let _propertyNodes_ be PropertyDefinitionNodes of _typedValNode_.
1. NOTE: Because _val_ was produced from JSON text and has not been modified, all of its property keys are Strings and will be exhaustively enumerated in source text order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? In Object.keys({ "a": 1, "0": 2 }) they are not enumerated in source text order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I removed the "in source text order" part of the note, which isn't important anyway.

Jack-Works added a commit to engine262/engine262 that referenced this pull request Nov 24, 2025
@michaelficarra michaelficarra added has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. and removed pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. labels Nov 24, 2025
spec.html Outdated
1. Return _unaryExpression_.
1. Else,
1. Return _candidate_.
1. If _queuedChildren_ is *false* and _candidate_ is an instance of a nonterminal and _candidate_ Contains _type_ is *true*, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "is an instance of a nonterminal" mean? I read that as "is a Parse Node". How can this condition ever fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition fails if _candidate_ is an instance of a terminal symbol.

Maybe such nodes shouldn't be put in _queue_ in the first place.

Copy link
Member Author

@gibson042 gibson042 Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's consistent with similar use near the definition of Parse Node in The Syntactic Grammar:

When a Parse Node is an instance of a nonterminal, it is also an instance of some production that has that nonterminal as its left-hand side. Moreover, it has zero or more children, one for each symbol on the production's right-hand side: each child is a Parse Node that is an instance of the corresponding symbol.

I'd prefer to continue performing filtering on candidate rather than on its children, but we probably could drop the middle condition, leaving If _queuedChildren_ is *false* and _candidate_ Contains _type_ is *true* by trusting that Contains will never return true when called with a terminal symbol because such a Parse Node has no child nodes of its own.

<emu-clause id="sec-createjsonparserecord" type="abstract operation">
<h1>
CreateJSONParseRecord (
_parseNode_: a Parse Node,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just any Parse Node, it's one of a handful of kinds of nodes. Let's list them explicitly here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.parse calls CreateJSONParseRecord with the Parse Node returned from ParseJSON, which (due to arbitrary decisions about the definition of that algorithm) is a |Script|. And for similar arbitrary decisions about how the grammar is structured, recursive calls provide an |AssignmentExpression|. Neither of those have any obvious connection with JSON parsing, and baking them in risks introducing a spec bug if some future refactoring changes those arbitrary decisions (e.g., if ParseJSON is updated to use goal symbol |ParenthesizedExpression| or |Expression| or |AssignmentExpression|). So I'd rather keep CreateJSONParseRecord as-is and rely upon ShallowestContainedJSONValue to ensure that consistency.

spec.html Outdated
1. If _candidate_ is an instance of _type_, then
1. NOTE: In the JSON grammar, a <code>number</code> token may represent a negative value. In ECMAScript, negation is represented as a unary operation.
1. If _type_ is |UnaryExpression|, then
1. Set _unaryExpression_ to _candidate_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this alias instead of just returning _candidate_ directly? In fact, _types_ need not even contain NumericLiteral at all. Or am I mistaken?

Copy link
Member Author

@gibson042 gibson042 Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All literal nonterminals appear deeply wrapped inside a |UnaryExpression| (including non-primitive |ArrayLiteral| and |ObjectLiteral|), but only |NumericLiteral|s might need to be associated with a terminal symbol from that level (the leading - for negating an unsigned number). So while you're not exactly mistaken, a refactoring of ShallowestContainedJSONValue to return |UnaryExpression| upon encountering one would result in it only returning such nodes and distributing the discovery of the actually relevant literal nodes elsewhere, which I think would be a much worse way to specify things.

Consider the full AST for JSON.parse("[-1]"), bolding each |UnaryExpression|:

Details

  1. ([-1]);
    1. Script
    2. ScriptBody
    3. StatementList
    4. StatementListItem
    5. Statement
    6. ExpressionStatement
    7. Expression ;
  2. ([-1])
    1. AssignmentExpression
    2. ConditionalExpression
    3. ShortCircuitExpression
    4. LogicalORExpression
    5. LogicalANDExpression
    6. BitwiseORExpression
    7. BitwiseXORExpression
    8. BitwiseANDExpression
    9. EqualityExpression
    10. RelationalExpression
    11. ShiftExpression
    12. AdditiveExpression
    13. MultiplicativeExpression
    14. ExponentiationExpression
    15. UnaryExpression
    16. UpdateExpression
    17. LeftHandSideExpression
    18. NewExpression
    19. MemberExpression
    20. PrimaryExpression
    21. CoverParenthesizedExpressionAndArrowParameterList → ParenthesizedExpression
    22. ( Expression )
  3. [-1]
    1. AssignmentExpression
    2. ConditionalExpression
    3. ShortCircuitExpression
    4. LogicalORExpression
    5. LogicalANDExpression
    6. BitwiseORExpression
    7. BitwiseXORExpression
    8. BitwiseANDExpression
    9. EqualityExpression
    10. RelationalExpression
    11. ShiftExpression
    12. AdditiveExpression
    13. MultiplicativeExpression
    14. ExponentiationExpression
    15. UnaryExpression
    16. UpdateExpression
    17. LeftHandSideExpression
    18. NewExpression
    19. MemberExpression
    20. PrimaryExpression
    21. ArrayLiteral
    22. [ ElementList ]
  4. -1
    1. Elisionopt AssignmentExpression
    2. ConditionalExpression
    3. ShortCircuitExpression
    4. LogicalORExpression
    5. LogicalANDExpression
    6. BitwiseORExpression
    7. BitwiseXORExpression
    8. BitwiseANDExpression
    9. EqualityExpression
    10. RelationalExpression
    11. ShiftExpression
    12. AdditiveExpression
    13. MultiplicativeExpression
    14. ExponentiationExpression
    15. UnaryExpression
    16. - UnaryExpression
  5. 1
    1. UpdateExpression
    2. LeftHandSideExpression
    3. NewExpression
    4. MemberExpression
    5. PrimaryExpression
    6. Literal
    7. NumericLiteral

In fact, that exercise highlighted a mistake on my part in ShallowestContainedJSONValue that I just pushed a fix for (in |UnaryExpression| → - |UnaryExpression|, we must keep holding the outer Parse Node rather than the inner one).

spec.html Outdated
Comment on lines 47500 to 47501
1. Assert: The first code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS).
1. Assert: The last code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), or 0x0022 (QUOTATION MARK).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the reader care about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's relevant to the constraint on jsonString that it must be the encoding of a JSON primitive.

1. Let _elements_ be a new empty List.
1. Let _entries_ be a new empty List.
1. If _val_ is an Object, then
1. Let _isArray_ be ! <emu-meta suppress-effects="user-code">IsArray(_val_)</emu-meta>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just inline this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's conforming with the already-common pattern as seen in InternalizeJSONProperty, JSON.stringify, SerializeJSONProperty, and (less relevantly) ArraySpeciesCreate and Object.prototype.toString.

1. For each Parse Node _propertyNode_ of _propertyNodes_, do
1. Let _propName_ be PropName of _propertyNode_.
1. If _propName_ is _P_, set _propertyDefinition_ to _propertyNode_.
1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>.
1. Assert: _propertyDefinition_ is a <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>.

I think I prefer to use "is a" whenever we're describing the type of a value. We could also go with something more verbose like

Suggested change
1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>.
1. Assert: _propertyDefinition_ is a |PropertyDefinition| matching <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>.

Do we have editorial precedent here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status quo has about 13 occurrences of
[something] is <emu-grammar>production</emu-grammar>
and 1 of
[alias] is an instance of the production <emu-grammar>production</emu-grammar>

<emu-alg>
1. Let _elements_ be a new empty List.
1. If |ElementList| is present, set _elements_ to the list-concatenation of _elements_ and ArrayLiteralContentNodes of |ElementList|.
1. If |Elision| is present, append |Elision| to _elements_.
Copy link
Member

@michaelficarra michaelficarra Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elision can't be present in JSON source text. Can't we just assert it's not present instead?

edit: We're already assuming we have whole-world knowledge about the calling patterns of these AOs because in CreateJSONParseRecord we do a ! Get that would possibly fail on a read-through to Array.prototype if the input had allowed Elisions. If we weren't assuming whole-world, we'd have to guard that step on whether _contentNodes_[_I_] was an Elision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole-world knowledge is appropriate in CreateJSONParseRecord where we're in the JSON section and confined to parsing the JSON grammar. But ArrayLiteralContentNodes is a general syntax-directed operation, and potentially useful outside of that context (where keeping track of elisions would be important). And even beyond that, the current approach falls out very naturally from simple mapping of the productions. Unless you want to make an argument that it's harmful and/or confusing, I much prefer the generality.

): a JSON Parse Record
</h1>
<dl class="header">
<dt>description</dt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhnaldo Do you know why we're getting parse errors from esmeta CI pointing to (I'm assuming) this line? https://github.com/tc39/ecma262/actions/runs/21416219811/job/61665316539?pr=3714

I don't see any obvious syntax errors.

Copy link
Collaborator

@jmdyck jmdyck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In retrospect, I probably should have filed a PR against the source branch. Ah well.)

@michaelficarra
Copy link
Member

+1 all of @jmdyck's review comments.

@gibson042
Copy link
Member Author

I adopted all suggestions from @jmdyck and made updates in response to @michaelficarra feedback. Ready for further review.

@gibson042 gibson042 requested a review from jmdyck February 19, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. normative change Affects behavior required to correctly evaluate some ECMAScript source text proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants