Normative: Extend JSON.parse and JSON.stringify for interaction with the source text of primitive values#3714
Normative: Extend JSON.parse and JSON.stringify for interaction with the source text of primitive values#3714
Conversation
|
The rendered spec for this PR is available as a single page at https://tc39.es/ecma262/pr/3714 and as multiple pages at https://tc39.es/ecma262/pr/3714/multipage . |
spec.html
Outdated
| <p>The `rawJSON` function returns an object representing raw JSON text of a string, number, boolean, or null value.</p> | ||
| <emu-alg> | ||
| 1. Let _jsonString_ be ? ToString(_text_). | ||
| 1. Throw a *SyntaxError* exception if _jsonString_ is the empty String, or if either the first or last code unit of _jsonString_ is any of 0x0009 (CHARACTER TABULATION), 0x000A (LINE FEED), 0x000D (CARRIAGE RETURN), or 0x0020 (SPACE). |
There was a problem hiding this comment.
I noticed that we never use in algorithms the structure "Throw a exception if ", it's usually "If , throw a exception."
There was a problem hiding this comment.
We did prior to #3540, when this text was originally written, but yeah we've done away with that now.
There was a problem hiding this comment.
Fixed, and added tighter assertions.
| 1. Let _internalSlotsList_ be « [[IsRawJSON]] ». | ||
| 1. Let _obj_ be OrdinaryObjectCreate(*null*, _internalSlotsList_). |
There was a problem hiding this comment.
Any reason for not inlining _internalSlotsList_, like we do in most OrdinaryObjectCreate calls where we pass extra slots?
There was a problem hiding this comment.
We've got examples of both, and I personally like having the name as an explicit hint of semantics for an otherwise opaque list of slot references, but am willing to inline it if that's the consensus preference among editors.
There was a problem hiding this comment.
I would inline it, but no strong preference.
spec.html
Outdated
| 1. Else, | ||
| 1. Assert: _typedValNode_ is an |ObjectLiteral| Parse Node. | ||
| 1. Let _propertyNodes_ be PropertyDefinitionNodes of _typedValNode_. | ||
| 1. NOTE: Because _val_ was produced from JSON text and has not been modified, all of its property keys are Strings and will be exhaustively enumerated in source text order. |
There was a problem hiding this comment.
Is this true? In Object.keys({ "a": 1, "0": 2 }) they are not enumerated in source text order.
There was a problem hiding this comment.
Nice catch! I removed the "in source text order" part of the note, which isn't important anyway.
…the source text of primitive values tc39/ecma262#3714
…the source text of primitive values
…n with the source text of primitive values
…n with the source text of primitive values
…n with the source text of primitive values
…n with the source text of primitive values
d7925aa to
477c794
Compare
spec.html
Outdated
| 1. Return _unaryExpression_. | ||
| 1. Else, | ||
| 1. Return _candidate_. | ||
| 1. If _queuedChildren_ is *false* and _candidate_ is an instance of a nonterminal and _candidate_ Contains _type_ is *true*, then |
There was a problem hiding this comment.
What does "is an instance of a nonterminal" mean? I read that as "is a Parse Node". How can this condition ever fail?
There was a problem hiding this comment.
The condition fails if _candidate_ is an instance of a terminal symbol.
Maybe such nodes shouldn't be put in _queue_ in the first place.
There was a problem hiding this comment.
It's consistent with similar use near the definition of Parse Node in The Syntactic Grammar:
When a Parse Node is an instance of a nonterminal, it is also an instance of some production that has that nonterminal as its left-hand side. Moreover, it has zero or more children, one for each symbol on the production's right-hand side: each child is a Parse Node that is an instance of the corresponding symbol.
I'd prefer to continue performing filtering on candidate rather than on its children, but we probably could drop the middle condition, leaving If _queuedChildren_ is *false* and _candidate_ Contains _type_ is *true* by trusting that Contains will never return true when called with a terminal symbol because such a Parse Node has no child nodes of its own.
| <emu-clause id="sec-createjsonparserecord" type="abstract operation"> | ||
| <h1> | ||
| CreateJSONParseRecord ( | ||
| _parseNode_: a Parse Node, |
There was a problem hiding this comment.
This isn't just any Parse Node, it's one of a handful of kinds of nodes. Let's list them explicitly here.
There was a problem hiding this comment.
JSON.parse calls CreateJSONParseRecord with the Parse Node returned from ParseJSON, which (due to arbitrary decisions about the definition of that algorithm) is a |Script|. And for similar arbitrary decisions about how the grammar is structured, recursive calls provide an |AssignmentExpression|. Neither of those have any obvious connection with JSON parsing, and baking them in risks introducing a spec bug if some future refactoring changes those arbitrary decisions (e.g., if ParseJSON is updated to use goal symbol |ParenthesizedExpression| or |Expression| or |AssignmentExpression|). So I'd rather keep CreateJSONParseRecord as-is and rely upon ShallowestContainedJSONValue to ensure that consistency.
spec.html
Outdated
| 1. If _candidate_ is an instance of _type_, then | ||
| 1. NOTE: In the JSON grammar, a <code>number</code> token may represent a negative value. In ECMAScript, negation is represented as a unary operation. | ||
| 1. If _type_ is |UnaryExpression|, then | ||
| 1. Set _unaryExpression_ to _candidate_. |
There was a problem hiding this comment.
Why are we using this alias instead of just returning _candidate_ directly? In fact, _types_ need not even contain NumericLiteral at all. Or am I mistaken?
There was a problem hiding this comment.
All literal nonterminals appear deeply wrapped inside a |UnaryExpression| (including non-primitive |ArrayLiteral| and |ObjectLiteral|), but only |NumericLiteral|s might need to be associated with a terminal symbol from that level (the leading - for negating an unsigned number). So while you're not exactly mistaken, a refactoring of ShallowestContainedJSONValue to return |UnaryExpression| upon encountering one would result in it only returning such nodes and distributing the discovery of the actually relevant literal nodes elsewhere, which I think would be a much worse way to specify things.
Consider the full AST for JSON.parse("[-1]"), bolding each |UnaryExpression|:
Details
([-1]);- Script
- ScriptBody
- StatementList
- StatementListItem
- Statement
- ExpressionStatement
- Expression
;
([-1])- AssignmentExpression
- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- CoverParenthesizedExpressionAndArrowParameterList → ParenthesizedExpression
(Expression)
[-1]- AssignmentExpression
- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- ArrayLiteral
[ElementList]
-1ElisionoptAssignmentExpression- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
-UnaryExpression
1- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- Literal
- NumericLiteral
In fact, that exercise highlighted a mistake on my part in ShallowestContainedJSONValue that I just pushed a fix for (in |UnaryExpression| → - |UnaryExpression|, we must keep holding the outer Parse Node rather than the inner one).
spec.html
Outdated
| 1. Assert: The first code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS). | ||
| 1. Assert: The last code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), or 0x0022 (QUOTATION MARK). |
There was a problem hiding this comment.
Why does the reader care about this?
There was a problem hiding this comment.
It's relevant to the constraint on jsonString that it must be the encoding of a JSON primitive.
| 1. Let _elements_ be a new empty List. | ||
| 1. Let _entries_ be a new empty List. | ||
| 1. If _val_ is an Object, then | ||
| 1. Let _isArray_ be ! <emu-meta suppress-effects="user-code">IsArray(_val_)</emu-meta>. |
There was a problem hiding this comment.
I think we can just inline this.
There was a problem hiding this comment.
It's conforming with the already-common pattern as seen in InternalizeJSONProperty, JSON.stringify, SerializeJSONProperty, and (less relevantly) ArraySpeciesCreate and Object.prototype.toString.
| 1. For each Parse Node _propertyNode_ of _propertyNodes_, do | ||
| 1. Let _propName_ be PropName of _propertyNode_. | ||
| 1. If _propName_ is _P_, set _propertyDefinition_ to _propertyNode_. | ||
| 1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>. |
There was a problem hiding this comment.
| 1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>. | |
| 1. Assert: _propertyDefinition_ is a <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>. |
I think I prefer to use "is a" whenever we're describing the type of a value. We could also go with something more verbose like
| 1. Assert: _propertyDefinition_ is <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>. | |
| 1. Assert: _propertyDefinition_ is a |PropertyDefinition| matching <emu-grammar>PropertyDefinition : PropertyName `:` AssignmentExpression</emu-grammar>. |
Do we have editorial precedent here?
There was a problem hiding this comment.
The status quo has about 13 occurrences of
[something] is <emu-grammar>production</emu-grammar>
and 1 of
[alias] is an instance of the production <emu-grammar>production</emu-grammar>
| <emu-alg> | ||
| 1. Let _elements_ be a new empty List. | ||
| 1. If |ElementList| is present, set _elements_ to the list-concatenation of _elements_ and ArrayLiteralContentNodes of |ElementList|. | ||
| 1. If |Elision| is present, append |Elision| to _elements_. |
There was a problem hiding this comment.
Elision can't be present in JSON source text. Can't we just assert it's not present instead?
edit: We're already assuming we have whole-world knowledge about the calling patterns of these AOs because in CreateJSONParseRecord we do a ! Get that would possibly fail on a read-through to Array.prototype if the input had allowed Elisions. If we weren't assuming whole-world, we'd have to guard that step on whether _contentNodes_[_I_] was an Elision.
There was a problem hiding this comment.
Whole-world knowledge is appropriate in CreateJSONParseRecord where we're in the JSON section and confined to parsing the JSON grammar. But ArrayLiteralContentNodes is a general syntax-directed operation, and potentially useful outside of that context (where keeping track of elisions would be important). And even beyond that, the current approach falls out very naturally from simple mapping of the productions. Unless you want to make an argument that it's harmful and/or confusing, I much prefer the generality.
| ): a JSON Parse Record | ||
| </h1> | ||
| <dl class="header"> | ||
| <dt>description</dt> |
There was a problem hiding this comment.
@jhnaldo Do you know why we're getting parse errors from esmeta CI pointing to (I'm assuming) this line? https://github.com/tc39/ecma262/actions/runs/21416219811/job/61665316539?pr=3714
I don't see any obvious syntax errors.
jmdyck
left a comment
There was a problem hiding this comment.
(In retrospect, I probably should have filed a PR against the source branch. Ah well.)
|
+1 all of @jmdyck's review comments. |
…n with the source text of primitive values tc39#3714 (review)
…n with the source text of primitive values tc39#3749
…n with the source text of primitive values tc39#3714 (review)
…n with the source text of primitive values tc39#3714 (review)
…n with the source text of primitive values
…n with the source text of primitive values
|
I adopted all suggestions from @jmdyck and made updates in response to @michaelficarra feedback. Ready for further review. |
…n with the source text of primitive values
This implements JSON.parse source text access, currently at stage 3. Tests have already been merged, but are being further extended by tc39/test262#4682 .