-
Notifications
You must be signed in to change notification settings - Fork 69
Add EEP for native records #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bjorng and @garazdawi! The proposal is well written and building on top of records does a beautiful job of keeping the language changes minimal. Removing the undefined as default for fields is another great change, as well as having the optional of having "required fields" (which must be given on creation).
My only criticism to the proposal is that it doesn't discuss maps at all. For the last ~10 years, we have been using maps as replacements for records, and while native records aim to ease the migration between "records -> native records", there are no paths between "maps -> native records". Perhaps this is intentional, you are not expecting anyone to migrate between maps and native records, but then we have to deal with the side-effect that, as native records improve, a lot of code (specially in Elixir) will be forever suboptimal.
If I am allowed to spitball a bit, I'd love if "native records" were implemented as "named maps" or "record maps" behind the scenes. The proposal would stay roughly the same, the only differences are that:
-
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
-
Updating any record could use the existing map syntax and just work:
Expr#{Field1=Expr1, ..., FieldN=ExprN}
You could also support the proposed #_{...} syntax and it would additionally check it is a record and not a "plain map". is_map/1 would obviously return true for records but you could do a more precise check with is_record.
Regarding the key-destructive map operations, such as maps:put/3 or maps:remove/2, I'd make them fail unless you "untag" the map (or you could allow them to succeed but the tag would be removed at the end, which I find too subtle).
Overall, the goal would be to effectively unify records and maps, removing the decade-old question "records or maps". This would also provide a path for Erlang and Elixir to unify behind the same constructs, so I'd love to hear your opinions.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth adding more context to the errors, such as {badrecord, {default_missing, key}}. I know format_error can be used (and have additional side-band information attached), but there may be a benefits in being upfront about it too?
This would also allow distinguishing from other errors below, such as {badrecord, not_found}, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More context definitely better, even if it gives only one missing key out of several.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the error reason to {novalue,Field} (where novalue is a new error reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
In erlang there is no existing map syntax for accessing a single field today,
that is still missing, was never implemented I guess because maps can have arbitrary terms as keys.
I guess it could be added for "literal" fields?
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgud yes! I keep forgetting that it was part of EEP but not implemented.
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
Yes, that should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion in erlang/otp#9174 on a possible x[y] notation - @michalmuskala suggested M#[Field].
essen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Would it make sense to have finer grained control on who can do what? For example restrict creation to the defining module while still providing access to fields; or read-only fields outside the defining module. Probably doesn't matter for 95% of use cases I reckon.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More context definitely better, even if it gives only one missing key out of several.
| The following syntax allows accessing field `Field` in any record: | ||
| ```erlang | ||
| Expr#_.Field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a new use case because we could have two records v1 and v2 that share some fields, and use this syntax to access the common fields. It probably needs to be highlighted.
a4bdeec to
f5b4834
Compare
We didn't really think about migrations from maps. Your suggestion seems reasonable. We will discuss this in the OTP team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a cool EEP! Thank you! I've a handful of questions
Language interop
One aspect of this which the document does not touch on that I think could be highly impactful for the BEAM ecosystem is language interop. Today each major language has a different preference for fixed key-value data structures:
- Erlang: maps and records
- Elixir: maps with a special field containing a module atom
- Gleam: records
This creates some degree of friction when calling code from other BEAM languages. If they all were to largely use native records then this friction go away, making interop between languages would be a much better experience.
I'm not immediately seeing any problems for Gleam as we use records there, but it seems like it would be more challenging for Elixir there we use maps.
Adoption within existing OTP modules
It seems that in an ideal world that native records would be the ubiquitous data structure, once they are available. Would existing Erlang/OTP modules be updated to work with them?
Functions that expect classic records, tagged tuples, and maps could have new function clauses added to handle native records in a backwards compatible way, unless I am mistaken. It seems that due to not being compatible with maps or tuples there would be very little ability to update existing functions to return native records.
Is there something we could do here? Or is the expectation that Erlang/OTP code will use different data structures depending on how old it is?
Construction syntax
Is the only difference between the native and classic record syntaxes the # character in the name of the definition? This seems like it will be very error prone, and also hard for less familiar people to debug as the definition will be accepted by the Erlang compiler, but their attempts to construct the record will fail.
Thank you all!
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
|
|
||
| ### Anonymous access of native records | ||
|
|
||
| The following syntax allows accessing field `Field` in any record: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Are there situations in which one cannot use the anonymous syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Yes, certain optimizations we are thinking about implementing will not be applied when using anonymous syntax.
Are there situations in which one cannot use the anonymous syntax?
No.
|
|
||
| 1. Creation of native records cannot be done in guards | ||
|
|
||
| 2. `element/2` will not accept native records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple records can be constructed with field names (using the #blah{a=1,b=2} syntax) or without field names (using the {blah, 1, 2}) syntax. Do native records have a field-nameless #blah{1, 2} construction syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
The nameless syntax for tuple records only works because traditional records are implemented using tuples.
| (tuples). So, their performance characteristics should align with maps | ||
| (with insignificant overhead for runtime validation). Additionally, | ||
| given that native-records are more specialized versions of maps (with | ||
| all keys being atoms), there is potential for optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expectation that native records will have inferior performance to tuple records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know yet, and it also depends on exactly which operations we are talking about. For example, updating multiple elements in a small map (no more than 32 elements) is quite efficient because it can be done in one pass, and similarly matching out multiple elements is also efficient because it can be done in one pass. Accessing one element at the time is more expensive for a map than for a tuple record.
In the first implementation to be released, native records will be implemented similar to how small maps are implemented, with similar performance. We have an idea for an optimization that would make it faster than maps.
Yes. If a function returns a tuple record, all we can do is to create a new function that returns a native record.
To some extent, yes. I think that is already the case.
Agreed. |
|
Thank you @bjorng. To confirm: all fields must have names? There are no positional fields? Can one define a record which does not have any fields? -record #none{}.
-record #some{value = term()}.
-type option() :: #none{} | #some{} |
Yes.
Yes. |
We are trying to limit the scope here to try to get it in to 29. Hmm, this would require some additional syntax, 'private' | 'protected' | 'public' (borrowing from ets access types), personally |
|
@lpil wrote:
It is a different format, not only the # character, it is like record creation: -record(foo, {a, b, c}).
%% vs.
-record #foo{a, b, c}.So there is also a comma and parentheses that are different. |
|
I don't see the concern raised that we'll now have record, native records and maps. I know they all serve different purposes, but the differences are slight and new users definitely get confused about which to use. I don't think it should be a blocker to adding a new data structure that can help improve devex or performance when choosing the right one but it does worry me that this can't replace records. Related to not replacing records, I think |
Right it's something that likely doesn't have to be done at runtime, but it is important to consider at least in the documentation part, as internal fields definitely shouldn't be documented. Read-only fields can be marked as such easily in the text. But this depends on what the documentation will look like I suppose. |
997a60f to
d2ab33d
Compare
|
Update: there can now be two distinct errors when |
|
@tsloughter wrote:
"Not replacing records" is really phrased: not "Replacing all tuple record usage scenarios.", and "Replace most tuple-record usages without having to update anything but the declaration.". With that in mind I think overloading -record may be more good than bad. |
|
@RaimoNiskanen unless it outright replaces, 100%, named tuple records I think using I take it there are a million reasons At first I wanted I can concede there is no good alternative to |
|
I do share the concern that the similar syntax is confusing, and the difference being 2 characters of punctuation makes it challenging to differentiate between the two when reviewing code.
-struct #state{
values = [] :: list(number()),
avg = 0.0 :: float()
}.Elixir does already have "defstruct", though that construct does seem very similar to this proposal in design and purpose. |
Question: Behavior when adding a field across distributed nodesConsider this scenario: Node A (old code): -record #state{count = 0, name = "default"}.
State = #state{count = 5, name = "server1"}Node B (new code): -record #state{count = 0, name = "default", version = 1}.When Node A sends a
Issue: This seems to check against the definition on Node B, not the fields in the value from Node A. It's unclear whether the update would:
|
Question: Field renaming across nodesConsider this scenario where a field is renamed: Node A (old code): -record #user{id, username, city}.
User = #user{id = 1, username = "alice", city = "Stockholm"}Node B (new code - username renamed to name): -record #user{id, name, city}.When Node A sends
Issue: Field renames appear to be breaking changes in distributed systems. The EEP states:
This means reading works purely based on the value's fields, not the definition. So:
|
Question: Removing a fieldConsider this scenario: Node A (old code): -record #config{host, port, legacy_timeout}.
Config = #config{host = "localhost", port = 8080, legacy_timeout = 5000}Node B (new code - legacy_timeout removed): -record #config{host, port}.When Node A sends
Issue: The interaction between field access (which ignores definition) and pattern matching (which may or may not check definition) is unclear. |
eeps/eep-0079.md
Outdated
| record. | ||
| ```erlang | ||
| is_record(Term :: dynamic(), #Module:Name) -> boolean(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the idea of allowing free #r or #m:r as in is_record(X, #r), because those fragments do not have any semantic meaning on their own, but they will have to be handled as something that may occur in any expression position. I think the rationale further below is wrong in this case: it's much worse to introduce pseudo-expressions that have no defined meaning outside a specific context. And you won't be using that syntax for the API functions records:create(Module, Name, ....) etc. anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the idea of allowing free
#ror#m:ras inis_record(X, #r)
It turns out you are not alone in don't liking it. After some discussions, we have decided to abandon that syntax. Instead, is_record/3 will be overloaded to test for a record in another module.
| but guaranteed to always work and be more efficient. | ||
| TODO: What should the name of the BIF be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like is_subtype or is_compatible, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think subtype and compatible would make me think the type of the fields are the same and are also checked. Perhaps has_record_fields but there are no guards starting with has_. I am not sure how to phrase it using is_.
| Note, however, that it is possible to match on the name of a | ||
| non-exported record. Thus, if the `match_name/1` function in the | ||
| following example is called with an instance of record `r` defined in | ||
| `some_module`, it will succeed even if the record is not exported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this matching on non exported records by name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have the possibility to use them in an api as socket for example where you want to be able to match it but not the content. Today with opauge types, you have to wrap them in a two tuple {socket, OpagueStaff}.
Don't allow specification of types for a native record outside of its definition.
103a0bf to
4024026
Compare
4024026 to
dc31588
Compare
|
It turns out that I had failed to fully update the EEP regarding when a record definition is consulted (or not). In our internal meetings we had decided that the definition for a native record is only used when creating a record. All update and read operations will only refer to the value of record (which includes the names, whether it was exported at value-creation time, the name of the record, and of course all values). I've now pushed another commit to hopefully make that clearer. |
|
If the definition is never consulted on update, how do you upgrade values to the new definition? Do you have to explicitly deconstruct & reconstruct and keep version of the code doing that for all possible values that are live in the system? |
Yes. Upgrades must be done explicitly.
While the design was simple, we discovered that the actual implementation was far from simple, especially when more than one node is involved (or even when loading from the disk native records written by a previous instance of the runtime system). Also, we are not sure that is easy for the user to understand and handle all the implications of automatic upgrades of native records. |
Apply Rickard's suggestions.
601ecf5 to
376a2d0
Compare
No description provided.