- Abstract
- Problem
- Background
- Proposal
- Details
- Rationale
- Future work
- Alternatives considered
- Allow multiple non-owning declarations, remove the import requirement, or both
- Total number of allowed declarations (owning and non-owning)
- Don't require a modifier on the owning declarations
- Only require
extern
on the first owning declaration - Separate require-direct-import from non-owning declarations
- Other
extern
syntaxes - Have types with
extern
members re-export them - Require syntactic matching for
extern library
declarations
An entity may be declared extern
(such as extern class Foo;
); this means
that its type is only complete if the definition is directly imported. It also
allows for a single declaration in a different library, which must be marked as
extern library "<owning_library>"
(such as extern library "Bar" class Foo;
).
Also, establish a different rule of thumb for when modifier keywords are required: modifier keywords are required when, if prior optional declarations were removed, the lack of the modifier keyword would change behavior.
In the extern
model from
#3762: Merging forward declarations,
multiple extern
declarations are allowed.
#3763: Matching redeclarations
further evolved the extern
keyword.
The prior extern
model assumed that the extern
and non-extern
declarations
of a class formed two different types, which could be merged.
As discussed on #packages-and-libraries,
this runs into an issue with code such as:
library "a";
class C {}
library "b";
extern class C;
extern fn F() -> C*;
library "c";
import library "a";
extern fn F() -> C*;
Here, the return types of F
differ.
This proposal aims to address the differing return types by unifying the type of
C
regardless of whether it's extern
. This could be done under multiple
different approaches, and this proposal aims for one which enables efficient
implementation strategies.
Proposals:
Discussions:
- #packages-and-libraries:
extern
type coherency - #packages-and-libraries: When to allow/disallow redeclarations
- Open discussion 2024-05-09: Number of allowed redeclarations
- Issue #3986: Alternative naming for
has_extern
keyword - Issue #4025: Handling of indirect access of
extern
types - #typesystem: Will
&
have an extension point?
A given entity may have up to three declarations:
- An optional, non-owning
extern library "<owning_library>"
declaration- It must be in a separate library from the definition.
- The owning library's API file must import the
extern
declaration, and must also contain a declaration.
- An optional, owning forward declaration
- This must come before the definition. The API file is considered to be before the implementation file.
- A required, owning definition
The consequential changes to the problem example are then:
library "a";
// This proposal makes the import required.
import library "b";
// This proposal makes `extern` required here.
extern class C {}
library "b";
// This proposal makes `library "a"` required here.
extern library "a" class C;
extern fn F() -> C*;
library "c";
import library "a";
extern fn F() -> C*;
On an owning extern
declaration, such as extern class C {}
, there are two
key effects:
- The declaration must be explicitly imported in order to be complete.
- An "explicit import" means some import path exists where the name is
available to name lookup, including
export import
andexport <name>
.
- An "explicit import" means some import path exists where the name is
available to name lookup, including
- A non-owning
extern library "<owning_library">
declaration is allowed, but not required.
If either owning declaration has the extern
modifier, both must have it.
In the context of the example that is the problem, C
will produce
the same type regardless of whether C
is the owning or non-owning declaration.
This means that both function signatures have identical types.
We do this by only producing a complete type if the owning definition of C
is
imported by name: either directly through import library "a"
, or indirectly
through a chain of export import library "a"
and export C;
. Otherwise, an
incomplete type is used.
This does mean that adding extern
to an owning declaration changes the import
semantic. As a consequence, it is a potentially breaking change for API
consumers that didn't explicitly import the time.
In the presence of extern library "a" class C;
, the required
import library "b"
means that all owning extern class C
declarations are
able to see the extern library "a" class C
declaration as a name collision,
which is merged. This allows the compiler to easily apply the same type to all
declarations. That in turn will be used to ensure libraries which import both
understand the type equality.
An entity marked as extern
is only complete when the definition is explicitly
imported. In the following, examples of indirect, non-explicit uses are given
inside library "o"
.
library "m";
extern class C { fn Member(); }
library "n";
import library "m";
fn F() -> C;
var c: C = {};
var pc: C* = &c;
library "o";
import library "n";
// Invalid: The return type of `C` is incomplete, making the function signature
// invalid.
fn G() { F(); }
// Invalid: Accessing members requires `C` to be complete.
fn UseC() { c.Member(); }
// Valid: Taking the address of `C` doesn't require it to be complete. This is
// possible because `&` doesn't have an extension point.
var indirect_pc: auto = &c;
// Invalid: Copying `C` requires the complete type.
var copy_c: auto = c;
// Valid: Pointer-to-pointer copies are okay.
var copy_pc: auto = pc;
The above rules explicitly do not apply for non-extern
types, as decided in
Issue #4025. In
other words:
library "a";
class C { fn F(); }
library "b";
import library "a";
fn G() -> C;
library "c";
import library "b";
// Valid: `C` is complete here, even though it's not in name lookup.
G().F();
Since extern library "a" class C;
must be imported by the owning library, we
now allow uses of the imported name prior to its declaration within the same
file. This is a divergence from
#3762. It means the
following now works:
library "extern";
extern library "use_extern" class MyType;
library "use_extern";
import library "extern"
// Uses the `extern library` declaration.
fn Foo(val: MyType*);
extern class MyType {
fn Bar[addr self: Self*]() { Foo(self); }
}
Previously, in
#3762, a non-owning
private extern
was valid to declare something as extern without exposing the
name. In this proposal, that would be a non-owning
private extern library "<owning_library>"
for an owning public extern
declaration. However, rather than supporting this version of the syntax, it will
instead be invalid because the name would never be visible to the owning
library. Instead, visibility must match between an
extern library "<owning_library>"
declaration and the owning extern
declaration.
Note, because an owning extern
declaration can be used independently of
extern library "<owning_library>"
, an owning private extern
declaration is
valid in an API file. It has no special behaviors about it, and is merged as
normal.
We should offer some validation that the library in extern library
is correct.
When the owning library is incorrect, it's very likely to be detected in two
cases:
- A compile-time error when the owning library imports the non-owning library, when the owning declaration is evaluated.
- A link-time error as a fallback.
Other cases, such as when both libraries are independently imported, may or may not be caught, dependent upon the cost of validation.
The non-owned extern library
declarations will only use semantic matching for
redeclarations, not syntactic matching. Details of syntactic matching laid out
in #3763 will only
apply to owned declarations in the same library, which may include owned
extern
declarations.
Versus proposal
#3762, the extern
feature is essentially rewritten. No part of extern
should be assumed to still
apply.
- Software and language evolution
- Unifying the type of
extern
entities addresses a type coherency issue. - The
extern
behavior of requiring an explicit import is intended to assist library authors in carefully managing the dependencies on their API.
- Unifying the type of
- Fast and scalable development
- Requiring the non-owning
extern library
declaration be imported by the owning library should improve compiler performance.
- Requiring the non-owning
This proposal makes a trade-off with
Interoperability with and migration from existing C++ code.
The restriction of a unique extern
declaration is expected to require
additional work in migration, because C++ extern
declarations will need to be
consolidated. This is currently counter-balanced by the trade-offs involved,
although it may result in a reevaluation of that aspect of this proposal.
We've only loosely discussed template interactions with extern
. Right now,
what we expect is that when a template declaration uses an extern
type, the
instantiation still occurs in the calling file. Thus, the extern
type's name
would need to be imported in both the file declaring the template, and the file
calling the template.
When the template is in the same package as the extern
type, it could
re-export it. However, we don't support re-exporting names cross-package, and
something like let template ExternType:! auto = OwningPackage.ExternType;
would not actually forward the completeness of ExternType
.
This is expected to be inconvenient, but it may be okay if extern
sees limited
use. It may also be that the template model ends up different from expected.
We limit to one non-owning extern library
declaration. Continuing to allow
multiple extern library
declarations (the previous state) is feasible.
Similarly, we could not require the owning extern
declaration to import the
non-owning extern library
declaration; this could be done with or without
multiple non-owning extern library
declarations. For this set of alternatives,
the issues which would arise are similar.
In the compiler, we want to be able to determine that two types are equal through a unique identifier, such as a 32-bit integer. When one declaration sees another directly, as through an import, we identify the redeclaration by name, and reuse the unique identifier. This deduplication can occur once per declaration. Indirect imports can continue to use the unique identifier.
We could instead support unifying declarations that did not see each other. However, this would require canonicalizing all types by name instead of by unique identifier. For example, consider:
package Other library "type";
extern class MyType {
fn Print();
};
package Other library "use_type";
import library "type";
fn Make() -> MyType*;
package Other library "extern";
extern library "type" class MyType;
package Other library "use_extern";
import library "extern";
fn Print(val: MyType*);
library "merge";
import Other library "use_type";
import Other library "use_extern";
Other.Print(Other.Make());
Here, the "merge" library doesn't see either declaration of MyType
directly.
However, Print(Make())
requires that both declarations of MyType
be
determined as equivalent. This particular indirect use also means that the names
will not have been added to name lookup, so there is no reason for the two
declarations to be associated by name.
In order to do merge these declarations, we would need to identify that fully qualified names and other structural details are equivalent when the type is used (including non-explicit uses, such as interface lookup). We could achieve this, for example, by having a name lookup table for in-use types, managed per library. Each library would also need to validate that declarations were semantically equivalent, versus the current approach validating as part of the redeclaration. The cost of a per-library approach is expected to have a significant impact on the amount of work done as part of semantic analysis.
We may end up wanting to do similar work in order to improve diagnostics for
invalid cases where the non-owning extern library
is not correctly declared
and imported. However, additional work building good diagnostics for
already-identified invalid code is less of a concern than additional work on
fully valid code.
In order to maintain a high-performance compiler, we are taking a restrictive approach that makes it simpler to associate type information.
A few options were considered regarding the number of allowed declarations.
We limit to two owning declarations: the optional forward declaration, and
required definition. The need to provide interface implementations (for example,
impl MyType as Add
) is considered to constrain this choice.
In this category, alternatives considered were:
- Do not restrict the number of declarations
- Allow up to two declarations total
- Allow up to four declarations total
Details for why each alternative was declined are below.
We could not restrict the number of forward declarations, allowing an arbitrary amount -- possibly also after the definition. This would be consistent with C++.
One thing to consider here is modifier keyword behavior. If we require modifier keywords to match across all declarations, that could become a maintenance burden for developers. If we don't, it makes the meaning of a given forward declaration more ambiguous.
This option is declined due to the lack of clear benefit.
Under this option, we would only allow one forward declaration, treating the
non-owning extern library
declaration as a forward declaration. This would
mean two declarations overall, instead of three.
For this, the main concern was interactions between file placement of the definition, and file placement of interface implementations. Interface implementations must generally be in API files in order to be seen by other libraries.
For example:
library "i";
interface I {}
library "e";
import library "i";
extern library "o" class C;
extern library "o" impl C as I;
library "o";
import library "e";
extern class C { }
extern impl C as I;
impl library "o";
extern impl C as I { }
If the definition is required to be in the API file in order to allow the interface implementations in the API file, the API file would need to import libraries required to construct the definition. That could create issues for separation of build dependencies, and could also make it more difficult to unravel some dependency cycles between libraries.
If the definition was allowed to be in the implementation file even when there
were interface implementations in the API file, the ambiguity of seeing a
non-owning extern library
declaration and being unsure of whether this was the
owning library could have negative consequences for evaluation of interface
constraints.
The purpose of allowing a forward declaration when there is a non-owning
extern
declaration is to make it clear for interface implementations that they
exist in the owning library, while processing the API file.
The four declarations would be:
- Non-owning
extern library
declaration - Forward declaration in API file
- Forward declaration in implementation file
- Definition
The number of forward declarations allowed is consistent with the current state from #3762.
This would allow for clarity when defining in the implementation file, to also be able to put a forward declaration above -- even when the forward declaration is pulled from the API file.
If we're allowing declarations from another file (including the non-owning
extern library
declaration) to be used before an entity is declared in the
same file, the motivating factor for allowing a repeat forward declaration in an
implementation file is removed. Previously, that was required for an entity to
be referenced prior to its definition.
In discussion of this option, it was considered unclear why we would allow two forward declarations, but not allow even more. The more popular choice seemed to be not restricting, which was also declined.
Instead of requiring an extern
modifier on owning declarations, we could infer
from the presence of a non-owning extern library
declaration.
We had declined allowing a definition to control whether extern library
was
allowed in discussion of
#3762, although this
is not directly mentioned in the proposal. At the time, it was dropped because
the owning library didn't need to include extern
declarations, and so having
the definition opt-in to allowing extern
was viewed as low benefit. However,
now that the owning library must import the extern
declaration, there is a
tighter association and so we reevaluated.
The extern
modifier offers a benefit for being able to verify the association
between non-owning and owning declarations, and offers additional parity in
modifiers. It also makes it easy for a tool to know if it's missing a
declaration.
At present, we require extern
on all owning declarations. We could instead
only require extern
on the first owning declaration and, if there's a separate
forward declaration and definition, infer it for the definition. For example:
// `extern` on the forward declaration.
extern class C;
// Infer `extern` for the definition.
class C {}
The decision to require extern
on all owning declarations is based on wanting
the forward declaration to be optional. A rule of thumb was discussed wherein if
a forward declaration could be removed without breaking the definition (as
defined by it being in the same lexical scope), keywords should be duplicated to
the definition. This is not proposed as a rule because it's not clear whether
we'll generally follow it, but it's why this particular choice is taken.
At present, an extern
modifier on an owning declaration serves two purposes:
- Indicates that a non-owning
extern library
declaration can exist. - Indicates the declaration must be directly imported in order to be complete.
This means that:
- The presence of
extern
on an owning declaration cannot be used to determine whether a non-owning declaration exists.- Because the location of a non-owning declaration isn't explicit in the owning code, this may lead to a developer failing to find the non-owning declaration and misunderstanding that as the non-existence of a non-owning declaration.
- Libraries which happen to be imported by the owning declaration may freely
add or remove non-owning
extern library
declarations without modifying the owning library.
We could give distinct syntax to the two purposes, so that they could be managed separately. The preference at present is to use a single syntax for both purposes, rather than emphasizing control or correspondence.
Issue #3986
discussed other syntaxes for extern
+ extern library
. These were mainly
has_extern
/is_extern
/externed
+ extern
.
Breaking down extern
, there are two features which could have been provided
separately:
- Declaring an entity has a forward declaration in a separate library.
- Also, declaring that forward declaration in a separate library:
extern library "<owning_library>"
.
- Also, declaring that forward declaration in a separate library:
- Declaring an entity must be imported directly.
Although (1) must depend on (2), a different design could provide (2) without
making (1) possible, for example with different keywords to differentiate
between intended usage (has_extern class C;
meaning (1) and (2), must_import
meaning (2) only). However, the extern
keyword approach means developers have
all or nothing.
Considering that, the trade-offs are viewed as:
- The primary motivation is to provide feature (1).
- Leads wanted a syntax on the owning declaration that states something positive about the owning declaration itself, rather than expressing that other declarations exist, which suggests that the syntax on the owning declaration should provide feature (2).
- Leads consider it valuable, though secondary, to support (2) separate from
(1), and find it acceptable to make (1) optional to achieve this (in other
words, making the
extern library "<owning_library>"
declaration optional).- It's okay that that
extern library "<owning_library>"
can be added and removed from imported libraries without modifying the owning library. - If a developer considers it important to disambiguate the intended use
of a declaration
extern class C;
and whether there should be a declaration in a separate library, they can add comments.
- It's okay that that
extern
seemed like an acceptable name for this approach, and alternative names seemed significantly less good.- Using
extern
for both features still only creates one new keyword, versus multi-keyword approaches. - Adding the owning library with
extern library "<owning_library>"
will hopefully improve diagnostics and human understandability of the code.- It is very verbose, but this verbosity goes on the forward declaration in the non-owning library. When it's read, which will hopefully be less often than the actual declaration, it will provide the reader directions to find the actual declaration.
- If in practice we find the verbosity becomes a significant issue, we can
revisit syntaxes to address that specifically. For example, if we have
significant repetiton, we might consider a grouping structure such as
extern library "..." { <many forward declarations> }
.
We expect there will be types that have extern
members; these types are only
truly complete if their members are complete.
We discussed having such types automatically re-export the extern
members,
possibly requiring the types to also be extern
in order to be allowed to have
extern
members. For example:
library "a"
extern library "b" class A;
library "b"
import library "a"
extern class A {}
// B re-exports A so that it's complete on use.
class B { var a: A; }
library "c"
import library "b"
// Importing this function declaration gets B, which again, re-exports A so that
// it's complete on use.
fn F() -> B { ... }
library "d"
// This import loads the incomplete name for A.
import library "a"
// This import loads F, which loads B, which loads the definition of A.
import library "c"
// Because of the import behaviors, this is valid.
var a: A;
We consider this action-at-a-distance. Type coherency means the A
member of
B
is the same as the A
in name lookup; we could make them behave slightly
differently, but then we get into provenance tracking of type information.
Several various forms of this have been discussed as part of the extern
design, and it's something we've decided to avoid.
Although it's more inconvenient, we will require A
to be deliberately imported
in order for B
to be complete.
We will not require syntactic matching for extern library
declarations, but we
could.
When a redeclaration is in the same library, we've designed name lookup in a way such that syntactic matching is effectively a superset of semantic matching. However, that relies on poisoning entries in name lookup, with later redeclarations seeing identical name lookup data. Because different libraries have different name lookup data, syntactic matching not a superset of semantic matching cross-library. We address this schism by only requiring semantic matching.
Semantic matching will include parameter names. The difference is primarily in whether different ways of producing the same type information are considered invalid or not.
For example:
library "a";
class A {}
namespace NS;
extern library "c" fn NS.F() -> A;
library "b";
namespace NS;
class A {}
library "c"; import library "a" import library "b"
extern fn NS.F() -> NS.A {}
Semantically, NS.F
in libraries "a" and "c" are identical. Syntactically, they
differ because of NS.A
in "c". Writing A
in "c" is invalid because it would
use NS.A
from "b". But in "a", there is nothing to make the declaration
invalid: it would only be invalid after completing cross-library compilation.
However, we could also have code such as:
library "d";
class D {}
namespace NS;
extern library "e" fn NS.G() -> D;
library "e";
namespace NS;
alias NS.D = D;
extern fn NS.G() -> D {}
Here, the semantics and syntax match, but this would be invalid in a normal
redeclaration due to the different name lookup result for D
.
This additionally gets into a different statement made in
#3763 to justify
synactic matching: "The intention is that whenever the syntax matches, the
semantics must also match." Due to the differences in name lookup, syntax
matching does not mean semantics must match; instead of alias NS.D = D;
, that
could have been alias NS.D = i32;
and the syntax would have still matched.
This only works in a library because "...we persist syntactic information from
the API file to implementation files." We cannot persist syntactic information
cross-library, across imports.
Due to the differences in the guarantees that syntactic matching provides for
owned declarations versus non-owned declarations, we will not enforced syntactic
matching on the non-owned extern library
declarations.