Description
I often get asked why when you perform a type-test the variable doesn't magically get that type inside the If
.
Dim obj As Object = "A string"
If TypeOf obj Is String AndAlso obj.Length > 0 Then
obj.StringStuff()
End If
There's a list of reasons why it's not that simple but I think I have a design that addresses them.
Back-compat
This is worth burning an Option
statement on. When we added local type inference it would have been a breaking change so we added Option Infer
for back-compat reasons, leaving it Off
on project upgrade but On
for new projects.
Caveats
- Only works on non-static local variables and value (
ByVal
) parameters.
This avoids problems where a property would return a different object of a different type on subsequent invocations, or a field or ByRef
parameter is mutated on a different thread or even the same thread. Given that both the current pattern of first TryCast
ing the value into a local and then testing it for null, as well as pattern matching also require this it's not a detractor vs alternatives.
How does it work under the hood?
I think of it like a leaky binder. When you have constructs which have boolean conditionals there's an opportunity for a binder (context) to "leak" out either on the "true path" or the "false path" depending on the operators involved. In this context there's a sort of shadow variable with the same name as the variable being tested with the type of the type test.
So, for example take the expression TypeOf obj Is String AndAlso obj.Length > 0 OrElse obj Is Nothing
the binder "leaks" into the right operand of the AndAlso
operator so in that context 'obj' refers to the String typed 'obj' variable, not the original one. It doesn't leak into the right side of the OrElse
because that's not on the "true path". By contrast in the expression TypeOf obj IsNot String OrElse obj.Length = 0
the binder does leak into the right hand of the OrElse
because TypeOf ... IsNot ...
leaks on the "false path".
This is what lets guard statements work:
If TypeOf obj IsNot String Then Throw New Exception()
' obj has type 'String' here.
The "scope" of the binder is everything after the If
statement (within the same block). This means that within that scope overload resolution will always treat obj as a String
.
This leaking has to apply to the short-circuiting logic operators, the ternary conditional operator, If
, Do
, and While
statements and maybe When
clauses on exceptions. So, for example:
' This code has a bug in it, I know.
' Or maybe this should have been 'Do While TypeOf node IsNot StatementSyntax'
Do Until TypeOf node Is StatementSyntax
node = node.Parent
Loop
' At this point, node has the type StatementSyntax.
This all happens during "initial binding"; it's not based on flow-analysis.
What about Where
clauses in queries?
We can go one of two ways.
-
You only get the strong typing within the where if the expression is joined with a boolean operator or conditional because we can't know if the
Where
clause actually executed the lambda and the use of this feature should never result in exceptions. -
We could translate the
Where
into aLet
, aWhere
, and then aSelect
. It's a big of a stretch but we're already doing magic on this feature so...
Does it automagically upcast?
This doesn't happen if the type test would widen the type of the variable so:
Dim str As String = ""
If TypeOf str Is Object Then
' str is NOT reduced to 'Object' here.
End If
What if the same variable is tested multiple times?
The types are intersected. We actually support intersection types in generic methods today when a type parameter has multiple constraints. It's the one place in the language where you can say something is an IDisposable
AND an IComparable
so we should follow all the same rules there.
What about value types?
The idea is that this feature creates strongly typed aliases to objects. So the scenario for testing for a value type necessarily requires a boxed value type on the heap. Today when you unbox a value type from the heap we immediately copy it into a local variable so that any mutation to the copy doesn't change the value on the heap. For this feature we want to preserve the idea that it's just a strongly-typed reference, not a copy, and IL lets us do this. The unbox
IL instruction actually pushes a managed reference on the stack. Instead of copying the value type we can copy this reference into a "ref local` (and this would be transparent to the user) so a mutation to that value either through say an interface method or the typed value will be consistent. It's critical to preserve identity.
Are the variables immutable?
No. But here are the rules for mutation:
-
Within that scope you can assign the variable a value of the same type or more derived as long as the invariants at that point aren't broken. Under the hood we'd have to reassign every alias up to that point, I guess.
-
You can also assign things of a wider type (anything assignable to the original variable). This does not cause an implicit narrowing conversion. Instead, from that point it's illegal to use that variable in a manner which relies on the type guard having succeeded. That's where flow analysis comes in. So even if you re-assign an
Object
variable which has been promoted to aString
variable with anInteger
value you can still use it like anObject
. It's just that any code which used it like aString
, including overload resolution, type inference, member accesses, etc, will report an error.
Dim obj As Object = ""
If TypeOf obj Is String Then
Console.WriteLine(obj) ' Calls String overload.
GC.KeepAlive(obj) ' Calls ' Object overload. No error.
obj = 1
GC.KeepAlive(obj) ' Calls ' Object overload. No error.
Console.WriteLine(obj) ' Still calls String overload but reports an error.
End If
This way flow analysis doesn't have to feed type information back into initial binding. It sort of works on the idea that the Object
alias of obj
gets re-assigned, but the String
alias of obj
becomes unassigned. So flow analysis just tracks reads of String
that are unassigned. In theory one could reassign the String
alias of obj
to fix this. And any usage of obj
and an Object
(e.g. by calling members of Object
or implicit widening conversion) really reads from the Object
alias so doesn't count as a read from unassigned.
The solution in this situation is either to remove the write to the variable, re-guard the code that requires obj to be String
, or explicitly cast obj to Object
. While all of those workarounds seem ugly they're also the only legitimate code to write in those situations.
This idea that flow analysis reports an error rather than "downgrading" the type is super important to avoid silently changing the meaning of code with shadowed members:
Class C
Public Shadows ToString As Integer = 5
End Class
Dim obj As Object = New C
If TypeOf obj Is C Then
Console.WriteLine(obj.ToString) ' Calls Integer overload.
obj = New Object
' Still calls Integer overload but reports an error.
' Doesn't silently start calling Object overload when you
' add the line of code above.
Console.WriteLine(obj.ToString)
End If
What about Goto
s?
The same asignment analysis applies. If the reference is reachable at a point where the alias is unassigned an error is reported and the same solutions apply:
Dim str As Object = ""
If TypeOf str Is String Then
1:
Console.WriteLine(str) ' Error reported.
End If
Goto 1
Does an assignment cause re-inference if a narrower type is assigned?
That would be madness. We should discuss it!
What about Select Case
on type?
I've always thought of the principle function of Select Case
being to use the same "left" operand for multiple tests without repeating the name over and over. So if TypeOf
is the operator, the natural syntax for Select Case
would look like applying it multiple times.
Select Case TypeOf obj
Case Is String
' obj has String type here.
Case Is Integer
' obj has Integer type here.
End Select
Or
Select Case obj
Case TypeOf Is String
' obj has String type here.
Case TypeOf Is Integer
' obj has Integer type here.
End Select
The advantage of the first form is it has a little less repetition of the TypeOf
keyword and reads very straightforwardly--"What's the syntax in VB for doing a Select Case on the type of an object?" Select Case TypeOf obj
.
The advantage of the second form is it doesn't put Select Case
into any special mode and so you can still use all the other kinds of Case
clauses in the same block. I don't know how often that's actually a scenario though.
Both forms reuse a concept already in the language (TypeOf
) and don't add a whole new thing (Match
) for a common scenario. In a lot of ways the Case s As String
design was a consolation prize to semantics like this.
How would this work in the IDE?
I imagine we'd use a slightly different classification to indicate that the variable is "enhanced" at that point in the program. So let's say your identifiers are black by default, in a region where the type has been re-inferred it'll be purple. Then, if you loose the enhancement somehow it'll go back to black. Maybe if you hover over it the quick type will say something like "This variable has been enhanced with String
type and can be used like a string here." or something.
Summary
I think this is the most "Visual Basic" feature ever! It's very "Do what I mean" and is fairly intuitive. The last time a developer asked me why when he checks the type it doesn't automatically get that type and I sat down to write a whoe blog essay about all the technical reasons that won't work and for VB, as much as we can, it's nice to avoid a first-time programmer needing to read an essay from some compiler nerd about threading and overload resolution and shadowing (like what are any of those things?) to explain why their very reasonable intuition doesn't work.
And this is nothing particularly innovative or out there; this is actually how TypeScript and other languages work already.
I also like the idea of rehabilitating the very readable TypeOf
operator which I've felt has suffered a lot since the introduction of TryCast
. It's like TypeOf
is so self-explanatory but we have this sort of inside baseball gotcha that "Ah-ha, FxCop will tell you that really TypeOf
uses the isinst
instruction which pushes a casted value on the stack and checks it for null so doing a castclass
after that is really just casting twice so you shouldn't do it and instead you should use the TryCast
operator and check for null for performance or FxCop and people on forums will laugh at you--THEY'RE ALL GOING TO LAUGH AT YOU!". From the same folks who brought you "Ah-ha! Lists start with 0 here because of pointer arithmetic :)"