dealing with nulls and zero #2532

llfletch · 2019-05-16T08:43:39Z

llfletch
May 16, 2019

There seems to be a problem with how this language deals with nulls and zeros.
I will try to clear this up by explaining what they mean and how they should be used.
Null basically means not applicable (N/A). If we have a field for paint cans, a null value would represent that we do not carry paint cans. So how many paint cans do we have? We have zero paint cans (we don't carry them). Whenever we have a null value, it means we have zero. A zero, however does not mean the value is null. We could sell paint cans if we had them, but there aren't any at the moment.
Logical values are special and they potentially have one of four values: null, zero, true, and false. If we were to ask if a certain word is usually a noun, the result could be stored in a logical value. If we said "true" then we believe that the word is usually a noun, if we said "false", then we believe that it is usually not a noun. "zero" would mean that we haven't decided and "null" would mean that the question wasn't relevant (and by definition that we haven't decided, since nulls are zeros also).
All variables can be converted to truth values. All variables potentially have a "zero" value and a "true" value. For numbers, positive numbers are "true", negative numbers are "false", and 0 is "zero". The "zero" value means that there is no valid data and the "true" value means there is valid data. For strings, empty strings and null strings, should have a "zero" truth value, otherwise they are "true".
Zero is what most variables should be initialized to.

Now that I have outlined the theory of zero, why is it relevant?
I will focus on the ? operator. Lets look at the following statement:

if (Employee?.Name == 0)

Obviously, if there is valid data for Employee.Name, the statement should be false.
If Employee.Name is "" (empty string) it should be true.
What if Employee is null?
A null value is a zero as is an empty string, so the statement should be true in that case.
Dealing with nulls and zeros appropriately should be able to fix some of the main problems with the ? operator.

I would like to get feedback on what other problems this theory might address.

Answered by YairHalberstadt

May 16, 2019

The meaning of null, zero, empty string, etc. are all highly dependent on the context.

As such unlike C++ and JS, C# requires you to be explicit about how you want to treat them.

Given the sheer scale of the disaster type coercion is in JS this is definitely the right decision. Consider the fact that the == operator is now essentially banned from many JS codebases, and all usages are replaced with === which has a much stricter definition of equality.

View full answer

YairHalberstadt · 2019-05-16T08:50:10Z

YairHalberstadt
May 16, 2019
Collaborator

The meaning of null, zero, empty string, etc. are all highly dependent on the context.

As such unlike C++ and JS, C# requires you to be explicit about how you want to treat them.

Given the sheer scale of the disaster type coercion is in JS this is definitely the right decision. Consider the fact that the == operator is now essentially banned from many JS codebases, and all usages are replaced with === which has a much stricter definition of equality.

0 replies

llfletch · 2019-05-16T09:00:05Z

llfletch
May 16, 2019
Author

Would there be a way to indicate that you want it to act this way, but have it act normally otherwise?

0 replies

YairHalberstadt · 2019-05-16T09:01:53Z

YairHalberstadt
May 16, 2019
Collaborator

Yes

if(string.IsNullOrEmpty(Employee?.Name))

0 replies

llfletch · 2019-05-16T09:15:50Z

llfletch
May 16, 2019
Author

You say that the definitions of null, zero, empty strings are highly dependent on the context, It seems to me that people should "sit down" and figure out exactly what they should mean so that there can be some sort of standard. Having things like that being "highly dependent on the context" is unacceptable. It basically means that we have no idea what it means except by how it happens to be used.

0 replies

YairHalberstadt · 2019-05-16T09:24:15Z

YairHalberstadt
May 16, 2019
Collaborator

We know exactly what an empty string means. It means that a string contains no characters.

What's highly dependent on context is what you do with an empty string.

0 replies

llfletch · 2019-05-16T09:26:13Z

llfletch
May 16, 2019
Author

Then we need to figure out how they should be used.

0 replies

YairHalberstadt · 2019-05-16T09:27:39Z

YairHalberstadt
May 16, 2019
Collaborator

They will be used differently depending on your application, and what the string represents.

0 replies

YairHalberstadt · 2019-05-16T09:29:48Z

YairHalberstadt
May 16, 2019
Collaborator

A string can represent a name, a collection of characters, a piece of text, a byte array etc. All of these cases will need to deal with empty strings differently.

An int can represent an age, the number of items in a collection, an Id, an offset, a port, or a thousand other things. Every single one of those scenarios treats 0 differently.

0 replies

llfletch · 2019-05-16T09:34:37Z

llfletch
May 16, 2019
Author

And yet zero should mean the same thing in almost every case. In almost every case it should be what the value is initialized to and it should mean that there is no valid data.

0 replies

llfletch · 2019-05-16T09:47:05Z

llfletch
May 16, 2019
Author

One can always say things depend on the situation, but there is value in finding commonalities and underlying patterns.

0 replies

YairHalberstadt · 2019-05-16T09:52:02Z

YairHalberstadt
May 16, 2019
Collaborator

Is 0 a valid age?

Is 0 a valid network port?

Is 0 a valid price?

Is 0 a valid weight?

The answer to every single one of those questions is different:

Yes.
Yes but it's special cased.
In some systems yes, in others no, depending on how you represent a free item.
Not if you're measuring something physical, but possibly yes if you're rounding, or measuring something non-physical.

0 replies

yaakov-h · 2019-05-16T10:11:13Z

yaakov-h
May 16, 2019

I'm gonna re-post something I put in another GitHub Issue here:

null is an abomination that should never have existed.

null is a lie that people ignore.

null is the computer saying "hey I have that data you want, it's at memory location zero" and all code checking for it saying "yeah, right, I don't believe you, if I go looking at memory location zero I will get my head bitten off."

Because, if it does, it will hit the first page of memory, which is reserved by the operating system, and trigger a page fault. In the CLR, this becomes a NullReferenceException, and in other runtimes you can just crash.

When it comes to most data structures, null is a legitimate nothing. There is nothing there. Not an empty thing, a nothing.

If we have a field for paint cans, a null value would represent that we do not carry paint cans.

If we have a field for paint cans, an empty array or empty Enumerable should represent that we do not carry paint cans. null doesn't mean that we have paint cans, or that we don't have paint cans, rather we have only laid a trap for the caller.

If it's a field for a singular paint can, then null could be indicative of no paint can, and this is it's generally accepted use. This does not mean zero.

A zero, however does not mean the value is null. We could sell paint cans if we had them, but there aren't any at the moment.

So if we have none, it could be zero, or it could be null, and if it's null then there are zero, but if there are zero it might not be null?

Using two different values interchangeably to mean the same thing is a 'fun' vector for bugs.

Logical values are special and they potentially have one of four values: null, zero, true, and false. If we were to ask if a certain word is usually a noun, the result could be stored in a logical value. If we said "true" then we believe that the word is usually a noun, if we said "false", then we believe that it is usually not a noun. "zero" would mean that we haven't decided and "null" would mean that the question wasn't relevant (and by definition that we haven't decided, since nulls are zeros also).

This sounds like you need an enum. There is no current type in the BCL that can reflect these four states.

This also reminds me of the tri-state bool.

All variables can be converted to truth values. All variables potentially have a "zero" value and a "true" value. For numbers, positive numbers are "true", negative numbers are "false", and 0 is "zero". The "zero" value means that there is no valid data and the "true" value means there is valid data.

So what are negative numbers then, if we've already used "zero" for invalid and "true" for valid?

Assigning roles like this may make sense in some limited domains, but this isn't something you can generalize to the language level.

For strings, empty strings and null strings, should have a "zero" truth value, otherwise they are "true".

"null strings" are not a thing. null has no type. If it's null then it's not a string; if it's a string then it's not null.

A lot of the standard library treats null as empty string when performing concatenation and formatting operations, but this does not mean that they are the same thing.

Zero is what most variables should be initialized to.

All variables are initialized to zero by default, and for reference types, that's a zero pointer, also known as null. An empty string is not a "zero".

0 replies

HaloFour · 2019-05-16T11:04:26Z

HaloFour
May 16, 2019

@llfletch

That C# makes you very explicitly deal with the differences in data types is not a problem. It is very intentional and based on the fact that languages that permit such implicit coercion and "truthiness" are loaded with bug vectors.

null is not 0 is not an empty string, nor should any of those things be confused with one another. C# is very intentionally trying to not be JavaScript.

0 replies

llfletch · 2019-05-16T17:18:47Z

llfletch
May 16, 2019
Author

Maybe the solution would be to allow a ~= operator that could make such coercion possible without sacrificing the type strictness of the nominal case.

0 replies

DavidArno · 2019-05-16T17:31:24Z

DavidArno
May 16, 2019

@llfletch,

As others have said, what you are proposing leads to some of the worst bugs in JavaScript, leading to many to ban the use of == in that language. Are you therefore able to supply a real-world example of where bypassing the protection from these bugs that C# offers us would be useful?

0 replies

HaloFour · 2019-05-16T20:19:44Z

HaloFour
May 16, 2019

@llfletch

Why is it so bad to have a new concept that there is either meaningful data in a variable or there is not meaningful data in a variable

We already have that concept. We can compare a variable to null or to 0 or to default or to whatever you consider to be not "meaningful data". The language doesn't have or need a way to do this across disparate data types or to consider 0 to be equivalent to null, "", [] or any other combination of values.

0 replies

CyrusNajmabadi · 2019-05-16T20:25:04Z

CyrusNajmabadi
May 16, 2019
Collaborator

Why is it so bad to have a new concept that there is either meaningful data in a variable or there is not meaningful data in a variable (whatever the variable may be).

How do you define what is meaningful? As i showed above it's domain specific. For some domains a string isn't meaningful if it's full of whitespace, in others it would be.

Is it not possible to make such a concept?

Not really no. I recommend you try as it might be very informative.

0 replies

llfletch · 2019-05-16T20:28:13Z

llfletch
May 16, 2019
Author

We do not have that concept. Remember, null, 0 and "" are different and distinct concepts. I disagree, the language does need such a concept and badly because if we don't have this concept, we end up compartmentalizing everything and we end up not being able to generalize to a higher level. By not being able to generalize to higher level concepts, we end up being stuck at a lower level of detail. One of the main problems with computing is having to do everything at a low level of detail and not being able to handle higher level abstractions. This slows down software development.

0 replies

HaloFour · 2019-05-16T20:34:56Z

HaloFour
May 16, 2019

Remember, null, 0 and "" are different and distinct concepts.

Yes, they are, and they should be.

If you want to treat them as equivalent, write a function to do so.

I disagree, the language does need such a concept and badly because if we don't have this concept, we end up compartmentalizing everything and we end up not being able to generalize to a higher level.

I disagree. Generalizing this concept and across data types is one of the leading causes of bugs in the languages that allow it. C# made the decision to not treat null like 0 like false specifically because of learning from the bad experiences from C and C++. Truthy/Falsy is a constant source of bugs in JavaScript.

If you want this generalization, write your own function.

This slows down software development.

I'll take slightly slower software development resulting from having to explicitly consider my data types over chasing down the inevitable flood of bugs any day of the week.

0 replies

llfletch · 2019-05-16T20:36:46Z

llfletch
May 16, 2019
Author

How do I define what is meaningful? It seems like it is domain specific, but it could be generalized possibly with some user preferences. White space is meaningful in general, but there could always be a preference or option to change this for a particular user or circumstance. Just because there are problems with the concept doesn't mean that it couldn't be done - if we allow it to be flexible.

0 replies

spydacarnage · 2019-05-16T20:37:34Z

spydacarnage
May 16, 2019

Remember, C# 8 is doing a lot to deal with the null problem - it's moving towards getting rid of them! The last thing we need is a new feature that makes them more prevalent again...

…

On Thu, 16 May 2019 at 21:35, HaloFour ***@***.***> wrote: Remember, null, 0 and "" are different and distinct concepts. Yes, they are, and they should be. If you want to treat them as equivalent, write a function to do so. I disagree, the language does need such a concept and badly because if we don't have this concept, we end up compartmentalizing everything and we end up not being able to generalize to a higher level. I disagree. Generalizing this concept and across data types is one of the leading causes of bugs in the languages that allow it. C# made the decision to not treat null like 0 like false specifically because of learning from the bad experiences from C and C++. Truthy/Falsy is a constant source of bugs in JavaScript. If you want this generalization, write your own function. This slows down software development. I'll take slightly slower software development resulting from having to explicitly consider my data types over chasing down the inevitable flood of bugs any day of the week. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/dotnet/csharplang/issues/2532?email_source=notifications&email_token=ADIEDQKMS7LNGW7SIV3UDZLPVXAQPA5CNFSM4HNKIUCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVS7SXA#issuecomment-493222236>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADIEDQOCNCYAZ7M6TXWGW5DPVXAQPANCNFSM4HNKIUCA> .

0 replies

CyrusNajmabadi · 2019-05-16T20:49:18Z

CyrusNajmabadi
May 16, 2019
Collaborator

We do not have that concept. Remember, null, 0 and "" are different and distinct concepts.

I disagree, the language does need such a concept and badly because if we don't have this concept, we end up compartmentalizing everything and we end up not being able to generalize to a higher level. By not being able to generalize to higher level concepts, we end up being stuck at a lower level of detail. One of the main problems with computing is having to do everything at a low level of detail and not being able to handle higher level abstractions. This slows down software development.

What is your concept? Please explain it fully.

You have also avoided the points i've made. Such as the fact that whether or not data is meaningful is unrelated to whether or not it is null, empty or 0. For example, the empty string is very meaningful in many domains. But it's completely non-meaningful in others. How do you create your concept such that it deals with this fact?

0 replies

CyrusNajmabadi · 2019-05-16T20:50:56Z

CyrusNajmabadi
May 16, 2019
Collaborator

White space is meaningful in general, but there could always be a preference or option to change this for a particular user or circumstance.

Yes. We have a way to specify that preference. But using functions like string.IsWhitespace(...). How would you encode all the preferences for strings? What if printable characters are meaningful for one person, but not for another? What if special characters are meaningful for one, but not another. What if only US phone numbers are meaningful for one, but not hte other? What if all phone numbers are meaningful for one but not the other?

etc. etc. etc.

How are you actually going to provide a way for people to specify hte infinite number of meanings of what is "meaningful" or not?

0 replies

llfletch · 2019-05-16T21:20:41Z

llfletch
May 16, 2019
Author

I will try to clarify what I meant be meaningful data.
For strings, null or "" would not be meaningful, but "Jason", or "Network" would be. Obviously if we have just white space that is a problem that could be addressed with preferences, But if there are any letters, numbers or special characters, they are meaningful (even if they are incorrect or just noise).
For boolean types, false or zero or null (if we have that option) is not meaningful by default, but true values are. There is a problem with this because of the overloading of meaning of false that we want to believe that the question is not true or that we don't know. The default should be that only true is meaningful in this concept with perhaps some options to specify how they want to deal with this if they want to deal with it in another way.
For number types, positive numbers should always be meaningful and zero should not be meaningful, and negative numbers should default to being meaningful.
For arrays, if any of the array elements are meaningful then the whole array is meaningful.
For variables based on enum types, if the type is null (if that is possible) or can be converted to a zero number as is the case with the first element normally the enum type should by default be not meaningful, otherwise it is meaningful. I realize this may not be the behavior every one wants, but that should be the default.
Yes there are a whole slew of preferences that could be considered and would complicate things greatly if they were implemented, but the idea is to keep things relatively simple so that most thing act the way we would want and if they don't they can always be addressed the way they are now.

0 replies

CyrusNajmabadi · 2019-05-16T21:33:49Z

CyrusNajmabadi
May 16, 2019
Collaborator

"" would not be meaningful

But empty strings are very meaningful in many domains.

For boolean types, false or zero or null (if we have that option) is not meaningful by default

boolean types only have true or false. Why would it be a good idea to add zero or null into that mix?

For number types, positive numbers should always be meaningful and zero should not be meaningful

Why is zero not meaningful? What about all the domains where 0 is a perfectly expected and natural value?

--

Regardless, here's your solution:

public static bool IsMeaningful(this object o)
    {
        switch (o)
        {
            case string s: return s.Length > 0;
            case bool b: return b;
            case null: return false;
            case Array arr: return arr.OfType<object>().Any(a => a.IsMeaningful());
            default:
                try { return Convert.ToInt64(o) != 0; } catch { }
                try { return Convert.ToUInt64(o) != 0; } catch { }
                return false;
        };
    }

You don't need a language helper for this. You can just define your helper and put it on nuget and people can use it if they want.

0 replies

CyrusNajmabadi · 2019-05-16T21:45:13Z

CyrusNajmabadi
May 16, 2019
Collaborator

so that most thing act the way we would want

Here's teh crux of the problem: "the way we would want" is person, team, group and domain dependent.

I absolutely do think of empty strings as meaningful in my domains. 0 is also meaningful. Arrays have no concept of meaningfulness for me whatsoever. etc. etc.

You keep saying this as if there's consensus in the c# ecosystem about this, but there isn't. And there's broad belief that languages that have done similar things (i.e. JavaScript) screwed up here as this is not a good concept to have in the language. C# very intentionally and explicitly has not done this because it's actively felt to be a bad enough idea by enough of the LDM and enough of the community.

0 replies

llfletch · 2019-05-16T22:22:00Z

llfletch
May 16, 2019
Author

It is a shame no one can come to a consensus. I believe the idea is good, but obviously the people that have tried similar things like JavaScript have failed to get it right and have spoiled people's opinion of trying such things. Obviously you are right "the way we want" is person, team, group and domain dependent, but there is also common threads or ideas that could make for a new concept or standard if people were open to it. Thanks everyone for listening to this failed idea because it has helped me to understand the problems we face.

0 replies

CyrusNajmabadi · 2019-05-17T00:14:27Z

CyrusNajmabadi
May 17, 2019
Collaborator

Good luck with your journey! As i mentioned above @llfletch: i highly recommend you actually try this out and create a library you think woudl be valuable here. You can even host that library on nuget, and if it's positively viewed by the community, will get used in more and more places. If it turns out that you, along with the community, can form a good set of rules here, it's possible in the future that thoughts on this might be different.

0 replies

llfletch · 2019-05-17T01:54:56Z

llfletch
May 17, 2019
Author

That is exactly what I plan to do. Thank you. Sent from my Verizon, Samsung Galaxy smartphone

…

-------- Original message -------- From: CyrusNajmabadi <[email protected]> Date: 5/16/19 5:14 PM (GMT-07:00) To: dotnet/csharplang <[email protected]> Cc: llfletch <[email protected]>, Mention <[email protected]> Subject: Re: [dotnet/csharplang] dealing with nulls and zero (#2532) Good luck with your journey! As i mentioned above @llfletch<https://github.com/llfletch>: i highly recommend you actually try this out and create a library you think woudl be valuable here. You can even host that library on nuget, and if it's positively viewed by the community, will get used in more and more places. If it turns out that you, along with the community, can form a good set of rules here, it's possible in the future that thoughts on this might be different. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://github.com/dotnet/csharplang/issues/2532?email_source=notifications&email_token=AJFPOKGKDMEQMSVPZXQVXGDPVX2H7A5CNFSM4HNKIUCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTMBTY#issuecomment-493273295>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJFPOKGAPUGMQOGE6TGF6FTPVX2H7ANCNFSM4HNKIUCA>.

0 replies

DavidArno · 2019-05-17T08:07:55Z

DavidArno
May 17, 2019

@llfletch,

For number types, positive numbers should always be meaningful and zero should not be meaningful...

You have a goal of trying to define a common set of principles around equivalence between different values. But that is not possible as there is no common set of principles. There literally can't be because domains vary so much.

And there's no better example of this than the exit status from commands in Linux. Zero means success. Non-zero means there was an error. This even embodied in two bash commands, true (returns 0) and false (returns 1, as an arbitrary non-zero value). So the moment my C# app running on Linux collides with your ideas on what is and isn't meaningful and what is and isn't equivalent, all hell breaks loose.

Zero absolutely is meaningful, sometimes. And that's the problem: for any rule you invent, sometimes that rule will be plain wrong for a given domain. "" is data sometimes; sometimes it's an absence of data. The same applies to 0, null, whitespace, comments, -1 etc etc. I might have a system where 0-99 are valid values, and so -n and 100+ all mean "no value". Do we really want some configurable system of "roughly equal to" that would therefore treat 100 ~= false as a true statement? And if it's not configurable, what use is it to anyone that strays from some completely arbitrary set of rules on rough equivalence?

0 replies

dealing with nulls and zero #2532

Uh oh!

Replies: 41 comments

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

Uh oh!

YairHalberstadt May 16, 2019 Collaborator

Uh oh!

Uh oh!

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

Uh oh!

Uh oh!

CyrusNajmabadi May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

Uh oh!

CyrusNajmabadi May 16, 2019 Collaborator

Uh oh!

CyrusNajmabadi May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

Uh oh!

CyrusNajmabadi May 16, 2019 Collaborator

Uh oh!

CyrusNajmabadi May 16, 2019 Collaborator

Uh oh!

llfletch May 16, 2019 Author

Uh oh!

CyrusNajmabadi May 17, 2019 Collaborator

Uh oh!

llfletch May 17, 2019 Author

Uh oh!

Uh oh!

YairHalberstadt
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

YairHalberstadt
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

YairHalberstadt
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

YairHalberstadt
May 16, 2019
Collaborator

YairHalberstadt
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

llfletch
May 16, 2019
Author

YairHalberstadt
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

CyrusNajmabadi
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

llfletch
May 16, 2019
Author

CyrusNajmabadi
May 16, 2019
Collaborator

CyrusNajmabadi
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

CyrusNajmabadi
May 16, 2019
Collaborator

CyrusNajmabadi
May 16, 2019
Collaborator

llfletch
May 16, 2019
Author

CyrusNajmabadi
May 17, 2019
Collaborator

llfletch
May 17, 2019
Author