From 9d6c3248bbf3b89df56bc79edb2c184123cfb2fd Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Thu, 22 Sep 2016 17:02:38 +0200 Subject: [PATCH 1/9] Create const-data-class.md --- proposals/const-data-class.md | 150 ++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 proposals/const-data-class.md diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md new file mode 100644 index 000000000..edb2ac7da --- /dev/null +++ b/proposals/const-data-class.md @@ -0,0 +1,150 @@ +# Const modifier for data classes + +* **Type**: Design proposal +* **Author**: Salomon BRYS +* **Contributors**: Salomon BRYS +* **Status**: Under consideration +* **Prototype**: Not started + +## Feedback + +Discussion of this proposal is held in [this issue](TODO). + +## Synopsis + +Kotlin data classes are natural candidates for `HashMap` composite keys. + +Kotlin's data classes can contain either `var` or `val` values. +This mutability is, of course, a welcomed liberty but it prevents the `hashcode` value of the data class to be cached. + +A very simple benchmark (see appendix 1) has shown that the simple optimization of **caching the hashcode value** to divide by 3 the access time of a simple 2-layer data class (See appendix 2). + +The [issue KT-12991](https://youtrack.jetbrains.com/issue/KT-12991) introduces the idea to access the `hashcode` function generated for the data class even if the function is overridden. +This would allow the programmer to manually cache the `hashCode` of a data class but there would be no guarantee of the validity of such a measure other than *convention* (e.g. the programmer could cache the result of the hashcode even if the data class contains a `var`). + +We propose the notion of `const data class` that severely limits the possibility of such classes but enables the compiler to generate a `hashcode` function with cached result with the *guarantee* that the result will always reflect the data (e.g. the data can *never* change). + +Note that this proposal does NOT substitutes itself to KT-12991 as manually caching can still be very usefull when using data classes that do not (or cannot) comply with const data classes limitations. + +## Const data class limitations + +A `const data class` has the same limitations as a `data class` with the following additions: + +- It can only contain `val` constructor values. +- It can only contain constructor values of type: + - Primitive + - String + - Const data class +- Its `hashcode` and `equals` functions cannot be overridden. + +## Compiler implementation + +#### Definition + +The `toString` method is unaffected by the `const` modifier. + +The `hashCode` is generated only once and then cached. Each time the function is called, it first checks whether the hash code has already been generated and returns it if it has. + +The `equals` function checks `hashcode()` equality *before* checking any other equality. +Because the hash is most likely already cached, this enables to fail fast. +*(Note: this assertion should be statistically checked: if most `equals` function call succeed, this will slightly slow down execution instead of speeding it up)*. + +#### Example + +See appendix 1 implementation of `Optimized` class that features optimized `hashcode` and `equals` functions. + +## Alternative approaches + +#### Annotation `@CachedHashcode` + +This annotation would be allowed only on data classes. +As stated in the synopsis, there would be no *guarantee* that the data class is effectively constant, and that the cached hash code do represents the current state of the data class. + +#### Manually caching the result + +See [issue KT-12991](https://youtrack.jetbrains.com/issue/KT-12991). +This would allow the programmer to achieve the same result but, again, with no strong constant guarantee, other of course than *convention*. + +## Arguments against this proposal + +- This is an optimization that can be easily implemented by the programmer (at the cost of some boilerplate code that can be reduced with KT-12991). + +## Appendix + +#### 1: Benchmark code + +```kotlin +import java.util.* + +data class Person(val firstName: String, val lastName: String) + +@Suppress("EqualsOrHashCode") +data class Optimized(val id: Int, val person: Person) { + private var _hashcode = 0; + override fun hashCode(): Int{ + if (_hashcode == 0) + _hashcode = 31 * id + person.hashCode() + return _hashcode + } + override fun equals(other: Any?): Boolean{ + if (this === other) return true + if (other !is Optimized) return false + + if (hashCode() != other.hashCode()) return false + if (id != other.id) return false + if (person != other.person) return false + + return true + } +} + +data class Standard(val id: Int, val person : Person) + +const val ITERATIONS = 10000000 + +inline fun time(name: String, f: () -> Unit) { + val start = System.currentTimeMillis() + f() + val time = System.currentTimeMillis() - start + println("$name: $time") +} + +fun main(args: Array) { + val s = HashSet() + val o = HashSet() + + val person = Person("Salomon", "BRYS") + + for (i in 0..ITERATIONS) + s.add(Standard(i, person)) + + for (i in 0..ITERATIONS) + o.add(Optimized(i, person)) + + time("Standard") { + s.forEach { + if (!s.contains(it)) + throw IllegalStateException("WTF?!?!") + } + } + + time("Optimized") { + o.forEach { + if (!o.contains(it)) + throw IllegalStateException("WTF?!?!") + } + } +} +``` + +Note: The `Person` class should feature the same optimizations as the `Optimized` class to conforms to const data class limitations. +In this benchmark, however, it does not affect the results (it would affect them a lot if we benchmarked puts). + +#### 2: Benchmark result + +I've run the benchmark multiple times on my PC through IDEA and found consistent results (Linux Mint, Oracle JVM 8.0.101). + +``` +Standard: 582 +Optimized: 191 +``` From c219f61b0855464437c07ef13fafd2e1a1be1a06 Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Thu, 22 Sep 2016 17:09:03 +0200 Subject: [PATCH 2/9] Update const-data-class.md --- proposals/const-data-class.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index edb2ac7da..f5d191f51 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -68,6 +68,8 @@ This would allow the programmer to achieve the same result but, again, with no s ## Arguments against this proposal - This is an optimization that can be easily implemented by the programmer (at the cost of some boilerplate code that can be reduced with KT-12991). +- This optimization is useful *if and only if* keys to `HashMap` & `HashSet` are *reused*. + Recreating the object everytime (e.g. `map.contains(Key(1, 2))`) renders the optimisation completely useless. ## Appendix From 4db235256dff7ed466322cc8d90b8ba341a21f33 Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Thu, 22 Sep 2016 17:09:52 +0200 Subject: [PATCH 3/9] Added the issue reference for const-data-class --- proposals/const-data-class.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index f5d191f51..7fad29783 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -8,7 +8,7 @@ ## Feedback -Discussion of this proposal is held in [this issue](TODO). +Discussion of this proposal is held in [this issue](https://github.com/Kotlin/KEEP/pull/51). ## Synopsis From e2be487c4680f28442f630e26be5bdd068cc93d5 Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Mon, 26 Sep 2016 11:17:00 +0200 Subject: [PATCH 4/9] English grammar --- proposals/const-data-class.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index 7fad29783..7cdadcd9e 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -51,14 +51,14 @@ Because the hash is most likely already cached, this enables to fail fast. #### Example -See appendix 1 implementation of `Optimized` class that features optimized `hashcode` and `equals` functions. +See appendix 1 implementation of the `Optimized` class that features optimized `hashcode` and `equals` functions. ## Alternative approaches #### Annotation `@CachedHashcode` This annotation would be allowed only on data classes. -As stated in the synopsis, there would be no *guarantee* that the data class is effectively constant, and that the cached hash code do represents the current state of the data class. +As stated in the synopsis, there would be no *guarantee* that the data class is effectively constant, and that the cached hash code does represent the current state of the data class. #### Manually caching the result @@ -80,7 +80,6 @@ import java.util.* data class Person(val firstName: String, val lastName: String) -@Suppress("EqualsOrHashCode") data class Optimized(val id: Int, val person: Person) { private var _hashcode = 0; override fun hashCode(): Int{ From 1655155ca7f5876da787861b928cfd3368902ce0 Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Sat, 10 Dec 2016 08:37:05 +0100 Subject: [PATCH 5/9] Updated proposal following @voddan comments --- proposals/const-data-class.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index 7cdadcd9e..1efd2aeea 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -22,11 +22,13 @@ A very simple benchmark (see appendix 1) has shown that the simple optimization The [issue KT-12991](https://youtrack.jetbrains.com/issue/KT-12991) introduces the idea to access the `hashcode` function generated for the data class even if the function is overridden. This would allow the programmer to manually cache the `hashCode` of a data class but there would be no guarantee of the validity of such a measure other than *convention* (e.g. the programmer could cache the result of the hashcode even if the data class contains a `var`). -We propose the notion of `const data class` that severely limits the possibility of such classes but enables the compiler to generate a `hashcode` function with cached result with the *guarantee* that the result will always reflect the data (e.g. the data can *never* change). +We propose the notion of `const data class` that severely limits the possibility of such classes but enables the compiler to generate `hashcode` and `toString` functions with cached values with the *guarantee* that the output will always reflect the data (e.g. the data can *never* change). Note that this proposal does NOT substitutes itself to KT-12991 as manually caching can still be very usefull when using data classes that do not (or cannot) comply with const data classes limitations. -## Const data class limitations +## Const data classes + +### limitations A `const data class` has the same limitations as a `data class` with the following additions: @@ -37,13 +39,23 @@ A `const data class` has the same limitations as a `data class` with the followi - Const data class - Its `hashcode` and `equals` functions cannot be overridden. +### Auto-detection by the compiler + +The compiler could detect that a data class complies with all restrictions and apply silently the optimization if it does. +Using the `const` keyword would then force the programmer to enforce said limitations. + +### Keyword + +The `const` keyword used in this proposal is proposed because of its semantic. +It can easily be replaced by a compiler annotation (such as `@CachedHashcode`). + ## Compiler implementation #### Definition -The `toString` method is unaffected by the `const` modifier. +The `toString` value is generated only once and then cached. Each time the function is called, it first checks whether the value has already been generated and returns it if it has. Note that the `toString` method can be overridden. -The `hashCode` is generated only once and then cached. Each time the function is called, it first checks whether the hash code has already been generated and returns it if it has. +The `hashCode` value is generated only once and then cached. Each time the function is called, it first checks whether the hash code has already been generated and returns it if it has. The `equals` function checks `hashcode()` equality *before* checking any other equality. Because the hash is most likely already cached, this enables to fail fast. From 7716f23b6d8e60740f6b9e7a15230ffdfb6f7525 Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Sat, 10 Dec 2016 09:28:34 +0100 Subject: [PATCH 6/9] Cached data must be @Transient --- proposals/const-data-class.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index 1efd2aeea..d8f764c3f 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -57,6 +57,8 @@ The `toString` value is generated only once and then cached. Each time the funct The `hashCode` value is generated only once and then cached. Each time the function is called, it first checks whether the hash code has already been generated and returns it if it has. +Both cached values are marked `@Transient` to prevent them to be serialized or transfered along with the data. + The `equals` function checks `hashcode()` equality *before* checking any other equality. Because the hash is most likely already cached, this enables to fail fast. *(Note: this assertion should be statistically checked: if most `equals` function call succeed, this will slightly slow down execution instead of speeding it up)*. @@ -93,6 +95,7 @@ import java.util.* data class Person(val firstName: String, val lastName: String) data class Optimized(val id: Int, val person: Person) { + @Transient private var _hashcode = 0; override fun hashCode(): Int{ if (_hashcode == 0) From 0ea7db53956b8bb72363325957fc38a6a528c3ef Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Sat, 10 Dec 2016 09:32:27 +0100 Subject: [PATCH 7/9] Cached values should also be marked @Volatile. --- proposals/const-data-class.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index d8f764c3f..fb91bb26c 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -59,6 +59,8 @@ The `hashCode` value is generated only once and then cached. Each time the funct Both cached values are marked `@Transient` to prevent them to be serialized or transfered along with the data. +Both cached values are marked `@Volatile` to allow thread safety: at worst, the value can be computed by multiple threads at the same time, but it cannot be corrupt. + The `equals` function checks `hashcode()` equality *before* checking any other equality. Because the hash is most likely already cached, this enables to fail fast. *(Note: this assertion should be statistically checked: if most `equals` function call succeed, this will slightly slow down execution instead of speeding it up)*. @@ -95,7 +97,7 @@ import java.util.* data class Person(val firstName: String, val lastName: String) data class Optimized(val id: Int, val person: Person) { - @Transient + @Transient @Volatile private var _hashcode = 0; override fun hashCode(): Int{ if (_hashcode == 0) From efdada881f3087e85f6898694bd84e40d322feaa Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Sat, 10 Dec 2016 11:00:54 +0100 Subject: [PATCH 8/9] Moving auto-detection to alternatives --- proposals/const-data-class.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index fb91bb26c..e320540c6 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -39,11 +39,6 @@ A `const data class` has the same limitations as a `data class` with the followi - Const data class - Its `hashcode` and `equals` functions cannot be overridden. -### Auto-detection by the compiler - -The compiler could detect that a data class complies with all restrictions and apply silently the optimization if it does. -Using the `const` keyword would then force the programmer to enforce said limitations. - ### Keyword The `const` keyword used in this proposal is proposed because of its semantic. @@ -81,6 +76,13 @@ As stated in the synopsis, there would be no *guarantee* that the data class is See [issue KT-12991](https://youtrack.jetbrains.com/issue/KT-12991). This would allow the programmer to achieve the same result but, again, with no strong constant guarantee, other of course than *convention*. +### Auto-detection by the compiler + +The compiler could detect that a data class complies with all restrictions and apply silently the optimization if it does. +An annotation would still exist to allow the programmer to force himself to enforce said limitations. + +However, this auto-detection would be inconsistent with the rest of Kotlin because it silently adds a field to a class. That may be critical when you optimizing for small footprint. + ## Arguments against this proposal - This is an optimization that can be easily implemented by the programmer (at the cost of some boilerplate code that can be reduced with KT-12991). From ce3ddd680b1830659be1b8c358276f2410920f5c Mon Sep 17 00:00:00 2001 From: Salomon BRYS Date: Sat, 10 Dec 2016 11:01:33 +0100 Subject: [PATCH 9/9] Update const-data-class.md --- proposals/const-data-class.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/const-data-class.md b/proposals/const-data-class.md index e320540c6..6ad300133 100644 --- a/proposals/const-data-class.md +++ b/proposals/const-data-class.md @@ -81,7 +81,7 @@ This would allow the programmer to achieve the same result but, again, with no s The compiler could detect that a data class complies with all restrictions and apply silently the optimization if it does. An annotation would still exist to allow the programmer to force himself to enforce said limitations. -However, this auto-detection would be inconsistent with the rest of Kotlin because it silently adds a field to a class. That may be critical when you optimizing for small footprint. +However, this auto-detection would be inconsistent with the rest of Kotlin because it silently adds a field to a class. That may be critical when optimizing for small footprint. ## Arguments against this proposal