Open
Description
I'm considering targeting #232 to a series/2.x.x
branch.
@rossabaker @armanbilge any objections to me creating that branch and moving the code there?
I'm considering targeting #232 to a series/2.x.x
branch.
@rossabaker @armanbilge any objections to me creating that branch and moving the code there?
Activity
armanbilge commentedon May 7, 2022
👍 thanks for all your work on this. I think a series/2.x is the best path forward for the time being.
Since this is an issue about series/2.x in general, I'm thinking we should consider using a newtype for
CIString
(at least on Scala 3) now that we are getting a typeclass-basedMap
in typelevel/cats#4185 which does not depend on universal equals/hashCode.isomarcte commentedon May 7, 2022
@armanbilge Do you just mean
CIString
encoded as an opaque type or are you talking about wrappingCIString
again? And are you foreseeing us havinghashCode
differ fromHash
?armanbilge commentedon May 7, 2022
Yes, this. Which would mean that
hashCode
would differ fromHash
.isomarcte commentedon May 7, 2022
Yeah, so in #232 that changes a bit. I changed the underlying hashCode/equals/compare to be based on a
CanonicalFullCaseFoldedString
.There is still some discussion to have there, but all
CanonicalFullCaseFoldedString
values have the same representation (for the same input) and thus the same universal hashCode and equality. SoCanonicalFullCaseFoldedString
could be an opaque type, though it wouldn't need to for hash/equal reasons. And under this encodingCIString
is effectively a specialized(String, CanonicalCaseFoldedString)
so making it opaque doesn't gain you as much.FWIW, I'm starting to be of the opinion we should be using
CanonicalCaseFoldedString
for most use cases where we useCIString
. That is, I don't know if we need to hold a reference to the original representation as the normative case.isomarcte commentedon May 7, 2022
I hope that makes some amount of sense. Rereading my comment makes me feel like I didn't explain that well.
This is what I'm talking about: https://github.com/typelevel/case-insensitive/pull/232/files#diff-90a144a03717314cdaa1151276da76557be0f1e78c890580b57c9d95406f95f6R80
isomarcte commentedon May 7, 2022
@armanbilge can you make
series/1.x.x
andseries/2.x.x
branches? I don't have access apparently.armanbilge commentedon May 7, 2022
Neither do I :)
armanbilge commentedon May 7, 2022
@isomarcte to clarify, my proposal is we replace:
with
If we do this, then the universal hashCode and equals for
CIString
will be those ofString
itself, because theopaque type
is erased toString
. This is why the instances will be incompatible with them.armanbilge commentedon May 7, 2022
Wait, sorry, I wasn't looking carefully. I see,
CIString
can no longer be represented as an opaque type, sorry! 🤔Edit: wait, nope, I'm confused by all the various implementations 😬
isomarcte commentedon May 7, 2022
Yeah, there are a lot of types in the Unicode standard. For our discussion here we just need to focus on
CanonicalFullCaseFoldedString
andCIString
.The new encoding of
CIString
on my branch is similar to this.To make a cassless string, in general, you need to case fold it, then normalize it (this is simplified, there can be more steps depending on the type of caseless string).
Thus
caselessString
is the result ofnormalize(caseFold(original))
.The reason I changed the representation of
CIString
is because what we are doing in 1.x.x is incorrect for caseless strings. To get to a caseless string is more complicated than justtoUpper.toLower
on each char, so it seemed reasonable to hold a reference to the caseless string inCIString
, rather than computing it each time we needed to do an operation on the class. That also allows us to use theCanonicalFullCaseFoldedString
as the source ofhashCode
andequals
.There are other ways we could do this. We could have the
caselessString
be a def,Then all the methods on
CIString
would need to convert to aCanonicalFullCaseFoldedString
on each operation. We could cache the hash via a null init pattern as is currently done onmain
, but we couldn't easily do that for equality checks and the other methods.All of these gymnastics are the result of us keeping around a reference to the original input string in
CIString
. That's why I'm leaning on makingCanonicalFullCaseFoldedString
(perhaps with a more succinct name) the default type we use, rather thanCIString
.CanonicalFullCaseFoldedString
can be trivial represented as an opaque type, though we don't need to for hash/eq purposes.armanbilge commentedon May 7, 2022
Thanks for the detailed explanation, that makes sense to me. Regarding whether to cache or not, it probably makes sense? Not sure if there are situations where lots of
CIString
s are allocated but no ops are called on them.The FAQ suggests there are Reasons™, but I'm not sure what they are :)
https://typelevel.org/case-insensitive/#why-pay-for-a-wrapper-when-you-can-fold-case-before-storing
rossabaker commentedon May 13, 2022
I added @armanbilge and @isomarcte to the team, so anyone should be able to create the necessary branches.
Not keeping the original would be a non-semantic, but highly visible, change to HTTP rendering. For example, all field names would be rendered lower case.
I don't know how much uptake
CIString
has seen outside http4s to evaluate how bad a binary breakage would be, but we know what immense pain the last source breakage was. I would very much prefer to avoid source-breaking changes to http4s in this.armanbilge commentedon May 13, 2022
Here are steward PRs for case-insensitive:
https://github.com/search?q=case-insensitive+author%3Ascala-steward&type=Issues&ref=advsearch&l=&l=
Besides http4s, I recognize trace4cats.
armanbilge commentedon May 13, 2022
@isomarcte I created a
series/2.x
branch and re-targeted #232 to there.I think some form of source-breaking changes are inevitable (at least in the form of deprecations) since in #232 (comment) we realized many of the exsting APIs are unsound.
Personally I think the best strategy to is keep pushing for http4s 1.x, with better CI strings, better URIs, better entities, etc.
MediaType
/MediaRange
http4s/http4s#5359