When WP1 was re-written in Python, we inherited a database from 2005 that didn't "trust" MySQL collation and string conversion. Instead of allowing the database to store raw bytes (Python 3 bytes equivalent) and translate back to decoded strings (Python 3 str), the design decision was to bypass all automatic conversions and store everything as UTF8 bytes everywhere.
Needless to say, UTF8 conversion and string processing has improved a great deal since 2005.
However, even during the Python rewrite in 2018, we made the decision to keep this architecture in order to minimize the surface area of potentially breaking changes. It is now time to revist that decision.
Logical database queries in the app return bytes, which then have to be converted to str to send through the web app to the frontend. The opposite conversion happens in the opposite direction. This not only complicates everything, but makes reading and writing the code slower and more error prone.
We should standardize on str everywhere, and allow the database to do byte conversions for us (aka a proper encoding value on the connection).
When WP1 was re-written in Python, we inherited a database from 2005 that didn't "trust" MySQL collation and string conversion. Instead of allowing the database to store raw bytes (Python 3
bytesequivalent) and translate back to decoded strings (Python 3str), the design decision was to bypass all automatic conversions and store everything as UTF8 bytes everywhere.Needless to say, UTF8 conversion and string processing has improved a great deal since 2005.
However, even during the Python rewrite in 2018, we made the decision to keep this architecture in order to minimize the surface area of potentially breaking changes. It is now time to revist that decision.
Logical database queries in the app return
bytes, which then have to be converted tostrto send through the web app to the frontend. The opposite conversion happens in the opposite direction. This not only complicates everything, but makes reading and writing the code slower and more error prone.We should standardize on
streverywhere, and allow the database to do byte conversions for us (aka a properencodingvalue on the connection).