Skip to content

[proposal] Default database collation to collate=C #100

@guewen

Description

@guewen

When a database is created by the container's entrypoint, it uses the default collation that will be en_US.utf8.

As discussed here: odoo/odoo#25196 (comment), this may be under-optimized with the use of LIKE queries using wildcards (LIKE 'foo%').

We have several possible axes of improvements:

  • add text_pattern_ops case by case where necessary
  • add trigram indices using pg_trgm which benefits for LIKE '%foo%' queries as well, case by case too however
  • create the databases with a C collation and locale en_US.utf8 (collate=C)

This is mainly the last point which should be discussed here.

Pros:

  • consistent sorting
  • expected general improvement of performance

Cons:

  • sorting of accented chars "sounds" wrong for French: Blanche, Béatrice, Claude is going to be sorted as Blanche, Béatrice, Claude instead of Béatrice, Blanche, Claude. Can be resolved with unaccent

My proposal is to change the calls to createdb in the image to always create them with collate=C.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions