Skip to content

Java driver doesn't process strings with emojis correctly #699

Open
@farost

Description

@farost

Description

Java driver doesn't process strings with emojis (e.g. 😎) correctly. Instead, it fails on the Rust side's bytes parsing.

Environment

  1. TypeDB distribution: Core
  2. TypeDB version: 3.0.0-alpha-6 and earlier
  3. Environment: Mac

Use a bdd test (available in our BDD repo as cannot create database with an emoji in connection/database):

  Background:
    Given typedb starts
    Given connection opens with default authentication
    Given connection is open: true
    Given connection has 0 databases

  Scenario: cannot create database with an incorrect name
    Then connection create database: 😎; fails

The database name's parsing will fail with an error:

thread '<unnamed>' panicked at c/src/memory.rs:109:13:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }

If we print the received bytes on the Rust side, it will show eda0bdedb88e. In the meantime, we'd expect f09f988e as the UTF-8 representation for this emoji.

Characters inside the Basic Multilingual Plane (like Chinese chars) are processed correctly. The issue seems to be exclusive to chars outside of BMP.

We'll need to modify strings processing in SWIG for Java (and probably other languages... at least Python works correctly, others will be tested later).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions