Skip to content

QUERY --format=json incorrectly double-escapes native JSON column types #426

@orlevyhs

Description

@orlevyhs

What happened?

When using a simple bq query with the --format=json flag, the emulator incorrectly serializes the native BigQuery JSON data type column as a double-escaped string.

This differs from production BigQuery behavior, where a native JSON column should be outputted as a raw, unquoted JSON object within the surrounding JSON structure. This breakage prevents automated pipelines from correctly unmarshaling the data.

Expected vs. Actual Behavior

  • Source Table Schema: (id INT64, metadata JSON)

  • Source Data: {"id": 1, "metadata": {"status": "ok", "version": 1}}

Scenario | Output of metadata Field

What did you expect to happen?

Expected (Production BigQuery) | {"metadata": {"status": "ok", "version": 1}} (Nested JSON Object)
Actual (BigQuery Emulator) | {"metadata": "{\"status\":\"ok\",\"version\":1}"} (Double-escaped JSON String)

How can we reproduce it (as minimally and precisely as possible)?

Reproduction Steps (Using bq CLI)

Please ensure the goccy/bigquery-emulator is running on port 9050.

  1. Define Environment Variables (Needed if running from host/client)

    export BQ_EMULATOR_HOST="[http://0.0.0.0:9050](http://0.0.0.0:9050)"
    PROJECT_ID="test-project-1"
    DATASET_ID="test_ds"
    TABLE_NAME="test_table_json"
    

  2. Create Table with Native JSON Column

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID mk --table $DATASET_ID.$TABLE_NAME 
    'id:INT64, metadata:JSON'

  3. Insert Row with JSON Data

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query "INSERT INTO $DATASET_ID.$TABLE_NAME (id, metedata)
    VALUES (
    1,
    JSON {"status": "ok", "version": 1}
    );"

  4. Verify the Broken Output using Simple QUERY

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query --format=json  
    "SELECT * FROM $DATASET_ID.$TABLE_NAME"

Observed Output (Faulty)

The output from the query command shows the metadata field as an escaped string:

[
{
"id": "1",
"metadata": "{"status":"ok","version":1}"
}
]

Expected Output (Correct)

The expected output for a correct JSON serialization should contain the nested object:

[
{
"id": "1",
"metadata": {
"status": "ok",
"version": 1
}
}
]

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions