Skip to content

[destination-snowflake] First sync after upgrading to v4.0.1 is extremely slow (many tiny PUT/COPY batches + Temporal rate limit messages) #67154

@dobsontom

Description

@dobsontom

Connector Name

destination-snowflake

Connector Version

4.0.1

What step the error happened?

During the sync

Relevant information

We’re self-hosting Airbyte v1.8.5 on Kubernetes. After upgrading:

Source: source-mysql v3.50.8

Destination: destination-snowflake v4.0.1

My understanding is that these versions enable a newer, faster load mode to Snowflake. Our MySQL source is very large. However, the first sync after upgrading was much slower than before. Symptoms:

The destination logs show thousands of very small batches: repeated PUT of a single small .csv.gz file to the Snowflake internal stage followed by a COPY INTO that inserts 1 row (or a handful of rows) into an airbyte_internal table, over and over.

We also see frequent SHOW COLUMNS statements being issued against the airbyte_internal tables.

Airbyte’s Temporal logs periodically emit: RESOURCE_EXHAUSTED: namespace rate limit exceeded.

Snowflake history (per-minute rollup) shows long stretches where AVG_ROWS_PER_COPY is ~1, with occasional spikes. Net effect: extremely slow end-to-end ingestion.

Questions for the team:

  • Is there an initialisation/migration step in v4 Snowflake destination that causes very small micro-batches during the first run (e.g., schema probing, per-stream bootstrap, or dedupe/state migration)?

  • Are there config knobs on either connector that control the flush size / buffer bytes / records / time that could explain 1-row COPYs?

  • Is the Temporal namespace rate limit exceeded expected during this bootstrap and could it be throttling file staging / COPY concurrency?

  • Any guidance on recommended Snowflake file sizes (so the destination produces larger staged files) for the new direct-load path?

Relevant log output

(Representative snippets; can supply full logs if needed.)

INFO SnowflakeInsertBuffer(flush):84 Finished insert of 1 row(s) into "airbyte_internal"."MYSQL_TA_DEFAULTORDER_ITEM..."
INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 PUT 'file:///tmp/snowflake....csv.gz' '@"AIRBYTE_DB"."airbyte_internal"."airbyte_stage_..."'
AUTO_COMPRESS = FALSE
SOURCE_COMPRESSION = GZIP
OVERWRITE = TRUE
INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 COPY INTO "AIRBYTE_DB"."airbyte_internal"."MYSQL_TA_DEFAULTORDER_ITEM..."
FROM '@"AIRBYTE_DB"."airbyte_internal"."airbyte_stage_..."'
FILE_FORMAT = "AIRBYTE_DB"."airbyte_internal"."airbyte_csv_format"
ON_ERROR = 'ABORT_STATEMENT'
PURGE = TRUE
files = ('snowflake....csv.gz')

INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 SHOW COLUMNS IN TABLE "AIRBYTE_DB"."airbyte_internal"."MYSQL_TA_DEFAULTCUSTOMER_EMAIL..."
...
ERROR SelfHealTemporalWorkflows — RESOURCE_EXHAUSTED: namespace rate limit exceeded


Example minute-level Snowflake history (illustrative):

MINUTE_BUCKET              COPY_OPS  ROWS_INSERTED  AVG_ROWS_PER_COPY
2025-10-07 06:33:00.000 Z  56        56             1.000000
2025-10-07 06:44:00.000 Z  57        59             1.035088
2025-10-07 07:31:00.000 Z  52        135            2.596154
2025-10-07 08:06:00.000 Z  47        54             1.148936

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions