-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
Connector Name
destination-snowflake
Connector Version
4.0.1
What step the error happened?
During the sync
Relevant information
We’re self-hosting Airbyte v1.8.5 on Kubernetes. After upgrading:
Source: source-mysql
v3.50.8
Destination: destination-snowflake
v4.0.1
My understanding is that these versions enable a newer, faster load mode to Snowflake. Our MySQL source is very large. However, the first sync after upgrading was much slower than before. Symptoms:
The destination logs show thousands of very small batches: repeated PUT
of a single small .csv.gz
file to the Snowflake internal stage followed by a COPY INTO
that inserts 1 row (or a handful of rows) into an airbyte_internal
table, over and over.
We also see frequent SHOW COLUMNS
statements being issued against the airbyte_internal tables.
Airbyte’s Temporal logs periodically emit: RESOURCE_EXHAUSTED
: namespace rate limit exceeded.
Snowflake history (per-minute rollup) shows long stretches where AVG_ROWS_PER_COPY
is ~1, with occasional spikes. Net effect: extremely slow end-to-end ingestion.
Questions for the team:
-
Is there an initialisation/migration step in v4 Snowflake destination that causes very small micro-batches during the first run (e.g., schema probing, per-stream bootstrap, or dedupe/state migration)?
-
Are there config knobs on either connector that control the flush size / buffer bytes / records / time that could explain 1-row
COPY
s? -
Is the Temporal namespace rate limit exceeded expected during this bootstrap and could it be throttling file staging /
COPY
concurrency? -
Any guidance on recommended Snowflake file sizes (so the destination produces larger staged files) for the new direct-load path?
Relevant log output
(Representative snippets; can supply full logs if needed.)
INFO SnowflakeInsertBuffer(flush):84 Finished insert of 1 row(s) into "airbyte_internal"."MYSQL_TA_DEFAULTORDER_ITEM..."
INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 PUT 'file:///tmp/snowflake....csv.gz' '@"AIRBYTE_DB"."airbyte_internal"."airbyte_stage_..."'
AUTO_COMPRESS = FALSE
SOURCE_COMPRESSION = GZIP
OVERWRITE = TRUE
INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 COPY INTO "AIRBYTE_DB"."airbyte_internal"."MYSQL_TA_DEFAULTORDER_ITEM..."
FROM '@"AIRBYTE_DB"."airbyte_internal"."airbyte_stage_..."'
FILE_FORMAT = "AIRBYTE_DB"."airbyte_internal"."airbyte_csv_format"
ON_ERROR = 'ABORT_STATEMENT'
PURGE = TRUE
files = ('snowflake....csv.gz')
INFO SnowflakeDirectLoadSqlGeneratorKt(andLog):32 SHOW COLUMNS IN TABLE "AIRBYTE_DB"."airbyte_internal"."MYSQL_TA_DEFAULTCUSTOMER_EMAIL..."
...
ERROR SelfHealTemporalWorkflows — RESOURCE_EXHAUSTED: namespace rate limit exceeded
Example minute-level Snowflake history (illustrative):
MINUTE_BUCKET COPY_OPS ROWS_INSERTED AVG_ROWS_PER_COPY
2025-10-07 06:33:00.000 Z 56 56 1.000000
2025-10-07 06:44:00.000 Z 57 59 1.035088
2025-10-07 07:31:00.000 Z 52 135 2.596154
2025-10-07 08:06:00.000 Z 47 54 1.148936
Contribute
- Yes, I want to contribute