Skip to content

Commit 5a4a77d

Browse files
authored
Fix atomic_rename double-wrapping bug in transform_conversation (#2163)
Remove unnecessary `atomic_rename()` wrapper around `write_jsonl_file()`. `write_jsonl_file()` already handles atomic writes internally, and the double-wrapping caused the compression check to fail (seeing `.tmp` extension instead of `.gz`).
1 parent 9ff9161 commit 5a4a77d

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

lib/marin/src/marin/transform/conversation/transform_conversation.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
from marin.core.conversation import DolmaConversationOutput, OpenAIChatMessage
3939
from marin.execution import unwrap_versioned_value
4040
from marin.utils import fsspec_mkdirs, load_dataset_with_backoff
41-
from zephyr import Dataset, atomic_rename, flow_backend, load_jsonl, write_jsonl_file
41+
from zephyr import Dataset, flow_backend, load_jsonl, write_jsonl_file
4242

4343
from .adapters import TransformAdapter
4444

@@ -361,8 +361,7 @@ def transform_records():
361361
if transformed_row is not None:
362362
yield transformed_row.model_dump()
363363

364-
with atomic_rename(output_filename) as tmp_filename:
365-
result = write_jsonl_file(transform_records(), tmp_filename)
364+
result = write_jsonl_file(transform_records(), output_filename)
366365

367366
logging.info(
368367
f"Wrote {result['count']} rows to {result['path']} "

0 commit comments

Comments
 (0)