Skip to content

[nightshift] 20260611 multi-cleanup#6327

Merged
claude-nightshift[bot] merged 5 commits into
mainfrom
nightshift/cleanup-20260611
Jun 12, 2026
Merged

[nightshift] 20260611 multi-cleanup#6327
claude-nightshift[bot] merged 5 commits into
mainfrom
nightshift/cleanup-20260611

Conversation

@claude-nightshift

Copy link
Copy Markdown
Contributor

seed 75357b1d
Dead lines fall away —
one helper holds the contract,
clean diffs greet the dawn.

Four independent, behavior-preserving cleanups, one per subproject.

marin/datakit: normalize._make_split_writer duplicated the
part-NNNNN-of-MMMMM.parquet shard-name format string inline twice instead of
using datakit.partition_filename, the canonical helper written for exactly this
purpose. Both the main and dups output paths now route through the helper, so
the naming contract that consolidate's filename-based join depends on lives in
one place. Output paths are identical; tests/datakit/test_normalize.py (15)
passes.

levanter/data: Removed a closed cluster of dead code from _preprocessor.py
(_construct_composite_batch_processor, _CompositeBatchProcessor,
as_record_batch — 116 lines). This composite-transform pathway has been
unreferenced since it was added in 2023; the three symbols only referenced one
another and nothing in the repo or downstream marin consumed them. The
transform machinery still used by sharded_datasource (_MapTransform,
_BatchMapTransform, _TransformedDataset) and the still-imported BatchResult /
dict_from_record_batch are untouched.

iris/k8s: Five k8s call sites independently parsed Kubernetes RFC3339
timestamps with datetime.fromisoformat(s.replace("Z", "+00:00")), and the
kubectl log-line parser additionally carried a manual fractional-second
truncation block working around a pre-3.11 fromisoformat limitation. On the
supported Python range (>=3.11,<3.13) fromisoformat truncates sub-microsecond
fractions natively, so that block was dead. Added parse_k8s_timestamp in
k8s/types.py alongside parse_k8s_quantity/parse_k8s_cpu, routed all call sites
through it, dropped the dead truncation block and a now-unused datetime import,
and added parametrized tests (Z suffix, explicit offset, microsecond,
nanosecond truncation, malformed-input rejection).

zephyr/plan: The physical Write op was the lone violator of the plan module's
design that physical ops encapsulate execution as callables — it carried a
stringly-typed writer_type plus a schema field and forced run_stage into a
4-way if/elif that imported every writer function. A _writer_for(writer_type,
schema) factory now binds the writer to its callable at plan-build time
(functools.partial binds schema for parquet/vortex); Write carries a single
write_fn and run_stage just calls op.write_fn(stream, output_path). The
user-facing WriteOp in dataset.py keeps its Literal-typed writer_type.

Affected test suites pass: iris k8s parsers (18), zephyr plan/dataset/backends/
execution/groupby/writers/optimization (189), datakit normalize (15), levanter
sharded_dataset/newdataset (11).

Nightshift Agent added 4 commits June 11, 2026 14:04
…me helper

normalize._make_split_writer duplicated the part-NNNNN-of-MMMMM.parquet
format string inline twice instead of using datakit.partition_filename,
the canonical helper written for exactly this purpose. Route both the
main and dups output paths through the helper so the naming contract lives
in one place.
Drop _construct_composite_batch_processor, _CompositeBatchProcessor, and
as_record_batch from data/_preprocessor.py. This composite-transform pathway
has been unreferenced since it was added in 2023; the three symbols only
referenced one another and nothing in the repo (or downstream marin) consumed
them. The transform classes still used by sharded_datasource are untouched.
Five k8s call sites independently parsed Kubernetes RFC3339 timestamps with
datetime.fromisoformat(s.replace("Z", "+00:00")). One of them (the kubectl
log-line parser) also carried a manual fractional-second truncation block that
worked around a pre-3.11 fromisoformat limitation. On the supported Python
range (>=3.11,<3.13) fromisoformat truncates sub-microsecond fractions itself,
so that block is dead.

Centralize the parse into parse_k8s_timestamp in k8s/types.py alongside the
existing parse_k8s_quantity/parse_k8s_cpu helpers and route all call sites
through it.
The physical Write op was the only one carrying a stringly-typed writer_type
(plus a schema field) and forcing run_stage to branch on it and import every
writer function, contradicting the plan module's stated design that physical
ops encapsulate execution as callables. Resolve the writer at plan time via
_writer_for() so run_stage just calls op.write_fn, decoupling it from concrete
output formats.
@claude-nightshift claude-nightshift Bot added agent-generated Created by automation/agent nightshift Automated nightshift fixes labels Jun 11, 2026
@claude-nightshift claude-nightshift Bot requested a review from rjpower June 11, 2026 14:12
@claude-nightshift claude-nightshift Bot enabled auto-merge (squash) June 11, 2026 14:12
@claude-nightshift claude-nightshift Bot merged commit 92d9542 into main Jun 12, 2026
33 checks passed
@claude-nightshift claude-nightshift Bot deleted the nightshift/cleanup-20260611 branch June 12, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent nightshift Automated nightshift fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant