Reduce boilerplate: error traits, array accessors, and job helpers#35
Open
tonyalaribe wants to merge 6 commits into
Open
Reduce boilerplate: error traits, array accessors, and job helpers#35tonyalaribe wants to merge 6 commits into
tonyalaribe wants to merge 6 commits into
Conversation
Analysis of src/ identifying cross-cutting consolidation opportunities (shared Arrow array-access layer, error-conversion traits, scheduled-job helpers) plus per-file boilerplate clusters and library/derive/macro strategies. Estimated ~1,000-1,300 LOC reduction. https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
Introduce src/error_ext.rs with two extension traits that remove repeated
error-mapping boilerplate:
- ArrowResultExt::into_df() replaces
`.map_err(|e| DataFusionError::ArrowError(Box::new(e), None))`
- ExecResultExt::exec_context(msg) replaces
`.map_err(|e| DataFusionError::Execution(format!("msg: {e}")))`
Migrate database.rs: 6 ArrowError sites and 4 Execution-context sites.
First increment of the consolidation work in docs/code-conciseness-research.md.
https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
…stats_table Migrate ~11 more verbose map_err sites to ArrowResultExt::into_df() and ExecResultExt::exec_context(): - functions.rs: 1 ArrowError + 3 Execution-context sites - mem_buffer.rs: 3 ArrowError sites - stats_table.rs: 1 ArrowError site (drop now-unused DataFusionError import) - dml.rs: 5 Execution-context sites Verified with cargo check (clean, no new warnings). https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
Add src/arrow_access.rs with StrAccessor, generalizing the existing BinaryAccessor pattern to read uniformly across Utf8/LargeUtf8/Utf8View arrays. Use it to collapse duplicated per-encoding downcast-and-loop blocks in functions.rs: - evaluate_jsonpath_on_json_string: two near-identical loops -> one - extract_scalar_string: two downcast branches -> one StrAccessor also accepts LargeUtf8 (a strict superset of prior behavior). Verified with cargo check (clean). https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
Both tantivy_index/builder.rs and tantivy_index/udf.rs hand-rolled a near-identical Utf8/Utf8View downcast-to-closure helper. Replace both with StrAccessor (adding a null-aware get() accessor), dropping the now-unused AsArray/Array/StringArray/StringViewArray imports. Behavior preserved (StrAccessor additionally accepts LargeUtf8). Verified with cargo check (clean). https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
Add two helpers on Database and collapse the six cron jobs in
start_maintenance_schedulers:
- add_async_job(): hides the repeated
`Job::new_async(.., { let db = db.clone(); move |_,_| { let db = db.clone();
Box::pin(async move { .. }) }})` + `scheduler.add(job).await?` plumbing that
every job duplicated. The single remaining Box::pin lives in the helper.
- all_tables(): returns a (project_id, table_name, table) snapshot over both
unified and custom tables, replacing the duplicated "iterate unified, then
iterate custom" double loops in 5 jobs.
- table_label(): one place for the "unified table 'X'" / "custom project 'P'
table 'X'" log wording.
Net -148/+107 lines in database.rs.
Behavior notes (intentional, called out for review):
- all_tables() clones the Arc handles and releases the map read-locks before
per-table work, rather than holding them across the whole maintenance run.
This matches the snapshot pattern the recompress job already used and
reduces lock contention with table-map writers. Per-table ops still take the
individual table's RwLock internally.
- Maintenance log messages are normalized to the table_label() format; the
stats-refresh job previously logged custom tables as "P:X" and now logs
"custom project 'P' table 'X'". Log wording only; no functional change.
Verified with cargo check (clean, no warnings).
https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR reduces cross-cutting boilerplate across the codebase by introducing three reusable abstractions:
error_ext.rs) — eliminate repeatedmap_err(|e| DataFusionError::…)patternsarrow_access.rs) — collapse downcast cascades forUtf8/LargeUtf8/Utf8Viewarraysdatabase.rs— remove closure/Box::pinscaffolding and duplicated unified+custom table loopsKey Changes
New modules:
src/arrow_access.rs—StrAccessorenum that unifies reads across the three Arrow string encodings, replacing ~15 duplicated downcast-and-loop blockssrc/error_ext.rs—ArrowResultExtandExecResultExttraits that convert Arrow and generic errors to DataFusion results in one method calldocs/code-conciseness-research.md— detailed audit of boilerplate patterns and reduction opportunities across the codebasesrc/database.rsrefactoring:add_async_job()helper that wraps theJob::new_async(… { let db=db.clone(); move|_,_| { Box::pin(async move {…}) } })patternall_tables()helper that returns a snapshot of unified + custom tables as(project_id, table_name, table)tuplestable_label()function for consistent maintenance log messages ("unified table 'X'" vs. "custom project 'P' table 'X'")Callsite updates:
src/functions.rs— useStrAccessorinextract_scalar_string()andevaluate_jsonpath_on_json_string(); apply error traits to Arrow casts and JSONPath parsingsrc/tantivy_index/udf.rsandsrc/tantivy_index/builder.rs— replace duplicatedstring_extractor()implementations withStrAccessorsrc/mem_buffer.rs,src/dml.rs— apply error traits to Arrow operationssrc/lib.rs— export new modulesImplementation Details
StrAccessor::try_new()performs a single downcast cascade and returns a unified enum; callers usevalue(),get(),is_null()without branching on the concrete type#[inline]wrappers around existing error constructorsadd_async_job()helper is generic over the closure type and hides themove |_,_| { let db=db.clone(); Box::pin(async move {…}) }pattern entirelyall_tables(), iterate once, and log withtable_label()for consistencyTesting
Existing tests cover all modified code paths; no new test files added. The refactoring is purely mechanical — same error handling, same table iteration logic, just consolidated into reusable helpers.
https://claude.ai/code/session_01SCS7yFmpjgSgdg68UnM7Df