-
Notifications
You must be signed in to change notification settings - Fork 210
Clean up benchmarking scripts #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Clean up benchmarking scripts #1382
Conversation
Signed-off-by: Sarah Yurick <[email protected]>
Greptile SummaryThis PR consolidates duplicate utility functions across benchmark scripts into a shared Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Script as Benchmark Script
participant Utils as scripts/utils.py
participant Executor as Executor (Xenna/RayData/RayActors)
participant Pipeline as Pipeline
participant Writer as Result Writer
Script->>Utils: setup_executor(executor_name)
Utils->>Executor: Initialize executor instance
Executor-->>Script: Return executor
alt Scripts with dataset loading
Script->>Utils: load_dataset_files(path, size_gb)
Utils->>Utils: Calculate file subset by size
Utils-->>Script: Return file list
end
Script->>Pipeline: Create pipeline with stages
Script->>Pipeline: run(executor, initial_tasks)
Pipeline->>Executor: Execute pipeline stages
Executor-->>Pipeline: Return output tasks
Pipeline-->>Script: Return results
Script->>Script: Calculate metrics (time, documents, etc.)
Script->>Script: Build results dict (params, metrics, tasks)
Script->>Utils: write_benchmark_results(results, output_path)
Utils->>Utils: Convert str to Path if needed
Utils->>Writer: Create output directory
Utils->>Writer: Write params.json
Utils->>Writer: Write metrics.json
Utils->>Writer: Write tasks.pkl
Writer-->>Script: Results written
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 1 comment
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
benchmarking/scripts/common_crawl_benchmark.py, line 152 (link)syntax: Missing
type=Pathfor this argument.write_benchmark_resultsexpects aPathobject but will receive a string, causingAttributeError: 'str' object has no attribute 'mkdir' -
benchmarking/scripts/dedup_removal_benchmark.py, line 158-160 (link)syntax: Missing
type=Pathfor this argument.write_benchmark_resultsexpects aPathobject but will receive a string, causingAttributeError: 'str' object has no attribute 'mkdir'
9 files reviewed, 3 comments
Signed-off-by: Sarah Yurick <[email protected]>
praateekmahajan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome! thank you 🙏
No description provided.