-
Notifications
You must be signed in to change notification settings - Fork 0
Results database with MLFlow #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates MLflow for experiment tracking and result management, addressing issue #1. It refactors the main search function to be callable programmatically and adds database storage capabilities for search results using SQLAlchemy.
Key changes:
- Extracted core search logic into
run_search()function for programmatic access - Added MLflow integration via
run_search_mlflow()for experiment tracking - Implemented database models and result compilation utilities for persistent storage
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_search.py | Added tests for the new run_search and run_search_mlflow functions |
| src/find_asteroids/search.py | Refactored main() to extract run_search() and added run_search_mlflow() wrapper with MLflow tracking |
| src/find_asteroids/results.py | New module for reading and compiling results from local directories or MLflow, with database storage support |
| src/find_asteroids/models.py | New SQLAlchemy models for storing experiments, runs, and search results in a database |
| pyproject.toml | Added sqlalchemy dependency and mlflow as optional pipeline dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with mlflow.start_run() as run: | ||
| run_id = run.info.run_id | ||
| print(f"MLflow Run ID: {run_id}") | ||
| for k, v in tags: |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterates over tags without validation. If tags is an empty list (the default), this will work, but if a tag is not a two-element tuple, this will fail. While the CLI parsing at line 353 ensures tuples, when called directly via run_search_mlflow(), invalid tag formats could cause an unpacking error.
| for k, v in tags: | |
| for tag in tags: | |
| if not (isinstance(tag, (tuple, list)) and len(tag) == 2): | |
| raise ValueError(f"Each tag must be a tuple or list of length 2, got: {tag!r}") | |
| k, v = tag |
|
|
||
| def read_results(results_dir : Path, name : str, output_format='ecsv'): | ||
| import astropy.table | ||
| if type(results_dir) is str: | ||
| results_dir = Path(results_dir) | ||
| for p in sorted(results_dir.glob("*/"+f"{name}.{output_format}"), key=lambda x: int(x.parent.name)): | ||
| print("reading", p) |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the logging module instead of print statements for consistency with the rest of the codebase. Import and use log.info() or log.debug() instead.
| def read_results(results_dir : Path, name : str, output_format='ecsv'): | |
| import astropy.table | |
| if type(results_dir) is str: | |
| results_dir = Path(results_dir) | |
| for p in sorted(results_dir.glob("*/"+f"{name}.{output_format}"), key=lambda x: int(x.parent.name)): | |
| print("reading", p) | |
| import logging | |
| log = logging.getLogger(__name__) | |
| def read_results(results_dir : Path, name : str, output_format='ecsv'): | |
| import astropy.table | |
| if type(results_dir) is str: | |
| results_dir = Path(results_dir) | |
| for p in sorted(results_dir.glob("*/"+f"{name}.{output_format}"), key=lambda x: int(x.parent.name)): | |
| log.info(f"reading {p}") |
Address #1