Skip to content

Comments

Xiaoya update prefect3#201

Merged
Wiebke merged 21 commits intomlexchange:mainfrom
xiaoyachong:xiaoya-update-prefect3
Dec 4, 2025
Merged

Xiaoya update prefect3#201
Wiebke merged 21 commits intomlexchange:mainfrom
xiaoyachong:xiaoya-update-prefect3

Conversation

@xiaoyachong
Copy link
Contributor

@xiaoyachong xiaoyachong commented Nov 12, 2025

This PR focus on Prefect upgrade. It is compatible with the upcoming changes in mlex_utils and mlex_prefect_worker and includes the following updates:

1.Add algorithm registry:
The application now reads the .json file and saves the algorithm details to MLflow, and subsequently retrieves them from MLflow when needed.
This enables the Prefect worker to access algorithm information directly from MLflow, rather than relying on parameters passed from the application.

  1. Simplify parameters and remove flow_type in segmentation.py:
    The structure of parameters sent from the application to the Prefect worker has been simplified as follows:
{
    "model_name": model_name,
    "task_name": "train",
    "params": {
        "io_parameters": io_parameters,
        "model_parameters": model_parameters,
    },
}

Meanwhile, the following credentials have been removed from io_parameters:

data_tiled_api_key
mask_tiled_api_key
seg_tiled_api_key
mlflow_tracking_username
mlflow_tracking_password

These credentials are now stored on the Prefect worker side, eliminating the need to send them from the application to the Prefect worker and improving security and configuration consistency.

Companion PRs:
tomo: mlexchange/mlex_tomo_framework#16
prefect worker: mlexchange/mlex_prefect_worker#26
mlex_utils: mlexchange/mlex_utils#5
dlsia_proto: mlexchange/mlex_dlsia_segmentation_prototype#38


@gitnotebooks
Copy link

gitnotebooks bot commented Nov 12, 2025

@xiaoyachong xiaoyachong requested review from Wiebke and taxe10 November 13, 2025 20:28
@taxe10 taxe10 marked this pull request as ready for review November 14, 2025 17:08
Copy link
Member

@taxe10 taxe10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I have a couple of comments that should be addressed prior merging:

  • Enhanced interface startup when no algorithms have been registered - I accidentally started the app without registering the algorithms first, and the application failed to initialize, so I added the following:
% git diff components/control_bar.py 
diff --git a/components/control_bar.py b/components/control_bar.py
index 80e0b59..c39a056 100644
--- a/components/control_bar.py
+++ b/components/control_bar.py
@@ -594,7 +594,7 @@ def layout():
                                         data=models.modelname_list,
                                         value=(
                                             models.modelname_list[0]
-                                            if models.modelname_list[0]
+                                            if len(models.modelname_list) > 0 and models.modelname_list[0]
                                             else None
                                         ),
                                         placeholder="Select a model...",

In the future - we could add a "refresh" button nearby the model selection controls to check if new algorithms have been made available in the application, but I don't think this is needed at this time.

Additionally, I was wondering if we could do a quick model registration at the application startup - something like:

  mlex_segmentation:
    build:
      context: ./mlex_highres_segmentation
      dockerfile: Dockerfile
    command: 'python scripts/save_mlflow_algorithm.py && gunicorn -b 0.0.0.0:8075 --reload app:server'

FYI - I have not been able to test the Prefect 3.x integration locally due to additional comments in the worker's PR

@xiaoyachong
Copy link
Contributor Author

xiaoyachong commented Nov 19, 2025

This is great! I have a couple of comments that should be addressed prior merging:

  • Enhanced interface startup when no algorithms have been registered - I accidentally started the app without registering the algorithms first, and the application failed to initialize, so I added the following:
% git diff components/control_bar.py 
diff --git a/components/control_bar.py b/components/control_bar.py
index 80e0b59..c39a056 100644
--- a/components/control_bar.py
+++ b/components/control_bar.py
@@ -594,7 +594,7 @@ def layout():
                                         data=models.modelname_list,
                                         value=(
                                             models.modelname_list[0]
-                                            if models.modelname_list[0]
+                                            if len(models.modelname_list) > 0 and models.modelname_list[0]
                                             else None
                                         ),
                                         placeholder="Select a model...",

In the future - we could add a "refresh" button nearby the model selection controls to check if new algorithms have been made available in the application, but I don't think this is needed at this time.

Additionally, I was wondering if we could do a quick model registration at the application startup - something like:

  mlex_segmentation:
    build:
      context: ./mlex_highres_segmentation
      dockerfile: Dockerfile
    command: 'python scripts/save_mlflow_algorithm.py && gunicorn -b 0.0.0.0:8075 --reload app:server'

FYI - I have not been able to test the Prefect 3.x integration locally due to additional comments in the worker's PR

Thanks for your revision! I’ve made the changes accordingly in the latest commit. I also change the command in mlexchange/mlex_tomo_framework#16.

Copy link
Member

@taxe10 taxe10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes worked well at my end.

Just a couple of comments for follow-up in the next PR:

  • The status check for the prefect worker switches to ready as soon as the parent flow pool becomes available, even if the individual child worker (conda/docker/etc.) is not actually ready. This can mislead users, so we should refine this logic or clarify the status reporting.
  • For MLflow model registration, we're currently using the Prefect flow run ID. We should consider switching to human-readable names—possibly the job name—but we need to think through a long-term strategy that supports multi-tenant scenarios.

Copy link
Member

@Wiebke Wiebke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I think this is pretty ready to merge. I left some minor comments.
Prior to merging we should:

  • first merge the mlex_utils PR and update requirements.txt accordingly,
  • [optionally] still introduce the guarding against no models being found

Comment on lines +597 to +598
if len(models.modelname_list) > 0
and models.modelname_list[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is coming out of the options here, but with no models registered yet, I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/mlex_utils/mlflow_utils/mlflow_algorithm_client.py", line 267, in __getitem__
    return self.algorithms[key]
           ^^^^^^^^^^^^^^^^^^^^
KeyError: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/utils/data_utils.py", line 402, in __getitem__
    return self.mlflow_client[key]
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mlex_utils/mlflow_utils/mlflow_algorithm_client.py", line 269, in __getitem__
    raise KeyError(f"An algorithm with name '{key}' does not exist.")
KeyError: "An algorithm with name 'None' does not exist."

During handling of the above exception, another exception occurred:

KeyError: 'A model with name None does not exist.'

Note that the reason for me starting the application without models registered is that I unintentionally overwrote the command in my docker-compose.override.yaml.
Some error handling for this might be useful though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea could be to update the update_model_parameters callback

https://github.com/xiaoyachong/mlex_highres_segmentation/blob/f742fb97868606da4d7f471b8f1f138fc60832cd/callbacks/control_bar.py#L956-L969

with an additional check:

    if not model_name:
        return html.Div("No model available.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I’ve added a safeguard in the update_model_parameters callback in control_bar.py to handle cases where no models are found.

Comment on lines +14 to +21
load_dotenv(dotenv_path="../.env")

# MLflow Configuration from environment variables
MLFLOW_TRACKING_URI = os.getenv("MLFLOW_TRACKING_URI_OUTSIDE", "http://localhost:5000")
MLFLOW_TRACKING_USERNAME = os.getenv("MLFLOW_TRACKING_USERNAME", "")
MLFLOW_TRACKING_PASSWORD = os.getenv("MLFLOW_TRACKING_PASSWORD", "")
# Algorithm JSON path from environment variable
ALGORITHM_JSON_PATH = os.getenv("ALGORITHM_JSON_PATH", "../assets/models.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR (in line with archiving mlex_tomo_framework, it might make sense to convert this script to use typer and read from environment variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the kind reminder!

@xiaoyachong
Copy link
Contributor Author

These changes worked well at my end.

Just a couple of comments for follow-up in the next PR:

  • The status check for the prefect worker switches to ready as soon as the parent flow pool becomes available, even if the individual child worker (conda/docker/etc.) is not actually ready. This can mislead users, so we should refine this logic or clarify the status reporting.
  • For MLflow model registration, we're currently using the Prefect flow run ID. We should consider switching to human-readable names—possibly the job name—but we need to think through a long-term strategy that supports multi-tenant scenarios.

Thanks for the kind reminder!

@xiaoyachong xiaoyachong requested review from Wiebke and taxe10 December 3, 2025 23:45
Copy link
Member

@Wiebke Wiebke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Wiebke Wiebke merged commit 21c6c87 into mlexchange:main Dec 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants