Skip to content

Conversation

@apayne97
Copy link
Collaborator

@apayne97 apayne97 commented Jul 1, 2025

Description

Fixes #15

The goal is to make it so that docking doesn't depend on ml at all, and therefore the docking cli will pass.

As of 2025-07-02 we've decided to go with option 4 but for a future release. We will make the minimal change necessary in a separate PR.

Todos

  • move ml scorers to ml subpackage
  • move fint score to dataviz subpackage
  • move tests to appropriate subpackages

Status

  • Ready to go

Developers certificate of origin

@apayne97 apayne97 linked an issue Jul 1, 2025 that may be closed by this pull request
@apayne97 apayne97 added this to the release 0.1 milestone Jul 1, 2025
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (pydantic_2@9929e64). Learn more about missing BASE report.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ijpulidos
Copy link
Contributor

Do we know how and where are these scorers being used? I ask here because I don't really know how these are currently used. Just thinking, while they are indeed ML, we are now effectively making the ml package depend on the docking one (especially with the references to DockingResult). Maybe this is hinting towards a deeper API design problem. As in, maybe we need docking agnostic scorers in the ML package, and interface to them for docking results in the docking package. Is this something that's worth thinking about?

I know this may make the docking CI pass, but maybe it's just moving the problem somewhere else. If it just moves the issue around without really solving the CI, I'd suggest that this probably requires a further discussion with the users of these scorers and these API points. I hope this makes sense.

@apayne97
Copy link
Collaborator Author

apayne97 commented Jul 2, 2025

@ijpulidos This is a good point. I think they are basically only used in the small and large scale docking workflows (which live in workflows and also depend on dataviz, etc).

There might be some logic in moving the scorer base class to data, which than then be adopted by the various other subpackages. Then we can also define the metascorer logic their and provide api to add other scoring functions.

This is maybe worth a larger discussion. An even easier solution is to put both the ML scorers in workflows, since basically any ML scorer to be used on a docking result has to depend on both the ml and docking subpackages.

I think the dependency problem does have to do with the way we've made the scorers with complicated logic like this:
https://github.com/choderalab/asapdiscovery/blob/c430e6f5729a892f1da2f77af6d558fccdc09fa3/asapdiscovery-ml/asapdiscovery/ml/scorers.py#L171C5-L194C23

Perhaps we can keep the ml scorers in the ml subpackage but write them without using the docking package, by entirely removing the dependency on the DockingResults object and moving a base scorer class to data.

@apayne97
Copy link
Collaborator Author

apayne97 commented Jul 2, 2025

I do think with our stated goals for the next two weeks, making ML depend on docking to make docking not depend on ML is worth it, since the primary goals are to make it easy to use spectrum, docking, modeling, and data, and make the CI pass for all of those. a larger refactor can happen later. but definitely worth discussing more (maybe with an eye on how to incorporate free energy calculation "scores" and boltz-2 scores and whatnot in the future).

@apayne97
Copy link
Collaborator Author

apayne97 commented Jul 2, 2025

Just to clarify, options as I see it are:

  1. Finish this PR (i.e. get dataviz CI pass) and merge it, letting ML depend on docking for now.
  2. Don't merge this PR, just xfail the docking scorers that depend on ML stuff.
  3. Move the ML and FINT scorers into the workflow package, making workflows depend on everything (which it already does) instead of placing a burden on the other packages.
  4. Refactor the way scoring works (base score in data instead of docking, and then define the api for each of the docking, ml, and dataviz (fint) scorers such that they only depend on data, and then refactor the workflows to properly handle them).

4 is not mutually exclusive with 1,2 and 3, but one of these 4 need to happen in the next two weeks.

@mariacm12
Copy link
Contributor

Update: I also need the ChemGauss scorer for spectrum (and the PR I want to merge before the first release), so I'm leaning towards option 1 as a temporary fix. I agree we should copy the whole base of the scorers to data to make it independent of docking (but keep the scorers there), but that should probably go on the next release.

@chrisiacovella
Copy link
Member

Since the current plan is to release without ml, I suggest we modify the cli-ci test to xfail ml testing. Just add in

if subcommand == "ml"
     pytest.xskip("skipping ML cli testing") 

@apayne97 apayne97 changed the title Move docking scorers [DNM] Move docking scorers Jul 2, 2025
@apayne97 apayne97 modified the milestones: release 0.1, release 0.2 Jul 2, 2025
@ijpulidos
Copy link
Contributor

ijpulidos commented Jul 2, 2025

From discussion with @apayne97 and @mariacm12 , we thought a fast way forward in these terms is to move the ML scorers to their own module, still in the docking subpackage (becuase they still depend on DockingResult objects). That way the actual scorers they are using in their workflows (namely ChemGaussScorer) should be ready to use. @apayne97 will be making a new branch/PR where we would be testing this.

This branch is a good starting point for a further restructure of the scorers. As in, having some base and agnostic classes for scorers in data, and then subclasses in ml or docking as needed.

@apayne97 apayne97 removed this from the release 0.8 milestone Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move ML and FINT scorers from docking somewhere else

6 participants