feat: initial support for data-platform by sg-s · Pull Request #430 · deeporiginbio/do-dd-client

sg-s · 2026-02-03T18:46:06Z

changes

basic support for data-platform
ability to register a Ligand
ability to register a protein
ability to search for a Ligand
ability to search for a protein
ability to construct a Ligand from ID
ability to construct a Protein from ID

will not implement this pr

ability to list projects
ability to create a project
ability to check if a project exists

Copilot

Pull request overview

This PR adds initial support for the data-platform API, enabling users to search and query various entities (ligands, proteins, etc.) through the DeepOriginClient.

Changes:

Added a new Data API wrapper class with methods for health checks, model listing, and entity searching
Integrated the Data wrapper into DeepOriginClient
Added comprehensive test coverage and mock server endpoints for the new functionality

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/platform/data.py	New Data API wrapper class with methods for searching entities, ligands, and proteins, plus health and model listing endpoints
src/platform/client.py	Integration of Data wrapper into DeepOriginClient initialization
tests/test_data.py	Comprehensive test suite for data platform functionality including health checks, search operations, and error handling
tests/mock_server/server.py	Mock endpoints for data platform API including health, search, and model listing
docs/platform/ref/data.md	User-facing documentation for the Data API with usage examples
mkdocs.yaml	Added data reference to documentation navigation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T18:51:02Z

tests/test_data.py

+def test_search_proteins_lv1():
+    """Test searching proteins using convenience method."""
+    client = DeepOriginClient()
+    response = client.data.search_proteins()
+
+    assert isinstance(response, dict), "Expected a dictionary response"
+    assert "data" in response, "Expected 'data' key in response"
+    assert isinstance(response["data"], list), "Expected 'data' to be a list"
+
+
+def test_search_proteins_molecular_weight():
+    """Test searching proteins with molecular weight filters."""
+    client = DeepOriginClient()
+    response = client.data.search_proteins(
+        min_molecular_weight=250, max_molecular_weight=550
+    )
+
+    assert isinstance(response, dict), "Expected a dictionary response"
+    assert "data" in response, "Expected 'data' key in response"
+    assert isinstance(response["data"], list), "Expected 'data' to be a list"
+


There is no test coverage for the pdb_id parameter in search_proteins. Consider adding a test that verifies the pdb_id filter works correctly.

src/platform/data.py

Copilot · 2026-02-03T18:51:03Z

src/platform/data.py

+    def search_proteins(
+        self,
+        *,
+        cursor: str | None = None,
+        pdb_id: str | None = None,
+        min_molecular_weight: float | int | None = None,
+        max_molecular_weight: float | int | None = None,
+        sequence: str | None = None,
+        limit: int | None = None,
+        offset: int | None = None,
+        select: list[str] | None = None,
+        sort: dict[str, str] | None = None,
+    ) -> dict:


The search_proteins method does not accept a filter parameter like search_ligands does (line 157). Users cannot pass custom filters to search_proteins, which limits flexibility. Consider adding a filter parameter to maintain consistency with search_ligands and allow users to specify custom filter criteria beyond the convenience parameters (pdb_id, molecular_weight, sequence).

^ maybe, later. not sure if the filter syntax is usable

src/platform/data.py

Copilot · 2026-02-03T18:51:03Z

tests/test_data.py

+def test_search_ligands_lv1():
+    """Test searching ligands using convenience method."""
+    client = DeepOriginClient()
+    response = client.data.search_ligands()
+
+    assert isinstance(response, dict), "Expected a dictionary response"
+    assert "data" in response, "Expected 'data' key in response"
+    assert isinstance(response["data"], list), "Expected 'data' to be a list"
+


There is no test coverage for the filter parameter in search_ligands. The test_search_ligands_lv1 function only tests the basic case without filters, and test_search_ligands_molecular_weight only tests the molecular weight convenience parameters. Consider adding a test that verifies the filter parameter works correctly, similar to how the search method is tested in test_search_entity_lv1.

tests/mock_server/server.py

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T19:09:50Z

src/platform/data.py

+        self,
+        *,
+        cursor: str | None = None,
+        filter: dict[str, Any] | None = None,


The parameter name 'filter' in search_ligands is inconsistent with 'filter_dict' used in the base search() method. While both names work, using 'filter_dict' consistently across all methods would improve API consistency and make it clearer that the parameter expects a dictionary. Consider renaming 'filter' to 'filter_dict' for consistency.

Suggested change

filter: dict[str, Any] | None = None,

filter_dict: dict[str, Any] | None = None,

Copilot · 2026-02-19T19:09:50Z

src/platform/data.py

+    def search_proteins(
+        self,
+        *,
+        cursor: str | None = None,
+        pdb_id: str | None = None,
+        file_path: str | None = None,
+        min_molecular_weight: float | int | None = None,
+        max_molecular_weight: float | int | None = None,
+        sequence: str | None = None,
+        limit: int | None = None,
+        offset: int | None = None,
+        select: list[str] | None = None,
+        sort: dict[str, str] | None = None,
+    ) -> dict:


The search_proteins method lacks a 'filter' parameter for additional filtering, while search_ligands has this parameter. This creates an API inconsistency between the two similar convenience methods. Users may need to pass custom filters for proteins similar to how they can for ligands. Consider adding a 'filter' parameter to search_proteins for consistency and flexibility.

Copilot · 2026-02-19T19:09:50Z

src/drug_discovery/structures/protein.py

+        it returns the existing protein data instead of creating a new one.
+
+        Args:
+            client: DeepOriginClient instance. If None, uses DeepOriginClient.get().
+
+        Returns:
+            Dictionary containing the created or existing protein data from the data platform.


The docstring states "Returns: Dictionary containing the created or existing protein data from the data platform" but the method actually returns None. The docstring should be corrected to indicate that this method returns None and instead updates self.id as a side effect.

Suggested change

it returns the existing protein data instead of creating a new one.

Args:

client: DeepOriginClient instance. If None, uses DeepOriginClient.get().

Returns:

Dictionary containing the created or existing protein data from the data platform.

it updates the current instance with the existing protein's ID instead of

creating a new one.

Args:

client: DeepOriginClient instance. If None, uses DeepOriginClient.get().

Returns:

None. As a side effect, uploads the protein (if necessary) and updates

``self.id`` with the ID of the existing or newly created protein record.

Copilot · 2026-02-19T19:09:51Z

src/drug_discovery/structures/ligand.py

+    @beartype
+    def sync(self, client: Optional[DeepOriginClient] = None) -> None:
+        """Sync the ligand to the data platform.
+
+        This method uploads the ligand file to remote storage (if available) and creates a ligand
+        record in the data platform. If a ligand with the same canonical_smiles already exists,
+        it returns the existing ligand data instead of creating a new one.
+
+        Args:
+            client: DeepOriginClient instance. If None, uses DeepOriginClient.get().
+
+        Note:
+            If the ligand was created from a SMILES string without an SDF file, only the SMILES
+            will be used for syncing (no file upload will occur).
+        """
+        if client is None:
+            client = DeepOriginClient.get()
+
+        # If ligand has a file_path, upload it to remote storage
+        # (Note: ligands in the data platform are identified by canonical_smiles, not file_path)
+        if self.file_path is not None:
+            # Upload the ligand file first
+            self.upload(client=client)
+
+        # Search for existing ligands by canonical_smiles
+        response = client.data.search_ligands(canonical_smiles=self.canonical_smiles)
+        data = response["data"]
+
+        # If a ligand with this canonical_smiles already exists, update self.id and return
+        if data:
+            existing_ligand = data[0]
+            if "id" in existing_ligand:
+                self.id = existing_ligand["id"]
+            return
+
+        # No existing ligand found, create a new one
+        # Prepare parameters for create_ligand
+        # Note: canonical_smiles is read-only and computed by the platform
+        kwargs: dict[str, Any] = {
+            "smiles": self.smiles if self.smiles is not None else self.canonical_smiles,
+        }
+
+        # Add optional fields if available
+        if self.name is not None:
+            kwargs["name"] = self.name
+
+        # Add computed molecular properties if mol is available
+        if self.mol is not None:
+            try:
+                kwargs["formal_charge"] = self.formal_charge
+                kwargs["molecular_weight"] = self.molecular_weight
+                kwargs["hbond_donor_count"] = self.hbond_donor_count
+                kwargs["hbond_acceptor_count"] = self.hbond_acceptor_count
+                kwargs["rotatable_bond_count"] = self.rotatable_bond_count
+                kwargs["tpsa"] = self.tpsa
+            except Exception:
+                # If property computation fails, continue without those properties
+                pass
+
+        # Call create_ligand through the client
+        result = client.data.create_ligand(**kwargs)
+
+        # Update self.id with the newly created ligand's ID
+        if "data" in result and "id" in result["data"]:
+            self.id = result["data"]["id"]


The newly added sync() method for Ligand is not covered by tests. Given that other methods in the same file have test coverage, this method should have tests to verify it correctly uploads files, searches for existing ligands by canonical_smiles, creates new ligands when needed, and properly updates the self.id attribute.

Copilot · 2026-02-19T19:09:51Z

src/drug_discovery/structures/protein.py

+    @beartype
+    def sync(self, client: Optional[DeepOriginClient] = None) -> None:
+        """Sync the protein to the data platform.
+
+        This method uploads the protein file to remote storage and creates a protein
+        record in the data platform. If a protein with the same file_path already exists,
+        it returns the existing protein data instead of creating a new one.
+
+        Args:
+            client: DeepOriginClient instance. If None, uses DeepOriginClient.get().
+
+        Returns:
+            Dictionary containing the created or existing protein data from the data platform.
+        """
+        if client is None:
+            client = DeepOriginClient.get()
+
+        # Upload the protein file first
+        self.upload(client=client)
+
+        # Use the remote path as the file_path
+        file_path = self._remote_path
+
+        # Search for existing proteins with the same file_path
+        response = client.data.search_proteins(file_path=file_path)
+        data = response["data"]
+
+        # If a protein with this file_path already exists, return the first one
+        if data:
+            existing_protein = data[0]
+            # Update self.id with the existing protein's ID
+            if "id" in existing_protein:
+                self.id = existing_protein["id"]
+            return
+
+        # No existing protein found, create a new one
+        # Prepare parameters for create_protein
+        kwargs: dict[str, Any] = {
+            "file_path": file_path,
+        }
+
+        # Pass pdb_id if available
+        if self.pdb_id is not None:
+            kwargs["pdb_id"] = self.pdb_id
+
+        kwargs["protein_length"] = self.length
+        kwargs["protein_name"] = self.name
+
+        # Call create_protein through the client
+        result = client.data.create_protein(**kwargs)
+
+        # Update self.id with the newly created protein's ID
+        if "data" in result and "id" in result["data"]:
+            self.id = result["data"]["id"]


The newly added sync() method for Protein is not covered by tests. Given that other methods in the same file have test coverage, this method should have tests to verify it correctly uploads files, searches for existing proteins by file_path, creates new proteins when needed, and properly updates the self.id attribute.

Copilot · 2026-02-19T19:09:51Z

src/drug_discovery/structures/protein.py

+        if self.pdb_id is not None:
+            kwargs["pdb_id"] = self.pdb_id
+
+        kwargs["protein_length"] = self.length


The sync() method unconditionally calls self.length (line 1356), which in turn accesses self.sequence, which requires self.file_path to be non-None. However, the Protein class allows file_path to be Optional[Path]. If a Protein instance is created without a file_path, calling sync() will fail with an error when trying to compute the length. Consider adding a check to ensure file_path is not None before calling self.length, or handle the case where length cannot be computed.

Suggested change

kwargs["protein_length"] = self.length

# Only compute and include protein_length when a local file_path is available

if getattr(self, "file_path", None) is not None:

kwargs["protein_length"] = self.length

Copilot · 2026-02-19T19:09:51Z

docs/platform/ref/data.md

+ligand = client.data.create_ligand(
+    project_id="\\x0011223344556677",
+    canonical_smiles="CCOc1ccc2nc(S(=O)(=O)N3CCN(CC3)C)c(N)c2c1",
+    inchi_key="BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
+    inchi="InChI=1S/C20H24N4O4S/.../h1-4,6-9H,5,10-14H2,(H,22,23)",
+    smiles="CCOc1ccc2nc(S(=O)(=O)N3CCN(CC3)C)c(N)c2c1",
+    name="Compound-12345",
+    formal_charge=0,
+    hbond_donor_count=1,
+    hbond_acceptor_count=6,
+    rotatable_bond_count=5,
+    tpsa=85.12,
+    molecular_weight=447.5,
+)


The documentation example shows parameters that are not accepted by the create_ligand method. The method does not accept 'canonical_smiles' or 'inchi_key' or 'inchi' as parameters - these are computed by the platform and returned in the response. Only 'smiles' should be passed as input. The example should be updated to only include valid parameters.

Copilot · 2026-02-19T19:09:52Z

src/drug_discovery/structures/protein.py

+    @property
+    def length(self) -> int:
+        """get the length of the protein structure"""
+        return sum([len(seq) for seq in self.sequence])


The newly added length property for Protein is not covered by tests. Given that other properties in the same file have test coverage, this property should have tests to verify it correctly sums the lengths of all sequences in the protein structure.

Copilot · 2026-02-19T19:09:52Z

src/platform/data.py

+            filter_dict = {"deleted": False}
+        else:
+            filter_dict = filter_dict.copy()
+            filter_dict["deleted"] = False


The search() method always sets filter_dict["deleted"] = False, which prevents users from searching for deleted entities even if they explicitly pass "deleted": True in their filter. While this might be the intended behavior for most use cases, it removes flexibility and could be unexpected. Consider either documenting this behavior clearly or allowing users to explicitly search for deleted entities when needed.

Suggested change

filter_dict = {"deleted": False}

else:

filter_dict = filter_dict.copy()

filter_dict["deleted"] = False

# By default, exclude deleted entities when no filter is provided.

filter_dict = {"deleted": False}

else:

# Work on a copy and only set a default for "deleted" if the caller

# did not explicitly provide it, allowing explicit searches for

# deleted entities when needed.

filter_dict = filter_dict.copy()

filter_dict.setdefault("deleted", False)

sg-s · 2026-02-19T19:38:51Z

src/drug_discovery/structures/entity.py

    This class manages the remote path and provides an upload method to ensure that the entity's file is uploaded to the remote storage if it does not already exist there. It uses the DeepOrigin FilesClient for remote file operations.
    """

+    id: str | None = field(default=None, kw_only=True)


all entities have a ID now. this is the ID in the data platform

sg-s · 2026-02-19T19:39:12Z

src/drug_discovery/structures/ligand.py

+    def sync(self, client: Optional[DeepOriginClient] = None) -> None:
+        """Sync the ligand to the data platform.
+
+        This method uploads the ligand file to remote storage (if available) and creates a ligand
+        record in the data platform. If a ligand with the same canonical_smiles already exists,


syncing syncs with the data platform/ufa

feat: initial support for data-platform

087a498

sg-s requested a review from a team as a code owner February 3, 2026 18:46

sg-s requested a review from Copilot February 3, 2026 18:46

sg-s self-assigned this Feb 3, 2026

sg-s had a problem deploying to dev February 3, 2026 18:46 — with GitHub Actions Failure

sg-s had a problem deploying to prod February 3, 2026 18:46 — with GitHub Actions Error

sg-s temporarily deployed to docs February 3, 2026 18:46 — with GitHub Actions Inactive

sg-s had a problem deploying to staging February 3, 2026 18:46 — with GitHub Actions Error

Copilot started reviewing on behalf of sg-s February 3, 2026 18:46 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

fix: add some tests

2deb710

sg-s temporarily deployed to docs February 8, 2026 22:04 — with GitHub Actions Inactive

sg-s had a problem deploying to dev February 8, 2026 22:04 — with GitHub Actions Error

sg-s had a problem deploying to staging February 8, 2026 22:04 — with GitHub Actions Error

sg-s had a problem deploying to prod February 8, 2026 22:04 — with GitHub Actions Failure

Merge branch 'main' into data-platform-api

8c4ccd3

sg-s had a problem deploying to dev February 9, 2026 00:41 — with GitHub Actions Error

sg-s temporarily deployed to docs February 9, 2026 00:41 — with GitHub Actions Inactive

sg-s had a problem deploying to staging February 9, 2026 00:41 — with GitHub Actions Error

sg-s had a problem deploying to prod February 9, 2026 00:41 — with GitHub Actions Failure

Merge branch 'main' into data-platform-api

320253e

sg-s deployed to docs February 11, 2026 13:50 — with GitHub Actions Active

sg-s had a problem deploying to prod February 11, 2026 13:50 — with GitHub Actions Error

sg-s had a problem deploying to dev February 11, 2026 13:50 — with GitHub Actions Error

sg-s had a problem deploying to staging February 11, 2026 13:50 — with GitHub Actions Failure

Merge branch 'main' into data-platform-api

b576db3

sg-s had a problem deploying to staging February 18, 2026 20:22 — with GitHub Actions Error

sg-s had a problem deploying to prod February 18, 2026 20:22 — with GitHub Actions Failure

sg-s had a problem deploying to dev February 18, 2026 20:22 — with GitHub Actions Error

sg-s had a problem deploying to docs February 18, 2026 20:22 — with GitHub Actions Failure

sg-s had a problem deploying to dev February 19, 2026 18:23 — with GitHub Actions Error

sg-s had a problem deploying to docs February 19, 2026 18:23 — with GitHub Actions Failure

sg-s had a problem deploying to staging February 19, 2026 18:23 — with GitHub Actions Error

sg-s had a problem deploying to prod February 19, 2026 18:23 — with GitHub Actions Failure

feat: ability to register a ligand

2a82f13

sg-s had a problem deploying to staging February 19, 2026 18:53 — with GitHub Actions Error

sg-s had a problem deploying to dev February 19, 2026 18:53 — with GitHub Actions Error

sg-s had a problem deploying to prod February 19, 2026 18:53 — with GitHub Actions Failure

sg-s had a problem deploying to docs February 19, 2026 18:53 — with GitHub Actions Failure

feat: ability to create ligands and proteins

23516bd

sg-s had a problem deploying to staging February 19, 2026 19:02 — with GitHub Actions Failure

sg-s had a problem deploying to docs February 19, 2026 19:02 — with GitHub Actions Failure

sg-s had a problem deploying to dev February 19, 2026 19:02 — with GitHub Actions Error

sg-s had a problem deploying to prod February 19, 2026 19:02 — with GitHub Actions Failure

sg-s requested review from Copilot and maurocolella February 19, 2026 19:03

Copilot started reviewing on behalf of sg-s February 19, 2026 19:03 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

feat: ability to create proteins from IDs

18938f8

sg-s deployed to prod February 19, 2026 19:36 — with GitHub Actions Active

sg-s had a problem deploying to prod February 19, 2026 19:36 — with GitHub Actions Failure

sg-s had a problem deploying to docs February 19, 2026 19:36 — with GitHub Actions Failure

sg-s had a problem deploying to staging February 19, 2026 19:36 — with GitHub Actions Error

sg-s had a problem deploying to dev February 19, 2026 19:36 — with GitHub Actions Error

feat: docs

b79fc2f

sg-s had a problem deploying to docs February 19, 2026 19:37 — with GitHub Actions Failure

sg-s had a problem deploying to prod February 19, 2026 19:37 — with GitHub Actions Error

sg-s had a problem deploying to dev February 19, 2026 19:37 — with GitHub Actions Error

sg-s had a problem deploying to staging February 19, 2026 19:37 — with GitHub Actions Failure

sg-s commented Feb 19, 2026

View reviewed changes

	filter: dict[str, Any] \| None = None,
	filter_dict: dict[str, Any] \| None = None,

Conversation

sg-s commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

changes

will not implement this pr

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

sg-s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

sg-s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

sg-s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

sg-s commented Feb 3, 2026 •

edited

Loading