Skip to content

Training data ontology#76

Merged
PythonFZ merged 23 commits into
mainfrom
75-add-pydantic-ontology-for-model-training-data
Jun 9, 2025
Merged

Training data ontology#76
PythonFZ merged 23 commits into
mainfrom
75-add-pydantic-ontology-for-model-training-data

Conversation

@PythonFZ

@PythonFZ PythonFZ commented Jun 5, 2025

Copy link
Copy Markdown
Collaborator
  • presets for common datasets
  • way to define new datasets
  • warning if values disagree, unless both are None
  • levels of compatibility: different datasets, different dft settings, same
  • somehow store / pass the data from the calculator into the data that was processed

instead of developing a new theme, look into

@PythonFZ PythonFZ linked an issue Jun 5, 2025 that may be closed by this pull request
@sandipde

sandipde commented Jun 9, 2025

Copy link
Copy Markdown
Member

@PythonFZ, writing the following points so that we don't forget

  • add QuantumEspresso among code options
  • need to add OC dataset descriptions
  • some datasets use spinpol , some don't we need to introduce this option. within DFT settings SpinPol=True | False could be one option
  • As most of the dataset uses one or the other workflows and most likely the full settings can be found in online repo. we can try to add a setting_ref = URL within the dft settings as well. this keyword ofcourse will not be used for comparisons.

@PythonFZ PythonFZ marked this pull request as ready for review June 9, 2025 12:39
@PythonFZ PythonFZ requested a review from Copilot June 9, 2025 12:42

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a training data ontology for MLIPX by adding new test cases, enhancing spec comparison capabilities, and providing VS Code schema integration. Key changes include:

  • New tests for MLIP spec comparisons and relaxation comparisons.
  • Expanded MLIP specification support with new YAML files and improved spec resolving.
  • A new CLI command to install VS Code schemas and an update to dependency versions.

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_spec.py Added tests to validate MLIP spec comparisons and dataset resolution.
tests/test_relax_compare.py Introduced tests for comparing relaxation nodes and warning generation.
tests/conftest.py Added a temporary project fixture for DVC integration tests.
pyproject.toml Updated version and added a pydantic dependency.
mlipx/spec/spec.py Expanded MLIP spec definitions and dataset loader functionality.
mlipx/spec/compare.py Implemented recursive spec comparison with metadata stripping.
mlipx/nodes/structure_optimization.py Integrated spec comparison into node comparison logic.
mlipx/nodes/generic_ase.py Added a spec field and get_spec method to support YAML-based MLIP specs.
mlipx/cli/main.py Introduced an install_vscode_schema command for VS Code integration.
mlipx/abc.py Added docstrings for the new get_spec protocol method.
.vscode/settings.json Configured YAML schema paths for MLIP specs.
.github/workflows/pytest.yaml Added a CI workflow for testing across multiple Python versions.
docs/source/contributing.rst Updated documentation to include tips for training data metadata.
mlipx/init.pyi Exported the new spec module in the public API.
Comments suppressed due to low confidence (1)

mlipx/cli/main.py:202

  • The install_vscode_schema command uses json.dumps but does not import the json module. Adding 'import json' at the top of the file should resolve this issue.
mlips_schema_path.write_text(json.dumps(MLIPS.model_json_schema(), indent=2))

Comment thread mlipx/nodes/generic_ase.py
@PythonFZ

PythonFZ commented Jun 9, 2025

Copy link
Copy Markdown
Collaborator Author

@sandipde The PR at this point does not contain different levels of compatibility. This is something I would like to spent a little more time on how to add it. Besides this, I think the main structure is there and the compare functionality for the StructureOptimization has been adapted. All other compare do not make use of the new feature yet.

I'd like to implement them in seperate PRs later on and get this version into main asap.

Please let me know if you think something important is missing that should be added to this version now.

@PythonFZ PythonFZ requested a review from sandipde June 9, 2025 12:46
@PythonFZ PythonFZ linked an issue Jun 9, 2025 that may be closed by this pull request
@sandipde

sandipde commented Jun 9, 2025

Copy link
Copy Markdown
Member

@PythonFZ fine by me.

@PythonFZ PythonFZ merged commit 31902fa into main Jun 9, 2025
4 checks passed
@PythonFZ PythonFZ deleted the 75-add-pydantic-ontology-for-model-training-data branch June 9, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add pydantic ontology for model training data CI testing

3 participants