Skip to content

Commit a341180

Browse files
committed
improve readme
1 parent 4dcad10 commit a341180

File tree

3 files changed

+73
-113
lines changed

3 files changed

+73
-113
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ help: ## Show this help message
77
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " %-15s %s\n", $$1, $$2}' $(MAKEFILE_LIST)
88

99
sync: ## Sync dependencies from lockfile
10-
uv sync --prerelease=allow
10+
uv sync
1111

1212
format: ## Format code with black and isort
1313
uv run black .

README.md

Lines changed: 70 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# MLRun Functions Hub
1+
# MLRun Hub
22

33
A centralized repository for open-source MLRun functions, modules, and steps that can be used as reusable components in ML pipelines.
44

@@ -20,7 +20,7 @@ A centralized repository for open-source MLRun functions, modules, and steps tha
2020

2121
Before you begin, ensure you have the following installed:
2222

23-
- **Python 3.10 or 3.11** - Required
23+
- **Python 3.10 or 3.11 (recommended) ** - Required
2424
- **UV** - Fast Python package manager (required)
2525
- **Git** - For version control
2626
- **Make** (optional) - For convenient command shortcuts
@@ -317,7 +317,6 @@ We follow **PEP 8** style guidelines with some modifications:
317317

318318
- **Line length**: 88 characters (Black default)
319319
- **Imports**: Sorted with isort
320-
- **Docstrings**: Google style or NumPy style
321320
- **Type hints**: Encouraged for function signatures
322321

323322
### Formatting Tools
@@ -351,29 +350,78 @@ uv run isort --check-only .
351350

352351
### Documentation Standards
353352

354-
- **Docstrings are mandatory** for all public functions, classes, and modules
353+
- **Docstrings are mandatory** for all public hub items
355354
- Use clear, concise descriptions
356355
- Include parameter types and return types
357356
- Provide usage examples when helpful
358357

359-
**Example:**
358+
**Example (function `auto_trainer`):**
360359
```python
361-
def train_model(data: pd.DataFrame, target_column: str, model_type: str = "sklearn") -> dict:
360+
def train(
361+
context: MLClientCtx,
362+
dataset: DataItem,
363+
model_class: str,
364+
label_columns: Optional[Union[str, List[str]]] = None,
365+
drop_columns: List[str] = None,
366+
model_name: str = "model",
367+
tag: str = "",
368+
sample_set: DataItem = None,
369+
test_set: DataItem = None,
370+
train_test_split_size: float = None,
371+
random_state: int = None,
372+
labels: dict = None,
373+
**kwargs,
374+
):
362375
"""
363-
Train a machine learning model on the provided dataset.
364-
365-
Args:
366-
data: Input DataFrame containing features and target
367-
target_column: Name of the target column
368-
model_type: Type of model to train (default: "sklearn")
369-
370-
Returns:
371-
Dictionary containing the trained model and metrics
372-
373-
Example:
374-
>>> result = train_model(df, "label", "sklearn")
375-
>>> print(result["accuracy"])
376-
0.95
376+
Training a model with the given dataset.
377+
378+
example::
379+
380+
import mlrun
381+
project = mlrun.get_or_create_project("my-project")
382+
project.set_function("hub://auto_trainer", "train")
383+
trainer_run = project.run(
384+
name="train",
385+
handler="train",
386+
inputs={"dataset": "./path/to/dataset.csv"},
387+
params={
388+
"model_class": "sklearn.linear_model.LogisticRegression",
389+
"label_columns": "label",
390+
"drop_columns": "id",
391+
"model_name": "my-model",
392+
"tag": "v1.0.0",
393+
"sample_set": "./path/to/sample_set.csv",
394+
"test_set": "./path/to/test_set.csv",
395+
"CLASS_solver": "liblinear",
396+
},
397+
)
398+
399+
:param context: MLRun context
400+
:param dataset: The dataset to train the model on. Can be either a URI or a FeatureVector
401+
:param model_class: The class of the model, e.g. `sklearn.linear_model.LogisticRegression`
402+
:param label_columns: The target label(s) of the column(s) in the dataset. for Regression or
403+
Classification tasks. Mandatory when dataset is not a FeatureVector.
404+
:param drop_columns: str or a list of strings that represent the columns to drop
405+
:param model_name: The model's name to use for storing the model artifact, default to 'model'
406+
:param tag: The model's tag to log with
407+
:param sample_set: A sample set of inputs for the model for logging its stats along the model in favour
408+
of model monitoring. Can be either a URI or a FeatureVector
409+
:param test_set: The test set to train the model with.
410+
:param train_test_split_size: if test_set was provided then this argument is ignored.
411+
Should be between 0.0 and 1.0 and represent the proportion of the dataset to include
412+
in the test split. The size of the Training set is set to the complement of this
413+
value. Default = 0.2
414+
:param random_state: Relevant only when using train_test_split_size.
415+
A random state seed to shuffle the data. For more information, see:
416+
https://scikit-learn.org/stable/glossary.html#term-random_state
417+
Notice that here we only pass integer values.
418+
:param labels: Labels to log with the model
419+
:param kwargs: Here you can pass keyword arguments with prefixes,
420+
that will be parsed and passed to the relevant function, by the following prefixes:
421+
- `CLASS_` - for the model class arguments
422+
- `FIT_` - for the `fit` function arguments
423+
- `TRAIN_` - for the `train` function (in xgb or lgbm train function - future)
424+
377425
"""
378426
# Implementation here
379427
```
@@ -392,10 +440,8 @@ def train_model(data: pd.DataFrame, target_column: str, model_type: str = "sklea
392440
pwd # Should be the project root
393441

394442
# Ensure dependencies are installed
395-
make install
443+
make sync
396444

397-
# Try running with full path
398-
python -m cli.cli --help
399445
```
400446

401447
#### Tests Failing
@@ -404,7 +450,7 @@ python -m cli.cli --help
404450

405451
**Solution:**
406452
```bash
407-
# Install test dependencies if the function has a requirements.txt
453+
# Install test dependencies if the item has a requirements.txt
408454
cd functions/src/your_function
409455
uv pip install -r requirements.txt
410456

pyproject.toml

Lines changed: 2 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,11 @@ version = "0.1.0"
44
description = "MLRun Hub - centralized location for open source contributions of mlrun hub components"
55
readme = "README.md"
66
requires-python = ">=3.10,<3.12"
7-
license = { text = "Apache-2.0" }
7+
license = { file = "LICENSE" }
88
authors = [
99
{ name = "MLRun Team" }
1010
]
11-
keywords = ["mlrun", "machine-learning", "serverless", "ml-ops"]
12-
classifiers = [
13-
"Development Status :: 4 - Beta",
14-
"Intended Audience :: Developers",
15-
"License :: OSI Approved :: Apache Software License",
16-
"Operating System :: OS Independent",
17-
"Programming Language :: Python :: 3.10",
18-
"Programming Language :: Python :: 3.11",
19-
"Topic :: Software Development :: Libraries",
20-
]
11+
keywords = ["mlrun", "marketplace"]
2112

2213
dependencies = [
2314
"wheel",
@@ -39,82 +30,9 @@ dependencies = [
3930
"sphinxcontrib-qthelp==1.0.7",
4031
]
4132

42-
[project.optional-dependencies]
43-
dev = [
44-
"urllib3>=1.26.20",
45-
"v3io-frames>=0.10.15",
46-
"GitPython~=3.1, >=3.1.41",
47-
"aiohttp~=3.11",
48-
"aiohttp-retry~=2.9",
49-
"click~=8.1",
50-
"nest-asyncio~=1.0",
51-
"ipython~=8.10",
52-
"nuclio-jupyter~=0.11.2",
53-
"numpy>=1.26.4, <1.27.0",
54-
"pandas>=1.2, <2.2",
55-
"pyarrow>=10.0, <18",
56-
"pyyaml>=6.0.2, <7",
57-
"requests~=2.32",
58-
"tabulate~=0.8.6",
59-
"v3io~=0.7.0",
60-
"pydantic>=1.10.15,<2",
61-
"protobuf>=3.13.0,<4",
62-
"mergedeep~=1.3",
63-
"semver~=3.0",
64-
"dependency-injector~=4.41",
65-
"fsspec>=2025.5.1, <=2025.7.0",
66-
"v3iofs~=0.1.17",
67-
"storey~=1.10.13",
68-
"inflection~=0.5.0",
69-
"python-dotenv~=1.0",
70-
"setuptools>=75.2",
71-
"deprecated~=1.2",
72-
"jinja2~=3.1, >=3.1.6",
73-
"orjson>=3.9.15, <4",
74-
"mlrun-pipelines-kfp-common~=0.5.8",
75-
"mlrun-pipelines-kfp-v1-8~=0.5.7",
76-
"docstring_parser~=0.16",
77-
"aiosmtplib~=3.0",
78-
"deepdiff>=8.6.1,<9.0.0",
79-
]
80-
8133
[project.scripts]
8234
mlrun-functions = "cli.cli:cli"
8335

84-
[build-system]
85-
requires = ["hatchling"]
86-
build-backend = "hatchling.build"
87-
88-
[dependency-groups]
89-
dev = [
90-
"urllib3>=1.26.20",
91-
"v3io-frames>=0.10.15",
92-
"GitPython~=3.1, >=3.1.41",
93-
"aiohttp~=3.11",
94-
"aiohttp-retry~=2.9",
95-
"click~=8.1",
96-
"nest-asyncio~=1.0",
97-
"ipython~=8.10",
98-
"nuclio-jupyter~=0.11.2",
99-
"numpy>=1.26.4, <1.27.0",
100-
"pandas>=1.2, <2.2",
101-
"pyarrow>=10.0, <18",
102-
"requests~=2.32",
103-
"tabulate~=0.8.6",
104-
"python-dotenv~=1.0",
105-
"setuptools>=75.2",
106-
"deprecated~=1.2",
107-
"orjson>=3.9.15, <4",
108-
"docstring_parser~=0.16",
109-
"aiosmtplib~=3.0",
110-
"deepdiff>=8.6.1,<9.0.0",
111-
# The following dependencies will be pulled by mlrun with correct versions
112-
# to avoid conflicts, we let mlrun manage them:
113-
# pyyaml, semver, v3io, pydantic, protobuf, mergedeep,
114-
# fsspec, v3iofs, storey, inflection, jinja2,
115-
# mlrun-pipelines-kfp-common, mlrun-pipelines-kfp-v1-8
116-
]
117-
11836
[tool.black]
11937
line-length = 88
12038
target-version = ["py310", "py311"]
@@ -128,7 +46,3 @@ include_trailing_comma = true
12846
force_grid_wrap = 0
12947
use_parentheses = true
13048
ensure_newline_before_comments = true
131-
132-
[tool.hatch.build.targets.wheel]
133-
packages = ["cli", "functions", "modules", "steps"]
134-

0 commit comments

Comments
 (0)