Add missing training support to onert Python API (experimental module) by ragmani · Pull Request #15175 · Samsung/ONE

ragmani · 2025-04-16T08:57:37Z

This commit integrate previously omitted modifications for training support in the onert Python API.

Expose new experimental training functionalities by updating the package’s public API:
- Modified __init__.py to include the experimental submodule.
- Added the experimental module, which imports train components and exposes TrainSession, traininfo, DataLoader, optimizer, losses, and metrics.
Implemented a flexible DataLoader in the experimental training module:
- Supports input from file paths or NumPy arrays.
- Handles loading of both .npy and raw binary files with configurable shapes and data types.
- Includes batching logic and a split method for training/validation separation.
Improved training compiler behavior in TrainingCompiler.cc:
- Adjusted the shape validation to accept unspecified dimensions (using ir::Shape::kUnspecifiedDim) in addition to dimensions of value 1.

ONE-DCO-1.0-Signed-off-by: ragmani ragmani0216@gmail.com

This commit integrate previously omitted modifications for training support in the onert Python API. - Expose new experimental training functionalities by updating the package’s public API: - Modified `__init__.py` to include the `experimental` submodule. - Added the experimental module, which imports train components and exposes `TrainSession`, `traininfo`, `DataLoader`, `optimizer`, `losses`, and `metrics`. - Implemented a flexible DataLoader in the experimental training module: - Supports input from file paths or NumPy arrays. - Handles loading of both .npy and raw binary files with configurable shapes and data types. - Includes batching logic and a split method for training/validation separation. - Improved training compiler behavior in `TrainingCompiler.cc`: - Adjusted the shape validation to accept unspecified dimensions (using `ir::Shape::kUnspecifiedDim`) in addition to dimensions of value 1. ONE-DCO-1.0-Signed-off-by: ragmani <ragmani0216@gmail.com>

ragmani · 2025-04-16T08:58:16Z

For #14505
Draft : #14492

ragmani · 2025-04-16T09:00:51Z

I tested with python samples.

python3 runtime/onert/sample/minimal-python/experimental/src/train_with_dataset.py -m mobilenetv2 -i out/imagenet_a.test.input.100.bin -l out/imagenet_a.test.output.100.bin --data_length 100 --optimizer adam --loss cce --learning_rate 0.01 --batch_size 10 --validation_split=0.2
Load data
== training parameter ==
- learning_rate        = 0.01
- batch_size           = 10
- loss_info            = {loss = CategoricalCrossentropy, reduction = sum over batch size}
- optimizer            = Adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - Train time: 704.429ms/step - IO time: 0.067ms/step - Train Loss: 10.7749 - Validation Loss: 10.1255 - CategoricalAccuracy: 0.0000
Epoch 2/5 - Train time: 679.521ms/step - IO time: 0.059ms/step - Train Loss: 6.1418 - Validation Loss: 12.0664 - CategoricalAccuracy: 0.0000
Epoch 3/5 - Train time: 712.286ms/step - IO time: 0.060ms/step - Train Loss: 5.7052 - Validation Loss: 14.5072 - CategoricalAccuracy: 0.0000
Epoch 4/5 - Train time: 741.144ms/step - IO time: 0.056ms/step - Train Loss: 5.4454 - Validation Loss: 15.3301 - CategoricalAccuracy: 0.0000
Epoch 5/5 - Train time: 771.123ms/step - IO time: 0.073ms/step - Train Loss: 6.6274 - Validation Loss: 17.4566 - CategoricalAccuracy: 0.0000
===================================
MODEL_LOAD   takes 6.9752 ms
COMPILE      takes 274.4315 ms
EXECUTE      takes 29863.0246 ms
- Epoch 1      takes 5829.0490 ms
- Epoch 2      takes 5635.3495 ms
- Epoch 3      takes 5896.5318 ms
- Epoch 4      takes 6126.5195 ms
- Epoch 5      takes 6375.5748 ms
===================================
nnpackage mobilenetv2 trains successfully.

python3 runtime/onert/sample/minimal-python/experimental/src/train_step_with_dataset.py -m mobilenetv2 -i out/imagenet_a.test.input.100.bin -l out/imagenet_a.test.output.100.bin --data_length 100 --optimizer adam --loss cce --learning_rate 0.01 --batch_size 10
Load data
== training parameter ==
- learning_rate        = 0.01
- batch_size           = 10
- loss_info            = {loss = CategoricalCrossentropy, reduction = sum over batch size}
- optimizer            = Adam
- num_of_trainable_ops = -1
========================
Step 1/10 - Train time: 704.106 ms/step - Train Loss: 9.0140
Step 2/10 - Train time: 710.967 ms/step - Train Loss: 21.7883
Step 3/10 - Train time: 700.940 ms/step - Train Loss: 6.7287
Step 4/10 - Train time: 701.534 ms/step - Train Loss: 8.3510
Step 5/10 - Train time: 704.686 ms/step - Train Loss: 9.8825
Step 6/10 - Train time: 706.220 ms/step - Train Loss: 8.4540
Step 7/10 - Train time: 705.723 ms/step - Train Loss: 11.6198
Step 8/10 - Train time: 709.600 ms/step - Train Loss: 10.3613
Step 9/10 - Train time: 712.440 ms/step - Train Loss: 9.9402
Step 10/10 - Train time: 712.583 ms/step - Train Loss: 9.4323
===================================
Average Loss: 10.5572
CategoricalAccuracy: 0.0000
Average Time: 706.8799 ms/step
===================================
nnpackage mobilenetv2 trains successfully.

glistening · 2025-04-16T23:04:23Z

@zetwhite Could you please review this PR?

zetwhite

LGTM 👍

ragmani · 2025-04-17T03:03:24Z

+        array = np.frombuffer(data, dtype=dtype)
+        if array.size != expected_elements:
+            raise ValueError(
+                f"Raw data size does not match the expected shape: {shape}. "
+                f"Expected {expected_elements} elements, got {array.size} elements.")
+        return array.reshape(shape)


I only keep f.read() in the with so the file is open just to load the bytes. Once data is in memory, np.frombuffer, size checks, and reshape work on that buffer(no open file needed) so they live outside the with.

ragmani added the PR/ready for review It is ready to review. Please review it. label Apr 16, 2025

ragmani requested a review from a team April 16, 2025 08:57

glistening requested a review from zetwhite April 16, 2025 23:04

zetwhite reviewed Apr 17, 2025

View reviewed changes

Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py Outdated

zetwhite reviewed Apr 17, 2025

View reviewed changes

Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py

zetwhite previously approved these changes Apr 17, 2025

View reviewed changes

zetwhite requested a review from a team April 17, 2025 02:39

ragmani dismissed zetwhite’s stale review via 0e99b8e April 17, 2025 02:44

Add typing annotations to DataLoader

c330797

ragmani force-pushed the onert/python/rest_training branch from 0e99b8e to c330797 Compare April 17, 2025 02:53

ragmani commented Apr 17, 2025

View reviewed changes

zetwhite approved these changes Apr 17, 2025

View reviewed changes

chunseoklee reviewed Apr 17, 2025

View reviewed changes

Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py

chunseoklee approved these changes Apr 17, 2025

View reviewed changes

chunseoklee merged commit 79ab937 into Samsung:master Apr 17, 2025
10 checks passed

ragmani deleted the onert/python/rest_training branch April 17, 2025 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing training support to onert Python API (experimental module)#15175

Add missing training support to onert Python API (experimental module)#15175
chunseoklee merged 2 commits intoSamsung:masterfrom
ragmani:onert/python/rest_training

ragmani commented Apr 16, 2025

Uh oh!

ragmani commented Apr 16, 2025

Uh oh!

ragmani commented Apr 16, 2025

Uh oh!

glistening commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

zetwhite left a comment

Uh oh!

ragmani Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ragmani commented Apr 16, 2025

Uh oh!

ragmani commented Apr 16, 2025

Uh oh!

ragmani commented Apr 16, 2025

Uh oh!

glistening commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

zetwhite left a comment

Choose a reason for hiding this comment

Uh oh!

ragmani Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants