Skip to content

Add missing training support to onert Python API (experimental module)#15175

Merged
chunseoklee merged 2 commits intoSamsung:masterfrom
ragmani:onert/python/rest_training
Apr 17, 2025
Merged

Add missing training support to onert Python API (experimental module)#15175
chunseoklee merged 2 commits intoSamsung:masterfrom
ragmani:onert/python/rest_training

Conversation

@ragmani
Copy link
Copy Markdown
Contributor

@ragmani ragmani commented Apr 16, 2025

This commit integrate previously omitted modifications for training support in the onert Python API.

  • Expose new experimental training functionalities by updating the package’s public API:
    • Modified __init__.py to include the experimental submodule.
    • Added the experimental module, which imports train components and exposes TrainSession, traininfo, DataLoader, optimizer, losses, and metrics.
  • Implemented a flexible DataLoader in the experimental training module:
    • Supports input from file paths or NumPy arrays.
    • Handles loading of both .npy and raw binary files with configurable shapes and data types.
    • Includes batching logic and a split method for training/validation separation.
  • Improved training compiler behavior in TrainingCompiler.cc:
    • Adjusted the shape validation to accept unspecified dimensions (using ir::Shape::kUnspecifiedDim) in addition to dimensions of value 1.

ONE-DCO-1.0-Signed-off-by: ragmani ragmani0216@gmail.com

This commit integrate previously omitted modifications for training support in the onert Python API.
- Expose new experimental training functionalities by updating the package’s public API:
  - Modified `__init__.py` to include the `experimental` submodule.
  - Added the experimental module, which imports train components and exposes `TrainSession`, `traininfo`, `DataLoader`, `optimizer`, `losses`, and `metrics`.
- Implemented a flexible DataLoader in the experimental training module:
  - Supports input from file paths or NumPy arrays.
  - Handles loading of both .npy and raw binary files with configurable shapes and data types.
  - Includes batching logic and a split method for training/validation separation.
- Improved training compiler behavior in `TrainingCompiler.cc`:
  - Adjusted the shape validation to accept unspecified dimensions (using `ir::Shape::kUnspecifiedDim`) in addition to dimensions of value 1.

ONE-DCO-1.0-Signed-off-by: ragmani <ragmani0216@gmail.com>
@ragmani ragmani added the PR/ready for review It is ready to review. Please review it. label Apr 16, 2025
@ragmani ragmani requested a review from a team April 16, 2025 08:57
@ragmani
Copy link
Copy Markdown
Contributor Author

ragmani commented Apr 16, 2025

For #14505
Draft : #14492

@ragmani
Copy link
Copy Markdown
Contributor Author

ragmani commented Apr 16, 2025

I tested with python samples.

python3 runtime/onert/sample/minimal-python/experimental/src/train_with_dataset.py -m mobilenetv2 -i out/imagenet_a.test.input.100.bin -l out/imagenet_a.test.output.100.bin --data_length 100 --optimizer adam --loss cce --learning_rate 0.01 --batch_size 10 --validation_split=0.2
Load data
== training parameter ==
- learning_rate        = 0.01
- batch_size           = 10
- loss_info            = {loss = CategoricalCrossentropy, reduction = sum over batch size}
- optimizer            = Adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - Train time: 704.429ms/step - IO time: 0.067ms/step - Train Loss: 10.7749 - Validation Loss: 10.1255 - CategoricalAccuracy: 0.0000
Epoch 2/5 - Train time: 679.521ms/step - IO time: 0.059ms/step - Train Loss: 6.1418 - Validation Loss: 12.0664 - CategoricalAccuracy: 0.0000
Epoch 3/5 - Train time: 712.286ms/step - IO time: 0.060ms/step - Train Loss: 5.7052 - Validation Loss: 14.5072 - CategoricalAccuracy: 0.0000
Epoch 4/5 - Train time: 741.144ms/step - IO time: 0.056ms/step - Train Loss: 5.4454 - Validation Loss: 15.3301 - CategoricalAccuracy: 0.0000
Epoch 5/5 - Train time: 771.123ms/step - IO time: 0.073ms/step - Train Loss: 6.6274 - Validation Loss: 17.4566 - CategoricalAccuracy: 0.0000
===================================
MODEL_LOAD   takes 6.9752 ms
COMPILE      takes 274.4315 ms
EXECUTE      takes 29863.0246 ms
- Epoch 1      takes 5829.0490 ms
- Epoch 2      takes 5635.3495 ms
- Epoch 3      takes 5896.5318 ms
- Epoch 4      takes 6126.5195 ms
- Epoch 5      takes 6375.5748 ms
===================================
nnpackage mobilenetv2 trains successfully.
python3 runtime/onert/sample/minimal-python/experimental/src/train_step_with_dataset.py -m mobilenetv2 -i out/imagenet_a.test.input.100.bin -l out/imagenet_a.test.output.100.bin --data_length 100 --optimizer adam --loss cce --learning_rate 0.01 --batch_size 10
Load data
== training parameter ==
- learning_rate        = 0.01
- batch_size           = 10
- loss_info            = {loss = CategoricalCrossentropy, reduction = sum over batch size}
- optimizer            = Adam
- num_of_trainable_ops = -1
========================
Step 1/10 - Train time: 704.106 ms/step - Train Loss: 9.0140
Step 2/10 - Train time: 710.967 ms/step - Train Loss: 21.7883
Step 3/10 - Train time: 700.940 ms/step - Train Loss: 6.7287
Step 4/10 - Train time: 701.534 ms/step - Train Loss: 8.3510
Step 5/10 - Train time: 704.686 ms/step - Train Loss: 9.8825
Step 6/10 - Train time: 706.220 ms/step - Train Loss: 8.4540
Step 7/10 - Train time: 705.723 ms/step - Train Loss: 11.6198
Step 8/10 - Train time: 709.600 ms/step - Train Loss: 10.3613
Step 9/10 - Train time: 712.440 ms/step - Train Loss: 9.9402
Step 10/10 - Train time: 712.583 ms/step - Train Loss: 9.4323
===================================
Average Loss: 10.5572
CategoricalAccuracy: 0.0000
Average Time: 706.8799 ms/step
===================================
nnpackage mobilenetv2 trains successfully.

@glistening
Copy link
Copy Markdown
Contributor

@zetwhite Could you please review this PR?

@glistening glistening requested a review from zetwhite April 16, 2025 23:04
Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py Outdated
Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py
zetwhite
zetwhite previously approved these changes Apr 17, 2025
Copy link
Copy Markdown
Contributor

@zetwhite zetwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@zetwhite zetwhite requested a review from a team April 17, 2025 02:39
@ragmani ragmani force-pushed the onert/python/rest_training branch from 0e99b8e to c330797 Compare April 17, 2025 02:53
Comment on lines +142 to +147
array = np.frombuffer(data, dtype=dtype)
if array.size != expected_elements:
raise ValueError(
f"Raw data size does not match the expected shape: {shape}. "
f"Expected {expected_elements} elements, got {array.size} elements.")
return array.reshape(shape)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only keep f.read() in the with so the file is open just to load the bytes. Once data is in memory, np.frombuffer, size checks, and reshape work on that buffer(no open file needed) so they live outside the with.

Comment thread runtime/onert/api/python/package/experimental/train/dataloader.py
@chunseoklee chunseoklee merged commit 79ab937 into Samsung:master Apr 17, 2025
10 checks passed
@ragmani ragmani deleted the onert/python/rest_training branch April 17, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR/ready for review It is ready to review. Please review it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants