Skip to content

[onert/python] Introduce eval_step#15198

Merged
hseok-oh merged 1 commit intoSamsung:masterfrom
ragmani:onert/python/train_flag
Apr 24, 2025
Merged

[onert/python] Introduce eval_step#15198
hseok-oh merged 1 commit intoSamsung:masterfrom
ragmani:onert/python/train_flag

Conversation

@ragmani
Copy link
Copy Markdown
Contributor

@ragmani ragmani commented Apr 22, 2025

This commit introduce eval_step and common batch logic into _batch_step.

  • Introduce private helper _batch_step(inputs, expecteds, update_weights) to handle binding inputs, running session, collecting losses/metrics, and timing
  • Update train_step to call _batch_step(..., update_weights=True) and expose "train_time"
  • Add eval_step for forward‑only inference, calling _batch_step(..., update_weights=False) and exposing "eval_time"

For #15172 (comment)

ONE-DCO-1.0-Signed-off-by: ragmani ragmani0216@gmail.com

@ragmani ragmani added the area/onert ONE runtime label Apr 22, 2025
@ragmani ragmani requested a review from a team April 22, 2025 04:16
def train_step(self,
inputs: List[np.ndarray],
expecteds: List[np.ndarray],
train: bool = True) -> Dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question,
if train is false, what is the operation of this function (train_step)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It acts like an “eval step", we can get losses and metrics but the model’s weights stay unchanged :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird to me, because train function acts like a eval() function in some case.
But I'm not sure this is normal in these kind of codes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bit confusing to have the train_step() function behave differently based on the train flag. In standard practice, it's better to separate the concerns by defining two distinct functions:

  • train_step(): This is for the training phase, where weights are updated.

  • eval_step(): This function is for inference or validation, where no weight updates happen and you likely want to disable gradients (using torch.no_grad()).

This mirrors how frameworks like PyTorch and PyTorch Lightning handle the training and evaluation phases. In PyTorch, model.train() and model.eval() toggle the mode for training and inference, respectively, and it's common to have separate functions for each phase. Instead of toggling behavior with a flag inside train_step(), it's cleaner to have separate functions to keep things clear and consistent.

Copy link
Copy Markdown
Contributor Author

@ragmani ragmani Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this PR to introduce eval_step() instead of adding the train flag.

@ragmani ragmani force-pushed the onert/python/train_flag branch from 2f505d0 to c704dc7 Compare April 22, 2025 05:48
@ragmani ragmani added the PR/ready for review It is ready to review. Please review it. label Apr 22, 2025
@ragmani ragmani changed the title [onert/python] Add train flag to train_step [onert/python] Introduce eval_step Apr 22, 2025
@ragmani ragmani force-pushed the onert/python/train_flag branch from c704dc7 to d957418 Compare April 22, 2025 05:59
This commit introduce `eval_step` and common batch logic into `_batch_step`.
- Introduce private helper `_batch_step(inputs, expecteds, update_weights)` to handle binding inputs, running session, collecting losses/metrics, and timing
- Update `train_step` to call `_batch_step(..., update_weights=True)` and expose `"train_time"`
- Add `eval_step` for forward‑only inference, calling `_batch_step(..., update_weights=False)` and exposing `"eval_time"`

ONE-DCO-1.0-Signed-off-by: ragmani <ragmani0216@gmail.com>
@ragmani ragmani force-pushed the onert/python/train_flag branch from d957418 to ed6eb05 Compare April 22, 2025 06:00
Copy link
Copy Markdown
Contributor

@seockho-kim seockho-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hseok-oh hseok-oh merged commit a71f0cd into Samsung:master Apr 24, 2025
10 checks passed
@ragmani ragmani deleted the onert/python/train_flag branch April 24, 2025 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/onert ONE runtime PR/ready for review It is ready to review. Please review it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants