Skip to content

feat: parrallel evaluations #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: dev
Choose a base branch
from
Draft

Conversation

distributedstatemachine
Copy link
Member

@distributedstatemachine distributedstatemachine commented Apr 13, 2025

Description

TODO

  • Live Test

Related Issue(s)

  • Closes #[issue number]

Type of Change

  • Feature (adding new functionality)
  • Fix (resolving a bug or issue)
  • Docs (documentation updates)
  • Refactor (code changes that don't affect functionality)
  • Maintenance (dependency updates or other maintenance)
  • Tests (adding or improving tests)
  • Breaking change (fix or feature with incompatible API changes)
  • Other: _____

Branch Naming

  • My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

  • My commits are small, atomic, and have proper commit messages
  • Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

  • I've performed a self-review of my code
  • I've added appropriate docstrings following the project's conventions
  • I've added proper logging where necessary (without trailing periods)
  • I've applied linting and formatting with Ruff
  • My code generates no new warnings

Testing

  • I've added tests for new functionality or bug fixes
  • All tests pass locally with my changes
  • Test coverage has not decreased

Documentation

  • I've updated documentation to reflect my changes
  • I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

@distributedstatemachine distributedstatemachine changed the base branch from main to dev April 13, 2025 11:18
Copy link
Contributor

@epappas epappas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner code than what we had before. I'm not sure if the parallelism will actually happen, for sure the evals will run concurrently, but the device is pinned to one cuda gpu as I read the code. Will the GPU context-switch? or am I thinning wrong?

# Process evaluation results and update scores (old evaluation logic)
for uid in evaluation_uids:
result = eval_results.get(uid)
if result is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion, might have removed some of the git effort if you would inverse the for loop

if result is None:
    continue
....code goes here, not wrapped in if

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think this makes the code so much readable as well. I had this in another PR that was not merged. But I think it helps ALOT to follow the happy path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking it out of evaluations, eventually we want it in something like weights ,and to also separate scoring from evals.

f"validator/weights/{eval_uid}": self.weights[
eval_uid
].item(),
f"validator/slash/{uid}/score_before": old_score,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we're double logging this in in line 871?

},
step=self.global_step,
)
tplr.logger.info(
f"{tplr.P(self.sync_window, tplr.T() - scoring_start)} Computed scores and weights"
self.metrics_logger.log(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we're double logging this in in line 879?

@distributedstatemachine
Copy link
Member Author

much cleaner code than what we had before. I'm not sure if the parallelism will actually happen, for sure the evals will run concurrently, but the device is pinned to one cuda gpu as I read the code. Will the GPU context-switch? or am I thinning wrong?

@epappas

Yes you read right. We currently utilise the H100, so the expectation is that we do perform all the evals on 1 core, and not impose higher infra costs on validators

Copy link
Collaborator

@joellidin joellidin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some issues. Is this working right now in the current state? Looks like we should do ruff format as well. But when we get this in it would be so much easier to follow how the evaluation is done honestly.

@@ -42,9 +42,10 @@
"catch_up_threshold": 15,
"catch_up_batch_size": 5,
"catch_up_timeout": 300,
"uids_per_window": 2,
"uids_per_window": 4,
"hparams.parallel_eval_uids": 4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hparams before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious what the reason to lower minimum peers was?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing locally , should have put this in draft it wasnt ready

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hparams before? i dont understand this , can you please rephrase your question

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Why prefix parallel eval uids with hparams.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its how it currently works , we have a max uids to evals in the window, this tell us how many of them should be parrallel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call it parallel_eval_uids instead of hparams.parallel_eval_uids?

input_ids = torch.tensor(batch, dtype=torch.long).to(
model_own_data_eval.device
# Evaluate sync metrics.
sync_result = await self.evaluate_miner_sync(uid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can move the sync scoring to the evaluation module as well?

# Process evaluation results and update scores (old evaluation logic)
for uid in evaluation_uids:
result = eval_results.get(uid)
if result is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think this makes the code so much readable as well. I had this in another PR that was not merged. But I think it helps ALOT to follow the happy path.

@distributedstatemachine distributedstatemachine marked this pull request as draft April 13, 2025 19:18
Comment on lines +884 to +907
self.weights = torch.zeros_like(self.final_moving_avg_scores)
evaluated_mask = torch.zeros_like(
self.final_moving_avg_scores, dtype=torch.bool
)
evaluated_mask[list(self.evaluated_uids)] = True
positive_mask = (
self.final_moving_avg_scores > 0
) & evaluated_mask

if positive_mask.any():
self.weights[positive_mask] = min_power_normalization(
self.final_moving_avg_scores[positive_mask],
power=self.hparams.power_normalisation,
)
weight_sum = self.weights.sum().item()
tplr.logger.debug(f"Weight sum: {weight_sum}")
if abs(weight_sum - 1.0) > 1e-6:
tplr.logger.warning(
f"Weights sum to {weight_sum}, expected close to 1.0"
)
else:
tplr.logger.info(
"No positive scores found, all weights set to 0"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the function you added in evaluation?

outputs = model(input_ids=input_ids, labels=labels)
total_loss += outputs.loss.item()
count += 1
del input_ids, labels, outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the del and torch.cuda.empty_cache() will help on anything practical. mem release doesn't work as synchronous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants