feat: parrallel evaluations #238

distributedstatemachine · 2025-04-13T11:17:41Z

Description

TODO

Live Test

Related Issue(s)

Closes #[issue number]

Type of Change

Feature (adding new functionality)
Fix (resolving a bug or issue)
Docs (documentation updates)
Refactor (code changes that don't affect functionality)
Maintenance (dependency updates or other maintenance)
Tests (adding or improving tests)
Breaking change (fix or feature with incompatible API changes)
Other: _____

Branch Naming

My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

My commits are small, atomic, and have proper commit messages
Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

I've performed a self-review of my code
I've added appropriate docstrings following the project's conventions
I've added proper logging where necessary (without trailing periods)
I've applied linting and formatting with Ruff
My code generates no new warnings

Testing

I've added tests for new functionality or bug fixes
All tests pass locally with my changes
Test coverage has not decreased

Documentation

I've updated documentation to reflect my changes
I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

epappas

much cleaner code than what we had before. I'm not sure if the parallelism will actually happen, for sure the evals will run concurrently, but the device is pinned to one cuda gpu as I read the code. Will the GPU context-switch? or am I thinning wrong?

neurons/validator.py

epappas · 2025-04-13T11:30:01Z

neurons/validator.py

+            # Process evaluation results and update scores (old evaluation logic)
+            for uid in evaluation_uids:
+                result = eval_results.get(uid)
+                if result is not None:


suggestion, might have removed some of the git effort if you would inverse the for loop

if result is None: continue ....code goes here, not wrapped in if

Yes. I think this makes the code so much readable as well. I had this in another PR that was not merged. But I think it helps ALOT to follow the happy path.

taking it out of evaluations, eventually we want it in something like weights ,and to also separate scoring from evals.

neurons/validator.py

epappas · 2025-04-13T11:37:00Z

neurons/validator.py

-                            f"validator/weights/{eval_uid}": self.weights[
-                                eval_uid
-                            ].item(),
+                            f"validator/slash/{uid}/score_before": old_score,


i think we're double logging this in in line 871?

epappas · 2025-04-13T11:37:02Z

neurons/validator.py

                        },
                        step=self.global_step,
                    )
-                    tplr.logger.info(
-                        f"{tplr.P(self.sync_window, tplr.T() - scoring_start)} Computed scores and weights"
+                    self.metrics_logger.log(


i think we're double logging this in in line 879?

src/tplr/evaluation.py

distributedstatemachine · 2025-04-13T11:53:55Z

much cleaner code than what we had before. I'm not sure if the parallelism will actually happen, for sure the evals will run concurrently, but the device is pinned to one cuda gpu as I read the code. Will the GPU context-switch? or am I thinning wrong?

@epappas

Yes you read right. We currently utilise the H100, so the expectation is that we do perform all the evals on 1 core, and not impose higher infra costs on validators

joellidin

I see some issues. Is this working right now in the current state? Looks like we should do ruff format as well. But when we get this in it would be so much easier to follow how the evaluation is done honestly.

joellidin · 2025-04-13T18:30:47Z

hparams.json

@@ -42,9 +42,10 @@
    "catch_up_threshold": 15,
    "catch_up_batch_size": 5,
    "catch_up_timeout": 300,
-    "uids_per_window": 2,
+    "uids_per_window": 4,
+    "hparams.parallel_eval_uids": 4,


Why hparams before?

Also curious what the reason to lower minimum peers was?

testing locally , should have put this in draft it wasnt ready

Why hparams before? i dont understand this , can you please rephrase your question

My bad. Why prefix parallel eval uids with hparams.

its how it currently works , we have a max uids to evals in the window, this tell us how many of them should be parrallel.

Why not call it parallel_eval_uids instead of hparams.parallel_eval_uids?

ecosystem.config.js

pyproject.toml

src/tplr/evaluation.py

src/tplr/r2_dataset.py

joellidin · 2025-04-13T18:52:43Z

neurons/validator.py

-                            input_ids = torch.tensor(batch, dtype=torch.long).to(
-                                model_own_data_eval.device
+                        # Evaluate sync metrics.
+                        sync_result = await self.evaluate_miner_sync(uid)


Probably we can move the sync scoring to the evaluation module as well?

joellidin · 2025-04-13T18:53:40Z

neurons/validator.py

+            # Process evaluation results and update scores (old evaluation logic)
+            for uid in evaluation_uids:
+                result = eval_results.get(uid)
+                if result is not None:


Yes. I think this makes the code so much readable as well. I had this in another PR that was not merged. But I think it helps ALOT to follow the happy path.

joellidin · 2025-04-17T10:53:13Z

neurons/validator.py

+                        self.weights = torch.zeros_like(self.final_moving_avg_scores)
+                        evaluated_mask = torch.zeros_like(
+                            self.final_moving_avg_scores, dtype=torch.bool
+                        )
+                        evaluated_mask[list(self.evaluated_uids)] = True
+                        positive_mask = (
+                            self.final_moving_avg_scores > 0
+                        ) & evaluated_mask
+
+                        if positive_mask.any():
+                            self.weights[positive_mask] = min_power_normalization(
+                                self.final_moving_avg_scores[positive_mask],
+                                power=self.hparams.power_normalisation,
+                            )
+                            weight_sum = self.weights.sum().item()
+                            tplr.logger.debug(f"Weight sum: {weight_sum}")
+                            if abs(weight_sum - 1.0) > 1e-6:
+                                tplr.logger.warning(
+                                    f"Weights sum to {weight_sum}, expected close to 1.0"
                                )
+                        else:
+                            tplr.logger.info(
+                                "No positive scores found, all weights set to 0"
+                            )


You can use the function you added in evaluation?

epappas · 2025-04-17T11:51:28Z

src/tplr/evaluation.py

+                outputs = model(input_ids=input_ids, labels=labels)
+            total_loss += outputs.loss.item()
+            count += 1
+            del input_ids, labels, outputs


i don't think the del and torch.cuda.empty_cache() will help on anything practical. mem release doesn't work as synchronous.

src/tplr/evaluation.py

feat: parrallel evaluations

c9aab44

distributedstatemachine changed the base branch from main to dev April 13, 2025 11:18

epappas reviewed Apr 13, 2025

View reviewed changes

distributedstatemachine added 3 commits April 13, 2025 11:55

chore: lints + update ecossytem.js

03958f1

chore: remove import

8f7244a

fix: hparam

b2a9494

joellidin requested changes Apr 13, 2025

View reviewed changes

distributedstatemachine marked this pull request as draft April 13, 2025 19:18

distributedstatemachine added 4 commits April 14, 2025 07:12

chore: review comments

377c3c9

fix: evaluation.py

dc0bba3

fix: evaluation , validator

59e8611

fix: update r2 dataset

d3a3e4e

joellidin reviewed Apr 17, 2025

View reviewed changes

epappas reviewed Apr 17, 2025

View reviewed changes

distributedstatemachine added 6 commits April 17, 2025 12:23

feat: batch tokenization

9124e08

fix: nom bin indicator

5050ffc

chore: ruff

759b59a

fix: per shard locks

fd8ad5a

chore: TYPES!?!?!

78081b6

chore: ruff

e05cca4

feat: parrallel evaluations #238

Are you sure you want to change the base?

feat: parrallel evaluations #238

Uh oh!

Conversation

distributedstatemachine commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODO

Related Issue(s)

Type of Change

Branch Naming

Commit Messages

Code Quality

Testing

Documentation

If this is a breaking change

Screenshots/Examples

Additional Notes

Uh oh!

epappas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

distributedstatemachine commented Apr 13, 2025

Uh oh!

joellidin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

distributedstatemachine commented Apr 13, 2025 •

edited

Loading