Skip to content

[Perf] Add ti.perf_dispatch#356

Open
hughperkins wants to merge 31 commits intomainfrom
hp/perf-dispatch
Open

[Perf] Add ti.perf_dispatch#356
hughperkins wants to merge 31 commits intomainfrom
hp/perf-dispatch

Conversation

@hughperkins
Copy link
Collaborator

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

else:
assert c[ImplEnum.a_shape0_ge2] == 1
assert c[ImplEnum.a_shape0_lt2] == 0
assert c[ImplEnum.a_shape0_ge2] == 0
Copy link
Collaborator

@erizmr erizmr Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a typo here assert c[ImplEnum.a_shape0_ge2] == 0 and assert c[ImplEnum.a_shape0_ge2] == 1? Is it c[ImplEnum.serial] == 0? As after warming up, we only want to run the fastest one given the geometry hash

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good spot. And also led me to notice this bit of code wasnt being called, and my loop was too short. And when I made it longer, I noticed this PR assumed idmpotent kernels. 😅 Anyway, fixed now, in fec2372

Copy link
Collaborator

@erizmr erizmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments.

@hughperkins
Copy link
Collaborator Author

hughperkins commented Feb 5, 2026

oh, hmmm, this is broken, because our functions are not idempotent. Will need to fix the scheduling.

Fixed

least_trials_idx = None
least_trials = None
for underlying_idx in compatible.keys():
trial_count = self._trial_count_by_underlying_idx_by_geometry_hash[geometry_hash].get(underlying_idx, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, self._trial_count_by_underlying_idx_by_geometry_hash[geometry_hash] should be list (initialised with 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I disagree fairly strongly here, because then if the list mutates in some way, we retrieve the wrong count.

Copy link
Contributor

@duburcqa duburcqa Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I disagree fairly strongly here, because then if the list mutates in some way, we retrieve the wrong count.

You don't. When you register a new function you are supposed to expend this list of course. Same if you delete some (which is not supported but still).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the trial counts, if they are a list, could become out of sync. strongly prefer to make them a dict, indexed by kernel index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the trial counts, if they are a list, could become out of sync.

Why? I don't see any reason for this. You have a register method that is the only entrypoint for users. it is easily to make sure that everything is in sync.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldnt the kernel have to be hashable to use it as a key? 🤔

Copy link
Contributor

@duburcqa duburcqa Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Still, any python native function or custom class instance can be used as dict key:

def fun():
    pass
    
class Class:
    pass

cls = Class()

d = {fun: 2, cls: 1}
assert d[fun] == 2 * d[cls]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. how does this work? using the id? hashing the object? something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood, dict is using the output of hash() to build the hashing table when inserting a new item, then equality comparison to make sure it has retrieved the expected object. The default hash for python objects is (id >> 4) | ((id << 60) & 0xffffffffffffffff) (see https://stackoverflow.com/a/38519187/4820605).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it's hashing on id, which I explicitly would like to avoid please, since multiple objects can have identical ids, over time.

Comment on lines +65 to +66
assert c[ImplEnum.a_shape0_ge2] == 1
assert c[ImplEnum.a_shape0_lt2] == 0
Copy link
Contributor

@duburcqa duburcqa Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend making my_func1_impl_a_shape0_lt_2 a better candidate (faster) than my_func1_impl_a_shape0_ge_2 if it was compatible, just to make sure the compatibility filter is doing the job.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ok. migrating to time.sleep will make this both easier, and more obvoius.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addresseed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no it is not addressed. I said my_func1_impl_a_shape0_lt_2 a better candidate (faster) than my_func1_impl_a_shape0_ge_2 if it was compatible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in affd537

for b in range(B):
a[b] = a[b] * b
c[ImplEnum.serial] = 1
time.sleep(0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.sleep inside kernels is supported ?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting question. Let me migrate to use linear congruential generator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in affd537

Comment on lines +136 to +148
We collect a single sample from each implementation, and compare that single sample with the samples from the
other implementations.

We are comparing algorithms based on empirical runtime.

Note that for best results, sets of input arguments that have different runtimes should map to different
geometries, otherwise the comparison between runtimes might not be fair, and an inappropriate implementation
kernel might be selected.

We are not implementing an epsilon-greedy algorithm to keep sampling non-fastest variants just in case the
distribution is shifting over time.

It is not possible for you to control exploration vs exploitation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information is there. Not enjoyable to read due to both formatting and wording but at least it is there x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants