You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* WIP loading weights once
* WIP loading weights once
Function correctly runs once
Created graphmodule loads weights properly
TODO: get wrapper function to successfully call generated GraphModule
* Adds necessary files to get load_weights working (fixup)
* Loading weights once works!!!
TODO: clean up before PR
* Clean up for PR part 1
* Add analysis of when to load_weights_once
TODO: clean up unused code
* Training flows now work by not loading weights first
TODO: address any feedback
* fixup
* Get Data Parallel working again
* Reset run_once count every time compilation occurs to fix multiple tests
run in one command
* Remove dead code in backend
Update comment for graph module analysis pass
* Address PR feedback, improves readability
* Updated comments
* Skip Falcon-7b test for now since it fails in Before Merge workflow but
passes in Run Tests
* Only load_once on end-to-end converted models to solve input caching
issue
"""Marks the GraphModule as either training forward, training backward, or inference (forward).
62
+
63
+
This relies on commonalities between training forward, backward, and inference graphs. Namely, backward passes call backward versions of the forward functions to calculate gradients. Training forward passes return inputs unchanged. Inference forward functions do neither of these. It would be cleaner if we could just use something like `torch.is_grad_enabled()` or `gm.training` instead, but these appear to be inaccurate by the time the GraphModule is passed to our backend.
64
+
65
+
:param gm: Graph module for the function being compiled.
66
+
:return: Pass result with the updated graph module with metadata indicating the type of graph being compiled.
# aligned_node_dict maps DataMoveSpec to aligned version of the node to prevent calling the same data movement twice
272
274
self.aligned_node_dict= {}
275
+
# marshaled_node_dict maps DataMoveSpec to index in the load_weights function that runs once. This is consumed once when adding data movement, and the DataMoveSpec will
276
+
# then be populated in the aligned_node_dict for further usage
# This will push all from_torch calls to the top of the forward function. This shouldn't impact performance, but it may impact memory usage since variables will be
519
586
# live longer than they would if from_torch calls occurred right before usage. If we start running out of DRAM or need to be more careful about memory usage, this
0 commit comments