Chakra_FX: Generating Distributed ML Workload "Representation" from PyTorch Source Code, with only 1 GPU

This project aims to answer the question: ASTRA-sim, or many other tools can take in a Chakra graph and do many stuff (simulation, etc) with it. How do I (easily) get these Chakra graphs (without having to obtain hundreds of GPUs and running the workload on the GPUs)?

Here's the key idea: PyTorch's torch.compile traces the Python source code and creates a graph based representation (called FXGraphs) in compile time. We simply take that graph and convert it into a Chakra graph. While the default behavior is to provide this graph to low-level compilers (such as Inductor or NvFuser), PyTorch also exposes an API through which developers can write custom compilers (i.e. custom modifications to this graph). This repository writes a 'custom compiler' that, instead of compiling and giving the compiled result to PyTorch, simply takes the FX Graph, converts it into a Chakra Graph, stores it locally, and gracefully exits.

For a detailed setup guide, please refer to USER_GUIDE.md

Here is the directory structure:

profile_fxgraph.py #
|
--src
|
|
|

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.gitlab/merge_request_templates		.gitlab/merge_request_templates
.vscode		.vscode
configs		configs
eval_scripts		eval_scripts
minimal_repro		minimal_repro
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.base		Dockerfile.base
Dockerfile.postexec		Dockerfile.postexec
README.md		README.md
USER_GUIDE.md		USER_GUIDE.md
__init__.py		__init__.py
changes.patch		changes.patch
log.md		log.md
profile_fxgraph.py		profile_fxgraph.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
run_docker.sh		run_docker.sh
run_docker_pull.sh		run_docker_pull.sh
run_docker_pull_nomount.sh		run_docker_pull_nomount.sh
slurm.sh		slurm.sh
torchtitan.patch		torchtitan.patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chakra_FX: Generating Distributed ML Workload "Representation" from PyTorch Source Code, with only 1 GPU

About

Uh oh!

Releases

Packages

Languages

jinsun-yoo/chakra_fx

Folders and files

Latest commit

History

Repository files navigation

Chakra_FX: Generating Distributed ML Workload "Representation" from PyTorch Source Code, with only 1 GPU

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages