- 49 files · ~92,616 words
- Verdict: corpus is large enough that graph structure adds value.
- 285 nodes · 340 edges · 53 communities detected
- Extraction: 81% EXTRACTED · 19% INFERRED · 0% AMBIGUOUS
- Token cost: 6,000 input · 3,500 output
Value- 15 edgesTraining Script- 11 edgesGPT- 9 edgesLayer- 8 edgesCharDataset- 7 edgesAdditionDataset- 7 edgesCfgNode- 7 edgesEncoder- 7 edgesNeuron- 7 edgesFlashAttention Algorithm- 7 edges
from_pretrained()--calls-->get_default_config()[INFERRED] /home/safi/graphify-benchmark/repos/nanoGPT/model.py → /home/safi/graphify-benchmark/repos/minGPT/mingpt/model.pyget_batch()--conceptually_related_to-->get_batch()[INFERRED] /home/safi/graphify-benchmark/repos/nanoGPT/train.py → /home/safi/graphify-benchmark/repos/nanoGPT/bench.pyTraining Script--produces-->GPTConfig Dataclass[INFERRED] repos/nanoGPT/train.py → repos/nanoGPT/model.pyGPT Language Model (minGPT)--conceptually_related_to-->GPT Model Class[INFERRED] repos/minGPT/mingpt/model.py → repos/nanoGPT/model.pyCausalSelfAttention (minGPT)--conceptually_related_to-->CausalSelfAttention Module[INFERRED] repos/minGPT/mingpt/model.py → repos/nanoGPT/model.py
Cohesion: 0.11 Nodes (12): dataclasses, inspect, Block, CausalSelfAttention, from_pretrained(), get_default_config(), GPT, GPTConfig (+4 more)
Cohesion: 0.12 Nodes (17): batch_end_callback(), eval_split(), get_config(), get_default_config(), get_config(), get_default_config(), collections, mingpt_bpe (+9 more)
Cohesion: 0.13 Nodes (15): get_batch(), contextlib, datasets, math, numpy, os, pickle, tiktoken (+7 more)
Cohesion: 0.1 Nodes (22): Benchmarking Script, Config: Finetune GPT-2-XL on Shakespeare, Config: Train GPT-2 (124M), Config: Train Character-Level Shakespeare, Configurator (exec-based Override System), OpenWebText Data Preparation, Shakespeare Char-Level Data Preparation, Shakespeare (BPE) Data Preparation (+14 more)
Cohesion: 0.13 Nodes (6): micrograd_engine, Layer, MLP, Module, Neuron, random
Cohesion: 0.12 Nodes (21): FlashAttention Algorithm, GPU HBM vs On-Chip SRAM Memory Hierarchy, FlashAttention: Fast Memory-Efficient Attention, Selective Gradient Checkpointing (Recomputation), Result: 15% faster BERT-large vs MLPerf, Result: 3x GPT-2 training speedup, Tiling for Attention Computation, Self-Attention Mechanism (Q, K, V) (+13 more)
Cohesion: 0.19 Nodes (8): BPETokenizer, bytes_to_unicode(), Encoder, get_encoder(), get_file(), get_pairs(), regex, requests
Cohesion: 0.12 Nodes (1): Value
Cohesion: 0.18 Nodes (5): ast, json, sys, CfgNode, setup_logging()
Cohesion: 0.15 Nodes (3): AdditionDataset, CharDataset, Dataset
Cohesion: 0.21 Nodes (11): Value (autograd scalar), Value.backward, Micrograd Computation Graph (operations + gradients), Backpropagation / Reverse-Mode Autodiff, Dynamically Built DAG (computation graph), micrograd, GPT.configure_optimizers, GPT.forward (minGPT) (+3 more)
Cohesion: 0.33 Nodes (7): Block Attention Residuals, Full Attention Residuals, Attention Residuals (AttnRes) - Kimi Team, PreNorm Dilution Problem, Result: AttnRes improves MMLU 73.5→74.6, BBH 76.3→78.0, Result: Block AttnRes matches 1.25x more compute baseline, Residual Connections in Deep Networks
Cohesion: 0.33 Nodes (6): Catastrophic Forgetting Problem, CoLoR Method, Low Rank Adaptation (LoRA), CoLoR: Continual Learning with Low Rank Adaptation, Vision Transformer (ViT-B-16) Backbone, Multi-Head Attention
Cohesion: 0.4 Nodes (1): Trainer
Cohesion: 0.4 Nodes (5): Mamba State Space Model, NeuralWalker Architecture, NeuralWalker: Learning Long Range Dependencies on Graphs, Result: NeuralWalker is strictly more expressive than 1-WL, Result: NeuralWalker +10% PascalVOC-SP, +13% COCO-SP over SOTA
Cohesion: 0.67 Nodes (3): AdditionDataset, CharDataset, GPT.generate (minGPT)
Cohesion: 1.0 Nodes (2): BPETokenizer, BPE Encoder
Cohesion: 1.0 Nodes (2): OpenWebText Dataset, OpenWebText Dataset (~9B tokens, 17GB, 8M documents)
Cohesion: 1.0 Nodes (2): Performance: torch.compile reduces iter time from 250ms to 135ms, torch.compile (PyTorch 2.0)
Cohesion: 1.0 Nodes (2): Behavior Tokens Concept, LCBM: Large Content and Behavior Model
Cohesion: 1.0 Nodes (1): setuptools
Cohesion: 1.0 Nodes (2): GPT Complexity Metaphor: Battleship vs Speedboat, nanogpt_readme_design_simplicity
Cohesion: 1.0 Nodes (2): Design Decision: minGPT prioritizes education (~300 lines), Design Decision: nanoGPT prioritizes speed over education
Cohesion: 1.0 Nodes (2): mingpt_readme_mingpt, Attention Is All You Need (Transformer Paper)
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (0):
Cohesion: 1.0 Nodes (1): LayerNorm with Optional Bias
Cohesion: 1.0 Nodes (1): meta.pkl Vocabulary Schema
Cohesion: 1.0 Nodes (1): Config: Eval GPT-2 (124M)
Cohesion: 1.0 Nodes (1): Config: Eval GPT-2 Medium
Cohesion: 1.0 Nodes (1): Config: Eval GPT-2 Large
Cohesion: 1.0 Nodes (1): Config: Eval GPT-2 XL
Cohesion: 1.0 Nodes (1): NewGELU Activation
Cohesion: 1.0 Nodes (1): GPT.from_pretrained (minGPT)
Cohesion: 1.0 Nodes (1): Trainer (minGPT)
Cohesion: 1.0 Nodes (1): CfgNode Configuration Class
Cohesion: 1.0 Nodes (1): set_seed
Cohesion: 1.0 Nodes (1): setup_logging
Cohesion: 1.0 Nodes (1): get_encoder
Cohesion: 1.0 Nodes (1): GPT-2 Architectural Changes: pre-norm LayerNorm, scaled residual init
Cohesion: 1.0 Nodes (1): Tiny Shakespeare Char Dataset (1M train tokens)
Cohesion: 1.0 Nodes (1): minGPT Adder Project (GPT trained to add numbers)
Cohesion: 1.0 Nodes (1): Tiny Shakespeare Dataset
Cohesion: 1.0 Nodes (1): IO-Aware Attention Computation
Cohesion: 1.0 Nodes (1): Result: FlashAttention memory scales linearly
Cohesion: 1.0 Nodes (1): Result: CoLoR 69.7% on DomainNet (+19% over S-Prompts)
Cohesion: 1.0 Nodes (1): Result: LCBM outperforms GPT-3.5/4 on behavior simulation (10x smaller)
Cohesion: 1.0 Nodes (1): Positional Encoding in Transformers
- 65 isolated node(s):
MLP Module,LayerNorm with Optional Bias,Checkpoint Data Schema (ckpt.pt),meta.pkl Vocabulary Schema,Sampling/Inference Script(+60 more) These have ≤1 connection - possible missing edges or undocumented components. - Thin community
BPETokenizer (minGPT)(2 nodes):BPETokenizer,BPE EncoderToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
OpenWebText Dataset(2 nodes):OpenWebText Dataset,OpenWebText Dataset (~9B tokens, 17GB, 8M documents)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
torch.compile Performance(2 nodes):Performance: torch.compile reduces iter time from 250ms to 135ms,torch.compile (PyTorch 2.0)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Behavior Token Paper(2 nodes):Behavior Tokens Concept,LCBM: Large Content and Behavior ModelToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Setup(2 nodes):setup.py,setuptoolsToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Nanogpt Complexity Metaphor(2 nodes):GPT Complexity Metaphor: Battleship vs Speedboat,nanogpt_readme_design_simplicityToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Readme Design Education(2 nodes):Design Decision: minGPT prioritizes education (~300 lines),Design Decision: nanoGPT prioritizes speed over educationToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Readme Mingpt(2 nodes):mingpt_readme_mingpt,Attention Is All You Need (Transformer Paper)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Init(1 nodes):__init__.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Train Gpt2(1 nodes):train_gpt2.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Eval Gpt2 Xl(1 nodes):eval_gpt2_xl.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Eval Gpt2(1 nodes):eval_gpt2.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Eval Gpt2 Large(1 nodes):eval_gpt2_large.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Train Shakespeare Char(1 nodes):train_shakespeare_char.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Eval Gpt2 Medium(1 nodes):eval_gpt2_medium.pyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Model Layernorm(1 nodes):LayerNorm with Optional BiasToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Model Meta Pkl Schema(1 nodes):meta.pkl Vocabulary SchemaToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Config Eval Gpt2(1 nodes):Config: Eval GPT-2 (124M)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Config Eval Gpt2 Medium(1 nodes):Config: Eval GPT-2 MediumToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Config Eval Gpt2 Large(1 nodes):Config: Eval GPT-2 LargeToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Config Eval Gpt2 Xl(1 nodes):Config: Eval GPT-2 XLToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Model Newgelu(1 nodes):NewGELU ActivationToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Model Gpt From Pretrained(1 nodes):GPT.from_pretrained (minGPT)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Trainer Trainer(1 nodes):Trainer (minGPT)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Utils Cfgnode(1 nodes):CfgNode Configuration ClassToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Utils Set Seed(1 nodes):set_seedToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Utils Setup Logging(1 nodes):setup_loggingToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Bpe Get Encoder(1 nodes):get_encoderToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Readme Gpt2 Arch Changes(1 nodes):GPT-2 Architectural Changes: pre-norm LayerNorm, scaled residual initToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Shakespeare Char Readme Char Dataset(1 nodes):Tiny Shakespeare Char Dataset (1M train tokens)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Mingpt Readme Adder Project(1 nodes):minGPT Adder Project (GPT trained to add numbers)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Chargpt Readme Tiny Shakespeare(1 nodes):Tiny Shakespeare DatasetToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
2205 14135 Io Awareness(1 nodes):IO-Aware Attention ComputationToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
2205 14135 Result Memory Linear(1 nodes):Result: FlashAttention memory scales linearlyToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
2311 17601 Result Domainnet(1 nodes):Result: CoLoR 69.7% on DomainNet (+19% over S-Prompts)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
2309 00359 Result Behavior Sim(1 nodes):Result: LCBM outperforms GPT-3.5/4 on behavior simulation (10x smaller)Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Concept Positional Encoding(1 nodes):Positional Encoding in TransformersToo small to be a meaningful cluster - may be noise or needs more connections extracted.
Questions this graph is uniquely positioned to answer:
- Why does
Training ScriptconnectnanoGPT Config + Data PreptonanoGPT Training Pipeline? High betweenness centrality (0.176) - this node is a cross-community bridge. - Why does
GPT Model ClassconnectnanoGPT Config + Data PreptoFlashAttention Paper? High betweenness centrality (0.103) - this node is a cross-community bridge. - Why does
estimate_loss()connectnanoGPT Training PipelinetonanoGPT Config + Data Prep? High betweenness centrality (0.083) - this node is a cross-community bridge. - Are the 4 inferred relationships involving
Value(e.g. with.__add__()and.__mul__()) actually correct?Valuehas 4 INFERRED edges - model-reasoned connections that need verification. - Are the 3 inferred relationships involving
Training Script(e.g. withGPTConfig DataclassandPerformance: ~2.85 val loss in 4 days on 8xA100) actually correct?Training Scripthas 3 INFERRED edges - model-reasoned connections that need verification. - Are the 2 inferred relationships involving
Layer(e.g. with.__init__()and.__call__()) actually correct?Layerhas 2 INFERRED edges - model-reasoned connections that need verification. - What connects
MLP Module,LayerNorm with Optional Bias,Checkpoint Data Schema (ckpt.pt)to the rest of the system? 65 weakly-connected nodes found - possible documentation gaps or missing edges.