Skip to content

kreasof-ai/flow-matching-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Advanced AI: Mastering Flow Matching for Generative Modeling

Course Goal: To provide a deep theoretical and practical understanding of Flow Matching (FM), a cutting-edge technique for generative modeling. Learners will gain hands-on experience implementing and extending FM models using PyTorch, building upon their prior knowledge of Transformers and generative models.

Prerequisites:

  • Successful completion of "Modern AI Development: From Transformers to Generative Models" or equivalent knowledge.
  • Strong Python programming skills.
  • Solid understanding of deep learning concepts, including:
    • Neural network architectures (MLPs, CNNs, Transformers)
    • Optimization algorithms (Adam, etc.)
    • Loss functions
    • Backpropagation
  • Familiarity with PyTorch.
  • Comfort with mathematical concepts including:
    • Calculus (derivatives, integrals, gradients)
    • Linear algebra (vectors, matrices, operations)
    • Probability and statistics (distributions, expectations, etc)
  • Experience with the Hugging Face ecosystem is helpful but not required.

Course Duration: 8 weeks (flexible, can be adjusted to 6-10 weeks)

Tools:

  • Python (>= 3.8)
  • PyTorch (latest stable version)
  • flow_matching library (from the paper's codebase: https://github.com/facebookresearch/flow_matching)
  • Hugging Face Transformers library (for potential connections and comparisons)
  • Jupyter Notebooks/Google Colab
  • Standard Python libraries (NumPy, Matplotlib, etc.)

Curriculum Draft:

Module 1: Recap and Introduction to Flow Matching (Week 1)

  • Topic 1.1: Recap of Generative Models:
    • Review of key generative model types:
      • Autoregressive models (brief)
      • VAEs (brief)
      • Diffusion models (more emphasis)
    • Limitations and challenges of existing methods.
  • Topic 1.2: Introduction to Flow Matching (FM):
    • Intuitive explanation of Flow Matching.
    • The concept of continuous-time flows and velocity fields.
    • How FM addresses limitations of other methods.
    • Relationship to Continuous Normalizing Flows (CNFs).
  • Topic 1.3: Setting up the environment with flow_matching library:
    • Installing and configuring necessary libraries
    • Overview of flow_matching codebase
    • Running basic examples from the paper.
  • Topic 1.4: Mathematical Foundations - Part 1:
    • Random vectors, conditional densities, and expectations
    • Diffeomorphisms and push-forward maps
  • Hands-on Exercises:
    • Exploring the flow_matching library, running pre-built examples
    • Implementing basic operations with probability distributions in PyTorch

Module 2: Flow Models and Probability Paths (Week 2)

  • Topic 2.1: Mathematical Foundations - Part 2:
    • Flows as generative models
    • Probability paths and the Continuity Equation.
    • Instantaneous Change of Variables
  • Topic 2.2: Flow Models in Detail:
    • Defining flows via velocity fields.
    • Solving the flow ODE (Ordinary Differential Equation).
    • Computing target samples from source samples.
    • Implementing a simple flow model in PyTorch.
  • Topic 2.3: Understanding Probability Paths:
    • Different types of probability paths.
    • The concept of "generating" a probability path with a velocity field.
    • Visualizing probability paths.
  • Topic 2.4: Training Flow Models with Simulation (Traditional Approach):
    • Maximizing likelihood using the instantaneous change of variables formula.
    • The need for simulation and its challenges.
  • Hands-on Exercises:
    • Implementing an ODE solver (e.g., Euler, Midpoint) in PyTorch.
    • Training a simple flow model using likelihood maximization.
    • Experimenting with different probability paths using the flow_matching library

Module 3: The Core of Flow Matching (Week 3)

  • Topic 3.1: The Flow Matching Loss:
    • Introducing the FM loss as a regression objective.
    • Why the FM loss is simulation-free.
    • Conditional Flow Matching (CFM) loss.
  • Topic 3.2: Designing Probability Paths:
    • Conditional probability paths.
    • Marginal probability paths.
    • The marginalization trick
  • Topic 3.3: Deriving Conditional Velocity Fields:
    • Constructing conditional velocity fields for specific probability paths.
    • The Optimal Transport (OT) conditional path.
    • Implementing conditional velocity fields in PyTorch
  • Topic 3.4: The Simplest Implementation of Flow Matching:
    • Putting it all together: CFM with OT path.
    • Training a basic FM model using the flow_matching library.
  • Hands-on Exercises:
    • Implementing the CFM loss function.
    • Training an FM model on a simple 2D dataset (e.g., two moons).
    • Visualizing the learned velocity field and generated samples.

Module 4: Conditional Flow Matching and Design Choices (Week 4)

  • Topic 4.1: Affine Conditional Flows:
    • Exploring different schedulers (at, at) and their effect on the flow
    • Connecting affine flows to known examples from diffusion models
    • Implementing various schedulers from the paper in the flow_matching library
  • Topic 4.2: General Conditioning and the Marginalization Trick:
    • Understanding how conditioning works in FM.
    • Theorem 3 (Marginalization Trick) in detail.
    • The role of regularity assumptions
  • Topic 4.3: Beyond the Basics - Exploring Different Probability Paths:
    • Investigating alternative probability path designs.
    • Trade-offs and considerations when choosing a path.
  • Topic 4.4: Velocity Parameterizations:
    • Different ways to parameterize the velocity field (x1-prediction, x0-prediction).
    • The role of the score function in the Gaussian case.
    • Connection to diffusion models.
  • Hands-on Exercises:
    • Experimenting with different schedulers and their impact on training.
    • Implementing x1-prediction and x0-prediction models.
    • Training FM models on more complex datasets (e.g., MNIST, CIFAR-10).

Module 5: Non-Euclidean Flow Matching (Week 5)

  • Topic 5.1: Introduction to Riemannian Manifolds:
    • Basic concepts of manifolds, tangent spaces, and metrics.
    • Why manifolds are relevant for certain types of data.
  • Topic 5.2: Flows on Manifolds:
    • Defining flows and velocity fields on manifolds.
    • The Riemannian Continuity Equation.
    • The concept of geodesics
  • Topic 5.3: The Riemannian Flow Matching Loss:
    • Extending the FM loss to manifolds.
    • Using Bregman divergences on tangent spaces.
  • Topic 5.4: Conditional Flows Through Premetrics:
    • The idea behind premetrics.
    • Implementing a basic non-Euclidean FM model.
  • Hands-on Exercises:
    • Implementing a simple flow on a sphere (if applicable, depending on the maturity of manifold support in the flow_matching library).
    • Exploring geodesic conditional flows.

Module 6: Discrete Flow Matching (Week 6)

  • Topic 6.1: Continuous Time Markov Chains (CTMCs):
    • Introduction to CTMCs as generative models.
    • Rates, velocities, and the Kolmogorov equation
    • Probability paths and mass conservation
  • Topic 6.2: Discrete Flow Matching (DFM):
    • The DFM loss function.
    • Conditional DFM.
    • Training a CTMC model with DFM.
  • Topic 6.3: Factorized Paths and Velocities:
    • Simplifying DFM with factorized paths.
    • Reducing the complexity of the model.
    • Implementing factorized DFM.
  • Topic 6.4: Mixture Paths for DFM:
    • Designing probability paths with mixtures.
    • Deriving conditional velocities.
  • Hands-on Exercises:
    • Implementing a basic DFM model using the flow_matching library (if applicable).
    • Training a DFM model on a simple discrete dataset.
    • Experimenting with factorized paths and mixture paths.

Module 7: Advanced Topics (Week 7)

  • Topic 7.1: Generator Matching (GM):
    • Introduction to the general framework of Generator Matching
    • Connection to CTMPs
    • Training CTMPs with the Generator Matching loss.
  • Topic 7.2: Combining Models:
    • Markov superpositions.
    • Divergence-free components.
    • Predictor-corrector methods.
    • Building hybrid models (e.g., flow + jump).
  • Topic 7.3: Multimodal Flow Matching:
    • Extending FM to multiple modalities.
    • Factorized conditional probability paths for multimodal data.
  • Topic 7.4: Post-training velocity scheduler change:
    • Adapting the velocity field to a different scheduler.
    • Improving sample quality.
  • Hands-on Exercises:
    • Implementing a simple example of combining models (e.g., flow + jump if supported by flow_matching)
    • Experimenting with different velocity field parameterizations and their effect on performance

Module 8: Project and Future Directions (Week 8)

  • Topic 8.1: Project Development:
    • Students work on a final project applying Flow Matching to a problem of their choice.
    • Potential project ideas:
      • Image generation with advanced FM architectures.
      • Exploring non-Euclidean FM for specific data types.
      • Implementing and evaluating DFM for discrete data.
      • Building a multimodal generative model using FM.
      • Investigating novel probability path designs.
    • Instructor guidance and feedback.
  • Topic 8.2: Project Presentations:
    • Students present their projects and findings.
    • Peer review and discussion.
  • Topic 8.3: The Future of Flow Matching:
    • Open research questions and potential extensions.
    • Connections to other areas of generative modeling.
  • Topic 8.4: Resources for Continued Learning:
    • Relevant papers, codebases, and online communities.
    • Tips for staying up-to-date with the field.

Assessment:

  • Weekly hands-on exercises (coding implementations, experiments).
  • Short quizzes to test understanding of core concepts (optional).
  • Mid-term assessment (e.g., implementing a specific FM variant from the paper).
  • Final project (implementation, evaluation, and presentation).

Key Pedagogical Considerations:

  • Theory and Practice Balance: The curriculum emphasizes both a deep understanding of the theoretical underpinnings of Flow Matching and hands-on implementation skills.
  • Code-First Approach: Leveraging the flow_matching library to provide a practical and accessible entry point to the concepts.
  • Progressive Complexity: Modules are structured to gradually introduce more advanced concepts, building upon previous knowledge.
  • Mathematical Rigor: The course does not shy away from the mathematical details, but presents them in a clear and digestible manner, with a focus on building intuition.
  • Emphasis on Design Choices: The curriculum highlights the various design choices in FM (probability paths, velocity fields, conditioning, etc.) and their impact on performance.
  • Real-World Datasets: Encouraging experimentation with real-world datasets (images, text, etc.) to demonstrate the practical applicability of FM.
  • Research-Oriented: The course connects to current research in generative modeling and encourages students to explore open questions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published