forked from volcengine/verl
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
We have the following main tasks:
- A better testing framework: Tests each PR with actual training (and record performances). Unit test for smaller components.
- More modalities support: Merge audio support branch. Add support for more branches.
- Diffusion language model support: Support for RL tuning of diffusion language models.
Each can be broken down into smaller tasks:
-
A better testing framework: Tests each PR with actual training (and record performances). Unit test for smaller components.
- CI/CD Pipeline Setup
- Configure GitHub Actions for automated testing on PRs
- Set up GPU runners for training tests
- Implement performance benchmarking and tracking
- Set up automatic performance regression alerts
- Unit Testing
- Write unit tests for data loading and preprocessing modules
- Test reward computation components
- Test loss calculation functions
- Validate model initialization and checkpointing
- Test distributed training utilities
- Test memory management and cleanup
- Validate gradient computation and backpropagation
- CI/CD Pipeline Setup
-
More modalities support: Merge audio support branch. Add support for more modalities.
- Audio Integration
- Merge existing audio support branch
- Add example training scripts for audio tasks
- Create audio quality evaluation metrics
- Arbitrary Modality Framework
- Design generic modality interface/base classes
- Create modality fusion layers
- Implement cross-modal attention mechanisms
- Build modality-agnostic data loaders
- Design modality registration system
- Create modality mixing strategies for training
- Audio Integration
-
Diffusion language model support: Support for RL tuning of diffusion language models.
- Architecture Integration
- Implement diffusion model base classes
- Add noise scheduling modules (linear, cosine, etc.)
- Create denoising loss functions
- Integrate with existing model registry
- Implement score matching objectives
- Add variational bounds computation
- Training Pipeline
- Adapt PPO for diffusion models
- Implement diffusion-specific sampling strategies
- Create hybrid autoregressive-diffusion training
- Add diffusion-specific evaluation metrics
- Implement training stability techniques
- Create checkpointing for diffusion models
- RL Algorithms for Diffusion
- Adapt reward modeling for continuous outputs
- Implement diffusion-specific PPO variants
- Create direct preference optimization for diffusion
- Design curriculum learning strategies
- Implement trajectory optimization methods
- Add support for conditional generation with RL
- Architecture Integration
Metadata
Metadata
Assignees
Labels
No labels