Changelog
- Remove the redundant sampling hyperparameters in algorithms.
- Allow multi-gpu training with MP (see test results at PR #62).
Notes: Currently, we limit the scope of parallel training via model parallel for research use (4x7/15B models should be fine). Ray-based distributed training for a larger scope is unnecessary and has not been developed yet.