Skip to content

Releases: ray-project/ray

ray-0.8.6

24 Jun 16:34
Compare
Choose a tag to compare

Highlight

  • Experimental support for Windows is now available for single node Ray usage. Check out the Windows section below for known issues and other details.
  • Have you had troubles monitoring GPU or memory usage while you used Ray? The Ray dashboard now supports the GPU monitoring and a memory view.
  • Want to use RLlib with Unity? RLlib officially supports the Unity3D adapter! Please check out the documentation.
  • Ray Serve is ready for feedback! We've gotten feedback from many users, and Ray Serve is already being used in production. Please reach out to us with your use cases, ideas, documentation improvements, and feedback. We'd love to hear from you. Please do so on the Ray Slack and join #serve! Please see the Serve section below for more details.

Core

  • We’ve introduced a new feature to automatically retry failed actor tasks after an actor has been restarted by Ray (by specifying max_restarts in @ray.remote). Try it out with max_task_retries=-1 where -1 indicates that the system can retry the task until it succeeds.

API Change

  • To enable automatic restarts of a failed actor, you must now use max_restarts in the @ray.remote decorator instead of max_reconstructions. You can use -1 to indicate infinity, i.e., the system should always restart the actor if it fails unexpectedly.
  • We’ve merged the named and detached actor APIs. To create an actor that will survive past the duration of its job (a “detached” actor), specify name=<str> in its remote constructor (Actor.options(name='<str>').remote()). To delete the actor, you can use ray.kill.

RLlib

  • PyTorch: IMPALA PyTorch version and all rllib/examples scripts now work for either TensorFlow or PyTorch (--torch command line option).
  • Switched to using distributed execution API by default (replaces Policy Optimizers) for all algorithms.
  • Unity3D adapter (supports all Env types: multi-agent, external env, vectorized) with example scripts for running locally or in the cloud.
  • Added support for variable length observation Spaces ("Repeated").
  • Added support for arbitrarily nested action spaces.
  • Added experimental GTrXL (Transformer/Attention net) support to RLlib + learning tests for PPO and IMPALA.
  • QMIX now supports complex observation spaces.

API Change

  • Retire use_pytorch and eager flags in configs and replace these with framework=[tf|tfe|torch].
  • Deprecate PolicyOptimizers in favor of the new distributed execution API.
  • Retired support for Model(V1) class. Custom Models should now only use the ModelV2 API. There is still a warning when using ModelV1, which will be changed into an error message in the next release.
  • Retired TupleActions (in favor of arbitrarily nested action Spaces).

Ray Tune / RaySGD

  • There is now a Dataset API for handling large datasets with RaySGD. (#7839)
  • You can now filter by an average of the last results using the ExperimentAnalysis tool (#8445).
  • BayesOptSearch received numerous contributions, enabling preliminary random search and warm starting. (#8541, #8486, #8488)

API Changes

  • tune.report is now the right way to use the Tune function API. tune.track is deprecated (#8388)

Serve

  • New APIs to inspect and manage Serve objects:
    • serve.list_backends and serve.list_endpoints (#8737)
    • serve.delete_backend and serve.delete_endpoint (#8252, #8256)
  • serve.create_endpoint now requires specifying the backend directly. You can remove serve.set_traffic if there's only one backend per endpoint. (#8764)
  • serve.init API cleanup, the following options were removed:
  • serve.init now supports namespacing with name. You can run multiple serve clusters with different names on the same ray cluster. (#8449)
  • You can specify session affinity when splitting traffic with backends using X-SERVE-SHARD-KEY HTTP header. (#8449)
  • Various documentation improvements. Highlights:
    • A new section on how to perform A/B testing and incremental rollout (#8741)
    • Tutorial for batch inference (#8490)
    • Instructions for specifying GPUs and resources (#8495)

Dashboard / Metrics

  • The Machine View of the dashboard now shows information about GPU utilization such as:
    • Average GPU/GRAM utilization at a node and cluster level
    • Worker-level information about how many GPUs each worker is assigned as well as its GRAM use.
  • The dashboard has a new Memory View tab that should be very useful for debugging memory issues. It has:
    • Information about objects in the Ray object store, including size and call-site
    • Information about reference counts and what is keeping an object pinned in the Ray object store.

Small changes

  • IDLE workers get automatically sorted to the end of the worker list in the Machine View

Autoscaler

  • Improved logging output. Errors are more clearly propagated and excess output has been reduced. (#7198, #8751, #8753)
  • Added support for k8s services.

API Changes

  • ray up accepts remote URLs that point to the desired cluster YAML. (#8279)

Windows support

  • Windows wheels are now available for basic experimental usage (via ray.init()).
  • Windows support is currently unstable. Unusual, unattended, or production usage is not recommended.
  • Various functionality may still lack support, including Ray Serve, Ray SGD, the autoscaler, the dashboard, non-ASCII file paths, etc.
  • Please check the latest nightly wheels & known issues (#9114), and let us know if any issue you encounter has not yet been addressed.
  • Wheels are available for Python 3.6, 3.7, and 3.8. (#8369)
  • redis-py has been patched for Windows sockets. (#8386)

Others

Thanks

We thank the following contributors for their work on this release:
@pcmoritz, @akharitonov, @devanderhoff, @ffbin, @anabranch, @jasonjmcghee, @kfstorm, @mfitton, @alecbrick, @simon-mo, @konichuvak, @aniryou, @wuisawesome, @robertnishihara, @ramanNarasimhan77, @09wakharet, @richardliaw, @istoica, @ThomasLecat, @sven1977, @ceteri, @acxz, @iamhatesz, @JarnoRFB, @rkooo567, @mehrdadn, @thomasdesr, @janblumenkamp, @ujvl, @edoakes, @maximsmol, @krfricke, @amogkam, @gehring, @ijrsvt, @internetcoffeephone, @LucaCappelletti94, @chaokunyang, @WangTaoTheTonic, @fyrestone, @raulchen, @ConeyLiu, @stephanie-wang, @suquark, @ashione, @Coac, @JosephTLucas, @ericl, @AmeerHajAli, @pdames

Ray 0.8.5

07 May 17:35
Compare
Choose a tag to compare

Highlight

Core

  • Task cancellation is now available for locally submitted tasks. (#7699)
  • Experimental support for recovering objects that were lost from the Ray distributed memory store. You can try this out by setting lineage_pinning_enabled: 1 in the internal config. (#7733)

RLlib

Tune

  • Documentation has improved with a new format. (#8083, #8201, #7716)
  • Search algorithms are refactored to make them easier to extend, deprecating max_concurrent argument. (#7037, #8258, #8285)
  • TensorboardX errors are now handled safely. (#8174)
  • Bug fix in PBT checkpointing. (#7794)
  • New ZOOpt search algorithm added. (#7960)

Serve

  • Improved APIs.
    • Add delete_endpoint and delete_backend. (#8252, #8256)
    • Use dictionary to update backend config. (#8202)
  • Added overview section to the documentation.
  • Added tutorials for serving models in Tensorflow/Keras, PyTorch, and Scikit-Learn.
  • Made serve clusters tolerant to process failures. (#8116, #8008,#7970,#7936)

SGD

  • New Semantic Segmentation and HuggingFace GLUE Fine-tuning Examples. (#7792, #7825)
  • Fix GPU Reservations in SLURM usage. (#8157)
  • Update learning rate scheduler stepping parameter. (#8107)
  • Make serialization of data creation optional. (#8027)
  • Automatic DDP wrapping is now optional. (#7875)

Others Projects

Thanks

We thank the following contributors for their work on this release:

@simon-mo, @robertnishihara, @BalaBalaYi, @ericl, @kfstorm, @tirkarthi, @nflu, @ffbin, @chaokunyang, @ijrsvt, @pcmoritz, @mehrdadn, @sven1977, @iamhatesz, @nmatthews-asapp, @mitchellstern, @edoakes, @anabranch, @billowkiller, @eisber, @ujvl, @allenyin55, @yncxcw, @deanwampler, @DavidMChan, @ConeyLiu, @micafan, @rkooo567, @datayjz, @wizardfishball, @sumanthratna, @ashione, @marload, @stephanie-wang, @richardliaw, @jovany-wang, @MissiontoMars, @aannadi, @fyrestone, @JarnoRFB, @wumuzi520, @roireshef, @acxz, @gramhagen, @Servon-Lee, @clarkzinzow, @mfitton, @maximsmol, @janblumenkamp, @istoica

Ray 0.8.4

02 Apr 17:44
Compare
Choose a tag to compare

Highlight

  • Add Python 3.8 support. (#7754)

Core

  • Fix asycnio actor deserialization. (#7806)
  • Fix importing Pyarrow lead to symbol collison segfault. (#7568)
  • ray memory will collect statistics from all nodes. (#7721)
  • Pin lineage of plasma objects that are still in scope. (#7690)

RLlib

  • Add contextual bandit algorithms. (#7642)
  • Add parameter noise exploration API. (#7772)
  • Add scaling guide. (#7780)
  • Enable restore keras model from h5 file. (#7482)
  • Store tf-graph by default when doing Policy.export_model(). (#7759)
  • Fix default policy overrides torch policy. (#7756, #7769)

RaySGD

  • BREAKING: Add new API for tuning TorchTrainer using Tune. (#7547)
  • BREAKING: Convert the head worker to a local model. (#7746)
  • Added a new API for save/restore. (#7547)
  • Add tqdm support to TorchTrainer. (#7588)

Tune

  • Add sorted columns and TensorBoard to Tune tab. (#7140)
  • Tune experiments can now be cancelled via the REST client. (#7719)
  • fail_fast enables experiments to fail quickly. (#7528)
  • override the IP retrieval process if needed. (#7705)
  • TensorBoardX nested dictionary support. (#7705)

Serve

  • Performance improvements:
    • Push route table updates to HTTP proxy. (#7774)
    • Improve serialization. (#7688)
  • Add async methods support for serve actors. (#7682)
  • Add multiple method support for serve actors. (#7709)
    • You can specify HTTP methods in serve.create_backend(..., methods=["GET", "POST"]).
    • The ability to specify which actor method to execute in HTTP through X-SERVE-CALL-METHOD header or in RayServeHandle through handle.options("method").remote(...).

Others

  • Progress towards highly available control plane. (#7822, #7742)
  • Progress towards Windows compatibility. (#7740, #7739, #7657)
  • Progress towards Ray Streaming library. (#7813)
  • Progress towards metrics export service. (#7809)
  • Basic C++ worker implementation. (#6125)

Thanks

We thank the following contributors for their work on this release:

@carlbalmer, @BalaBalaYi, @saurabh3949, @maximsmol, @SongGuyang, @istoica, @pcmoritz, @aannadi, @kfstorm, @ijrsvt, @richardliaw, @mehrdadn, @wumuzi520, @cloudhan, @edoakes, @mitchellstern, @robertnishihara, @hhoke, @simon-mo, @ConeyLiu, @stephanie-wang, @rkooo567, @ffbin, @ericl, @hubcity, @sven1977

Ray 0.8.3

25 Mar 21:28
Compare
Choose a tag to compare

Highlights

  • Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
    • Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
    • It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
  • Distributed reference counting is turned on by default. (#7628, #7337)
    • This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
    • When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
  • A new command ray memory is added to help debug memory usage: (#7589)
> ray memory
-----------------------------------------------------------------------------------------------------
 Object ID                                Reference Type       Object Size   Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000  PINNED_IN_MEMORY            8231   (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000  USED_BY_PENDING_TASK           ?   (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000  USED_BY_PENDING_TASK        8231   (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000  LOCAL_REFERENCE                ?   (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------

API change

  • Change actor.__ray_kill__() to ray.kill(actor). (#7360)
  • Deprecate use_pickle flag for serialization. (#7474)
  • Remove experimental.NoReturn. (#7475)
  • Remove experimental.signal API. (#7477)

Core

  • Add Apache 2 license header to C++ files. (#7520)
  • Reduce per worker memory usage to 50MB. (#7573)
  • Option to fallback to LRU on OutOfMemory. (#7410)
  • Reference counting for actor handles. (#7434)
  • Reference counting for returning object IDs created by a different process. (#7221)
  • Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
  • Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
  • Remove static concurrency limit from gRPC server. (#7544)
  • Remove get_global_worker(), RuntimeContext. (#7638)
  • Fix known issues from 0.8.2 release:
    • Fix passing duplicate by-reference arguments. (#7306)
    • Fix Raise gRPC message size limit to 100MB. (#7269)

RLlib

  • New features:
    • Exploration API improvements. (#7373, #7314, #7380)
    • SAC: add discrete action support. (#7320, #7272)
    • Add high-performance external application connector. (#7641)
  • Bug fix highlights:
    • PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
    • Rename sample_batch_size => rollout_fragment_length. (#7503)
    • Fix bugs and speed up SegmentTree.

Tune

  • Integrate Dragonfly optimizer. (#5955)
  • Fix HyperBand errors. (#7563)
  • Access Trial Name, Trial ID inside trainable. (#7378)
  • Add a new repeater class for high variance trials. (#7366)
  • Prevent deletion of checkpoint from user-initiated restoration. (#7501)

Libraries

  • [Parallel Iterators] Allow for operator chaining after repartition. (#7268)
  • [Parallel Iterators] Repartition functionality. (#7163)
  • [Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
  • [RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
  • [RaySGD] Custom training API. (#7211)
  • [RaySGD] Breaking User API changes: (#7384)
    • data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
    • TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
    • data_loader_config and batch_size are no longer parameters for TorchTrainer.
    • TorchTrainer parallelism is now set by num_workers.
    • All TorchTrainer args now must be named parameters.

Java

  • New Java actor API (#7414)
    • @RayRemote annotation is removed.
    • Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
  • Allow passing internal config from raylet to Java worker. (#7532)
  • Enable direct call by default. (#7408)
  • Pass large object by reference. (#7595)

Others

Known issues

  • Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

Thanks

We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @clarkzinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @tisonkun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz

Ray 0.8.2

24 Feb 19:28
Compare
Choose a tag to compare

Highlights

  • Pyarrow is no longer vendored. Ray directly uses the C++ Arrow API. You can use any version of pyarrow with ray. (#7233)
  • The dashboard is turned on by default. It shows node and process information, actor information, and Ray Tune trials information. You can also use ray.show_in_webui to display custom messages for actors. Please try it out and send us feedback! (#6705, #6820, #6822, #6911, #6932, #6955, #7028, #7034)
  • We have made progress on distributed reference counting (behind a feature flag). You can try it out with ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 1})). It is designed to help manage memory using precise distributed garbage collection. (#6945, #6946, #7029, #7075, #7218, #7220, #7222, #7235, #7249)

Breaking changes

  • Many experimental Ray libraries are moved to the util namespace. (#7100)
    • ray.experimental.multiprocessing => ray.util.multiprocessing
    • ray.experimental.joblib => ray.util.joblib
    • ray.experimental.iter => ray.util.iter
    • ray.experimental.serve => ray.serve
    • ray.experimental.sgd => ray.util.sgd
  • Tasks and actors are cleaned up if their owner process dies. (#6818)
  • The OMP_NUM_THREADS environment variable defaults to 1 if unset. This improves training performance and reduces resource contention. (#6998)
  • We now vendor psutil and setproctitle to support turning the dashboard on by default. Running import psutil after import ray will use the version of psutil that ships with Ray. (#7031)

Core

  • The Python raylet client is removed. All raylet communication now goes through the core worker. (#6018)
  • Calling delete() will not delete objects in the in-memory store. (#7117)
  • Removed vanilla pickle serialization for task arguments. (#6948)
  • Fix bug passing empty bytes into Python tasks. (#7045)
  • Progress toward next generation ray scheduler. (#6913)
  • Progress toward service based global control store (GCS). (#6686, #7041)

RLlib

  • Improved PyTorch support, including a PyTorch version of PPO. (#6826, #6770)
  • Added distributed SGD for PPO. (#6918, #7084)
  • Added an exploration API for controlling epsilon greedy and stochastic exploration. (#6974, #7155)
  • Fixed schedule values going negative past the end of the schedule. (#6971, #6973)
  • Added support for histogram outputs in TensorBoard. (#6942)
  • Added support for parallel and customizable evaluation step. (#6981)

Tune

  • Improved Ax Example. (#7012)
  • Process saves asynchronously. (#6912)
  • Default to tensorboardx and include it in requirements. (#6836)
  • Added experiment stopping api. (#6886)
  • Expose progress reporter to users. (#6915)
  • Fix directory naming regression. (#6839)
  • Handles nan case for asynchyperband. (#6916)
  • Prevent memory checkpoints from breaking trial fault tolerance. (#6691)
  • Remove keras dependency. (#6827)
  • Remove unused tf loggers. (#7090)
  • Set correct path when deleting checkpoint folder. (#6758)
  • Support callable objects in variant generation. (#6849)

Autoscaler

  • Ray nodes now respect docker limits. (#7039)
  • Add --all-nodes option to rsync-up. (#7065)
  • Add port-forwarding support for attach. (#7145)
  • For AWS, default to latest deep learning AMI. (#6922)
  • Added 'ray dashboard' command to proxy ray dashboard in remote machine. (#6959)

Utility libraries

  • Support of scikit-learn with Ray joblib backend. (#6925)
  • Parallel iterator support local shuffle. (#6921)
  • [Serve] support no http headless services. (#7010)
  • [Serve] refactor router to use Ray asyncio support. (#6873)
  • [Serve] support composing arbitrary dags. (#7015)
  • [RaySGD] support fp16 via PyTorch apex. (#7061)
  • [RaySGD] refactor PyTorch sgd documentation. (#6910)
  • Improvement in Ray Streaming. (#7043, #6666, #7071)

Other improvements

  • Progress toward Windows compatibility. (#6882, #6823)
  • Ray Kubernetes operator improvements. (#6852, #6851, #7091)
  • Java support for concurrent actor calls API. (#7022)
  • Java support for direct call for normal tasks. (#7193)
  • Java support for cross language Python invocation. (#6709)
  • Java support for cross language serialization for actor handles. (#7134)

Known issue

  • Passing the same ObjectIDs multiple time as arguments currently doesn't work. (#7296)
  • Tasks can exceed gRPC max message size. (#7263)

Thanks

We thank the following contributors for their work on this release:
@mitchellstern, @hugwi, @deanwampler, @alindkhare, @ericl, @ashione, @fyrestone, @robertnishihara, @pcmoritz, @richardliaw, @yutaizhou, @istoica, @edoakes, @ls-daniel, @BalaBalaYi, @raulchen, @justinkterry, @roireshef, @elpollouk, @kfstorm, @Bassstring, @hhbyyh, @Qstar, @mehrdadn, @chaokunyang, @flying-mojo, @ujvl, @AnanthHari, @rkooo567, @simon-mo, @jovany-wang, @ijrsvt, @ffbin, @AmeerHajAli, @gaocegege, @suquark, @MissiontoMars, @zzyunzhi, @sven1977, @stephanie-wang, @amogkam, @wuisawesome, @aannadi, @maximsmol

ray-0.8.1

27 Jan 22:23
Compare
Choose a tag to compare

Ray 0.8.1 Release Notes

Highlights

  • ObjectIDs corresponding to ray.put() objects and task returns are now reference counted locally in Python and when passed into a remote task as an argument. ObjectIDs that have a nonzero reference count will not be evicted from the object store. Note that references for ObjectIDs passed into remote tasks inside of other objects (e.g., f.remote((ObjectID,)) or f.remote([ObjectID])) are not currently accounted for. (#6554)
  • asyncio actor support: actors can now define async def method and Ray will run multiple method invocations in the same event loop. The maximum concurrency level can be adjusted with ActorClass.options(max_concurrency=2000).remote().
  • asyncio ObjectID support: Ray ObjectIDs can now be directly awaited using the Python API. await my_object_id is similar to ray.get(my_object_id), but allows context switching to make the operation non-blocking. You can also convert an ObjectID to a asyncio.Future using ObjectID.as_future().
  • Added experimental parallel iterators API (#6644, #6726): ParallelIterators can be used to more convienently load and process data into Ray actors. See the documentation for details.
  • Added multiprocessing.Pool API (#6194): Ray now supports the multiprocessing.Pool API out of the box, so you can scale existing programs up from a single node to a cluster by only changing the import statment. See the documentation for details.

Core

  • Deprecated Python 2 (#6581, #6601, #6624, #6665)
  • Fixed bug when failing to import remote functions or actors with args and kwargs (#6577)
  • Many improvements to the dashboard (#6493, #6516, #6521, #6574, #6590, #6652, #6671, #6683, #6810)
  • Progress towards Windows compatibility (#6446, #6548, #6653, #6706)
  • Redis now binds to localhost and has a password set by default (#6481)
  • Added actor.__ray_kill__() to terminate actors immediately (#6523)
  • Added 'ray stat' command for debugging (#6622)
  • Added documentation for fault tolerance behavior (#6698)
  • Treat static methods as class methods instead of instance methods in actors (#6756)

RLlib

  • DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz (#6772)
  • SAC site changes (#6759)
  • PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch) (#6650)
  • SAC for Mujoco Environments (#6642)
  • Tuple action dist tensors not reduced properly in eager mode (#6615)
  • Changed foreach_policy to foreach_trainable_policy (#6564)
  • Wrapper for the dm_env interface (#6468)

Tune

  • Get checkpoints paths for a trial after tuning (#6643)
  • Async restores and S3/GCP-capable trial FT (#6376)
  • Usability errors PBT (#5972)
  • Demo exporting trained models in pbt examples (#6533)
  • Avoid duplication in TrialRunner execution (#6598)
  • Update params for optimizer in reset_config (#6522)
  • Support Type Hinting for py3 (#6571)

Other Libraries

  • [serve] Pluggable Queueing Policy (#6492)
  • [serve] Added BackendConfig (#6541)
  • [sgd] Fault tolerance support for pytorch + revamp documentation (#6465)

Thanks

We thank the following contributors for their work on this release:

@chaokunyang, @Qstar, @simon-mo, @wlx65003, @stephanie-wang, @alindkhare, @ashione, @harrisonfeng, @JingGe, @pcmoritz, @zhijunfu, @BalaBalaYi, @kfstorm, @richardliaw, @mitchellstern, @michaelzhiluo, @ziyadedher, @istoica, @EyalSel, @ffbin, @raulchen, @edoakes, @chenk008, @frthjf, @mslapek, @gehring, @hhbyyh, @zzyunzhi, @zhu-eric, @MissiontoMars, @sven1977, @walterddr, @micafan, @inventormc, @robertnishihara, @ericl, @ZhongxiaYan, @mehrdadn, @jovany-wang, @ujvl, @bharatpn

Ray 0.8.0 Release Notes

18 Dec 00:01
Compare
Choose a tag to compare

Ray 0.8.0 Release Notes

This is the first release with gRPC direct calls enabled by default for both tasks and actors, which substantially improves task submission performance.

Highlights

  • Enable gRPC direct calls by default (#6367). In this mode, actor tasks are sent directly from actor to actor over gRPC; the Raylet only coordinates actor creation. Similarly, with tasks, tasks are submitted directly from worker to worker over gRPC; the Raylet only coordinates the scheduling decisions. In addition, small objects (<100KB in size) are no longer placed in the object store. They are inlined into task submissions and returns when possible.

Note: in some cases, reconstruction of large evicted objects is not possible with direct calls. To revert to the 0.7.7 behaviour, you can set the environment variable RAY_FORCE_DIRECT=0.

Core

  • [Dashboard] Add remaining features from old dashboard (#6489)
  • Ray Kubernetes Operator Part 1: readme, structure, config and CRD realted file (#6332)
  • Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486)
  • Avoid workers starting with the same random seed (#6471)
  • Properly handle a forwarded task that gets forwarded back (#6271)

RLlib

  • (Bug Fix): Remove the extra 0.5 in the Diagonal Gaussian entropy (#6475)
  • AlphaZero and Ranked reward implementation (#6385)

Tune

  • Add example and tutorial for DCGAN (#6400)
  • Report trials by state fairly (#6395)
  • Fixed bug in PBT where initial trial result is empty. (#6351)

Other Libraries

  • [sgd] Add support for multi-model multi-optimizer training (#6317)
  • [serve] Added deadline awareness (#6442)
  • [projects] Return parameters for a command (#6409)
  • [streaming] Streaming data transfer and python integration (#6185)

Thanks

We thank the following contributors for their work on this release:

@zplizzi, @istoica, @ericl, @mehrdadn, @walterddr, @ujvl, @alindkhare, @timgates42, @chaokunyang, @eugenevinitsky, @kfstorm, @Maltimore, @visatish, @simon-mo, @AmeerHajAli, @wumuzi520, @robertnishihara, @micafan, @pcmoritz, @zhijunfu, @edoakes, @sytelus, @ffbin, @richardliaw, @Qstar, @stephanie-wang, @Coac, @mitchellstern, @MissiontoMars, @deanwampler, @hhbyyh, @raulchen

ray-0.7.7

16 Dec 00:52
Compare
Choose a tag to compare

Ray 0.7.7 Release Notes

Highlights

  • Remote functions and actors now support kwargs and positionals (#5606).
  • ray.get now supports a timeout argument (#6107). If the object isn't available before the timeout passes, a RayTimeoutError is raised.
  • Ray now supports detached actors (#6036), which persist beyond the lifetime of the script that creates them and can be referred to by a user-defined name.
  • Added documentation for how to deploy Ray on YARN clusters using Skein (#6119, #6173).
  • The Ray scheduler now attempts to schedule tasks fairly to avoid starvation (#5851).

Core

RLlib

  • Now using pytorch's function to see if gpu is available. #5890
  • Fixed APEX priorities returning zero all the time. #5980
  • Fixed leak of TensorFlow assign operations in DQN/DDPG. #5979
  • Fixed choosing the wrong neural network model for Atari in 0.7.5. #6087
  • Added large scale regression test for RLlib. #6093
  • Fixed and added test for LR annealing config. #6101
  • Reduced log verbosity. #6154
  • Added a microbatch optimizer with an A2C example. #6161

Tune

  • Search algorithms now use early stopped trials for optimization. #5651
  • Metrics are now outputted via a tabular format. Errors are outputted on a separate table. #5822
  • In the distributed setting, checkpoints are now deleted automatically post-sync using an rsync flag. Checkpoints on the driver are garbage collected according to the policy defined by the user. #5877
  • A much faster ExperimentAnalysis tool. #5962
  • Trial executor callbacks now take in a “Runner” parameter. #5868
  • Fixed queue_trials so to enable cluster autoscaling with a CPU-Only Head Node. #5900
  • Added a TensorBoardX logger. #6133

Other Libraries

Thanks

We thank the following contributors for their amazing contributions:

@zhuohan123, @jovany-wang, @micafan, @richardliaw, @waldroje, @mitchellstern, @visatish, @mehrdadn, @istoica, @ericl, @adizim, @simon-mo, @lsklyut, @zhu-eric, @pcmoritz, @hhbyyh, @suquark, @sotte, @hershg, @pschafhalter, @stackedsax, @edoakes, @mawright, @stephanie-wang, @ujvl, @ashione, @couturierc, @AdamGleave, @robertnishihara, @DaveyBiggers, @daiyaanarfeen, @danyangz, @AmeerHajAli, @mimoralea

ray-0.7.6

24 Oct 18:00
Compare
Choose a tag to compare

Ray 0.7.6 Release Notes

Highlights

  • The Ray autoscaler now supports Kubernetes as a backend (#5492). This makes it possible to start a Ray cluster on top of your existing Kubernetes cluster with a simple shell command.

    • Please see the Kubernetes section of the autoscaler documentation to get started.
    • This is a new feature and may be rough around the edges. If you run into problems or have suggestions for how to improve Ray on Kubernetes, please file an issue.
  • The Ray cluster dashboard has been revamped (#5730, #5857) to improve the UI and include logs and error messages. More improvements will be coming in the near future.

    • You can try out the dashboard by starting Ray with ray.init(include_webui=True) or ray start --include-webui.
    • Please let us know if you have suggestions for what would be most useful to you in the new dashboard.

Core

  • Progress towards refactoring the Python worker on top of the core worker. #5750, #5771, #5752
  • Fix an issue in local mode where multiple actors didn't work properly. #5863
  • Fix class attributes and methods for actor classes. #5802
  • Improvements in error messages and handling. #5782, #5746, #5799
  • Serialization improvements. #5841, #5725
  • Various documentation improvements. #5801, #5792, #5414, #5747, #5780, #5582

RLlib

  • Added a link to BAIR blog posts in the documentation. #5762
  • Tracing for eager tensorflow policies with tf.function. #5705

Tune

  • Improved MedianStoppingRule. #5402
  • Add PBT + Memnn example. #5723
  • Add support for function-based stopping condition. #5754
  • Save/Restore for Suggestion Algorithms. #5719
  • TensorBoard HParams for TF2.0. #5678

Other Libraries

Thanks

We thank the following contributors for their amazing contributions:

@hershg, @JasonWayne, @kfstorm, @richardliaw, @batzner, @vakker, @robertnishihara, @stephanie-wang, @gehring, @edoakes, @zhijunfu, @pcmoritz, @mitchellstern, @ujvl, @simon-mo, @ecederstrand, @mawright, @ericl, @anthonyhsyu, @suquark, @waldroje

ray-0.7.5

25 Sep 00:07
Compare
Choose a tag to compare

Ray 0.7.5 Release Notes

Ray API

  • Objects created with ray.put() are now reference counted. #5590
  • Add internal pin_object_data() API. #5637
  • Initial support for pickle5. #5611
  • Warm up Ray on ray.init(). #5685
  • redis_address passed to ray.init is now just address. #5602

Core

  • Progress towards a common C++ core worker. #5516, #5272, #5566, #5664
  • Fix log monitor stall with many log files. #5569
  • Print warnings when tasks are unschedulable. #5555
  • Take into account resource queue lengths when autoscaling #5702, #5684

Tune

  • TF2.0 TensorBoard support. #5547, #5631
  • tune.function() is now deprecated. #5601

RLlib

Other Libraries

  • Complete rewrite of experimental serving library. #5562
  • Progress toward Ray projects APIs. #5525, #5632, #5706
  • Add TF SGD implementation for training. #5440
  • Many documentation improvements and bugfixes.