Add Soft Actor-Critic documentation (#87)

Corentin-pro · phillipleblanc · web-flow · commit 46b04a521c45 · 2021-12-08T22:39:34.000+09:00
* Add Soft Actor-Critic documentation

* Update spiceaidocs/content/en/deep-learning-ai/sac.md

* Add more doc updates for SAC

Co-authored-by: Phillip LeBlanc &lt;phillip@spiceai.io&gt;
diff --git a/spiceaidocs/content/en/deep-learning-ai/_index.md b/spiceaidocs/content/en/deep-learning-ai/_index.md
@@ -15,10 +15,11 @@ Spice.ai provides a standard interface that a deep learning algorithm can be imp
 
 By default, Spice.ai will use [Deep Q-Learning]({{<ref "deep-learning-ai/dql">}}). To use a different algorithm, call `spice train` with the parameter `--learning-algorithm` set to one of the following values:
 
-| --learning-algorithm | Algorithm                                                   |
-| -------------------- | ----------------------------------------------------------- |
-| dql                  | [Deep Q-Learning]({{<ref "deep-learning-ai/dql">}})         |
-| vpg                  | [Vanilla Policy Gradient]({{<ref "deep-learning-ai/vpg">}}) |
+| --learning-algorithm | Algorithm                                                        |
+| -------------------- | ---------------------------------------------------------------- |
+| dql                  | [Deep Q-Learning]({{<ref "deep-learning-ai/dql">}})              |
+| vpg                  | [Vanilla Policy Gradient]({{<ref "deep-learning-ai/vpg">}})      |
+| sacd                 | [Soft Actor-Critic (Discrete)]({{<ref "deep-learning-ai/sac">}}) |
 
 **Example**
 
diff --git a/spiceaidocs/content/en/deep-learning-ai/sac.md b/spiceaidocs/content/en/deep-learning-ai/sac.md
@@ -0,0 +1,14 @@
+---
+type: docs
+title: "Soft Actor-Critic"
+linkTitle: "Soft Actor-Critic"
+weight: 50
+description: Spice.ai implementation of the Soft Actor-Critic algorithm (SAC)
+---
+
+The SAC (Soft Actor-Critic) algorithm was developed in 2018. It is a off-policy, model-free reinforcement learning algorithm that aims not only at maximizing the reward but also the entropy (acting as randomly as possible). The entropy maximization helps exploring possibilities and trying actions that seems to be equally rewarding.
+
+The Spice.ai implementation of Soft Actor-Critic has been modified to work for discrete action sets.
+
+Berkeley AI Research blog: https://bair.berkeley.edu/blog/2018/12/14/sac/
+Arxiv paper: https://arxiv.org/abs/1801.01290