diff --git a/docs/examples/batch-to-online.ipynb b/docs/examples/batch-to-online.ipynb
index 60f5892f99..3e517665c2 100644
--- a/docs/examples/batch-to-online.ipynb
+++ b/docs/examples/batch-to-online.ipynb
@@ -60,7 +60,7 @@
     "    ('lin_reg', linear_model.LogisticRegression(solver='lbfgs'))\n",
     "])\n",
     "\n",
-    "# Define a determistic cross-validation procedure\n",
+    "# Define a deterministic cross-validation procedure\n",
     "cv = model_selection.KFold(n_splits=5, shuffle=True, random_state=42)\n",
     "\n",
     "# Compute the MSE values\n",
@@ -356,7 +356,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The results seem to be exactly the same! The twist is that the running statistics won't be very accurate for the first few observations. In general though this doesn't matter too much. Some would even go as far as to say that this descrepancy is beneficial and acts as some sort of regularization...\n",
+    "The results seem to be exactly the same! The twist is that the running statistics won't be very accurate for the first few observations. In general though this doesn't matter too much. Some would even go as far as to say that this discrepancy is beneficial and acts as some sort of regularization...\n",
     "\n",
     "Now the idea is that we can compute the running statistics of each feature and scale them as they come along. The way to do this with River is to use the `StandardScaler` class from the `preprocessing` module, as so:"
    ]
diff --git a/docs/examples/building-a-simple-nowcasting-model.ipynb b/docs/examples/building-a-simple-nowcasting-model.ipynb
index 80e476d490..aa9b56a669 100644
--- a/docs/examples/building-a-simple-nowcasting-model.ipynb
+++ b/docs/examples/building-a-simple-nowcasting-model.ipynb
@@ -446,7 +446,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We've managed to get a good looking prediction curve with a reasonably simple model. What's more our model has the advantage of being interpretable and easy to debug. There surely are more rocks to squeeze (e.g. tune the hyperparameters, use an ensemble model, etc.) but we'll leave that as an exercice to the reader.\n",
+    "We've managed to get a good looking prediction curve with a reasonably simple model. What's more our model has the advantage of being interpretable and easy to debug. There surely are more rocks to squeeze (e.g. tune the hyperparameters, use an ensemble model, etc.) but we'll leave that as an exercise to the reader.\n",
     "\n",
     "As a finishing touch we'll rewrite our pipeline using the `|` operator, which is called a \"pipe\"."
    ]
diff --git a/docs/examples/content-personalization.ipynb b/docs/examples/content-personalization.ipynb
index 78dc55b340..f294e96f26 100644
--- a/docs/examples/content-personalization.ipynb
+++ b/docs/examples/content-personalization.ipynb
@@ -319,7 +319,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A good recommender model should at the very least understand what kind of items each user prefers. One of the simplest and yet performant way to do this is Simon Funk's SGD method he developped for the Netflix challenge and wrote about [here](https://sifter.org/simon/journal/20061211.html). It models each user and each item as latent vectors. The dot product of these two vectors is the expected preference of the user for the item."
+    "A good recommender model should at the very least understand what kind of items each user prefers. One of the simplest and yet performant way to do this is Simon Funk's SGD method he developed for the Netflix challenge and wrote about [here](https://sifter.org/simon/journal/20061211.html). It models each user and each item as latent vectors. The dot product of these two vectors is the expected preference of the user for the item."
    ]
   },
   {
diff --git a/docs/examples/sentence-classification.ipynb b/docs/examples/sentence-classification.ipynb
index a456ad5299..4ff1f8f412 100644
--- a/docs/examples/sentence-classification.ipynb
+++ b/docs/examples/sentence-classification.ipynb
@@ -814,7 +814,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The command below allows you to download the pre-trained embeddings that spaCy makes available. More informations about spaCy and its installation may be found here [here](https://spacy.io/usage)."
+    "The command below allows you to download the pre-trained embeddings that spaCy makes available. More information about spaCy and its installation may be found here [here](https://spacy.io/usage)."
    ]
   },
   {
diff --git a/docs/faq/index.md b/docs/faq/index.md
index 7218ed5149..564a897452 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -58,4 +58,4 @@ There are many great open-source libraries for building neural network models. W
 
 ## Who are the authors of this library?
 
-We are research engineers, graduate students, PhDs and machine learning researchers. The members of the develompent team are mainly located in France, Brazil and New Zealand.
+We are research engineers, graduate students, PhDs and machine learning researchers. The members of the development team are mainly located in France, Brazil and New Zealand.
diff --git a/docs/introduction/basic-concepts.md b/docs/introduction/basic-concepts.md
index 669ab4a7d6..ba21f89e06 100644
--- a/docs/introduction/basic-concepts.md
+++ b/docs/introduction/basic-concepts.md
@@ -44,7 +44,7 @@ Dictionaries are therefore a perfect fit. They're native to Python and have exce
 
 In production, you're almost always going to face data streams which you have to react to, such as users visiting your website. The advantage of online machine learning is that you can design models that make predictions as well as learn from this data stream as it flows.
 
-But of course, when you're developping a model, you don't usually have access to a real-time feed on which to evaluate your model. You usually have an offline dataset which you want to evaluate your model on. River provides some datasets which can be read in online manner, one sample at a time. It is however crucial to keep in mind that the goal is to reproduce a production scenario as closely as possible, in order to ensure your model will perform just as well in production.
+But of course, when you're developing a model, you don't usually have access to a real-time feed on which to evaluate your model. You usually have an offline dataset which you want to evaluate your model on. River provides some datasets which can be read in online manner, one sample at a time. It is however crucial to keep in mind that the goal is to reproduce a production scenario as closely as possible, in order to ensure your model will perform just as well in production.
 
 ## Model evaluation
 
diff --git a/docs/introduction/getting-started/concept-drift-detection.ipynb b/docs/introduction/getting-started/concept-drift-detection.ipynb
index 922936ec1e..1a2fb690b2 100644
--- a/docs/introduction/getting-started/concept-drift-detection.ipynb
+++ b/docs/introduction/getting-started/concept-drift-detection.ipynb
@@ -179,7 +179,7 @@
     }
    },
    "source": [
-    "We see that `ADWIN` successfully indicates the presence of drift (red vertical lines) close to the begining of a new data distribution.\n",
+    "We see that `ADWIN` successfully indicates the presence of drift (red vertical lines) close to the beginning of a new data distribution.\n",
     "\n",
     "\n",
     "---\n",
diff --git a/docs/recipes/active-learning.ipynb b/docs/recipes/active-learning.ipynb
index 15d39fa328..c1522146e1 100644
--- a/docs/recipes/active-learning.ipynb
+++ b/docs/recipes/active-learning.ipynb
@@ -196,7 +196,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Active learning is primarly used to label data in an efficient manner. However, in an online setting, active learning can also be used simply to speed up training. The point is that you can achieve a very good performance without training on an entire dataset. Active learning is a powerful way to decide which samples to train on."
+    "Active learning is primarily used to label data in an efficient manner. However, in an online setting, active learning can also be used simply to speed up training. The point is that you can achieve a very good performance without training on an entire dataset. Active learning is a powerful way to decide which samples to train on."
    ]
   },
   {
diff --git a/docs/recipes/cloning-and-mutating.ipynb b/docs/recipes/cloning-and-mutating.ipynb
index ebae777800..bfee592aa0 100644
--- a/docs/recipes/cloning-and-mutating.ipynb
+++ b/docs/recipes/cloning-and-mutating.ipynb
@@ -13,7 +13,7 @@
    "source": [
     "Sometimes you might want to reset a model, or edit (what we call mutate) its attributes. This can be useful in an online environment. Indeed, if you detect a drift, then you might want to mutate a model's attributes. Or if you see that a model's performance is plummeting, then you might to reset it to its \"factory settings\".\n",
     "\n",
-    "Anyway, this is not to convince you, but rather to say that a model's attributes don't have be to set in stone throughout its lifetime. In particular, if you're developping your own model, then you might want to have good tools to do this. This is what this recipe is about."
+    "Anyway, this is not to convince you, but rather to say that a model's attributes don't have be to set in stone throughout its lifetime. In particular, if you're developing your own model, then you might want to have good tools to do this. This is what this recipe is about."
    ]
   },
   {
@@ -332,9 +332,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "All attributes are immutable by default. Under the hood, each model can specify a set of mutable attributes via the `_mutable_attributes` property. In theory this can be overriden. But the general idea is that we will progressively add more and more mutable attributes with time.\n",
+    "All attributes are immutable by default. Under the hood, each model can specify a set of mutable attributes via the `_mutable_attributes` property. In theory this can be overridden. But the general idea is that we will progressively add more and more mutable attributes with time.\n",
     "\n",
-    "And that concludes this recipe. Arguably, this recipe caters to advanced users, and in particular users who are developping their own models. And yet, one could also argue that modifying parameters of a model on-the-fly is a great tool to have at your disposal when you're doing online machine learning."
+    "And that concludes this recipe. Arguably, this recipe caters to advanced users, and in particular users who are developing their own models. And yet, one could also argue that modifying parameters of a model on-the-fly is a great tool to have at your disposal when you're doing online machine learning."
    ]
   }
  ],
diff --git a/docs/recipes/on-hoeffding-trees.ipynb b/docs/recipes/on-hoeffding-trees.ipynb
index eb0218b68f..f8f4367787 100644
--- a/docs/recipes/on-hoeffding-trees.ipynb
+++ b/docs/recipes/on-hoeffding-trees.ipynb
@@ -26,7 +26,7 @@
     "\n",
     "In this guide, we are going to:\n",
     "\n",
-    "1. summarize the differences accross the multiple HT versions available;\n",
+    "1. summarize the differences across the multiple HT versions available;\n",
     "2. learn how to inspect tree models;\n",
     "3. learn how to manage the memory usage of HTs;\n",
     "4. compare numerical tree splitters and understand their impact on the iDT induction process.\n",
@@ -888,7 +888,7 @@
     "- $n$: Number of observations seen so far.\n",
     "- $c$: the number of classes.\n",
     "- $s$: the number of split points to evaluate (which means that this is a user-given parameter).\n",
-    "- $h$: the number of histogram bins or hash slots. Tipically, $h \\ll n$.\n",
+    "- $h$: the number of histogram bins or hash slots. Typically, $h \\ll n$.\n",
     "\n",
     "### 4.1. Classification tree splitters\n",
     "\n",
@@ -906,7 +906,7 @@
     "- The number of split points can be configured in the Gaussian splitter. Increasing this number makes this splitter slower, but it also potentially increases the quality of the obtained query points, implying enhanced tree accuracy. \n",
     "- The number of stored bins can be selected in the Histogram splitter. Increasing this number increases the memory footprint and running time of this splitter, but it also potentially makes its split candidates more accurate and positively impacts on the tree's final predictive performance.\n",
     "\n",
-    "Next, we provide a brief comparison of the classification splitters using 10K instances of the Random RBF synthetic dataset. Note that the tree equiped with the Exhaustive splitter does not use Naive Bayes leaves."
+    "Next, we provide a brief comparison of the classification splitters using 10K instances of the Random RBF synthetic dataset. Note that the tree equipped with the Exhaustive splitter does not use Naive Bayes leaves."
    ]
   },
   {
diff --git a/docs/releases/0.12.0.md b/docs/releases/0.12.0.md
index 54c75653c2..967a06c3fb 100644
--- a/docs/releases/0.12.0.md
+++ b/docs/releases/0.12.0.md
@@ -29,7 +29,7 @@
 ## drift
 
 - Refactor the concept drift detectors to match the remaining of River's API. Warnings are only issued by detectors that support this feature.
-- Drifts can be assessed via the property `drift_detected`. Warning signals can be acessed by the property `warning_detected`. The `update` now returns `self`.
+- Drifts can be assessed via the property `drift_detected`. Warning signals can be accessed by the property `warning_detected`. The `update` now returns `self`.
 - Ensure all detectors automatically reset their inner states after a concept drift detection.
 - Streamline `DDM`, `EDDM`, `HDDM_A`, and `HDDM_W`. Make the configurable parameters names match their respective papers.
 - Fix bugs in `EDDM` and `HDDM_W`.
diff --git a/docs/releases/0.19.0.md b/docs/releases/0.19.0.md
index 0c41605df4..02f6b2de5c 100644
--- a/docs/releases/0.19.0.md
+++ b/docs/releases/0.19.0.md
@@ -30,7 +30,7 @@ Calling `learn_one` in a pipeline will now update each part of the pipeline in t
 ## forest
 
 - Fixed issue with `forest.ARFClassifier` which couldn't be passed a `CrossEntropy` metric.
-- Fixed a bug in `forest.AMFClassifier` which slightly improves predictive accurary.
+- Fixed a bug in `forest.AMFClassifier` which slightly improves predictive accuracy.
 - Added `forest.AMFRegressor`.
 
 ## multioutput
diff --git a/docs/releases/0.8.0.md b/docs/releases/0.8.0.md
index 43ba8fa30b..8eee7be289 100644
--- a/docs/releases/0.8.0.md
+++ b/docs/releases/0.8.0.md
@@ -28,6 +28,6 @@
 
 ## tree
 
-- Unifed base class structure applied to all tree models.
+- Unified base class structure applied to all tree models.
 - Bug fixes.
 - Added `tree.SGTClassifier` and `tree.SGTRegressor`.