Skip to content

Commit f85f12c

Browse files
committed
Fix spelling mistakes in the code
1 parent 5531ae5 commit f85f12c

32 files changed

+46
-46
lines changed

river/anomaly/lof.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -149,17 +149,17 @@ class LocalOutlierFactor(anomaly.base.AnomalyDetector):
149149
150150
The algorithm take into account the following elements:
151151
- `NewPoints`: new points;
152-
- `kNN(p)`: the k-nearest neighboors of `p` (the k-closest points to `p`);
153-
- `RkNN(p)`: the reverse-k-nearest neighboors of `p` (points that have `p` as one of their neighboors);
152+
- `kNN(p)`: the k-nearest neighbors of `p` (the k-closest points to `p`);
153+
- `RkNN(p)`: the reverse-k-nearest neighbors of `p` (points that have `p` as one of their neighbors);
154154
- `set_upd_lrd`: Set of points that need to have the local reachability distance updated;
155155
- `set_upd_lof`: Set of points that need to have the local outlier factor updated.
156156
157157
This current implementation within `River`, based on the original one in the paper, follows the following steps:
158158
1) Insert new data points (`NewPoints`) and calculate its distance to existing points;
159-
2) Update the nreaest neighboors and reverse nearest neighboors of all the points;
159+
2) Update the nearest neighbors and reverse nearest neighbors of all the points;
160160
3) Define sets of affected points that required updates;
161-
4) Calculate the reachability-distance from new point to neighboors (`NewPoints` -> `kNN(NewPoints)`)
162-
and from rev-neighboors to new point (`RkNN(NewPoints)` -> `NewPoints`);
161+
4) Calculate the reachability-distance from new point to neighbors (`NewPoints` -> `kNN(NewPoints)`)
162+
and from rev-neighbors to new point (`RkNN(NewPoints)` -> `NewPoints`);
163163
5) Update the reachability-distance for affected points: `RkNN(RkNN(NewPoints))` -> `RkNN(NewPoints)`
164164
6) Update local reachability distance of affected points: `lrd(set_upd_lrd)`;
165165
7) Update local outlier factor: `lof(set_upd_lof)`.

river/base/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -415,7 +415,7 @@ def log_method_calls(
415415
):
416416
"""A context manager to log method calls.
417417
418-
All method calls will be logged by default. This behavior can be overriden by passing filtering
418+
All method calls will be logged by default. This behavior can be overridden by passing filtering
419419
functions.
420420
421421
Parameters

river/cluster/dbstream.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ class DBSTREAM(base.Clusterer):
1313
1414
DBSTREAM [^1] is a clustering algorithm for evolving data streams.
1515
It is the first micro-cluster-based online clustering component that
16-
explicitely captures the density between micro-clusters via a shared
16+
explicitly captures the density between micro-clusters via a shared
1717
density graph. The density information in the graph is then exploited
1818
for reclustering based on actual density between adjacent micro clusters.
1919

river/cluster/odac.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ class ODAC(base.Clusterer):
7373
├── CH1_LVL_3 d1=0.71 [5, 6]
7474
└── CH2_LVL_3 d1=0.71 [7, 8]
7575
76-
You can acess some properties of the clustering model directly:
76+
You can access some properties of the clustering model directly:
7777
7878
>>> model.n_clusters
7979
11

river/cluster/streamkmeans.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ class STREAMKMeans(base.Clusterer):
1111
1212
However, instead of using the traditional k-means, which requires a total reclustering
1313
each time the temporary chunk of data points is full, the implementation of this algorithm
14-
uses an increamental k-means.
14+
uses an incremental k-means.
1515
1616
At first, the cluster centers are initialized with a `KMeans` instance. For a new point `p`:
1717

river/cluster/textclust.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -583,7 +583,7 @@ def merge(self, microcluster, t, omega, fading_factor, term_fading, realtime):
583583
microcluster.fade(t, omega, fading_factor, term_fading, realtime)
584584

585585
self.time = t
586-
# here we merge an existing mc wth the current mc. The tf values as well as the ids have to be transferred
586+
# here we merge an existing mc with the current mc. The tf values as well as the ids have to be transferred
587587
for k in list(microcluster.tf.keys()):
588588
if k in self.tf:
589589
self.tf[k]["tf"] += microcluster.tf[k]["tf"]

river/compose/renamer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ class Renamer(base.Transformer):
1111
Parameters
1212
----------
1313
mapping
14-
Dictionnary describing substitution rules. Keys in `mapping` that are not a feature's name are silently ignored.
14+
Dictionary describing substitution rules. Keys in `mapping` that are not a feature's name are silently ignored.
1515
1616
Examples
1717
--------

river/datasets/restaurants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
class Restaurants(base.RemoteDataset):
1111
"""Data from the Kaggle Recruit Restaurants challenge.
1212
13-
The goal is to predict the number of visitors in each of 829 Japanese restaurants over a priod
13+
The goal is to predict the number of visitors in each of 829 Japanese restaurants over a period
1414
of roughly 16 weeks. The data is ordered by date and then by restaurant ID.
1515
1616
References

river/drift/dummy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ class DummyDriftDetector(base.DriftDetector):
8282
The 'w' value must be greater than zero when 'trigger_method' is 'random'.
8383
8484
Since we set `dynamic_cloning` to `True`, a clone of the periodic trigger will
85-
have its internal paramenters changed:
85+
have its internal parameters changed:
8686
8787
>>> rtrigger = rtrigger.clone()
8888
>>> for i, v in enumerate(data):

river/drift/retrain.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ class DriftRetrainingClassifier(base.Wrapper, base.Classifier):
77
"""Drift retraining classifier.
88
99
This classifier is a wrapper for any classifier. It monitors the incoming data for concept
10-
drifts and warnings in the model's accurary. In case a warning is detected, a background model
10+
drifts and warnings in the model's accuracy. In case a warning is detected, a background model
1111
starts to train. If a drift is detected, the model will be replaced by the background model,
1212
and the background model will be reset.
1313

river/ensemble/boosting.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ def learn_one(self, x, y, **kwargs):
303303
# the best model's not yet trained will receive lambda values for training from the model's that correctly classified an instance.
304304
# the values of lambda increase in case a mistake is made and decrease in case a right prediction is made.
305305
# the worst models are more likely to make mistakes, increasing the value of lambda.
306-
# Then, the best's model are likely to receive a high value of lambda and decreasing gradually throughout the remaning models to be trained
306+
# Then, the best's model are likely to receive a high value of lambda and decreasing gradually throughout the remaining models to be trained
307307
# It's similar to a system where the rich get richer.
308308
for i in range(self.n_models):
309309
if correct:

river/imblearn/random.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ class RandomSampler(ClassificationSampler):
194194
desired_dist
195195
The desired class distribution. The keys are the classes whilst the values are the desired
196196
class percentages. The values must sum up to 1. If set to `None`, then the observations
197-
will be sampled uniformly at random, which is stricly equivalent to using
197+
will be sampled uniformly at random, which is strictly equivalent to using
198198
`ensemble.BaggingClassifier`.
199199
sampling_rate
200200
The desired ratio of data to sample.

river/linear_model/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def _fit(self, x, y, w, get_grad):
125125
def _update_weights(self, x):
126126
# L1 cumulative penalty helper
127127

128-
# Apply penalty to each weight iteratively, with the potential of being parrallelized by using VectorDict
128+
# Apply penalty to each weight iteratively, with the potential of being parallelized by using VectorDict
129129
for j, xj in x.items():
130130
wj_temp = self._weights[j]
131131

river/linear_model/test_glm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def test_finite_differences(lm, dataset):
8888

8989
# d is a set of weight perturbations
9090
for d in iter_perturbations(weights.keys()):
91-
# Pertubate the weights and obtain the loss with the new weights
91+
# Perturb the weights and obtain the loss with the new weights
9292
lm._weights = utils.VectorDict({i: weights[i] + eps * di for i, di in d.items()})
9393
forward = lm.loss(y_true=y, y_pred=lm._raw_dot_one(x))
9494
lm._weights = utils.VectorDict({i: weights[i] - eps * di for i, di in d.items()})

river/metrics/expected_mutual_info.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def expected_mutual_info(confusion_matrix):
4141
the AMI will be one order of magnitude slower than most other implemented metrics.
4242
4343
Note that, different form most of the implementations of other mutual information metrics,
44-
the expected mutual information wil be implemented using numpy arrays. This implementation
44+
the expected mutual information will be implemented using numpy arrays. This implementation
4545
inherits from the implementation of the expected mutual information in scikit-learn.
4646
4747
Parameters

river/metrics/mutual_info.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ class NormalizedMutualInfo(metrics.base.MultiClassMetric):
119119
agreement solely due to chance); as a result, the Adjusted Mutual Info Score will mostly be preferred.
120120
However, this metric is still symmetric, which means that switching true and predicted labels will not
121121
alter the score value. This fact can be useful when the metric is used to measure the agreement between
122-
two indepedent label solutions on the same dataset, when the ground truth remains unknown.
122+
two independent label solutions on the same dataset, when the ground truth remains unknown.
123123
124124
Another advantage of the metric is that as it is based on the calculation of entropy-related measures,
125125
it is independent of the permutation of class/cluster labels.

river/metrics/rand.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ class AdjustedRand(metrics.base.MultiClassMetric):
119119
120120
The Adjusted Rand Index is the corrected-for-chance version of the Rand Index [^1] [^2].
121121
Such a correction for chance establishes a baseline by using the expected similarity
122-
of all pair-wise comparisions between clusterings specified by a random model.
122+
of all pair-wise comparisons between clusterings specified by a random model.
123123
124124
Traditionally, the Rand Index was corrected using the Permutation Model for Clustering.
125125
However, the premises of the permutation model are frequently violated; in many

river/metrics/silhouette.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ class Silhouette(metrics.base.ClusteringMetric):
5151
5252
References
5353
----------
54-
[^1]: Rousseeuw, P. (1987). Silhouettes: a graphical aid to the intepretation and validation
54+
[^1]: Rousseeuw, P. (1987). Silhouettes: a graphical aid to the interpretation and validation
5555
of cluster analysis 20, 53 - 65. DOI: 10.1016/0377-0427(87)90125-7
5656
[^2]: Bifet, A. et al. (2018). "Machine Learning for Data Streams".
5757
DOI: 10.7551/mitpress/10654.001.0001.

river/metrics/vbeta.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ class Completeness(metrics.base.MultiClassMetric):
120120
the proposed cluster distribution given the class of the component data points.
121121
However, in the worst case scenario, each class is represented by every cluster
122122
with a distribution equal to the distribution of cluster sizes. Therefore,
123-
symmetric to the claculation above, we define completeness as:
123+
symmetric to the calculation above, we define completeness as:
124124
125125
$$
126126
c = \begin{cases}
@@ -209,7 +209,7 @@ class VBeta(metrics.base.MultiClassMetric):
209209
It provides an elegant solution to many problems that affect previously defined
210210
cluster evaluation measures including
211211
212-
* Dependance of clustering algorithm or dataset,
212+
* Dependence of clustering algorithm or dataset,
213213
214214
* The "problem of matching", where the clustering of only a portion of data
215215
points are evaluated, and

river/misc/sdft.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ class SDFT(base.Base):
3838
3939
References
4040
----------
41-
[^1]: [Jacobsen, E. asample_average.pynd Lyons, R., 2003. The sliding DFT. IEEE Signal Processing Magazine, 20(2), pp.74-80.](https://www.comm.utoronto.ca/~dimitris/ece431/slidingdft.pdf)
41+
[^1]: [Jacobsen, E. and Lyons, R., 2003. The sliding DFT. IEEE Signal Processing Magazine, 20(2), pp.74-80.](https://www.comm.utoronto.ca/~dimitris/ece431/slidingdft.pdf)
4242
[^2]: [Understanding and Implementing the Sliding DFT](https://www.dsprelated.com/showarticle/776.php)
4343
4444
"""

river/naive_bayes/bernoulli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ def joint_log_likelihood_many(self, X: pd.DataFrame) -> pd.DataFrame:
261261
X[missing] = 0
262262
if is_sparse:
263263
# The new values need to be converted to preserve the sparseness of the dataframe.
264-
# Input values can be intergers or floats, converting all to float preserves the behaviour without the need for complex conversion logic.
264+
# Input values can be integers or floats, converting all to float preserves the behaviour without the need for complex conversion logic.
265265
X = X.astype(pd.SparseDtype(float, 0.0))
266266

267267
index, columns = X.index, X.columns

river/naive_bayes/test_naive_bayes.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def test_learn_one_methods(model):
113113
def test_learn_many_vs_learn_one(model, batch_model):
114114
"""Assert that the Naive Bayes river models provide the same results when learning in
115115
incremental and mini-batch modes. The models tested are MultinomialNB, BernoulliNB and
116-
ComplementNB with differents alpha parameters..
116+
ComplementNB with different alpha parameters..
117117
"""
118118
for x, y in yield_dataset():
119119
model.learn_one(x, y)
@@ -134,11 +134,11 @@ def test_learn_many_vs_learn_one(model, batch_model):
134134
batch_model.predict_proba_many(x_batch)["no"][0]
135135
)
136136

137-
# Assert class probabilities are the same when trainig Naive Bayes in pure online and in batch.
137+
# Assert class probabilities are the same when training Naive Bayes in pure online and in batch.
138138
assert model["model"].p_class("yes") == batch_model["model"].p_class("yes")
139139
assert model["model"].p_class("no") == batch_model["model"].p_class("no")
140140

141-
# Assert conditionnal probabilities are the same when training Naive Bayes in pure online and
141+
# Assert conditional probabilities are the same when training Naive Bayes in pure online and
142142
# in batch.
143143
if isinstance(model["model"], naive_bayes.BernoulliNB) or isinstance(
144144
model["model"], naive_bayes.MultinomialNB
@@ -213,7 +213,7 @@ def test_river_vs_sklearn(model, sk_model, bag):
213213
"""Assert that river Naive Bayes models and sklearn Naive Bayes models provide the same results
214214
when the input data are the same. Also check that the behaviour of Naives Bayes models are the
215215
same with dense and sparse dataframe. Models tested are MultinomialNB, BernoulliNB and
216-
ComplementNB with differents alpha parameters.
216+
ComplementNB with different alpha parameters.
217217
"""
218218
for x, y in yield_batch_dataset():
219219
model.learn_many(x, y)

river/neighbors/knn_regressor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ def _check_aggregation_method(self, method):
9292
Parameters
9393
----------
9494
method
95-
The suplied aggregration method.
95+
The supplied aggreration method.
9696
"""
9797
if method not in {self._MEAN, self._MEDIAN, self._WEIGHTED_MEAN}:
9898
raise ValueError(

river/proba/gaussian.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,11 +270,11 @@ def __repr__(self):
270270
return f"𝒩(\n μ=({mu_str}),\n σ^2=(\n{var_str}\n )\n)"
271271

272272
def update(self, x):
273-
# TODO: add support for weigthed samples
273+
# TODO: add support for weighted samples
274274
self._var.update(x)
275275

276276
def revert(self, x):
277-
# TODO: add support for weigthed samples
277+
# TODO: add support for weighted samples
278278
self._var.revert(x)
279279

280280
def __call__(self, x: dict[str, float]):
@@ -285,11 +285,11 @@ def __call__(self, x: dict[str, float]):
285285
try:
286286
pdf_ = multivariate_normal([*self.mu.values()], var).pdf(x_)
287287
return float(pdf_)
288-
# TODO: validate occurence of ValueError
288+
# TODO: validate occurrence of ValueError
289289
# The input matrix must be symmetric positive semidefinite.
290290
except ValueError: # pragma: no cover
291291
return 0.0
292-
# TODO: validate occurence of OverflowError
292+
# TODO: validate occurrence of OverflowError
293293
except OverflowError: # pragma: no cover
294294
return 0.0
295295
return 0.0 # pragma: no cover

river/sketch/counter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ class Counter(base.Base):
8787
>>> cms[532]
8888
15
8989
90-
Keep in mind that CMS is an approximate sketch algorithm. Couting estimates for unseen values
90+
Keep in mind that CMS is an approximate sketch algorithm. Counting estimates for unseen values
9191
might not be always reliable:
9292
9393
>>> cms[1001]

river/stats/kolmogorov_smirnov.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,15 +168,15 @@ class KolmogorovSmirnov(stats.base.Bivariate):
168168
$$
169169
170170
This implementation is the incremental version of the previously mentioned statistics, with the change being in
171-
the ability to insert and remove an observation thorugh time. This can be done using a randomized tree called
171+
the ability to insert and remove an observation through time. This can be done using a randomized tree called
172172
Treap (or Cartesian Tree) [^2] with bulk operation and lazy propagation.
173173
174174
The implemented algorithm is able to perform the insertion and removal operations
175175
in O(logN) with high probability and calculate the Kolmogorov-Smirnov test in O(1),
176176
where N is the number of sample observations. This is a significant improvement compared
177177
to the O(N logN) cost of non-incremental implementation.
178178
179-
This implementation also supports the calculation of the Kuiper statistics. Different from the orginial
179+
This implementation also supports the calculation of the Kuiper statistics. Different from the original
180180
Kolmogorov-Smirnov statistics, Kuiper's test [^3] calculates the sum of the absolute sizes of the most positive and
181181
most negative differences between the two cumulative distribution functions taken into account. As such,
182182
Kuiper's test is very sensitive in the tails as at the median.

river/stats/test_stats.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ def test_bivariate(stat, func):
251251
],
252252
)
253253
def test_rolling_bivariate(stat, func):
254-
# Enough alread
254+
# Enough already
255255

256256
def tail(iterable, n):
257257
return collections.deque(iterable, maxlen=n)

river/stream/iter_vaex.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ def iter_vaex(
1616
Parameters
1717
----------
1818
X
19-
A vaex DataFrame housing the training featuers.
19+
A vaex DataFrame housing the training features.
2020
y
2121
The column or expression containing the target variable.
2222
features

river/tree/mondrian/mondrian_tree_classifier.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ def _compute_split_time(
218218
if isinstance(node, MondrianLeafClassifier):
219219
return split_time
220220
# Otherwise we apply Mondrian process dark magic :)
221-
# 1. We get the creation time of the childs (left and right is the same)
221+
# 1. We get the creation time of the children (left and right is the same)
222222
left, _ = node.children
223223
child_time = left.time
224224
# 2. We check if splitting time occurs before child creation time
@@ -400,7 +400,7 @@ def _go_downwards(self, x, y):
400400
current_node = left
401401

402402
# This is the leaf containing the sample point (we've just
403-
# splitted the current node with the data point)
403+
# split the current node with the data point)
404404
leaf = current_node
405405
self._update_downwards(x, y, leaf, False)
406406
return leaf

river/tree/mondrian/mondrian_tree_regressor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def _compute_split_time(
144144
if isinstance(node, MondrianLeafRegressor):
145145
return split_time
146146
# Otherwise we apply Mondrian process dark magic :)
147-
# 1. We get the creation time of the childs (left and right is the same)
147+
# 1. We get the creation time of the children (left and right is the same)
148148
left, _ = node.children
149149
child_time = left.time
150150
# 2. We check if splitting time occurs before child creation time
@@ -326,7 +326,7 @@ def _go_downwards(self):
326326
current_node = left
327327

328328
# This is the leaf containing the sample point (we've just
329-
# splitted the current node with the data point)
329+
# split the current node with the data point)
330330
leaf = current_node
331331
self._update_downwards(leaf, False)
332332
return leaf

0 commit comments

Comments
 (0)