Skip to content

Commit df5de7f

Browse files
committed
misc: add description of algorithm and usage example to README
1 parent 9ee81cf commit df5de7f

File tree

1 file changed

+70
-1
lines changed

1 file changed

+70
-1
lines changed

README.md

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,74 @@
77
[![cover.run go](https://cover.run/go/github.com/e-XpertSolutions/go-iforest/iforest.svg)](https://cover.run/go/github.com/e-XpertSolutions/go-iforest/iforest)
88

99

10+
GO implementation of Isolation Forest algorithm.
11+
12+
Isolation Forest is an unsupervised learning algorithm that is able to detect anomalies (data patterns that differ from normal instances). Detection is performed by recursive data partitioning, which can be represented by a tree structure. At each iteration data is splitted using randomly chosen feature and its value (random number between maximum and minimum value of chosen feature). Due to the fact that anomalies are rare and different from other instances, smaller number of partitions is needed to isolate them. This is equivalent to the path length in created tree. Shorter path means that given instance can be an anomaly. To improve accuracy the ensemble of such trees is created and result is averaged over all trees.
13+
14+
To get more information about algorithm, please refer to this paper: [IFOREST](https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf).
15+
16+
## Installation
17+
18+
go get -u github.com/e-XpertSolutions/go-iforest
19+
20+
## Usage
21+
22+
This example shows how to use the Isolation Forest. You will need to load the data, initialize iforest with proper parameters and use two functions: Train(), Test() to create the model. First one is used to build the trees, second one to find proper "anomaly threshold" and detect anomalies in given data. After that you can pass new instances to Predict() which result in labeling them as normal "0" or anomaly "1".
23+
It is possible to use parallel versions of testing and detecting functions - they use multiple go routines to speed up computations.
24+
Created models can be saved and read from the files using Save() and Load() methods.
25+
26+
```go
27+
package main
28+
29+
import(
30+
"fmt"
31+
"github.com/e-XpertSolutions/go-iforest/iforest"
32+
)
33+
34+
func main(){
35+
36+
// input data must be loaded into two dimensional array of the type float64
37+
// please note: loadData() is some custom function - not included in the
38+
// library
39+
var inputData [][]float64
40+
inputData = loadData("filename")
41+
42+
43+
// input parameters
44+
treesNumber := 100
45+
subsampleSize := 256
46+
outliersRatio := 0.01
47+
routinesNumber := 10
48+
49+
//model initialization
50+
forest := iforest.NewForest(treesNumber, subsampleSize, outliers)
51+
52+
53+
//training stage - creating trees
54+
forest.Train(inputData)
55+
56+
//testing stage - finding anomalies
57+
//Test or TestParaller can be used, concurrent version needs one additional
58+
// parameter
59+
forest.Test(inputData)
60+
forest.TestParallel(inputData, routinesNumber)
61+
62+
//after testing it is possible to access anomaly scores, anomaly bound
63+
// and labels for the input dataset
64+
threshold := forest.AnomalyBound
65+
anomalyScores := forest.AnomalyScores
66+
labelsTest := forest.Labels
67+
68+
//to get information about new instances pass them to the Predict function
69+
// to speed up computation use concurrent version of Predict
70+
var newData [][]float64
71+
newData = loadData("someNewInstances")
72+
labels, scores := forest.Predict(newData)
73+
74+
75+
}
76+
```
77+
1078
## Contributing
1179

1280
Contributions are greatly appreciated. The project follows the typical
@@ -17,4 +85,5 @@ for contribution.
1785
## License
1886

1987
The sources are release under a BSD 3-Clause License. The full terms of that
20-
license can be found in `LICENSE` file of this repository.
88+
license can be found in `LICENSE` file of this repository.
89+

0 commit comments

Comments
 (0)