Skip to content

opencredo/log-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

log-classifier

Scripts:

train.py

  • trains on a set of training logs using various algorithms
  • saves training models as joblib pickle files
  • predicts accuracy of the training models
  • takes the following parameters:
    --train_data_dir : sets the location of the training logs (default: data/train/laptop)
    --test_data_dir : sets the location of the testing logs (default: data/test/laptop)
    --save-dir : set location where the joblib pickle files are saved to (default: save)

Install libraries

Make sure you have a recent version of python2.7 and python pip, then install the required libraries.

pip install numpy sklearn

Collect logs

Create data directories.

mkdir -p data/{train,test}/laptop

Create save directory

mkdir -p save

Collect logs

find /var/log -type f -size +10k -name "*.log" 2>/dev/null | while read log
do
  rows=$(wc -l "$log" | awk '{ print $1 }')
  head -$(($rows - ($rows / 10))) "$log" > data/train/laptop/"${log##*/}"
  tail -$(($rows / 10)) "$log" > data/test/laptop/"${log##*/}"
done

Run script

Run the script

python2.7 train.py

This should give something like the following:

Training log collection => 250587 data entries
Testing log collection => 27843 data entries

SGDClassifier
Success rate: 97.38%


MultinomialNB
Success rate: 98.64%


BernoulliNB
Success rate: 96.36%


DecisionTreeClassifier
Success rate: 95.26%


ExtraTreeClassifier
Success rate: 94.52%


ExtraTreesClassifier
Success rate: 99.21%


LinearSVC
Success rate: 99.17%


NearestCentroid
Success rate: 92.29%


RandomForestClassifier
Success rate: 99.06%


RidgeClassifier
Success rate: 99.16%

predict.py

  • loads training models from joblib pickle files
  • predicts accuracy of the training models
  • takes the following parameters:
    --test_data_dir : sets the location of the testing logs (default: data/test/laptop)
    --save-dir : set location where the joblib pickle files are saved to (default: save)
$ python2.7 predict.py
Testing log collection => 27843 data entries

SGDClassifier
Success rate: 97.38%


MultinomialNB
Success rate: 98.64%


BernoulliNB
Success rate: 96.36%


DecisionTreeClassifier
Success rate: 95.26%


ExtraTreeClassifier
Success rate: 94.52%


ExtraTreesClassifier
Success rate: 99.21%


LinearSVC
Success rate: 99.17%


NearestCentroid
Success rate: 92.29%


RandomForestClassifier
Success rate: 99.06%


RidgeClassifier
Success rate: 99.16%

Algorithms

Adjust the algorithms array to include any number of Scikit Learn algorithms that you want to run:

algorithms = [
#    svm.SVC(kernel='linear', C = 1.0),   # QUITE SLOW
    linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, random_state=42, max_iter=5, tol=None),
    naive_bayes.MultinomialNB(),
    naive_bayes.BernoulliNB(),
    tree.DecisionTreeClassifier(max_depth=1000),
    tree.ExtraTreeClassifier(),
    ensemble.ExtraTreesClassifier(),
    svm.LinearSVC(),
#    linear_model.LogisticRegressionCV(multi_class='multinomial'),   # A BIT SLOW
#    neural_network.MLPClassifier(),   # VERY SLOW
    neighbors.NearestCentroid(),
    ensemble.RandomForestClassifier(),
    linear_model.RidgeClassifier(),
]

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages