Skip to content

Commit f0152bc

Browse files
cmccarthy1dmorgankxDianeodConor McCarthy
authored
v4.0.0 (#12)
* Timeseries notebook update (#2) * updates for gs/rs * run with the removal of errors * change to images path required for .md display, update to AutoML notebooks to remove errors * addition of feature impact/confmat for automl * updated Automl to reflect NLP addition. Fixed dockerfile * removed image directory in docker * new clustering updates * hc fixes * ap fixes * added time series notebooks * updated docker to use pip to install ml requirements * added result show for ap * rename notebook * updated README * updated README * Delete 13 Time Series Forecasting.ipynb * time series review * added extra notes for TS notebook * update to time series notebook, change to utilities to use util namespace * Review of time series notebook and utils update (#3) * clustering updates * nlp updates * clustering and automl review * updated graphics * pulled updated version * general plotting functions * review of time series and utils update * cluster update Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> * Addition of time series notebooks. Updated docker pip installs (#1) * updates for gs/rs * run with the removal of errors * change to images path required for .md display, update to AutoML notebooks to remove errors * addition of feature impact/confmat for automl * updated Automl to reflect NLP addition. Fixed dockerfile * removed image directory in docker * new clustering updates * hc fixes * ap fixes * added time series notebooks * updated docker to use pip to install ml requirements * added result show for ap * rename notebook * updated README * updated README * Delete 13 Time Series Forecasting.ipynb * time series review * added extra notes for TS notebook * clustering updates * nlp updates * update to time series notebook, change to utilities to use util namespace * clustering and automl review * updated graphics * pulled updated version * general plotting functions * review of time series and utils update * cluster update Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: dmorgankx <[email protected]> * Update to clustering notebook to use kmeans dictionary inputs Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]>
1 parent ae088d1 commit f0152bc

28 files changed

+46313
-1343
lines changed

README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The Kx NLP library can be used to answer a variety of questions about unstructur
1616

1717
## ML-Toolkit
1818

19-
The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, and clustering algorithms.
19+
The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, clustering algorithms, time series forecasting models and feature engineering functions.
2020

2121
## AutoML
2222

@@ -47,6 +47,8 @@ The contents of the notebooks are as follows:
4747

4848
11. **Clustering**: Examples of how to use the k-means, DBSCAN, affinity propagation, hierarchical and CURE algorithms available within the ML-Toolkit are provided. The notebook demonstrates how to effectively visualize results produced and make use of scoring functions contained within the toolkit. A real-world application is also included.
4949

50+
12. **Time Series Forecasting**: The notebook looks at a variety of time series forecasting models contained within the ML-Toolkit such as AR, ARIMA and SARIMA models along with time series specific feature engineering tools for passing time series data to supervised machine learning models.
51+
5052
## Requirements
5153

5254
- kdb+>=? v3.5 64-bit
@@ -88,4 +90,4 @@ For subsequent runs, you will not be prompted to redo the license setup when cal
8890
docker start -ai mymlnotebooks
8991

9092

91-
**N.B.** [build instructions for the image are available](docker/README.md)
93+
**N.B.** [build instructions for the image are available](docker/README.md)

data/IMBD.csv

+25,001
Large diffs are not rendered by default.

data/london_merged.csv

+17,415
Large diffs are not rendered by default.

docker/Dockerfile

+2-3
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ FROM jupyterq AS mlnotebooks
1515

1616
COPY requirements.txt README.md /opt/kx/mlnotebooks/
1717
COPY data/ /opt/kx/mlnotebooks/data/
18-
COPY images/ /opt/kx/mlnotebooks/images/
1918
COPY notebooks/ /opt/kx/mlnotebooks/notebooks/
2019
COPY utils/ /opt/kx/mlnotebooks/utils/
2120
#hack, better way, tensorflow-gpu should be used if possible
@@ -65,10 +64,10 @@ USER kx
6564
RUN . /opt/conda/etc/profile.d/conda.sh \
6665
&& conda activate kx \
6766
&& conda install --file /opt/kx/nlp/requirements.txt \
68-
&& conda update wrapt \
6967
&& pip install -r /opt/kx/mlnotebooks/requirements.txt \
7068
&& conda install -c anaconda graphviz \
71-
&& conda install -c conda-forge --file /opt/kx/ml/requirements.txt \
69+
&& pip install pip==9.0.1 \
70+
&& pip install -r /opt/kx/ml/requirements.txt \
7271
&& conda install -c conda-forge --file /opt/kx/automl/requirements.txt \
7372
&& conda clean -y --all \
7473
&& python -m spacy download en \

notebooks/01 Decision Trees.ipynb

+114-113
Large diffs are not rendered by default.

notebooks/02 Random Forests.ipynb

+44-43
Large diffs are not rendered by default.

notebooks/03 Neural Networks.ipynb

+52-51
Large diffs are not rendered by default.

notebooks/04 Dimensionality Reduction.ipynb

+92-91
Large diffs are not rendered by default.

notebooks/05 Feature Engineering.ipynb

+27-26
Large diffs are not rendered by default.

notebooks/06 Feature Extraction and Selection.ipynb

+164-239
Large diffs are not rendered by default.

notebooks/07 Cross Validation.ipynb

+221-98
Large diffs are not rendered by default.

notebooks/08 Natural Language Processing.ipynb

+9-9
Original file line numberDiff line numberDiff line change
@@ -361,17 +361,17 @@
361361
],
362362
"source": [
363363
"/ plot occurence of top terms per chapter\n",
364-
"plt[`:figure][`figsize pykw 20 10];\n",
364+
".util.plt[`:figure][`figsize pykw 20 10];\n",
365365
"{a:exec chapter from tab where term=x;\n",
366366
" b:exec occurences from tab where term=x;\n",
367-
" plt[`:plot][a;b];\n",
367+
" .util.plt[`:plot][a;b];\n",
368368
" }each key 10#keywords; \n",
369369
"\n",
370-
"plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
371-
"plt[`:ylabel]\"Occurences\";\n",
372-
"plt[`:xlabel]\"Chapter\";\n",
373-
"plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
374-
"plt[`:show][];"
370+
".util.plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
371+
".util.plt[`:ylabel]\"Occurences\";\n",
372+
".util.plt[`:xlabel]\"Chapter\";\n",
373+
".util.plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
374+
".util.plt[`:show][];"
375375
]
376376
},
377377
{
@@ -1146,7 +1146,7 @@
11461146
"source": [
11471147
"#This table can then be used to plot a graph. The below example was rendered in Analyst for Kx, where node size represents email volume.\n",
11481148
"\n",
1149-
"<img src=\"../images/network.png\" />"
1149+
"<img src=\"images/network.png\" />"
11501150
]
11511151
},
11521152
{
@@ -1466,7 +1466,7 @@
14661466
"file_extension": ".q",
14671467
"mimetype": "text/x-q",
14681468
"name": "q",
1469-
"version": "3.6.0"
1469+
"version": "4.0"
14701470
}
14711471
},
14721472
"nbformat": 4,

notebooks/09 K Nearest Neighbours.ipynb

+21-20
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)