|
813 | 813 | "source": [
|
814 | 814 | "Although we tried to chose default model parameters that work well in a wide range of scenarios, hyperparameter search will often find an emulator model with a better fit. Internally, `AutoEmulate` compares the performance of different models and hyperparameters using cross-validation on the training data, which can be computationally expensive and time-consuming for larger datasets. To speed it up, we can parallelise the process with `n_jobs`.\n",
|
815 | 815 | "\n",
|
816 |
| - "For each model, we've pre-defined a search space for hyperparameters. When setting up `AutoEmulate` with `param_search=True`, we default to using random search with `param_search_iters = 20` iterations. We plan to add other hyperparameter search methods in the future. \n", |
| 816 | + "For each model, we've pre-defined a search space for hyperparameters. When setting up `AutoEmulate` with `param_search=True`, we default to using random search with `param_search_iters = 20` iterations. This means that 20 hyperparameter combinations from the search space are sampled and evaluated. We plan to add other hyperparameter search methods in the future. \n", |
817 | 817 | "\n",
|
818 |
| - "Let's do a hyperparameter search for the Gaussian Process and Random Forest models." |
| 818 | + "Let's do a hyperparameter search for the Support Vector Machines and Random Forest models." |
819 | 819 | ]
|
820 | 820 | },
|
821 | 821 | {
|
|
1352 | 1352 | ],
|
1353 | 1353 | "source": [
|
1354 | 1354 | "em = AutoEmulate()\n",
|
1355 |
| - "em.setup(X, y, param_search=True, param_search_type=\"random\", param_search_iters=20, models=[\"GaussianProcess\", \"RandomForest\"], n_jobs=-2) # n_jobs=-2 uses all cores but one\n", |
| 1355 | + "em.setup(X, y, param_search=True, param_search_type=\"random\", param_search_iters=10, models=[\"SupportVectorMachines\", \"RandomForest\"], n_jobs=-2) # n_jobs=-2 uses all cores but one\n", |
1356 | 1356 | "em.compare()"
|
1357 | 1357 | ]
|
1358 | 1358 | },
|
|
1427 | 1427 | "metadata": {},
|
1428 | 1428 | "source": [
|
1429 | 1429 | "**Notes**: \n",
|
1430 |
| - "* Some models, such as `GaussianProcess` can be slow to run hyperparameter search on larger datasets (say n > 1500). \n", |
| 1430 | + "* Some models, such as `GaussianProcess` can be slow when conducting hyperparameter search on larger datasets (say n > 1000). \n", |
1431 | 1431 | "* Use the `models` argument to only run hyperparameter search on a subset of models to speed up the process.\n",
|
1432 | 1432 | "* When possible, use `n_jobs` to parallelise the hyperparameter search. With larger datasets, we recommend setting `param_search_iters` to a lower number, such as 5, to see how long it takes to run and then increase it if necessary.\n",
|
1433 | 1433 | "* all models can be specified with short names too, such as `rf` for `RandomForest`, `gp` for `GaussianProcess`, `svm` for `SupportVectorMachines`, etc"
|
|
0 commit comments