|
685 | 685 | "3. Computation of _feature vectors_ from a set of inputs, which will then be used as input for the decision tree" |
686 | 686 | ] |
687 | 687 | }, |
| 688 | + { |
| 689 | + "cell_type": "markdown", |
| 690 | + "metadata": {}, |
| 691 | + "source": [ |
| 692 | + "### Internal and \"Friendly\" Feature Names\n", |
| 693 | + "\n", |
| 694 | + "We use two kinds of _names_ for features:\n", |
| 695 | + "\n", |
| 696 | + "* _internal_ names have the form `<SYMBOL>@N` and refer to the `N`-th expansion of symbol (starting with 0).\n", |
| 697 | + " In `CALC_GRAMMAR`, for instance, `<function>@0` refers to the expansion of `<function>` to `\"sqrt\"`\n", |
| 698 | + "* _friendly_ names are more user-friendly (hence the name).\n", |
| 699 | + " The above feature `<function>@0` has the \"friendly\" name `<function> == \"sqrt\"`.\n", |
| 700 | + "\n", |
| 701 | + "We use internal names in all our interaction with the machine learner, as they are unambiguous and do not contain whitespace.\n", |
| 702 | + "When showing the final results, we switch to \"friendly\" names." |
| 703 | + ] |
| 704 | + }, |
688 | 705 | { |
689 | 706 | "cell_type": "markdown", |
690 | 707 | "metadata": {}, |
|
1003 | 1020 | "cell_type": "markdown", |
1004 | 1021 | "metadata": {}, |
1005 | 1022 | "source": [ |
1006 | | - "The `friendly` format is a bit more concise and more readable:" |
| 1023 | + "The `friendly` representation is a bit more concise and more readable:" |
1007 | 1024 | ] |
1008 | 1025 | }, |
1009 | 1026 | { |
|
3381 | 3398 | "If the predicate evaluates to `True`, follow the left path; if it evaluates to `False`, follow the right path.\n", |
3382 | 3399 | "A leaf node (no children) will give you the final decision `class = BUG` or `class = NO_BUG`.\n", |
3383 | 3400 | "\n", |
3384 | | - "So if the predicate states `<function> == 'sqrt' <= 0.5`, this means that if the function is _not_ `sqrt`, follow the left (`True`) path. If it is `sqrt`, follow the right (`False`) path.\n", |
| 3401 | + "So if the predicate states `<function> == 'sqrt' <= 0.5`, this means that\n", |
| 3402 | + "\n", |
| 3403 | + "* If the function is _not_ `sqrt` (the predicate `<function> == 'sqrt'` is negative, see above, and hence less than 0.5), follow the left (`True`) path.\n", |
| 3404 | + "* If the function _is_ `sqrt` (the predicate `<function> == 'sqrt'` is positive), follow the right (`False`) path.\n", |
3385 | 3405 | "\n", |
3386 | 3406 | "The `samples` field shows the number of sample inputs that contributed to this decision.\n", |
3387 | 3407 | "The `gini` field (aka Gini impurity) indicates how many samples fall into the displayed class (`BUG` or `NO_BUG`).\n", |
|
0 commit comments