|
| 1 | +### Architecture |
| 2 | + |
| 3 | +This package uses R6 classes organized around a dictionary registry pattern. |
| 4 | + |
| 5 | +#### Class hierarchy |
| 6 | + |
| 7 | +- `Learner` > `LearnerClassif` / `LearnerRegr` > concrete (e.g., `LearnerClassifRpart`) |
| 8 | +- `Task` > `TaskSupervised` > `TaskClassif` / `TaskRegr` |
| 9 | +- `Measure` > `MeasureClassif` / `MeasureRegr` / `MeasureSimilarity` |
| 10 | +- `Resampling` > `ResamplingCV`, `ResamplingHoldout`, etc. |
| 11 | +- `DataBackend` > `DataBackendDataTable`, `DataBackendCbind`, etc. |
| 12 | +- `Prediction` > `PredictionClassif` / `PredictionRegr` |
| 13 | + |
| 14 | +#### File naming |
| 15 | + |
| 16 | +- One R6 class per file, named exactly as the class: `LearnerClassifRpart.R` contains `LearnerClassifRpart`. |
| 17 | +- Named dataset tasks use an underscore: `TaskClassif_iris.R`. |
| 18 | +- Dictionary files: `mlr_learners.R`, `mlr_tasks.R`, etc. |
| 19 | + |
| 20 | +#### Dictionary system |
| 21 | + |
| 22 | +Objects are registered in dictionaries and accessed via sugar functions: |
| 23 | + |
| 24 | +| Dictionary | Sugar | Example | |
| 25 | +|-----------------------|----------------------|----------------------------------| |
| 26 | +| `mlr_learners` | `lrn()` / `lrns()` | `lrn("classif.rpart", cp = 0.1)` | |
| 27 | +| `mlr_tasks` | `tsk()` / `tsks()` | `tsk("iris")` | |
| 28 | +| `mlr_measures` | `msr()` / `msrs()` | `msr("classif.ce")` | |
| 29 | +| `mlr_resamplings` | `rsmp()` / `rsmps()` | `rsmp("cv", folds = 5)` | |
| 30 | +| `mlr_task_generators` | `tgen()` / `tgens()` | `tgen("friedman1")` | |
| 31 | + |
| 32 | +Every new object **must** be registered at the bottom of its file: |
| 33 | + |
| 34 | +```r |
| 35 | +#' @include mlr_learners.R |
| 36 | +mlr_learners$add("classif.rpart", function() LearnerClassifRpart$new()) |
| 37 | +``` |
| 38 | + |
| 39 | +#### Collation order |
| 40 | + |
| 41 | +Derived classes must declare `#' @include ParentClass.R` in their roxygen header. This controls the `Collate:` field in DESCRIPTION so base classes load before derived classes. |
| 42 | + |
| 43 | +#### Hyperparameters (paradox) |
| 44 | + |
| 45 | +Parameters are defined with `paradox::ps()` and must be tagged `"train"` or `"predict"`: |
| 46 | + |
| 47 | +```r |
| 48 | +ps = ps( |
| 49 | + cp = p_dbl(0, 1, default = 0.01, tags = "train"), |
| 50 | + keep_model = p_lgl(default = FALSE, tags = "train") |
| 51 | +) |
| 52 | +``` |
| 53 | + |
| 54 | +In `.train()` / `.predict()`, retrieve values with `self$param_set$get_values(tags = "train")`. |
| 55 | + |
| 56 | +There is a distinction between `default` and `init` values: |
| 57 | +- `default` describes the behavior when a parameter is not set at all (i.e., the upstream function's default). It is informational only. |
| 58 | +- `init` (via `p_xxx(init = ...)`) sets the parameter to a value upon construction. Use this when the mlr3 default should differ from the upstream default. |
| 59 | +- A parameter tagged `"required"` causes an error if not set. A required parameter cannot have a `default` (that would be contradictory). |
| 60 | +- paradox does type-checking and range-checking automatically; `get_values()` checks that required params are present. Additional feasibility checks are rarely needed. |
| 61 | + |
| 62 | +#### Core dependencies |
| 63 | + |
| 64 | +`data.table`, `checkmate`, `mlr3misc`, `paradox`, `R6`, and `cli` are imported wholesale. Use their functions directly without `::`. Key mlr3misc utilities: `map()`, `map_chr()`, `invoke()`, `calculate_hash()`, `str_collapse()`, `%nin%`, `%??%`. |
| 65 | + |
| 66 | +#### Error handling |
| 67 | + |
| 68 | +Use structured error/warning functions from mlr3misc: `error_config()`, `error_input()`, `error_learner_train()`, `error_learner_predict()`, `warning_config()`, `warning_input()`. These support `sprintf`-style formatting. |
| 69 | + |
| 70 | +#### Reflections |
| 71 | + |
| 72 | +`mlr_reflections` is an environment that stores allowed types, properties, and roles. Extension packages modify it to register new task types. Check it when adding new properties or feature types. |
| 73 | + |
| 74 | +### Testing |
| 75 | + |
| 76 | +- Tests for `R/{name}.R` go in `tests/testthat/test_{name}.R`. |
| 77 | +- All new code should have an accompanying test. |
| 78 | +- If there are existing tests, place new tests next to similar existing tests. |
| 79 | +- Strive to keep your tests minimal with few comments. |
| 80 | +- The full test suite takes a long time. Only run tests relevant to your changes with `devtools::test(filter = '^{name}')`. |
| 81 | +- New learners must pass `run_autotest()` and `run_paramtest()`. |
| 82 | +- Use shared assertion helpers: `expect_learner()`, `expect_task()`, `expect_resampling()`, `expect_measure()`, `expect_prediction()`. |
| 83 | +- Shared test infrastructure lives in `inst/testthat/` and is sourced by extension packages too. |
| 84 | + |
| 85 | +### Documentation |
| 86 | + |
| 87 | +- Every user-facing function should be exported and have roxygen2 documentation. |
| 88 | +- Wrap roxygen comments at 120 characters. |
| 89 | +- Write one sentence per line. |
| 90 | +- If a sentence exceeds the limit, break at a comma, "and", "or", "but", or other appropriate point. |
| 91 | +- Internal functions should not have roxygen documentation. |
| 92 | +- Whenever you add a new (non-internal) documentation topic, also add the topic to `_pkgdown.yml`. |
| 93 | +- Always re-document the package after changing a roxygen2 comment. |
| 94 | +- Use `pkgdown::check_pkgdown()` to check that all topics are included in the reference index. |
| 95 | +- Don’t hand-edit generated artifacts: `man/`, or `NAMESPACE`. |
| 96 | +- Roxygen templates live in `man-roxygen/` (e.g., `@template learner`, `@template param_id`). Use `@templateVar` to pass values. |
| 97 | +- Bibliographic references go in `R/bibentries.R` and are cited with `` `r format_bib("key")` ``. |
| 98 | +- Man page names for dictionary objects follow `mlr_learners_classif.rpart`, `mlr_tasks_iris`, etc. |
| 99 | +- When you write examples, make sure they work. |
| 100 | + |
| 101 | +### `NEWS.md` |
| 102 | + |
| 103 | +- Every user-facing change should be given a bullet in `NEWS.md`. Do not add bullets for small documentation changes or internal refactorings. |
| 104 | +- Each bullet should briefly describe the change to the end user and mention the related issue in parentheses. |
| 105 | +- A bullet can consist of multiple sentences but should not contain any new lines (i.e. DO NOT line wrap). |
| 106 | +- If the change is related to a function, put the name of the function early in the bullet. |
| 107 | +- Order bullets alphabetically by function name. Put all bullets that don't mention function names at the beginning. |
| 108 | + |
| 109 | +### GitHub |
| 110 | + |
| 111 | +- If you use `gh` to retrieve information about an issue, always use `--comments` to read all the comments. |
| 112 | + |
| 113 | +### Writing |
| 114 | + |
| 115 | +- Use sentence case for headings. |
| 116 | +- Use US English. |
| 117 | + |
| 118 | +### Proofreading |
| 119 | + |
| 120 | +If the user asks you to proofread a file, act as an expert proofreader and editor with a deep understanding of clear, engaging, and well-structured writing. |
| 121 | + |
| 122 | +Work paragraph by paragraph, always starting by making a TODO list that includes individual items for each top-level heading. |
| 123 | + |
| 124 | +Fix spelling, grammar, and other minor problems without asking the user. Label any unclear, confusing, or ambiguous sentences with a FIXME comment. |
| 125 | + |
| 126 | +Only report what you have changed. |
| 127 | + |
| 128 | +### References |
| 129 | + |
| 130 | +- [mlr3book](https://mlr3book.mlr-org.com/) — comprehensive guide to the mlr3 ecosystem. |
| 131 | +- [mlr3misc](https://github.com/mlr-org/mlr3misc) — helper functions used throughout the codebase. |
| 132 | +- [paradox](https://github.com/mlr-org/paradox) — hyperparameter/configuration space definitions. |
0 commit comments