mlr-org
diff --git a/‎.Rbuildignore‎
Lines changed: 32 additions & 17 deletions b/‎.Rbuildignore‎
Lines changed: 32 additions & 17 deletions
diff --git a/‎.agents/mlr3.md‎
Lines changed: 132 additions & 0 deletions b/‎.agents/mlr3.md‎
Lines changed: 132 additions & 0 deletions
diff --git a/‎.claude/settings.json‎
Lines changed: 7 additions & 0 deletions b/‎.claude/settings.json‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎.cspell/project-words.txt‎
Lines changed: 2 additions & 0 deletions b/‎.cspell/project-words.txt‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.editorconfig‎
Lines changed: 7 additions & 17 deletions b/‎.editorconfig‎
Lines changed: 7 additions & 17 deletions
diff --git a/‎.gitignore‎
Lines changed: 11 additions & 8 deletions b/‎.gitignore‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎.lintr‎
Lines changed: 7 additions & 3 deletions b/‎.lintr‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎.vscode/settings.json‎
Lines changed: 34 additions & 0 deletions b/‎.vscode/settings.json‎
Lines changed: 34 additions & 0 deletions
@@ -1,21 +1,36 @@
-^renv$
-^renv\.lock$
-^README\.Rmd$
-^README\.html$
-^LICENSE$
-.ignore
-.editorconfig
-.gitignore
-^.*\.Rproj$
+^\.agents$
+^\.ccache$
+^\.clangd$
+^\.claude$
+^\.cspell$
+^\.cursor$
+^\.editorconfig$
+^\.git$
+^\.github$
+^\.gitignore$
+^\.ignore$
+^\.lintr$
 ^\.Rproj\.user$
-^man-roxygen$
-^pkgdown$
 ^\.vscode$
-^\.lintr$
-^\.github$
-^\.ccache$
-^docs$
-^revdep$
+^.*\.Rproj$
+^AGENTS.md$
+^air.toml$
+^attic$
+^attic_local$
+^CITATION.cff$
+^CLAUDE.md$
+^cspell.json$
+^CONTRIBUTING.md$
 ^cran-comments\.md$
 ^CRAN-SUBMISSION$
-^.claude$
+^docs$
+^inst/extdata/.+\.R$
+^LICENSE$
+^local_attic$
+^man-roxygen$
+^paper$
+^pkgdown$
+^README\.Rmd$
+^README.html$
+^revdep$
+^tests/testthat/_object_snapshots$
@@ -0,0 +1,132 @@
+### Architecture
+
+This package uses R6 classes organized around a dictionary registry pattern.
+
+#### Class hierarchy
+
+- `Learner` > `LearnerClassif` / `LearnerRegr` > concrete (e.g., `LearnerClassifRpart`)
+- `Task` > `TaskSupervised` > `TaskClassif` / `TaskRegr`
+- `Measure` > `MeasureClassif` / `MeasureRegr` / `MeasureSimilarity`
+- `Resampling` > `ResamplingCV`, `ResamplingHoldout`, etc.
+- `DataBackend` > `DataBackendDataTable`, `DataBackendCbind`, etc.
+- `Prediction` > `PredictionClassif` / `PredictionRegr`
+
+#### File naming
+
+- One R6 class per file, named exactly as the class: `LearnerClassifRpart.R` contains `LearnerClassifRpart`.
+- Named dataset tasks use an underscore: `TaskClassif_iris.R`.
+- Dictionary files: `mlr_learners.R`, `mlr_tasks.R`, etc.
+
+#### Dictionary system
+
+Objects are registered in dictionaries and accessed via sugar functions:
+
+| Dictionary            | Sugar                | Example                          |
+|-----------------------|----------------------|----------------------------------|
+| `mlr_learners`        | `lrn()` / `lrns()`   | `lrn("classif.rpart", cp = 0.1)` |
+| `mlr_tasks`           | `tsk()` / `tsks()`   | `tsk("iris")`                    |
+| `mlr_measures`        | `msr()` / `msrs()`   | `msr("classif.ce")`              |
+| `mlr_resamplings`     | `rsmp()` / `rsmps()` | `rsmp("cv", folds = 5)`          |
+| `mlr_task_generators` | `tgen()` / `tgens()` | `tgen("friedman1")`              |
+
+Every new object **must** be registered at the bottom of its file:
+
+```r
+#' @include mlr_learners.R
+mlr_learners$add("classif.rpart", function() LearnerClassifRpart$new())
+```
+
+#### Collation order
+
+Derived classes must declare `#' @include ParentClass.R` in their roxygen header. This controls the `Collate:` field in DESCRIPTION so base classes load before derived classes.
+
+#### Hyperparameters (paradox)
+
+Parameters are defined with `paradox::ps()` and must be tagged `"train"` or `"predict"`:
+
+```r
+ps = ps(
+  cp = p_dbl(0, 1, default = 0.01, tags = "train"),
+  keep_model = p_lgl(default = FALSE, tags = "train")
+)
+```
+
+In `.train()` / `.predict()`, retrieve values with `self$param_set$get_values(tags = "train")`.
+
+There is a distinction between `default` and `init` values:
+- `default` describes the behavior when a parameter is not set at all (i.e., the upstream function's default). It is informational only.
+- `init` (via `p_xxx(init = ...)`) sets the parameter to a value upon construction. Use this when the mlr3 default should differ from the upstream default.
+- A parameter tagged `"required"` causes an error if not set. A required parameter cannot have a `default` (that would be contradictory).
+- paradox does type-checking and range-checking automatically; `get_values()` checks that required params are present. Additional feasibility checks are rarely needed.
+
+#### Core dependencies
+
+`data.table`, `checkmate`, `mlr3misc`, `paradox`, `R6`, and `cli` are imported wholesale. Use their functions directly without `::`. Key mlr3misc utilities: `map()`, `map_chr()`, `invoke()`, `calculate_hash()`, `str_collapse()`, `%nin%`, `%??%`.
+
+#### Error handling
+
+Use structured error/warning functions from mlr3misc: `error_config()`, `error_input()`, `error_learner_train()`, `error_learner_predict()`, `warning_config()`, `warning_input()`. These support `sprintf`-style formatting.
+
+#### Reflections
+
+`mlr_reflections` is an environment that stores allowed types, properties, and roles. Extension packages modify it to register new task types. Check it when adding new properties or feature types.
+
+### Testing
+
+- Tests for `R/{name}.R` go in `tests/testthat/test_{name}.R`.
+- All new code should have an accompanying test.
+- If there are existing tests, place new tests next to similar existing tests.
+- Strive to keep your tests minimal with few comments.
+- The full test suite takes a long time. Only run tests relevant to your changes with `devtools::test(filter = '^{name}')`.
+- New learners must pass `run_autotest()` and `run_paramtest()`.
+- Use shared assertion helpers: `expect_learner()`, `expect_task()`, `expect_resampling()`, `expect_measure()`, `expect_prediction()`.
+- Shared test infrastructure lives in `inst/testthat/` and is sourced by extension packages too.
+
+### Documentation
+
+- Every user-facing function should be exported and have roxygen2 documentation.
+- Wrap roxygen comments at 120 characters.
+- Write one sentence per line.
+- If a sentence exceeds the limit, break at a comma, "and", "or", "but", or other appropriate point.
+- Internal functions should not have roxygen documentation.
+- Whenever you add a new (non-internal) documentation topic, also add the topic to `_pkgdown.yml`.
+- Always re-document the package after changing a roxygen2 comment.
+- Use `pkgdown::check_pkgdown()` to check that all topics are included in the reference index.
+- Don’t hand-edit generated artifacts: `man/`, or `NAMESPACE`.
+- Roxygen templates live in `man-roxygen/` (e.g., `@template learner`, `@template param_id`). Use `@templateVar` to pass values.
+- Bibliographic references go in `R/bibentries.R` and are cited with `` `r format_bib("key")` ``.
+- Man page names for dictionary objects follow `mlr_learners_classif.rpart`, `mlr_tasks_iris`, etc.
+- When you write examples, make sure they work.
+
+### `NEWS.md`
+
+- Every user-facing change should be given a bullet in `NEWS.md`. Do not add bullets for small documentation changes or internal refactorings.
+- Each bullet should briefly describe the change to the end user and mention the related issue in parentheses.
+- A bullet can consist of multiple sentences but should not contain any new lines (i.e. DO NOT line wrap).
+- If the change is related to a function, put the name of the function early in the bullet.
+- Order bullets alphabetically by function name. Put all bullets that don't mention function names at the beginning.
+
+### GitHub
+
+- If you use `gh` to retrieve information about an issue, always use `--comments` to read all the comments.
+
+### Writing
+
+- Use sentence case for headings.
+- Use US English.
+
+### Proofreading
+
+If the user asks you to proofread a file, act as an expert proofreader and editor with a deep understanding of clear, engaging, and well-structured writing.
+
+Work paragraph by paragraph, always starting by making a TODO list that includes individual items for each top-level heading.
+
+Fix spelling, grammar, and other minor problems without asking the user. Label any unclear, confusing, or ambiguous sentences with a FIXME comment.
+
+Only report what you have changed.
+
+### References
+
+- [mlr3book](https://mlr3book.mlr-org.com/) — comprehensive guide to the mlr3 ecosystem.
+- [mlr3misc](https://github.com/mlr-org/mlr3misc) — helper functions used throughout the codebase.
+- [paradox](https://github.com/mlr-org/paradox) — hyperparameter/configuration space definitions.
@@ -0,0 +1,7 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(gh run view:*)"
+    ]
+  }
+}
@@ -0,0 +1,2 @@
+# Project-specific words — commit and share with the team.
+# Add words here (or via "Add to project dictionary" in VS Code / Cursor).
@@ -1,21 +1,11 @@
 # See http://editorconfig.org
 root = true
 
+# settings for all files
 [*]
-charset = utf-8
-end_of_line = lf
-insert_final_newline = true
-indent_style = space
-trim_trailing_whitespace = true
-
-[*.{r,R,md,Rmd}]
-indent_size = 2
-
-[*.{c,h}]
-indent_size = 4
-
-[*.{cpp,hpp}]
-indent_size = 4
-
-[{NEWS.md,DESCRIPTION,LICENSE}]
-max_line_length = 80
+charset = utf-8                            # Ensure all files are saved in UTF-8 encoding
+end_of_line = lf                           # Use LF line endings (Unix style)
+indent_style = space                       # Use spaces for indentation
+indent_size = 2                            # always use 2 spaces for indentation, R, C, python, etc.
+max_line_length = 120                      # max line length
+trim_trailing_whitespace = true            # Remove trailing whitespace
@@ -1,6 +1,6 @@
 # File created using '.gitignore Generator' for Visual Studio Code: https://bit.ly/vscode-gig
-# Created by https://www.toptal.com/developers/gitignore/api/windows,visualstudiocode,r,macos,linux
-# Edit at https://www.toptal.com/developers/gitignore?templates=windows,visualstudiocode,r,macos,linux
+# Created by https://www.toptal.com/developers/gitignore/api/windows,visualstudiocode,macos,linux,r
+# Edit at https://www.toptal.com/developers/gitignore?templates=windows,visualstudiocode,macos,linux,r
 
 ### Linux ###
 *~
@@ -150,13 +150,17 @@ $RECYCLE.BIN/
 # Windows shortcuts
 *.lnk
 
-# End of https://www.toptal.com/developers/gitignore/api/windows,visualstudiocode,r,macos,linux
+# End of https://www.toptal.com/developers/gitignore/api/windows,visualstudiocode,macos,linux,r
 
 # Custom rules (everything added below won't be overriden by 'Generate .gitignore File' if you use 'Update' option)
 
 # R
 .Rprofile
 README.html
+src/*.o
+src/*.so
+src/*.dll
+.clangd
 
 # CRAN
 cran-comments.md
@@ -170,10 +174,9 @@ docs/
 renv/
 renv.lock
 
-# vscode
-.vscode
-
 # revdep
 revdep/
-check/*
-.claude/
+
+# AI
+.claude/settings.local.json
+CLAUDE.md
@@ -1,9 +1,13 @@
 linters: linters_with_defaults(
-    # lintr defaults: https://github.com/jimhester/lintr#available-linters
+    # lintr defaults: https://lintr.r-lib.org/reference/default_linters.html
     # the following setup changes/removes certain linters
     assignment_linter = NULL, # do not force using <- for assignments
-    object_name_linter = object_name_linter(c("snake_case", "CamelCase")), # only allow snake case and camel case object names
+    object_name_linter = object_name_linter(c("snake_case", "CamelCase", "SNAKE_CASE")), # only allow snake case and camel case object names
     cyclocomp_linter = NULL, # do not check function complexity
     commented_code_linter = NULL, # allow code in comments
-    line_length_linter = line_length_linter(2000)
+    line_length_linter = line_length_linter(120L), # same as .editorconfig
+    # use indent=2 as in .editorconfig; also use block-aligned continuation with 2 space,
+    # not “align under first argument” style.
+    indentation_linter = indentation_linter(indent = 2L, hanging_indent_style = "never")
     )
+
@@ -0,0 +1,34 @@
+{
+
+  // ********** settings git / gitlens **********
+
+  // disable "blame hover", to remove visual noise
+ "gitlens.currentLine.enabled": false,
+
+ // ********** settings for cspell *************
+  // show spelling errors as hints (not in problems panel)
+  "cSpell.diagnosticLevel": "Hint",
+  // file type whitelist, useGitignore, and languageSettings live in cspell.json
+
+ // ********** settings for R *************
+
+  // format on save so we dont have to manually format, use AIR for formatting
+  "[r]": {
+    "editor.formatOnSave": true,
+    "editor.defaultFormatter": "Posit.air-vscode",
+    // disable hover for R, to remove visual noise
+    "editor.hover.enabled": false
+  },
+
+  // ********** settings for C / C++ **********
+
+  "[c]": {
+    "editor.formatOnSave": true,
+    "editor.defaultFormatter": "llvm-vs-code-extensions.vscode-clangd"
+  },
+  "[cpp]": {
+    "editor.formatOnSave": true,
+    "editor.defaultFormatter": "llvm-vs-code-extensions.vscode-clangd"
+  }
+}
+
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# Project-specific words — commit and share with the team.`
	`2`	`+# Add words here (or via "Add to project dictionary" in VS Code / Cursor).`