NVIDIA · realAsma · May 14, 2026 · May 14, 2026 · cjluo-nv · May 13, 2026
diff --git a/.agents/README.md b/.agents/README.md
@@ -0,0 +1,38 @@
+# Agent Instructions for ModelOpt
+
+These instructions apply to AI-assisted work in this repository.
+
+## Repository orientation
+
+- Start with `README.md` for project overview and install.
+- Use `modelopt/` for source, `tests/` for focused test coverage, and
+  `examples/` or `docs/` for usage patterns.
+
+## Coding guidelines
+
+- **Coding guide:** Code development and review require reading and following
+  `.agents/developer-guidelines.md`; do not skip this step.
+
+## Iterative development
+
+- **Running tests:** Follow the
+  [writing and running tests](../CONTRIBUTING.md#-writing-and-running-tests)
+  instructions. For fast initial iteration, choose focused tests for the
+  changed area from `tests/`.
+- **Running pre-commit:** Follow the
+  [pre-commit hook instructions](../CONTRIBUTING.md#pre-commit-hooks). Hooks may
+  modify files; review and re-stage those changes before committing.
+- **Signed commit:** Use `git commit -s -S -m "<message>"` for commits so they
+  follow the [signing your work](../CONTRIBUTING.md#-signing-your-work)
+  requirements.
+- **Never `git push` without explicit approval in the current turn.** Commit
+  locally is fine; publishing to a remote is not.
+- After `git commit`, stop and wait for the user to say "push", "publish",
+  "ship", or equivalent before running `git push`, `gh pr create`, or any
+  push-option flags like `-o merge_request.create`.
+
+## Contributing and PR readiness
+
+- Before opening or marking a PR ready for review, read the
+  [submitting your code](../CONTRIBUTING.md#submitting-your-code) guidance.
+- Read `.github/PULL_REQUEST_TEMPLATE.md` and satisfy the checklist.
diff --git a/.agents/TOOLING.md b/.agents/TOOLING.md
@@ -0,0 +1,18 @@
+# Agent Tooling Notes
+
+These notes are for humans maintaining repository agent setup. They are not part
+of the always-loaded agent instructions.
+
+## Shared Instructions
+
+Update `.agents/README.md` for repository-wide agent instructions. The root
+`AGENTS.md` and `CLAUDE.md` files are symlinked to `.agents/README.md`, so
+changes there apply to both Codex and Claude Code.
+
+## Local Overrides
+
+For private local instructions, use the tool-specific override file:
+
+- Claude Code: `CLAUDE.local.md` is additive; it is read after `CLAUDE.md`.
+- Codex: `AGENTS.override.md` replaces `AGENTS.md` in the same directory, so it
+  is not additive. Restate any shared instructions that should still apply.
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
@@ -0,0 +1,79 @@
+# Coding Principles
+
+Guidelines for production code in ModelOpt. Key values: simplicity, minimalism,
+and elegance.
+
+## Principles
+
+- **Be surgical.** Touch the code required to solve the actual problem, whether
+  that is one line or a broader design change. Avoid speculative refactors,
+  drive-by cleanup, unrelated rewrites, and half-finished implementations.
+- **Fix root causes.** Prefer the right fix over the most local patch. Do not
+  paper over symptoms with temporary fixes unless the temporary nature and
+  follow-up are explicit.
+- **Design for simplicity.** Choose the design that keeps code easiest to read
+  and change. Put behavior at the right level, tie extensibility to known needs,
+  and treat heavy branching or conditional logic as bad design smells.
+- **Respect ownership.** Keep behavior in the layer that owns it. Parent
+  abstractions should contain shared contracts and shared behavior, not
+  child-specific special cases.
+- **Keep one source of truth.** Put shared behavior, configuration, constants,
+  validation, and documentation in the single place that owns them. Reuse
+  existing helpers and shared APIs instead of copying logic or duplicating
+  state.
+- **Abstract to simplify.** Use helpers, base classes, registries, adapters,
+  plugins, or extension points when they remove real duplication, clarify
+  ownership, support current variation, or make call sites simpler. Do not add
+  abstractions for speculative future cases.
+- **Make code readable at the point of use.** Names, types, and structure should
+  make intent clear. Keep high-level orchestration clear, move low-level
+  mechanics into well-named helpers when helpful, and put critical code before
+  helper details when local conventions allow it.
+- **Comment cautiously.** Code should be clear and be the source of truth
+  for what happens, how it happens, and why; use comments only when the why is
+  not obvious from the code. First ask whether better names, clearer structure,
+  or simpler code can explain the intent without a comment. (Apply this guidance
+  to new comments only; do not rewrite or delete existing comments.)
+- **Scale documentation to the API.** Higher-level and user-visible APIs deserve
+  useful docstrings, including examples when helpful. Lower-level internals need
+  docstrings only when names, types, and structure are not enough.
+- **Validate at boundaries.** Check user input, files, network responses, and
+  external API results at the edge. Keep internal code simple by trusting types
+  and invariants instead of repeatedly checking for impossible states.
+- **Remove touched dead code.** Delete unused imports, unreachable branches,
+  obsolete placeholders, stale TODOs, and debug code when they are part of the
+  behavior you are already touching.
+- **Use workspace-relative paths.** Use relative paths in commands and file
+  references unless an absolute path is needed to disambiguate.
+
+## Testing
+
+- **Develop with focused tests.** During development, write as many focused
+  tests as needed, including lower-level unit tests or internal probes, to
+  understand and harden behavior.
+- **Curate production tests and keep them lean.** Before staging or committing,
+  decide which tests should be checked in. Checked-in tests should document
+  expected behavior, protect against regressions, or flag backward-incompatible
+  behavior changes. Remove redundant lower-level tests when a higher-level test
+  already covers the same behavior, keeping CI/CD fast and lean.
+
+## Performant AI Code
+
+- **Avoid stray CPU-GPU syncs.** Tensor metadata such as `tensor.shape` is safe
+  to read, but scalar extraction or CPU transfers such as `tensor.item()`,
+  `float(tensor)`, `bool(tensor)`, `tensor.cpu()`, `tensor.numpy()`, etc. can
+  force CPU-GPU synchronization. Keep computation on GPU unless the CPU actually
+  needs the value.
+- **Use rank-aware logging.** Default to `print_rank_0` instead of `print` and
+  `warn_rank_0` instead of generic warnings. Use per-rank output only when each
+  process needs to report distinct state. Generic prints and warnings clog
+  distributed logs.
+- **Respect distributed invariants.** Avoid hidden synchronization, global state,
+  per-rank file races, or assumptions that only hold on a single process.
+
+## Compatibility
+
+- **Preserve config and checkpoint compatibility.** Treat ModelOpt config schemas
+  and checkpoint formats as persisted contracts. When changing configs such as
+  `QuantizeConfig`, maintain backward compatibility with previous ModelOpt
+  checkpoints unless a breaking change is explicit and intentionally handled.
diff --git a/.gitignore b/.gitignore
@@ -61,6 +61,8 @@ venv/
 
 # Ignore claude local settings
 .claude/settings.local.json
+CLAUDE.local.md
+AGENTS.override.md
 
 # Ignore SonarQube analysis
 .sonar/
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1 @@
+.agents/README.md
diff --git a/CLAUDE.md b/CLAUDE.md
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+.agents/README.md
@@ -79,7 +79,7 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c
 
 See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format.
 
-## 📝 Writing tests
+## 📝 Writing and running tests
 
 We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:
 
@@ -89,7 +89,17 @@ We use [pytest](https://docs.pytest.org/) for all tests. For any new features /
 - `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run.
 - `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
 
-Please refer to [noxfile.py](./noxfile.py) for more details on how to run the tests and their dependencies.
+For lightweight focused local validation, run `pytest` directly on the relevant test path. For example:
+
+```bash
+pytest tests/unit/torch/quantization
+```
+
+For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py). Run `nox -l` to list available sessions, then run the matching session with `nox -s <session>`. The `partial_unit-3.12(torch)` session covers the broader torch unit test suite and installs heavier dependencies, including `megatron-core`:
+
+```bash
+nox -s "partial_unit-3.12(torch)"
+```
 
 ## ✍️ Signing your work
 

@@ -151,6 +151,11 @@ Model Optimizer follows a structured approach to managing deprecated features:
 Model Optimizer is now open source! We welcome any feedback, feature requests and PRs.
 Please read our [Contributing](./CONTRIBUTING.md) guidelines for details on how to contribute to this project.
 
+## AI Agents
+
+For AI-assisted development setup, including local Claude Code and Codex
+override files, see the [agent tooling notes](./.agents/TOOLING.md).
+
 ### Top Contributors
 
 [![Contributors](https://contrib.rocks/image?repo=NVIDIA/Model-Optimizer)](https://github.com/NVIDIA/Model-Optimizer/graphs/contributors)