awslabs
diff --git a/‎.DS_Store‎
2 KB b/‎.DS_Store‎
2 KB
diff --git a/‎docs/CONTRIBUTING.md‎
Lines changed: 88 additions & 486 deletions b/‎docs/CONTRIBUTING.md‎
Lines changed: 88 additions & 486 deletions
diff --git a/‎docs/EXAMPLES.md‎
Lines changed: 160 additions & 1284 deletions b/‎docs/EXAMPLES.md‎
Lines changed: 160 additions & 1284 deletions
diff --git a/‎docs/TROUBLESHOOTING.md‎
Lines changed: 110 additions & 1491 deletions b/‎docs/TROUBLESHOOTING.md‎
Lines changed: 110 additions & 1491 deletions
diff --git a/‎docs/configuration.md‎
Lines changed: 126 additions & 528 deletions b/‎docs/configuration.md‎
Lines changed: 126 additions & 528 deletions
diff --git a/‎docs/containerization.md‎
Lines changed: 0 additions & 14 deletions b/‎docs/containerization.md‎
Lines changed: 0 additions & 14 deletions
diff --git a/‎docs/deployments.md‎
Lines changed: 26 additions & 20 deletions b/‎docs/deployments.md‎
Lines changed: 26 additions & 20 deletions
diff --git a/‎docs/dev/generator-architecture.md‎
Lines changed: 60 additions & 0 deletions b/‎docs/dev/generator-architecture.md‎
Lines changed: 60 additions & 0 deletions
@@ -1,41 +1,44 @@
+# Deployment & Inference
+
 MCC supports two deployment targets and two build paths, all managed through standardized `do/` scripts inspired by the [do-framework](https://github.com/iankoulski/do-framework). Generated projects include a `do/` directory with scripts for every stage of the container lifecycle: `build`, `push`, `run`, `test`, `deploy`, `clean`, `logs`, `export`, and optionally `submit` (for CodeBuild).
 
 ## Build Paths
 
 ### Local Build
-For local builds, users run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
+
+Run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
+
+Local containers may produce `exec` errors when deployed to a different architecture (e.g., building on ARM, deploying on x86). Use CodeBuild for production builds to avoid this.
+
+`./do/run` starts the container on localhost:8080 for local testing. This works well for predictive ML containers (small images, no GPU dependency). LLM containers are large and typically require GPU resources, so local deployment may not be practical for those.
 
 ### AWS CodeBuild
-For CI/CD workflows, `./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for building containers destined for production endpoints, as it avoids architecture mismatches between local machines and deployment instances.
+
+`./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for production containers, as it avoids architecture mismatches and provides fast network access to base image registries.
 
 ## Deployment Targets
 
 MCC supports two deployment targets, selected during project generation via the `--deployment-target` option. The chosen target determines how `./do/deploy`, `./do/test`, `./do/clean`, and `./do/logs` behave.
 
-#### Local Deployment
-Local endpoints can be deployed once the image has been built. Local deployments are most easily accommodated by users who elect to build the container locally. Otherwise, users will have to download the container image from Amazon ECR to launch it locally.
-
-!!! warning "Local LLM Containers"
-    Local deployment should be used sparingly. Predictive containers built on ML frameworks like XGBoost can easily be launched locally given their relatively small size and lack of GPU dependencies. This capability may not work for LLM-based serving frameworks. Images built from SGLang for example are quite large, and require GPU resources to be made available to your container.
+### SageMaker Managed Inference (`managed-inference`)
 
-#### Amazon SageMaker AI Managed Inference (`managed-inference`)
-Amazon SageMaker AI Managed Inference is the default deployment target for MCC containers. When this target is selected, `./do/deploy` provisions resources using the SageMaker Inference Components API:
+The default deployment target. `./do/deploy` provisions resources using the SageMaker Inference Components API:
 
 1. **Create endpoint configuration** -- specifies the instance type and count
 2. **Create endpoint** -- provisions the compute infrastructure
 3. **Create inference component** -- associates the ECR container image with the endpoint
 
 The inference component model decouples compute provisioning from model deployment, allowing multiple models to share a single endpoint. Once the inference component reaches `InService` status, the endpoint is accessible via the SageMaker Runtime API for real-time inference requests.
 
-Users select their preferred instance type, family, and size when generating an MCC project. The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
+The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
 
 After deployment, `./do/test` validates the endpoint by invoking inference through the inference component, `./do/logs` tails CloudWatch logs, and `./do/clean endpoint` tears down the inference component, endpoint, and endpoint configuration.
 
-!!! info "Real-Time Only"
-    At this time, real-time endpoints are the only supported SageMaker AI managed inference endpoints supported by MCC.
+Only real-time endpoints are supported at this time.
 
-#### Amazon SageMaker HyperPod EKS (`hyperpod-eks`)
-For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes. When this target is selected:
+### SageMaker HyperPod EKS (`hyperpod-eks`)
+
+For existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes:
 
 - `./do/deploy` retrieves the underlying EKS cluster from the HyperPod cluster, configures `kubectl`, and applies Kubernetes manifests (Deployment, Service, ConfigMap, and optionally PVC for FSx storage) to the specified namespace.
 - `./do/test hyperpod` port-forwards the Kubernetes service and runs the same `/ping` and `/invocations` health checks used for managed inference.
@@ -44,18 +47,20 @@ For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hy
 
 The generated `do/config` file stores HyperPod-specific variables: `HYPERPOD_CLUSTER_NAME`, `HYPERPOD_NAMESPACE`, `HYPERPOD_REPLICAS`, and optionally `FSX_VOLUME_HANDLE`.
 
-!!! note "Prerequisites for HyperPod EKS"
-    - An existing SageMaker HyperPod cluster with EKS orchestrator
-    - `kubectl` installed locally
-    - IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
-    - Sufficient node capacity (especially GPU nodes for LLM workloads)
+Prerequisites:
+
+- An existing SageMaker HyperPod cluster with EKS orchestrator
+- `kubectl` installed locally
+- IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
+- Sufficient node capacity (especially GPU nodes for LLM workloads)
 
 ## Lifecycle Scripts Reference
 
 All generated projects include these `do/` scripts:
 
 | Command | Description |
 |---------|-------------|
+| `./do/config` | Centralized configuration for all scripts (sourced, not executed) |
 | `./do/build` | Build Docker image locally |
 | `./do/push` | Push image to Amazon ECR |
 | `./do/run` | Run container locally on port 8080 |
@@ -64,6 +69,7 @@ All generated projects include these `do/` scripts:
 | `./do/logs` | Tail logs (CloudWatch for managed-inference, kubectl for HyperPod) |
 | `./do/clean <target>` | Clean up resources (local, ecr, endpoint/hyperpod, codebuild, all) |
 | `./do/export` | Export current configuration as a reproducible `yo` CLI command |
+| `./do/register` | Capture deployment to the deployment registry |
 | `./do/submit` | Submit build to AWS CodeBuild (CodeBuild build target only) |
 
-Configuration for all scripts is centralized in `do/config`. See the generated `do/README.md` for detailed documentation on each command.
+See the generated `do/README.md` for detailed documentation on each command.
@@ -0,0 +1,60 @@
+# Generator Architecture
+
+The generator is a [Yeoman](https://yeoman.io/) generator that runs through four lifecycle phases. The entry point is `generators/app/index.js`, which delegates to specialized modules in `generators/app/lib/`.
+
+## Lifecycle Phases
+
+### Phase 1: `initializing()`
+
+Loads configuration from all sources and initializes the registry system.
+
+1. `CliHandler` checks for subcommands (`mcp`, `registry`, `help`, `configure`). If one matches, it executes and sets `_helpShown` to skip remaining phases.
+2. `ConfigManager.loadConfiguration()` merges values from 8 sources in precedence order: CLI options, CLI arguments, environment variables, CLI config file, custom config file (`config/mcp.json`), package.json section, MCP servers, and generator defaults. It also queries configured MCP servers via `McpClient`.
+3. `ConfigurationManager` loads the three registries through `RegistryLoader`, which reads catalog JSON files from `servers/*/catalogs/` and transforms them into internal data shapes.
+4. `ValidationEngine` is initialized with accelerator validators (CUDA, Neuron, ROCm, CPU) for later use.
+
+### Phase 2: `prompting()`
+
+If `--skip-prompts` is set, `ConfigManager.getFinalConfiguration()` returns the merged config directly. Otherwise:
+
+1. `PromptRunner.run()` executes prompts in phases: Infrastructure (region, deployment target, instance type, HyperPod/async/batch settings, build target), Core ML (deployment config, engine, model format, model name, base image, HF token), Modules (sample model, testing), and Project (name, directory).
+2. Prompt definitions live in `prompts.js`. Each prompt group is an exported array. The deployment config prompt presents a flat list of 15 `architecture-backend` values (e.g., `transformers-vllm`, `triton-fil`). `DeploymentConfigResolver.decompose()` splits these into `architecture` and `backend` fields.
+3. `PromptRunner` queries MCP servers for instance type and region choices before presenting those prompts, merging MCP-provided choices into the prompt options.
+4. Prompt answers are merged with the base config via `ConfigManager.getFinalConfiguration(promptAnswers)`.
+
+### Phase 3: `writing()`
+
+1. `TemplateManager.validate()` checks that the deployment config, build target, deployment target, instance type, and region are all within supported values. It also enforces GPU requirements for GPU-only backends.
+2. `CommentGenerator` produces Dockerfile comments (accelerator info, validation status, troubleshooting).
+3. All templates are copied with `fs.copyTpl()`, processing EJS variables. A small set of ignore patterns excludes architecture-specific subdirectories (`triton/`, `diffusors/`, `hyperpod/`) that are handled separately.
+4. A four-way `switch` on `architecture` (http, transformers, triton, diffusors) deletes files that don't belong to the selected architecture and, for triton and diffusors, copies architecture-specific templates (Dockerfile, model repository, serve scripts).
+5. Shell scripts in `do/` and `deploy/` get `chmod 755`.
+
+### Phase 4: `end()`
+
+Runs `train_abalone.py` if a sample model was requested (http and eligible triton backends only). Sets executable permissions on generated scripts.
+
+## Key Modules
+
+| Module | Purpose |
+|--------|---------|
+| `config-manager.js` | 8-level configuration precedence, MCP integration, parameter matrix |
+| `prompt-runner.js` | Phased prompt execution, MCP choice injection, catalog data loading |
+| `prompts.js` | All prompt definitions, instance type registry from catalog, project name generation |
+| `template-manager.js` | Validates deployment config, build target, deployment target, instance type, region, GPU requirements, HyperPod config, async/batch config |
+| `configuration-manager.js` | Orchestrates registry loading, framework/model matching, HuggingFace enrichment, env var validation |
+| `registry-loader.js` | Adapter layer: reads catalog JSON from `servers/*/catalogs/` and transforms into internal shapes |
+| `deployment-config-resolver.js` | Decomposes `transformers-vllm` into `{architecture: 'transformers', backend: 'vllm'}` |
+| `mcp-client.js` | Spawns MCP server processes, performs handshake, calls `get_ml_config` tool |
+| `validation-engine.js` | Validates accelerator compatibility (framework requirements vs. instance capabilities) |
+| `deployment-registry.js` | CRUD operations for the local deployment registry (`~/.mcc-registry/`) |
+
+## Configuration Flow
+
+The configuration precedence system is documented in the [Configuration](../configuration.md) user guide. From a code perspective, the flow is:
+
+1. `ConfigManager` constructor builds a parameter matrix defining which parameters are accepted from which sources.
+2. `loadConfiguration()` applies sources in reverse precedence order (lowest first), so higher-precedence sources overwrite lower ones.
+3. MCP servers are queried during loading. `McpClient` spawns each configured server as a child process, performs the MCP handshake, and calls the `get_ml_config` tool. Returned values and choices are stored separately -- values merge into the config, choices are injected into prompt options.
+4. `getFinalConfiguration(promptAnswers)` merges prompt answers (lowest precedence) with the accumulated config and applies `DeploymentConfigResolver` to decompose the `deploymentConfig` string into `architecture` and `backend`.
+5. `_ensureTemplateVariables()` in `index.js` fills in defaults for any missing fields, merges environment variables from catalog sources with a five-layer precedence (catalog defaults, framework profile, model entry, model profile, CLI overrides), and enriches transformer models with HuggingFace data.