You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MCC supports two deployment targets and two build paths, all managed through standardized `do/` scripts inspired by the [do-framework](https://github.com/iankoulski/do-framework). Generated projects include a `do/` directory with scripts for every stage of the container lifecycle: `build`, `push`, `run`, `test`, `deploy`, `clean`, `logs`, `export`, and optionally `submit` (for CodeBuild).
2
4
3
5
## Build Paths
4
6
5
7
### Local Build
6
-
For local builds, users run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
8
+
9
+
Run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
10
+
11
+
Local containers may produce `exec` errors when deployed to a different architecture (e.g., building on ARM, deploying on x86). Use CodeBuild for production builds to avoid this.
12
+
13
+
`./do/run` starts the container on localhost:8080 for local testing. This works well for predictive ML containers (small images, no GPU dependency). LLM containers are large and typically require GPU resources, so local deployment may not be practical for those.
7
14
8
15
### AWS CodeBuild
9
-
For CI/CD workflows, `./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for building containers destined for production endpoints, as it avoids architecture mismatches between local machines and deployment instances.
16
+
17
+
`./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for production containers, as it avoids architecture mismatches and provides fast network access to base image registries.
10
18
11
19
## Deployment Targets
12
20
13
21
MCC supports two deployment targets, selected during project generation via the `--deployment-target` option. The chosen target determines how `./do/deploy`, `./do/test`, `./do/clean`, and `./do/logs` behave.
14
22
15
-
#### Local Deployment
16
-
Local endpoints can be deployed once the image has been built. Local deployments are most easily accommodated by users who elect to build the container locally. Otherwise, users will have to download the container image from Amazon ECR to launch it locally.
17
-
18
-
!!! warning "Local LLM Containers"
19
-
Local deployment should be used sparingly. Predictive containers built on ML frameworks like XGBoost can easily be launched locally given their relatively small size and lack of GPU dependencies. This capability may not work for LLM-based serving frameworks. Images built from SGLang for example are quite large, and require GPU resources to be made available to your container.
#### Amazon SageMaker AI Managed Inference (`managed-inference`)
22
-
Amazon SageMaker AI Managed Inference is the default deployment target for MCC containers. When this target is selected, `./do/deploy` provisions resources using the SageMaker Inference Components API:
25
+
The default deployment target. `./do/deploy` provisions resources using the SageMaker Inference Components API:
23
26
24
27
1.**Create endpoint configuration** -- specifies the instance type and count
25
28
2.**Create endpoint** -- provisions the compute infrastructure
26
29
3.**Create inference component** -- associates the ECR container image with the endpoint
27
30
28
31
The inference component model decouples compute provisioning from model deployment, allowing multiple models to share a single endpoint. Once the inference component reaches `InService` status, the endpoint is accessible via the SageMaker Runtime API for real-time inference requests.
29
32
30
-
Users select their preferred instance type, family, and size when generating an MCC project. The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
33
+
The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
31
34
32
35
After deployment, `./do/test` validates the endpoint by invoking inference through the inference component, `./do/logs` tails CloudWatch logs, and `./do/clean endpoint` tears down the inference component, endpoint, and endpoint configuration.
33
36
34
-
!!! info "Real-Time Only"
35
-
At this time, real-time endpoints are the only supported SageMaker AI managed inference endpoints supported by MCC.
37
+
Only real-time endpoints are supported at this time.
For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes. When this target is selected:
39
+
### SageMaker HyperPod EKS (`hyperpod-eks`)
40
+
41
+
For existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes:
39
42
40
43
-`./do/deploy` retrieves the underlying EKS cluster from the HyperPod cluster, configures `kubectl`, and applies Kubernetes manifests (Deployment, Service, ConfigMap, and optionally PVC for FSx storage) to the specified namespace.
41
44
-`./do/test hyperpod` port-forwards the Kubernetes service and runs the same `/ping` and `/invocations` health checks used for managed inference.
@@ -44,18 +47,20 @@ For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hy
44
47
45
48
The generated `do/config` file stores HyperPod-specific variables: `HYPERPOD_CLUSTER_NAME`, `HYPERPOD_NAMESPACE`, `HYPERPOD_REPLICAS`, and optionally `FSX_VOLUME_HANDLE`.
46
49
47
-
!!! note "Prerequisites for HyperPod EKS"
48
-
- An existing SageMaker HyperPod cluster with EKS orchestrator
49
-
- `kubectl` installed locally
50
-
- IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
51
-
- Sufficient node capacity (especially GPU nodes for LLM workloads)
50
+
Prerequisites:
51
+
52
+
- An existing SageMaker HyperPod cluster with EKS orchestrator
53
+
-`kubectl` installed locally
54
+
- IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
55
+
- Sufficient node capacity (especially GPU nodes for LLM workloads)
52
56
53
57
## Lifecycle Scripts Reference
54
58
55
59
All generated projects include these `do/` scripts:
56
60
57
61
| Command | Description |
58
62
|---------|-------------|
63
+
|`./do/config`| Centralized configuration for all scripts (sourced, not executed) |
59
64
|`./do/build`| Build Docker image locally |
60
65
|`./do/push`| Push image to Amazon ECR |
61
66
|`./do/run`| Run container locally on port 8080 |
@@ -64,6 +69,7 @@ All generated projects include these `do/` scripts:
64
69
|`./do/logs`| Tail logs (CloudWatch for managed-inference, kubectl for HyperPod) |
The generator is a [Yeoman](https://yeoman.io/) generator that runs through four lifecycle phases. The entry point is `generators/app/index.js`, which delegates to specialized modules in `generators/app/lib/`.
4
+
5
+
## Lifecycle Phases
6
+
7
+
### Phase 1: `initializing()`
8
+
9
+
Loads configuration from all sources and initializes the registry system.
10
+
11
+
1.`CliHandler` checks for subcommands (`mcp`, `registry`, `help`, `configure`). If one matches, it executes and sets `_helpShown` to skip remaining phases.
12
+
2.`ConfigManager.loadConfiguration()` merges values from 8 sources in precedence order: CLI options, CLI arguments, environment variables, CLI config file, custom config file (`config/mcp.json`), package.json section, MCP servers, and generator defaults. It also queries configured MCP servers via `McpClient`.
13
+
3.`ConfigurationManager` loads the three registries through `RegistryLoader`, which reads catalog JSON files from `servers/*/catalogs/` and transforms them into internal data shapes.
14
+
4.`ValidationEngine` is initialized with accelerator validators (CUDA, Neuron, ROCm, CPU) for later use.
15
+
16
+
### Phase 2: `prompting()`
17
+
18
+
If `--skip-prompts` is set, `ConfigManager.getFinalConfiguration()` returns the merged config directly. Otherwise:
19
+
20
+
1.`PromptRunner.run()` executes prompts in phases: Infrastructure (region, deployment target, instance type, HyperPod/async/batch settings, build target), Core ML (deployment config, engine, model format, model name, base image, HF token), Modules (sample model, testing), and Project (name, directory).
21
+
2. Prompt definitions live in `prompts.js`. Each prompt group is an exported array. The deployment config prompt presents a flat list of 15 `architecture-backend` values (e.g., `transformers-vllm`, `triton-fil`). `DeploymentConfigResolver.decompose()` splits these into `architecture` and `backend` fields.
22
+
3.`PromptRunner` queries MCP servers for instance type and region choices before presenting those prompts, merging MCP-provided choices into the prompt options.
23
+
4. Prompt answers are merged with the base config via `ConfigManager.getFinalConfiguration(promptAnswers)`.
24
+
25
+
### Phase 3: `writing()`
26
+
27
+
1.`TemplateManager.validate()` checks that the deployment config, build target, deployment target, instance type, and region are all within supported values. It also enforces GPU requirements for GPU-only backends.
3. All templates are copied with `fs.copyTpl()`, processing EJS variables. A small set of ignore patterns excludes architecture-specific subdirectories (`triton/`, `diffusors/`, `hyperpod/`) that are handled separately.
30
+
4. A four-way `switch` on `architecture` (http, transformers, triton, diffusors) deletes files that don't belong to the selected architecture and, for triton and diffusors, copies architecture-specific templates (Dockerfile, model repository, serve scripts).
31
+
5. Shell scripts in `do/` and `deploy/` get `chmod 755`.
32
+
33
+
### Phase 4: `end()`
34
+
35
+
Runs `train_abalone.py` if a sample model was requested (http and eligible triton backends only). Sets executable permissions on generated scripts.
|`validation-engine.js`| Validates accelerator compatibility (framework requirements vs. instance capabilities) |
50
+
|`deployment-registry.js`| CRUD operations for the local deployment registry (`~/.mcc-registry/`) |
51
+
52
+
## Configuration Flow
53
+
54
+
The configuration precedence system is documented in the [Configuration](../configuration.md) user guide. From a code perspective, the flow is:
55
+
56
+
1.`ConfigManager` constructor builds a parameter matrix defining which parameters are accepted from which sources.
57
+
2.`loadConfiguration()` applies sources in reverse precedence order (lowest first), so higher-precedence sources overwrite lower ones.
58
+
3. MCP servers are queried during loading. `McpClient` spawns each configured server as a child process, performs the MCP handshake, and calls the `get_ml_config` tool. Returned values and choices are stored separately -- values merge into the config, choices are injected into prompt options.
59
+
4.`getFinalConfiguration(promptAnswers)` merges prompt answers (lowest precedence) with the accumulated config and applies `DeploymentConfigResolver` to decompose the `deploymentConfig` string into `architecture` and `backend`.
60
+
5.`_ensureTemplateVariables()` in `index.js` fills in defaults for any missing fields, merges environment variables from catalog sources with a five-layer precedence (catalog defaults, framework profile, model entry, model profile, CLI overrides), and enriches transformer models with HuggingFace data.
0 commit comments