Skip to content

Commit 42f36a9

Browse files
authored
Merge pull request #139 from dferguson992/main
Alternate Model Providers & Docs Update
2 parents 1c44a58 + 539047a commit 42f36a9

48 files changed

Lines changed: 9793 additions & 5180 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.DS_Store

2 KB
Binary file not shown.

docs/CONTRIBUTING.md

Lines changed: 88 additions & 486 deletions
Large diffs are not rendered by default.

docs/EXAMPLES.md

Lines changed: 160 additions & 1284 deletions
Large diffs are not rendered by default.

docs/TROUBLESHOOTING.md

Lines changed: 110 additions & 1491 deletions
Large diffs are not rendered by default.

docs/configuration.md

Lines changed: 126 additions & 528 deletions
Large diffs are not rendered by default.

docs/containerization.md

Lines changed: 0 additions & 14 deletions
This file was deleted.

docs/deployments.md

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,44 @@
1+
# Deployment & Inference
2+
13
MCC supports two deployment targets and two build paths, all managed through standardized `do/` scripts inspired by the [do-framework](https://github.com/iankoulski/do-framework). Generated projects include a `do/` directory with scripts for every stage of the container lifecycle: `build`, `push`, `run`, `test`, `deploy`, `clean`, `logs`, `export`, and optionally `submit` (for CodeBuild).
24

35
## Build Paths
46

57
### Local Build
6-
For local builds, users run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
8+
9+
Run `./do/build` to create the Docker image and `./do/push` to upload it to Amazon ECR. This two-step approach lets you test locally with `./do/run` before pushing.
10+
11+
Local containers may produce `exec` errors when deployed to a different architecture (e.g., building on ARM, deploying on x86). Use CodeBuild for production builds to avoid this.
12+
13+
`./do/run` starts the container on localhost:8080 for local testing. This works well for predictive ML containers (small images, no GPU dependency). LLM containers are large and typically require GPU resources, so local deployment may not be practical for those.
714

815
### AWS CodeBuild
9-
For CI/CD workflows, `./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for building containers destined for production endpoints, as it avoids architecture mismatches between local machines and deployment instances.
16+
17+
`./do/submit` creates an AWS CodeBuild project that builds the Docker image and pushes it to ECR in a single step. This is the preferred method for production containers, as it avoids architecture mismatches and provides fast network access to base image registries.
1018

1119
## Deployment Targets
1220

1321
MCC supports two deployment targets, selected during project generation via the `--deployment-target` option. The chosen target determines how `./do/deploy`, `./do/test`, `./do/clean`, and `./do/logs` behave.
1422

15-
#### Local Deployment
16-
Local endpoints can be deployed once the image has been built. Local deployments are most easily accommodated by users who elect to build the container locally. Otherwise, users will have to download the container image from Amazon ECR to launch it locally.
17-
18-
!!! warning "Local LLM Containers"
19-
Local deployment should be used sparingly. Predictive containers built on ML frameworks like XGBoost can easily be launched locally given their relatively small size and lack of GPU dependencies. This capability may not work for LLM-based serving frameworks. Images built from SGLang for example are quite large, and require GPU resources to be made available to your container.
23+
### SageMaker Managed Inference (`managed-inference`)
2024

21-
#### Amazon SageMaker AI Managed Inference (`managed-inference`)
22-
Amazon SageMaker AI Managed Inference is the default deployment target for MCC containers. When this target is selected, `./do/deploy` provisions resources using the SageMaker Inference Components API:
25+
The default deployment target. `./do/deploy` provisions resources using the SageMaker Inference Components API:
2326

2427
1. **Create endpoint configuration** -- specifies the instance type and count
2528
2. **Create endpoint** -- provisions the compute infrastructure
2629
3. **Create inference component** -- associates the ECR container image with the endpoint
2730

2831
The inference component model decouples compute provisioning from model deployment, allowing multiple models to share a single endpoint. Once the inference component reaches `InService` status, the endpoint is accessible via the SageMaker Runtime API for real-time inference requests.
2932

30-
Users select their preferred instance type, family, and size when generating an MCC project. The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
33+
The generated `do/config` file stores the `INSTANCE_TYPE` and optionally `INFERENCE_AMI_VERSION` for controlling the CUDA driver version on the instance.
3134

3235
After deployment, `./do/test` validates the endpoint by invoking inference through the inference component, `./do/logs` tails CloudWatch logs, and `./do/clean endpoint` tears down the inference component, endpoint, and endpoint configuration.
3336

34-
!!! info "Real-Time Only"
35-
At this time, real-time endpoints are the only supported SageMaker AI managed inference endpoints supported by MCC.
37+
Only real-time endpoints are supported at this time.
3638

37-
#### Amazon SageMaker HyperPod EKS (`hyperpod-eks`)
38-
For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes. When this target is selected:
39+
### SageMaker HyperPod EKS (`hyperpod-eks`)
40+
41+
For existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hyperpod/) clusters running on Amazon EKS, MCC can deploy containers directly to Kubernetes:
3942

4043
- `./do/deploy` retrieves the underlying EKS cluster from the HyperPod cluster, configures `kubectl`, and applies Kubernetes manifests (Deployment, Service, ConfigMap, and optionally PVC for FSx storage) to the specified namespace.
4144
- `./do/test hyperpod` port-forwards the Kubernetes service and runs the same `/ping` and `/invocations` health checks used for managed inference.
@@ -44,18 +47,20 @@ For users with existing [SageMaker HyperPod](https://aws.amazon.com/sagemaker/hy
4447

4548
The generated `do/config` file stores HyperPod-specific variables: `HYPERPOD_CLUSTER_NAME`, `HYPERPOD_NAMESPACE`, `HYPERPOD_REPLICAS`, and optionally `FSX_VOLUME_HANDLE`.
4649

47-
!!! note "Prerequisites for HyperPod EKS"
48-
- An existing SageMaker HyperPod cluster with EKS orchestrator
49-
- `kubectl` installed locally
50-
- IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
51-
- Sufficient node capacity (especially GPU nodes for LLM workloads)
50+
Prerequisites:
51+
52+
- An existing SageMaker HyperPod cluster with EKS orchestrator
53+
- `kubectl` installed locally
54+
- IAM permissions for `sagemaker:DescribeCluster` and `eks:DescribeCluster`
55+
- Sufficient node capacity (especially GPU nodes for LLM workloads)
5256

5357
## Lifecycle Scripts Reference
5458

5559
All generated projects include these `do/` scripts:
5660

5761
| Command | Description |
5862
|---------|-------------|
63+
| `./do/config` | Centralized configuration for all scripts (sourced, not executed) |
5964
| `./do/build` | Build Docker image locally |
6065
| `./do/push` | Push image to Amazon ECR |
6166
| `./do/run` | Run container locally on port 8080 |
@@ -64,6 +69,7 @@ All generated projects include these `do/` scripts:
6469
| `./do/logs` | Tail logs (CloudWatch for managed-inference, kubectl for HyperPod) |
6570
| `./do/clean <target>` | Clean up resources (local, ecr, endpoint/hyperpod, codebuild, all) |
6671
| `./do/export` | Export current configuration as a reproducible `yo` CLI command |
72+
| `./do/register` | Capture deployment to the deployment registry |
6773
| `./do/submit` | Submit build to AWS CodeBuild (CodeBuild build target only) |
6874

69-
Configuration for all scripts is centralized in `do/config`. See the generated `do/README.md` for detailed documentation on each command.
75+
See the generated `do/README.md` for detailed documentation on each command.

docs/dev/generator-architecture.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Generator Architecture
2+
3+
The generator is a [Yeoman](https://yeoman.io/) generator that runs through four lifecycle phases. The entry point is `generators/app/index.js`, which delegates to specialized modules in `generators/app/lib/`.
4+
5+
## Lifecycle Phases
6+
7+
### Phase 1: `initializing()`
8+
9+
Loads configuration from all sources and initializes the registry system.
10+
11+
1. `CliHandler` checks for subcommands (`mcp`, `registry`, `help`, `configure`). If one matches, it executes and sets `_helpShown` to skip remaining phases.
12+
2. `ConfigManager.loadConfiguration()` merges values from 8 sources in precedence order: CLI options, CLI arguments, environment variables, CLI config file, custom config file (`config/mcp.json`), package.json section, MCP servers, and generator defaults. It also queries configured MCP servers via `McpClient`.
13+
3. `ConfigurationManager` loads the three registries through `RegistryLoader`, which reads catalog JSON files from `servers/*/catalogs/` and transforms them into internal data shapes.
14+
4. `ValidationEngine` is initialized with accelerator validators (CUDA, Neuron, ROCm, CPU) for later use.
15+
16+
### Phase 2: `prompting()`
17+
18+
If `--skip-prompts` is set, `ConfigManager.getFinalConfiguration()` returns the merged config directly. Otherwise:
19+
20+
1. `PromptRunner.run()` executes prompts in phases: Infrastructure (region, deployment target, instance type, HyperPod/async/batch settings, build target), Core ML (deployment config, engine, model format, model name, base image, HF token), Modules (sample model, testing), and Project (name, directory).
21+
2. Prompt definitions live in `prompts.js`. Each prompt group is an exported array. The deployment config prompt presents a flat list of 15 `architecture-backend` values (e.g., `transformers-vllm`, `triton-fil`). `DeploymentConfigResolver.decompose()` splits these into `architecture` and `backend` fields.
22+
3. `PromptRunner` queries MCP servers for instance type and region choices before presenting those prompts, merging MCP-provided choices into the prompt options.
23+
4. Prompt answers are merged with the base config via `ConfigManager.getFinalConfiguration(promptAnswers)`.
24+
25+
### Phase 3: `writing()`
26+
27+
1. `TemplateManager.validate()` checks that the deployment config, build target, deployment target, instance type, and region are all within supported values. It also enforces GPU requirements for GPU-only backends.
28+
2. `CommentGenerator` produces Dockerfile comments (accelerator info, validation status, troubleshooting).
29+
3. All templates are copied with `fs.copyTpl()`, processing EJS variables. A small set of ignore patterns excludes architecture-specific subdirectories (`triton/`, `diffusors/`, `hyperpod/`) that are handled separately.
30+
4. A four-way `switch` on `architecture` (http, transformers, triton, diffusors) deletes files that don't belong to the selected architecture and, for triton and diffusors, copies architecture-specific templates (Dockerfile, model repository, serve scripts).
31+
5. Shell scripts in `do/` and `deploy/` get `chmod 755`.
32+
33+
### Phase 4: `end()`
34+
35+
Runs `train_abalone.py` if a sample model was requested (http and eligible triton backends only). Sets executable permissions on generated scripts.
36+
37+
## Key Modules
38+
39+
| Module | Purpose |
40+
|--------|---------|
41+
| `config-manager.js` | 8-level configuration precedence, MCP integration, parameter matrix |
42+
| `prompt-runner.js` | Phased prompt execution, MCP choice injection, catalog data loading |
43+
| `prompts.js` | All prompt definitions, instance type registry from catalog, project name generation |
44+
| `template-manager.js` | Validates deployment config, build target, deployment target, instance type, region, GPU requirements, HyperPod config, async/batch config |
45+
| `configuration-manager.js` | Orchestrates registry loading, framework/model matching, HuggingFace enrichment, env var validation |
46+
| `registry-loader.js` | Adapter layer: reads catalog JSON from `servers/*/catalogs/` and transforms into internal shapes |
47+
| `deployment-config-resolver.js` | Decomposes `transformers-vllm` into `{architecture: 'transformers', backend: 'vllm'}` |
48+
| `mcp-client.js` | Spawns MCP server processes, performs handshake, calls `get_ml_config` tool |
49+
| `validation-engine.js` | Validates accelerator compatibility (framework requirements vs. instance capabilities) |
50+
| `deployment-registry.js` | CRUD operations for the local deployment registry (`~/.mcc-registry/`) |
51+
52+
## Configuration Flow
53+
54+
The configuration precedence system is documented in the [Configuration](../configuration.md) user guide. From a code perspective, the flow is:
55+
56+
1. `ConfigManager` constructor builds a parameter matrix defining which parameters are accepted from which sources.
57+
2. `loadConfiguration()` applies sources in reverse precedence order (lowest first), so higher-precedence sources overwrite lower ones.
58+
3. MCP servers are queried during loading. `McpClient` spawns each configured server as a child process, performs the MCP handshake, and calls the `get_ml_config` tool. Returned values and choices are stored separately -- values merge into the config, choices are injected into prompt options.
59+
4. `getFinalConfiguration(promptAnswers)` merges prompt answers (lowest precedence) with the accumulated config and applies `DeploymentConfigResolver` to decompose the `deploymentConfig` string into `architecture` and `backend`.
60+
5. `_ensureTemplateVariables()` in `index.js` fills in defaults for any missing fields, merges environment variables from catalog sources with a five-layer precedence (catalog defaults, framework profile, model entry, model profile, CLI overrides), and enriches transformer models with HuggingFace data.

0 commit comments

Comments
 (0)