TOOLS-2592 Tooling for shipping Triton service images from monitor-reef#21
Open
TOOLS-2592 Tooling for shipping Triton service images from monitor-reef#21
Conversation
Outlines the images/ directory approach for building multiple Triton zone images from a single Rust monorepo, including per-service Makefiles, SAPI integration, and a jenkins-joylib enhancement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document which services don't ship as images (bugview, jira-stub), list the reference repos needed to understand the design, and add a prerequisites checklist for the jenkins-joylib change and SmartOS testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce triton-api, a Dropshot API service that will eventually replace cloudapi. For now it has a single /ping endpoint. This also establishes the images/ directory structure for building zone images from the monorepo. - apis/triton-api: API trait with /ping endpoint - services/triton-api-server: service implementation - images/triton-api: zone image Makefile, SMF manifests, SAPI manifests, and boot script - images/image.defs.mk: shared image build definitions, sets ENGBLD_REPO_ROOT for eng Makefile compatibility - deps/eng: updated to include ENGBLD_REPO_ROOT monorepo support - .gitignore: add image build artifact patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setup.sh was committed without the execute bit, which would cause SMF postboot to fail to start. Also move smf_include.sh source before the first-boot marker check so $SMF_EXIT_OK is available for the early exit path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
$(shell) swallows exit codes, so git rev-parse and git submodule update failures would leave ENGBLD_REPO_ROOT empty and eng includes broken with confusing errors. Add explicit guards with clear messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all REPO_ROOT references in code blocks to ENGBLD_REPO_ROOT to match actual implementation. Renumber open questions (was 1,3,4,5 now 1,2,3,4). Reframe eng Makefile compatibility question to reflect that ENGBLD_REPO_ROOT already addresses the root issue. Remove local filesystem path from TODO. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Files were untracked artifacts, not committed to the branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add status and healthy fields to PingResponse matching VMAPI pattern. Move types to types/ module for consistency with other API crates. Add Clone derive and crate-level doc comment. Update server to return populated response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add triton-api dependency and ManagedApiConfig entry so make openapi-generate and openapi-check cover the new API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document that the bind address should come from the SAPI-generated config file once this service is ready for production deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*.tar.gz, bits/, proto/, make_stamps/ were repo-wide but only needed for image builds. Scope to images/*/ to avoid accidentally hiding legitimate files elsewhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rust successor to sdcadm for Triton datacenter administration. All 16 top-level commands and 47 subcommands scaffolded as stubs returning "not yet implemented". Shell completion works. Design doc covers architecture, API client strategy, and first target (post-setup portal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Internal Triton APIs get the full trait-based pipeline (API trait → OpenAPI spec → Progenitor client), not hand-written minimal clients. Builds toward correct specs from day one and means the trait is ready when we rewrite the Node.js services. jira-client is the sole exception as a large external API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 5 API clients needed for post-setup portal also unlock services, instances, avail, check-config, and check-health as low-hanging fruit. Reordered priority list to reflect this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Grafana has a known-working sdcadm implementation to validate against. Same APIs needed, but we can compare results on a real DC before applying the pattern to a brand-new service (portal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three patches applied to sapi-api.json: - GET /mode: returns plain string, not ModeResponse JSON object - POST /mode: returns 204 no content, not 200 with JSON body - POST /loglevel: returns empty 200, not JSON body Updated client-generator to use patched spec, regenerated client, and fixed CLI to handle the new response types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trait changes (canonical type fixes): - Create endpoints return 200 (matching Node.js Restify default), not 201 - LogLevelResponse.level is serde_json::Value (Bunyan returns integer) - SetLogLevelBody.level is serde_json::Value (accepts string or integer) - Add uuid and master fields to all create body types Patch additions: - GET /ping 500: documented as known limitation (Node.js returns PingResponse on 500, Progenitor can't handle multiple response types) - Create status code safety net patch (no-op since trait already fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused ModeResponse type (GET /mode is patched to return string) - Add StorageType enum for PingResponse.stor_type field - Change PingResponse.mode from String to SapiMode enum - Change get_mode trait to return SapiMode (patched to string in spec) - Change set_mode trait to HttpResponseUpdatedNoContent (native 204) - Remove dead UpdateAttributesBody re-export from sapi-client - Simplify post_mode patch to no-op (trait now generates 204 natively) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1: Add sections for enum identification, Restify response pattern cataloging, patch requirements, and hidden request fields. Phase 2: Add guidance on using Phase 1 enums, matching Restify response patterns to Dropshot types (200 not 201 for creates), and avoiding dead wrapper types. Phase 5: Add enum wire-value verification, status code checking, dead schema detection, and remaining String→enum scan. Reference: Add Restify response pattern table, Progenitor limitations section (multiple body types, text/plain, empty bodies). Orchestrator: Add Step 2b for applying OpenAPI spec patches between API generation and client generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add sapi-client and vmapi-client dependencies to tritonadm. Convert main to async with tokio. Implement `services` (alias `svcs`) and `instances` (alias `insts`) as the first real commands, replacing their stubs. Services output matches sdcadm columns (type, uuid, name, image, insts). Instances enriches SAPI data with VM alias, state, and image from VMAPI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Checks SAPI service metadata for CLOUDAPI_READONLY and DOCKER_READONLY to determine maintenance mode. Also reads DC_MAINT_MESSAGE and DC_MAINT_ETA from the sdc application metadata. Matches sdcadm output format. Supports --json for machine-readable output. Also refactors main.rs to resolve API URLs eagerly before the match, avoiding borrow-checker issues when match arms destructure cli.command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
First real write command in tritonadm. Creates the grafana SAPI service, provisions a first instance on the headnode, and optionally adds a manta NIC. Handles re-runs (reprovision if image changed). Supports --yes, --dry-run, --server, and --image flags. Uses all five API clients (SAPI, IMGAPI, VMAPI, PAPI, NAPI). Image lookup is local IMGAPI only for now (no updates server download). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --image flag now defaults to "latest" which queries updates.tritondatacenter.com for the newest grafana image and imports it into local IMGAPI if not already present. Use --image current for local-only lookup (previous behavior). Adds --channel/-C for channel selection and --updates-url for overriding the updates server. Channel resolution: --channel flag > SAPI update_channel metadata > remote default. Import uses IMGAPI's import-remote action with polling until the image reaches active state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The real Node.js IMGAPI expects source and skip_owner_check as query parameters on POST /images/:uuid?action=import-remote&source=... Our TypedClient was putting them in the request body, causing a 404. Use a direct reqwest POST with .query() to match the wire format the IMGAPI server actually expects. Also removes the unused TypedClient for local IMGAPI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds external NICs to imgapi and adminui zones, matching sdcadm's command. Required before IMGAPI can reach the updates server to import images. Refactors NIC addition into reusable add_nic_if_missing and get_service_instances helpers, shared with the manta NIC logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The external NIC must be primary so the zone's default gateway routes
through the external network, allowing IMGAPI to reach the internet
(e.g., updates.tritondatacenter.com). Without primary=true the admin
network remains the default route and external DNS resolution fails.
Changed AddNicsRequest.networks from Vec<Uuid> to Vec<serde_json::Value>
to support VMAPI's object form: {"uuid": "...", "primary": true}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `tritonadm dev` subcommand group with: - `remove-external-nics`: undo common-external-nics (remove external NICs from imgapi/adminui) - `remove-grafana`: undo post-setup grafana (destroy VM, delete SAPI instance and service) These are development helpers not present in sdcadm, for iterating on post-setup commands without manual cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The real Node.js IMGAPI returns {"code": ..., "message": ...} errors
without a request_id field. Our Progenitor-generated Error type required
request_id, causing deserialization failures on 404s during import
polling.
Fix: add IMGAPI error schema patch (same approach as CloudAPI) making
request_id optional and using "code" instead of "error_code". Point
imgapi-client at the patched spec. Also fix wait_for_image_active to
tolerate 404s during the async import workflow and add a 4-minute
timeout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All Node.js Triton services return errors as {"code": ..., "message": ...}
without request_id. Dropshot generates an Error schema with required
request_id, causing deserialization failures when clients receive error
responses from real services.
Extracted a shared patch_node_triton_error_schema function and applied
it to all six Triton API clients: CloudAPI, SAPI, IMGAPI, NAPI, PAPI,
VMAPI. Only bugview-client (our Dropshot service) and jira-client
(external API) retain the Dropshot error format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all 33 IMGAPI CLI commands into tritonadm as `tritonadm image <cmd>`. This consolidates image management into the admin tool rather than shipping a separate imgapi-cli binary. Uses tritonadm's existing --imgapi-url global flag for URL resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All IMGAPI CLI functionality is now available via `tritonadm image`. Remove the standalone imgapi-cli crate from the workspace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract ServiceConfig struct and cmd_add_service from the grafana implementation. Both grafana and portal now call the same function with different configs. Portal defaults to --image current since images are locally built, not on the updates server. Portal config: name=portal, image=user-portal, package=sdc_1024, no delegated dataset, firewall enabled, no manta NIC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single command to import a manifest + file, matching sdc-imgadm's import -m -f workflow. Reads the manifest JSON, imports it to IMGAPI, uploads the image file, and activates the image. Usage: tritonadm image import -m <manifest> -f <file> [-c gzip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 'list' as alias for 'list-images'. Update default table format to match sdc-imgadm: UUID, NAME, VERSION, FLAGS, OS, PUBLISHED columns. Flags: I=unactivated, D=disabled, P=private (non-public), X=other. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The real Node.js IMGAPI reads the action from the query string, not the request body. The TypedClient was wrapping the action in an ActionBody struct and sending it in the body, causing 404s. Fix image_action_json to pass the action via the Progenitor builder's .action() method (which sets the query parameter). Remove the ActionBody wrapper from all image action methods. This fixes import, activate, and all other image actions. Also reverts the direct-HTTP workaround in post_setup.rs import-remote since the TypedClient now works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ImportImageRequest struct drops fields like origin, tags, and requirements that the real IMGAPI expects. Send the raw JSON value as the body instead of parsing into a typed struct first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The image import command now checks if the manifest's origin image exists locally, and if not, imports it from the updates server before importing the manifest. This matches sdc-imgadm's behavior of resolving origin chains automatically. Also moved DEFAULT_UPDATES_URL to a single constant in main.rs, used by both post_setup.rs and image.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The real IMGAPI expects source and skip_owner_check as query parameters for the import-remote action, not body fields. Added these to ImageActionQuery in the API trait so the Progenitor builder exposes .source() and .skip_owner_check() methods. Updated the TypedClient's import_remote_image to use them directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The IMGAPI requires a compression parameter when uploading image files. Read it from the manifest's files[0].compression as a fallback when --compression is not explicitly passed on the command line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add remove-portal shortcut and a generic remove-service that takes a service name. All three (remove-grafana, remove-portal, remove-service) use the same shared cmd_remove_service function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Portal needs USER_PORTAL_JWT_SECRET, USER_PORTAL_KEY_ID, and USER_PORTAL_DATACENTERS in SAPI service metadata so config-agent can render the config template. Generate JWT secret from /dev/urandom, read SSH key fingerprint from headnode, build datacenter list from SDC config. Also refactors cmd_add_service args into SetupOpts struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Read /root/.ssh/sdc.id_rsa during post-setup portal and store it as USER_PORTAL_SDC_KEY in the SAPI service metadata. Config-agent renders this into /opt/smartdc/portal/etc/sdc_key via the sdc-key manifest, which the portal uses to sign requests to CloudAPI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keys provisioned via config-agent templates or manual copy-paste can acquire leading/trailing artifacts (BOM, blank lines, shell prompts) that cause strict RFC 7468 parsers to reject otherwise valid PEM data. Add normalize_pem() which extracts the -----BEGIN to -----END block, and call it in both LegacyPrivateKey::from_pem() and KeyLoader::load_from_file() before the data reaches strict parsers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port sdcadm's post-setup cloudapi into tritonadm using the existing cmd_add_service infrastructure. The cloudapi service is created during headnode setup, so this command just creates a first instance (or reprovisions an existing one with a newer image). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes CloudAPI instances without deleting the SAPI service definition, since CloudAPI is a core service created during headnode setup. This allows re-running post-setup cloudapi for development iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add tritonadm CLI, SAPI/IMGAPI/NAPI/PAPI API conversions, and zone image build infrastructure
Summary
tritonadmCLI: New operator administration tool with subcommands for post-setup (grafana, portal, common-external-nics), image management (list, import, import-remote, delete),dc-maint status, and dev teardown helpers
images/), and atriton-apiservice with SMF manifests and SAPI metadatatriton-tlscrate: Portable TLS cert loading that works on both illumos and other platformsTest plan
make package-build PACKAGE=tritonadmbuilds successfullymake package-test PACKAGE=sapi-cli/imgapi-cli/napi-cli/papi-clipassmake openapi-checkconfirms generated specs are up-to-datemake clients-checkconfirms generated client code is up-to-datemake auditpasses (with known pre-existing exceptions)