Skip to content

Latest commit

 

History

History
829 lines (679 loc) · 21.8 KB

File metadata and controls

829 lines (679 loc) · 21.8 KB

SageMaker Integration - Deploy Your Custom AI Models

FULLY IMPLEMENTED: NeuroLink now supports Amazon SageMaker, enabling you to deploy and use your own custom trained models through NeuroLink's unified interface. All features documented below are complete and production-ready.

What is SageMaker Integration?

SageMaker integration transforms NeuroLink into a platform for custom AI model deployment, offering:

  • Custom Model Hosting - Deploy your fine-tuned models on AWS infrastructure
  • Cost Control - Pay only for inference usage with auto-scaling capabilities
  • Enterprise Security - Full control over model infrastructure and data privacy
  • Performance - Dedicated compute resources with predictable latency
  • Global Deployment - Available in all major AWS regions
  • Monitoring - Built-in CloudWatch metrics and logging

Quick Start

1. Deploy Your Model to SageMaker

First, you need a model deployed to a SageMaker endpoint:

# Example: Deploy a Hugging Face model to SageMaker
from sagemaker.huggingface import HuggingFaceModel

# Create model
huggingface_model = HuggingFaceModel(
    model_data="s3://your-bucket/model.tar.gz",
    role=role,
    transformers_version="4.21",
    pytorch_version="1.12",
    py_version="py39",
)

# Deploy to endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="my-custom-model-endpoint"
)

2. Configure NeuroLink

# Set AWS credentials and SageMaker configuration
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"
export SAGEMAKER_DEFAULT_ENDPOINT="my-custom-model-endpoint"

3. Use with CLI

# Test SageMaker endpoint connectivity
npx @juspay/neurolink sagemaker status

# Generate content with your custom model
npx @juspay/neurolink generate "Analyze this business scenario" --provider sagemaker

# Use specific endpoint
npx @juspay/neurolink generate "Domain-specific task" --provider sagemaker --model my-domain-model

# Performance benchmark
npx @juspay/neurolink sagemaker benchmark my-custom-model-endpoint

4. Use with SDK

import { NeuroLink } from "@juspay/neurolink";

// Create NeuroLink instance
const neurolink = new NeuroLink();

// Generate with default endpoint
const result = await neurolink.generate({
  input: { text: "Analyze customer feedback for sentiment and themes" },
  provider: "sagemaker",
});

// Use specific endpoint
const domainResult = await neurolink.generate({
  input: { text: "Industry-specific analysis request" },
  provider: "sagemaker",
  model: "domain-expert-model-endpoint",
});

Key Benefits

Custom Model Deployment

Deploy any model you've trained or fine-tuned:

import { NeuroLink } from "@juspay/neurolink";

// Example: Using different specialized models
const models = {
  sentiment: "sentiment-analysis-model",
  translation: "multilingual-translation-model",
  summarization: "document-summarizer-model",
  domain: "healthcare-specialist-model",
};

async function analyzeWithSpecializedModel(text: string, task: string) {
  const endpoint = models[task] || models.sentiment;
  const neurolink = new NeuroLink();

  const result = await neurolink.generate({
    input: { text: `${task}: ${text}` },
    provider: "sagemaker",
    model: endpoint,
    temperature: 0.3, // Lower for specialized tasks
    timeout: "45s",
  });

  return {
    analysis: result.content,
    model: endpoint,
    task: task,
  };
}

// Usage
const sentimentResult = await analyzeWithSpecializedModel(
  "The product quality has really improved recently!",
  "sentiment",
);

const summaryResult = await analyzeWithSpecializedModel(
  "Long document content here...",
  "summarization",
);

Cost Optimization

SageMaker enables precise cost control through multiple deployment options:

import { NeuroLink } from "@juspay/neurolink";

class CostOptimizedSageMaker {
  private neurolink: NeuroLink;
  private endpoints: {
    cheap: string; // Small instance, basic model
    balanced: string; // Medium instance, good model
    premium: string; // Large instance, best model
  };

  constructor() {
    this.neurolink = new NeuroLink();
    this.endpoints = {
      cheap: "cost-effective-model",
      balanced: "production-model",
      premium: "high-performance-model",
    };
  }

  async generateOptimized(
    prompt: string,
    priority: "cost" | "balanced" | "quality" = "balanced",
  ) {
    const endpoint = this.endpoints[priority];

    const startTime = Date.now();
    const result = await this.neurolink.generate({
      input: { text: prompt },
      provider: "sagemaker",
      model: endpoint,
      timeout: priority === "cost" ? "15s" : "45s", // Faster timeout for cost model
    });
    const responseTime = Date.now() - startTime;

    return {
      content: result.content,
      endpoint: endpoint,
      priority: priority,
      responseTime: responseTime,
      estimatedCost: this.calculateCost(responseTime, priority),
    };
  }

  private calculateCost(responseTime: number, priority: string): number {
    const rates = {
      cost: 0.0001, // $0.0001 per second
      balanced: 0.0005, // $0.0005 per second
      quality: 0.002, // $0.002 per second
    };

    return (responseTime / 1000) * rates[priority];
  }
}

// Usage
const optimizer = new CostOptimizedSageMaker();

// Cost-effective for simple tasks
const cheapResult = await optimizer.generateOptimized(
  "Simple classification task",
  "cost",
);

// High-quality for complex analysis
const premiumResult = await optimizer.generateOptimized(
  "Complex business strategy analysis",
  "quality",
);

console.log(
  `Cost difference: $${premiumResult.estimatedCost - cheapResult.estimatedCost}`,
);

Enterprise Security & Compliance

Full control over your model infrastructure:

import { NeuroLink } from "@juspay/neurolink";

class SecureSageMakerProvider {
  private neurolink: NeuroLink;
  private region: string;
  private vpcConfig?: {
    securityGroups: string[];
    subnets: string[];
  };

  constructor(region: string, vpcConfig?: any) {
    this.neurolink = new NeuroLink();
    this.region = region;
    this.vpcConfig = vpcConfig;
  }

  async secureGenerate(
    prompt: string,
    endpoint: string,
    securityContext: {
      userId: string;
      department: string;
      clearanceLevel: "public" | "internal" | "confidential";
    },
  ) {
    // Audit logging
    console.log(
      `[AUDIT] User ${securityContext.userId} from ${securityContext.department} requesting ${securityContext.clearanceLevel} generation`,
    );

    const result = await this.neurolink.generate({
      input: { text: prompt },
      provider: "sagemaker",
      model: endpoint,
      timeout: "30s",
      // Custom metadata for tracking
      context: {
        user: securityContext.userId,
        department: securityContext.department,
        classification: securityContext.clearanceLevel,
        timestamp: new Date().toISOString(),
      },
    });

    // Log successful completion
    console.log(
      `[AUDIT] Generation completed for user ${securityContext.userId}`,
    );

    return {
      ...result,
      securityContext,
      complianceInfo: {
        dataResidency: this.region,
        encryptionAtRest: true,
        encryptionInTransit: true,
        auditLogged: true,
      },
    };
  }
}

// Usage
const secureProvider = new SecureSageMakerProvider("us-east-1", {
  securityGroups: ["sg-12345"],
  subnets: ["subnet-abc123"],
});

const secureResult = await secureProvider.secureGenerate(
  "Analyze sensitive customer data",
  "hipaa-compliant-model",
  {
    userId: "john.doe@company.com",
    department: "healthcare",
    clearanceLevel: "confidential",
  },
);

Advanced Model Management

Multi-Model Endpoints

Manage multiple models through a single endpoint:

import { NeuroLink } from "@juspay/neurolink";

class MultiModelSageMaker {
  private neurolink: NeuroLink;
  private multiModelEndpoint: string;
  private models: Map<string, string>;

  constructor(endpoint: string) {
    this.neurolink = new NeuroLink();
    this.multiModelEndpoint = endpoint;
    this.models = new Map([
      ["sentiment", "sentiment-v2.tar.gz"],
      ["translation", "translate-v1.tar.gz"],
      ["summarization", "summary-v3.tar.gz"],
    ]);
  }

  async generateWithModel(
    prompt: string,
    modelType: string,
    options: {
      temperature?: number;
      maxTokens?: number;
    } = {},
  ) {
    const modelPath = this.models.get(modelType);
    if (!modelPath) {
      throw new Error(`Model type '${modelType}' not available`);
    }

    const result = await this.neurolink.generate({
      input: { text: prompt },
      provider: "sagemaker",
      model: this.multiModelEndpoint,
      temperature: options.temperature || 0.7,
      maxTokens: options.maxTokens || 500,
      // SageMaker-specific: target model for multi-model endpoint
      targetModel: modelPath,
    });

    return {
      ...result,
      modelType: modelType,
      modelPath: modelPath,
    };
  }

  async compareModels(prompt: string, modelTypes: string[]) {
    const comparisons = await Promise.all(
      modelTypes.map(async (modelType) => {
        try {
          const result = await this.generateWithModel(prompt, modelType);
          return {
            modelType,
            success: true,
            response: result.content,
            responseTime: result.responseTime,
          };
        } catch (error) {
          return {
            modelType,
            success: false,
            error: error.message,
          };
        }
      }),
    );

    return comparisons;
  }
}

// Usage
const multiModel = new MultiModelSageMaker("multi-model-endpoint");

// Use specific model
const sentimentResult = await multiModel.generateWithModel(
  "I love this new feature!",
  "sentiment",
);

// Compare multiple models
const comparison = await multiModel.compareModels(
  "Analyze this text for insights",
  ["sentiment", "summarization"],
);

Health Monitoring & Auto-Recovery

import { NeuroLink } from "@juspay/neurolink";

class SageMakerHealthMonitor {
  private neurolink: NeuroLink;
  private endpoints: string[];
  private healthStatus: Map<string, boolean>;
  private failureCount: Map<string, number>;

  constructor(endpoints: string[]) {
    this.neurolink = new NeuroLink();
    this.endpoints = endpoints;
    this.healthStatus = new Map();
    this.failureCount = new Map();
  }

  async checkHealth(endpoint: string): Promise<boolean> {
    try {
      const result = await this.neurolink.generate({
        input: { text: "health check" },
        provider: "sagemaker",
        model: endpoint,
        timeout: "10s",
        maxTokens: 10,
      });

      this.healthStatus.set(endpoint, true);
      this.failureCount.set(endpoint, 0);
      return true;
    } catch (error) {
      this.healthStatus.set(endpoint, false);
      const failures = this.failureCount.get(endpoint) || 0;
      this.failureCount.set(endpoint, failures + 1);
      return false;
    }
  }

  async generateWithFailover(prompt: string) {
    for (const endpoint of this.endpoints) {
      const isHealthy = await this.checkHealth(endpoint);

      if (isHealthy) {
        try {
          const result = await this.neurolink.generate({
            input: { text: prompt },
            provider: "sagemaker",
            model: endpoint,
            timeout: "30s",
          });

          return {
            ...result,
            endpoint: endpoint,
            failoverUsed: this.endpoints.indexOf(endpoint) > 0,
          };
        } catch (error) {
          console.warn(`Endpoint ${endpoint} failed, trying next...`);
          continue;
        }
      }
    }

    throw new Error("All SageMaker endpoints are unavailable");
  }

  getHealthReport() {
    return {
      endpoints: this.endpoints,
      health: Object.fromEntries(this.healthStatus),
      failures: Object.fromEntries(this.failureCount),
      healthyCount: Array.from(this.healthStatus.values()).filter(Boolean)
        .length,
      totalEndpoints: this.endpoints.length,
    };
  }
}

// Usage
const monitor = new SageMakerHealthMonitor([
  "primary-model-endpoint",
  "backup-model-endpoint",
  "fallback-model-endpoint",
]);

// Generate with automatic failover
const result = await monitor.generateWithFailover(
  "Important business analysis request",
);

// Get health status
const healthReport = monitor.getHealthReport();
console.log("Endpoint Health:", healthReport);

Advanced Configuration

Serverless Inference

Configure SageMaker for serverless inference:

Educational Example - Custom Wrapper Pattern

The coldStartTimeout parameter below is a user-defined convenience variable, not a native NeuroLink SDK option. This example demonstrates how you can create wrapper functions with custom options that map to standard SDK parameters.

The coldStartTimeout value is passed to the standard timeout option internally.

import { NeuroLink } from "@juspay/neurolink";

class ServerlessSageMaker {
  private neurolink: NeuroLink;
  private serverlessEndpoint: string;

  constructor(endpoint: string) {
    this.neurolink = new NeuroLink();
    this.serverlessEndpoint = endpoint;
  }

  async generateServerless(
    prompt: string,
    options: {
      // NOTE: coldStartTimeout is a custom wrapper option (not SDK native)
      // It maps to the standard `timeout` parameter for SageMaker cold starts
      coldStartTimeout?: string;
      maxConcurrency?: number;
      memorySize?: number;
    } = {},
  ) {
    const {
      coldStartTimeout = "2m", // Longer timeout for cold starts
      maxConcurrency = 10,
      memorySize = 4096,
    } = options;

    const startTime = Date.now();
    const result = await this.neurolink.generate({
      input: { text: prompt },
      provider: "sagemaker",
      model: this.serverlessEndpoint,
      timeout: coldStartTimeout,
      // Serverless-specific metadata
      context: {
        deployment: "serverless",
        maxConcurrency,
        memorySize,
      },
    });
    const totalTime = Date.now() - startTime;

    return {
      ...result,
      serverlessMetrics: {
        totalTime,
        coldStart: totalTime > 10000, // Assume cold start if > 10s
        configuration: {
          maxConcurrency,
          memorySize,
        },
      },
    };
  }

  async batchServerless(prompts: string[], batchSize: number = 5) {
    const results = [];

    // Process in batches to respect concurrency limits
    for (let i = 0; i < prompts.length; i += batchSize) {
      const batch = prompts.slice(i, i + batchSize);

      const batchResults = await Promise.all(
        batch.map((prompt) => this.generateServerless(prompt)),
      );

      results.push(...batchResults);

      // Brief pause between batches
      if (i + batchSize < prompts.length) {
        await new Promise((resolve) => setTimeout(resolve, 1000));
      }
    }

    return results;
  }
}

// Usage
const serverless = new ServerlessSageMaker("serverless-model-endpoint");

// Single serverless generation
const result = await serverless.generateServerless(
  "Analyze market trends for Q4 2024",
  {
    coldStartTimeout: "3m",
    maxConcurrency: 20,
    memorySize: 8192,
  },
);

// Batch serverless processing
const prompts = [
  "Summarize customer feedback",
  "Analyze competitor pricing",
  "Generate product recommendations",
];

const batchResults = await serverless.batchServerless(prompts, 3);

Testing and Validation

Model Performance Testing

import { NeuroLink } from "@juspay/neurolink";

class SageMakerPerformanceTester {
  private neurolink: NeuroLink;
  private endpoint: string;
  private baseline: {
    latency: number;
    accuracy: number;
    throughput: number;
  };

  constructor(endpoint: string, baseline: any) {
    this.neurolink = new NeuroLink();
    this.endpoint = endpoint;
    this.baseline = baseline;
  }

  async loadTest(
    prompts: string[],
    concurrency: number = 5,
    duration: number = 60000, // 1 minute
  ) {
    const results = [];
    const startTime = Date.now();
    let requestCount = 0;
    let errorCount = 0;

    while (Date.now() - startTime < duration) {
      const batchPromises = Array(concurrency)
        .fill(null)
        .map(async (_, index) => {
          const prompt = prompts[requestCount % prompts.length];
          requestCount++;

          try {
            const requestStart = Date.now();
            const result = await this.neurolink.generate({
              input: { text: prompt },
              provider: "sagemaker",
              model: this.endpoint,
              timeout: "30s",
            });
            const latency = Date.now() - requestStart;

            return {
              success: true,
              latency,
              responseLength: result.content.length,
              requestId: requestCount,
            };
          } catch (error) {
            errorCount++;
            return {
              success: false,
              error: error.message,
              requestId: requestCount,
            };
          }
        });

      const batchResults = await Promise.all(batchPromises);
      results.push(...batchResults);

      // Brief pause between batches
      await new Promise((resolve) => setTimeout(resolve, 100));
    }

    return this.analyzeResults(results, requestCount, errorCount);
  }

  private analyzeResults(
    results: any[],
    totalRequests: number,
    errors: number,
  ) {
    const successfulResults = results.filter((r) => r.success);
    const latencies = successfulResults.map((r) => r.latency);

    const avgLatency = latencies.reduce((a, b) => a + b, 0) / latencies.length;
    const p95Latency = latencies.sort((a, b) => a - b)[
      Math.floor(latencies.length * 0.95)
    ];
    const throughput = totalRequests / 60; // requests per second
    const errorRate = (errors / totalRequests) * 100;

    return {
      performance: {
        averageLatency: avgLatency,
        p95Latency: p95Latency,
        throughput: throughput,
        errorRate: errorRate,
        totalRequests: totalRequests,
      },
      comparison: {
        latencyChange:
          ((avgLatency - this.baseline.latency) / this.baseline.latency) * 100,
        throughputChange:
          ((throughput - this.baseline.throughput) / this.baseline.throughput) *
          100,
      },
      status: this.getPerformanceStatus(avgLatency, throughput, errorRate),
    };
  }

  private getPerformanceStatus(
    latency: number,
    throughput: number,
    errorRate: number,
  ) {
    if (errorRate > 5) return "POOR";
    if (latency > this.baseline.latency * 1.5) return "DEGRADED";
    if (throughput < this.baseline.throughput * 0.8) return "DEGRADED";
    return "GOOD";
  }
}

// Usage
const tester = new SageMakerPerformanceTester("performance-test-endpoint", {
  latency: 2000, // 2 seconds baseline
  accuracy: 0.95, // 95% accuracy baseline
  throughput: 10, // 10 requests/second baseline
});

const testPrompts = [
  "Analyze customer sentiment",
  "Generate product description",
  "Summarize business report",
  "Classify support ticket",
];

const performanceReport = await tester.loadTest(testPrompts, 10, 120000); // 2 minutes
console.log("Performance Report:", performanceReport);

Troubleshooting

Common Issues

1. "Endpoint not found" Error

# Check if endpoint exists
aws sagemaker describe-endpoint --endpoint-name your-endpoint-name

# Check endpoint status
npx @juspay/neurolink sagemaker status

2. "Access denied" Error

# Verify IAM permissions
aws sts get-caller-identity

# Test IAM policy
aws sagemaker invoke-endpoint --endpoint-name your-endpoint --body '{"inputs": "test"}' --content-type application/json /tmp/output.json

3. "Model not loading" Error

# Check endpoint health
npx @juspay/neurolink sagemaker test your-endpoint

# Monitor CloudWatch logs
aws logs describe-log-groups --log-group-name-prefix /aws/sagemaker/Endpoints

Debug Mode

# Enable debug output
export NEUROLINK_DEBUG=true
npx @juspay/neurolink generate "test" --provider sagemaker --debug

Related Documentation

Other Provider Integrations

Why Choose SageMaker Integration?

For AI/ML Teams

  • Custom Models: Deploy your own fine-tuned models
  • Experimentation: A/B test different model versions
  • Performance Control: Dedicated compute resources
  • Cost Transparency: Clear pricing per inference request

For Enterprises

  • Data Privacy: Models run in your AWS account
  • Compliance: Meet industry-specific requirements
  • Scalability: Auto-scaling from zero to thousands of requests
  • Integration: Seamless fit with existing AWS infrastructure

For Production

  • Reliability: Multi-AZ deployment options
  • Monitoring: CloudWatch integration for metrics and logs
  • Security: VPC, encryption, and IAM controls
  • Performance: Predictable latency and throughput

Ready to deploy your custom models? Follow the Quick Start guide above to begin using your own AI models through NeuroLink's SageMaker integration today!