Skip to content

jalajthanaki/RayServeProdDemo

Repository files navigation

Ray Production Applications

This repository contains production-grade applications built using Ray, a distributed computing framework for Python.

Project Structure

  • text_ml.py: Main implementation file containing Ray-based text processing/ML functionality
  • text_ml_client.py: Client code to interact with the Ray application
  • serve_config.yaml: Configuration file for Ray Serve deployment
  • __init__.py: Package initialization file. This is must otherwise you will keep on getting ModuleNotFound Error

Prerequisites

  • Python 3.9+
  • Ray (latest version)

Setup Instructions

  1. Create and activate the conda environment:
conda create -n ray_examples_env python=3.9
conda activate ray_examples_env
  1. Install dependencies:
pip install -r requirements.txt

Installation

  1. Clone the repository
  2. Install dependencies:
pip install ray

Deployment

CPU Deployment

To deploy the application using Ray Serve:

  1. Build the deployment configuration:
serve build text_ml:app -o serve_config.yaml
  1. Start the Ray cluster:
ray start --head
  1. Deploy the application:
serve deploy serve_config.yaml
  1. Check deployment status:
serve status
  1. Run the client:
python text_ml_client.py
Note: serve status should be in the RUNNING and individual services are in healthy state to get the output.

CPU Autoscaling Deployment

  1. Build the deployment configuration with GPU autoscaling:
serve build text_ml:app -o serve_config_cpu_autoscalling.yaml
  1. Start the Ray cluster:
ray start --head
  1. Deploy with CPU autoscaling:
serve deploy serve_config_cpu_autoscalling.yaml
  1. Check deployment status:
serve status
  1. Monitor the deployment in Ray Dashboard:

  2. Monitor the deployment in Ray Dashboard:

ray dashboard

Authentication Configuration

To deploy and use the application with authentication:

  1. Deploy the auth-enabled configuration:
serve deploy serve_config_auth_cpu_autoscalling.yaml
  1. Check deployment status:
serve status
  1. Run the auth-enabled client:
python text_ml_auth_client.py

The auth-enabled deployment uses the following configuration:

  1. API Key Validation:

    • API keys are validated against a predefined list in text_ml_auth.py
    • Only valid API keys can access the service
  2. Authentication Flow:

    • Clients must include their API key in the request headers
    • The API key is validated before processing the request
    • Rate limits are enforced per API key
    • TODO: Add in memory database, rate limiting, and rate limiting per API key

Load Testing

To test the autoscaling behavior:

  1. Install k6:
brew install k6
  1. Run the load test:
k6 run loadtest.js
  1. Monitor the autoscaling behavior:
    • Use serve status to check replica counts
    • View the Ray Dashboard at http://localhost:8265
    • Check the number of replicas scaling up and down based on load

The load test will help verify that the autoscaling configuration is working correctly, with replicas being added and removed based on the incoming request load.

Development Commands

Debugging

To run the application with debugging enabled:

ray debug text_ml.py

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Acknowledgments

  • Ray Core Team
  • Ray Serve Team
  • Ray ML Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •