AI and ML Server Setup

This is the code that accompanies the AI Server from Scratch in AWS video.

This project walks through setting up an AWS EC2 instance optimized for generative AI and machine learning tasks, using NVIDIA and Docker on Ubuntu.

Prerequisites

An AWS account
Basic knowledge of:
- AWS (EC2)
- Virtual Machines
- Linux / Command Line / SSH
Familiarity with Docker and containerization concepts

Goals

Configure a compute-optimized VM from scratch (starting with a blank Ubuntu image)
Ensure portability to avoid vendor lock-in and reduce dependence on platform-specific tools
Create a flexible environment suitable for various AI and ML frameworks
Streamline deployments by creating a custom image with Packer

Why EC2 vs Bedrock, HuggingFace Endpoints, etc.

An EC2 is not the perfect choice for all projects. There may be cases where a hosted API / an API that charges a price per token is cheaper and / or easier to get up an running (an API will almost always be easier to get up and running).

Here are the reason this guide uses an EC2:

Price ceiling: the price for the EC2 and storage has a predictable ceiling, so there shouldn't be any surprise bills
Versatile: since an EC2 is ostensibly a VM (with a GPU), we can run all kinds of projects on it, not just the use cases that are supported by an API / platform provider
Portable: the steps to configure the EC2 should generally apply to other Linux instances

Setup / Instructions / Notes

NVIDIA Drivers / CUDA

If you run into issue like RuntimeError: No CUDA GPUs are available or Failed to initialize NVML: Driver/library version mismatch just try rebooting.

https://stackoverflow.com/a/43023000

AWS

Make sure to create an aws.env file with your settings. cd aws; cp aws.env.example aws.env

See the aws directory. There is an example setup script that will configure the specified EC2 and a deploy script that copy the code to the EC2 and start the stack.

NOTE: if running on a non standard port (not 80 or 443) make sure to allow inbound traffic to the EC2 on that port.

Lambda Auto Stop EC2

An example Lambda function is provided to stop all EC2 instances. The demo'd use for this is to stop all EC2s at midnight to prevent long running instances from incurring a higher bill.

Packer

An example Packer file is provided to generate a custom AMI from the setup steps in this guide. This will allow you to quickly deploy a baseline instance with NVIDIA / Docker support without having to reconfigure from scratch every time.

llama.cpp

llama.cpp is included just to test that the server is working as expected, both natively and in a container. It is added as a git submodule.

Compile with CUDA and Support for NVIDIA T4 (nvidia 7.5)

make clean
CUDA_DOCKER_ARCH=compute_75 GGML_CUDA=1 make

Download Model

cd models

# Use Bartowski gguf to fix issues with llama.cpp
curl -O -L "https://huggingface.co/bartowski/Phi-3.1-mini-4k-instruct-GGUF/resolve/main/Phi-3.1-mini-4k-instruct-Q6_K_L.gguf"

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
aws		aws
models		models
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
run_server.sh		run_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI and ML Server Setup

Prerequisites

Goals

Why EC2 vs Bedrock, HuggingFace Endpoints, etc.

Setup / Instructions / Notes

NVIDIA Drivers / CUDA

AWS

Lambda Auto Stop EC2

Packer

llama.cpp

Compile with CUDA and Support for NVIDIA T4 (nvidia 7.5)

Download Model

Resources

About

Uh oh!

Releases

Packages

Languages

License

matthewhaynesonline/ai-server-setup

Folders and files

Latest commit

History

Repository files navigation

AI and ML Server Setup

Prerequisites

Goals

Why EC2 vs Bedrock, HuggingFace Endpoints, etc.

Setup / Instructions / Notes

NVIDIA Drivers / CUDA

AWS

Lambda Auto Stop EC2

Packer

llama.cpp

Compile with CUDA and Support for NVIDIA T4 (nvidia 7.5)

Download Model

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages