Skip to content

Files

Failed to load latest commit information.

Latest commit

 Cannot retrieve latest commit at this time.

History

History
 
 

llama_3.3_nemotron_super_49B

README.md

Detailed Thinking Mode with Llama 3.3 Nemotron Super 49B

In the notebook in this directory, we'll explore how simple it is to leverage thinking mode on, and off, using the Llama 3.3 Nemotron Super 49B NIM - a single model with the ability to modify how it generates responses through a simple toggle in the system prompt.

If you'd like to learn more about this model - please check out our blog, which goes into exactly how this model was produced.

To begin, we'll first need to download our NIM - which we can do following the detailed instructions on the Llama 3.3 Nemotron Super 49B model card on build.nvidia.com.

Downloading Our NIM

First, we'll need to generate our API key, you can find this by navigating to the "Deploy" tab on the build.nvidia.com website.

image

Next, let's login to the NVIDIA Container Registry using the following command:

docker login nvcr.io

Next, all we need to do is run the following command and wait for our NIM to spin up!

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1:latest

Using Our NIM!

We'll follow this notebook for some examples on how to use the Llama 3.3 Nemotron Super 49B NIM in both Detailed Think On, and Off mode!