Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
docker-compose.yaml	docker-compose.yaml
homework.md	homework.md
huggingface-flan-t5.ipynb	huggingface-flan-t5.ipynb
huggingface-mistral-7b.ipynb	huggingface-mistral-7b.ipynb
huggingface-phi3.ipynb	huggingface-phi3.ipynb
ollama.ipynb	ollama.ipynb
prompt.md	prompt.md
qa_faq.py	qa_faq.py
rag-intro.ipynb	rag-intro.ipynb
serving-hugging-face-models.md	serving-hugging-face-models.md
starter.ipynb	starter.ipynb

2. Open-Source LLMs

In the previous module, we used OpenAI via OpenAI API. It's a very convenient way to use an LLM, but you have to pay for the usage, and you don't have control over the model you get to use.

In this module, we'll look at using open-source LLMs instead.

2.1 Open-Source LLMs - Introduction

Open-Source LLMs
Replacing the LLM box in the RAG flow

2.2 Using a GPU in Saturn Cloud

Registering in Saturn Cloud
Configuring secrets and git
Creating an instance with a GPU

pip install -U transformers accelerate bitsandbytes sentencepiece

Links:

Google Colab as an alternative:

2.3 FLAN-T5

Model: google/flan-t5-xl
Notebook: huggingface-flan-t5.ipynb

import os
os.environ['HF_HOME'] = '/run/cache/'

Links:

Explanation of Parameters:

max_length: Set this to a higher value if you want longer responses. For example, max_length=300.
num_beams: Increasing this can lead to more thorough exploration of possible sequences. Typical values are between 5 and 10.
do_sample: Set this to True to use sampling methods. This can produce more diverse responses.
temperature: Lowering this value makes the model more confident and deterministic, while higher values increase diversity. Typical values range from 0.7 to 1.5.
top_k and top_p: These parameters control nucleus sampling. top_k limits the sampling pool to the top k tokens, while top_p uses cumulative probability to cut off the sampling pool. Adjust these based on the desired level of randomness.

2.4 Phi 3 Mini

Model: microsoft/Phi-3-mini-128k-instruct
Notebook: huggingface-phi3.ipynb

Links:

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

2.5 Mistral-7B and HuggingFace Hub Authentication

Model: mistralai/Mistral-7B-v0.1
Notebook: huggingface-mistral-7b.ipynb

ChatGPT instructions for serving

Links:

2.6 Other models

Where to find them:

Leaderboards
Google
ChatGPT

Links:

2.7 Ollama - Running LLMs on a CPU

The easiest way to run an LLM without a GPU is using Ollama
Notebook ollama.ipynb

For Linux:

curl -fsSL https://ollama.com/install.sh | sh

ollama start
ollama pull phi3
ollama run phi3

Prompt example

Connecting to it with OpenAI API:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

Docker

docker run -it \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama

Pulling the model

docker exec -it ollama bash
ollama pull phi3

2.8 Ollama & Phi3 + Elastic in Docker-Compose

Creating a Docker-Compose file
Re-running the module 1 notebook
Notebook: rag-intro.ipynb

2.9 UI for RAG

Putting it in Streamlit
Code

If you want to learn more about streamlit, you can use this material from our repository with projects of the week.

Homework

See here

Notes

Workaround by Pham Nguyen Hung to use ElasticSearch container with Saturn Cloud & Google Colab instead of minsearch
Notes by slavaheroes
Notes by Pham Nguyen Hung
Notes by Marat on Open-Sourced and Closed-Sourced Models and ways to run them
Notes by dimzachar
Notes by Waleed
Did you take notes? Add them above this line (Send a PR with links to your notes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

2. Open-Source LLMs

2.1 Open-Source LLMs - Introduction

2.2 Using a GPU in Saturn Cloud

2.3 FLAN-T5

2.4 Phi 3 Mini

2.5 Mistral-7B and HuggingFace Hub Authentication

2.6 Other models

2.7 Ollama - Running LLMs on a CPU

2.8 Ollama & Phi3 + Elastic in Docker-Compose

2.9 UI for RAG

Homework

Notes

Uh oh!

FilesExpand file tree

02-open-source

Directory actions

More options

Directory actions

More options

Latest commit

History

02-open-source

Folders and files

parent directory

README.md

2. Open-Source LLMs

2.1 Open-Source LLMs - Introduction

2.2 Using a GPU in Saturn Cloud

2.3 FLAN-T5

2.4 Phi 3 Mini

2.5 Mistral-7B and HuggingFace Hub Authentication

2.6 Other models

2.7 Ollama - Running LLMs on a CPU

2.8 Ollama & Phi3 + Elastic in Docker-Compose

2.9 UI for RAG

Homework

Notes