In the previous module, we used OpenAI via OpenAI API. It's a very convenient way to use an LLM, but you have to pay for the usage, and you don't have control over the model you get to use.
In this module, we'll look at using open-source LLMs instead.
- Open-Source LLMs
- Replacing the LLM box in the RAG flow
- Registering in Saturn Cloud
- Configuring secrets and git
- Creating an instance with a GPU
pip install -U transformers accelerate bitsandbytes sentencepieceLinks:
Google Colab as an alternative:
- Model:
google/flan-t5-xl - Notebook: huggingface-flan-t5.ipynb
import os
os.environ['HF_HOME'] = '/run/cache/'Links:
- https://huggingface.co/google/flan-t5-xl
- https://huggingface.co/docs/transformers/en/model_doc/flan-t5
Explanation of Parameters:
max_length: Set this to a higher value if you want longer responses. For example,max_length=300.num_beams: Increasing this can lead to more thorough exploration of possible sequences. Typical values are between 5 and 10.do_sample: Set this toTrueto use sampling methods. This can produce more diverse responses.temperature: Lowering this value makes the model more confident and deterministic, while higher values increase diversity. Typical values range from 0.7 to 1.5.top_kandtop_p: These parameters control nucleus sampling.top_klimits the sampling pool to the topktokens, whiletop_puses cumulative probability to cut off the sampling pool. Adjust these based on the desired level of randomness.
- Model:
microsoft/Phi-3-mini-128k-instruct - Notebook: huggingface-phi3.ipynb
Links:
- Model:
mistralai/Mistral-7B-v0.1 - Notebook: huggingface-mistral-7b.ipynb
ChatGPT instructions for serving
Links:
- https://huggingface.co/docs/transformers/en/llm_tutorial
- https://huggingface.co/settings/tokens
- https://huggingface.co/mistralai/Mistral-7B-v0.1
Where to find them:
- Leaderboards
- ChatGPT
Links:
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- https://huggingface.co/spaces/optimum/llm-perf-leaderboard
- The easiest way to run an LLM without a GPU is using Ollama
- Notebook ollama.ipynb
For Linux:
curl -fsSL https://ollama.com/install.sh | sh
ollama start
ollama pull phi3
ollama run phi3Connecting to it with OpenAI API:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama',
)Docker
docker run -it \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollamaPulling the model
docker exec -it ollama bash
ollama pull phi3-
Creating a Docker-Compose file
-
Re-running the module 1 notebook
-
Notebook: rag-intro.ipynb
- Putting it in Streamlit
- Code
If you want to learn more about streamlit, you can use this material from our repository with projects of the week.
See here
- Workaround by Pham Nguyen Hung to use ElasticSearch container with Saturn Cloud & Google Colab instead of minsearch
- Notes by slavaheroes
- Notes by Pham Nguyen Hung
- Notes by Marat on Open-Sourced and Closed-Sourced Models and ways to run them
- Notes by dimzachar
- Notes by Waleed
- Did you take notes? Add them above this line (Send a PR with links to your notes)