Skip to content

Commit d607ecc

Browse files
authored
Add DeepSeek R1 Distill 8B (#1488)
* Add DeepSeek R1 Distill 8B * Update aliases to match Ollama * Update README
1 parent 162a38b commit d607ecc

File tree

4 files changed

+21
-3
lines changed

4 files changed

+21
-3
lines changed

README.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@
33
torchchat is a small codebase showcasing the ability to run large language models (LLMs) seamlessly. With torchchat, you can run LLMs using Python, within your own (C/C++) application (desktop or server) and on iOS and Android.
44

55
> [!IMPORTANT]
6-
> Update September 25, 2024: torchchat has multimodal support for **Llama3.2 11B**!!
6+
> Update
7+
>
8+
> **February 3, 2025**: torchchat has support for [**DeepSeek R1 Distill: 8B**]( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)!
9+
>
10+
> **September 25, 2024**: torchchat has multimodal support for **Llama3.2 11B**!
711
>
812
> To try it out, finish the [Installation](#Installation) section below, then hop
913
> over to our [multimodal guide](docs/multimodal.md) to learn more.
@@ -75,6 +79,7 @@ aliases.
7579
| [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) || Alias to `granite3-8b`.|
7680
| [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct) || Alias to `granite3.1-2b` and `granite3.1`.|
7781
| [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) || Alias to `granite3.1-8b`.|
82+
| [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) || Alias to `deepseek-r1:8b`.|
7883

7984

8085
## Installation

tokenizer/hf_tokenizer.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,14 @@ def __init__(self, file_path: str):
4646
if tokenizer_config_path is not None:
4747
with open(tokenizer_config_path, "r") as handle:
4848
tok_config = json.load(handle)
49-
bos_token = tok_config.get("bos_token")
50-
eos_token = tok_config.get("eos_token")
49+
50+
def _extract_token(identifier: str) -> Optional[str]:
51+
entry: Optional[Union[str, dict]] = tok_config.get(identifier)
52+
return entry.get("content") if isinstance(entry, dict) else entry
53+
54+
bos_token = _extract_token("bos_token")
55+
eos_token = _extract_token("eos_token")
56+
5157
if bos_token is not None:
5258
self._bos_id = self._tokenizer.token_to_id(bos_token)
5359
if eos_token is not None:

torchchat/model_config/models.json

+6
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,12 @@
5151
"distribution_path": "meta-llama/Meta-Llama-3.1-8B-Instruct",
5252
"transformer_params_key": "Meta-Llama-3.1-8B"
5353
},
54+
"deepseek-ai/DeepSeek-R1-Distill-Llama-8B": {
55+
"aliases": ["deepseek-r1:8b"],
56+
"distribution_channel": "HuggingFaceSnapshot",
57+
"distribution_path": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
58+
"tokenizer_file": "tokenizer.json"
59+
},
5460
"meta-llama/Meta-Llama-3.1-70B-Instruct": {
5561
"aliases": ["llama3.1-70b"],
5662
"distribution_channel": "HuggingFaceSnapshot",
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"block_size": 131072, "dim": 4096, "ffn_dim_multiplier": 1.3, "multiple_of": 1024, "n_heads": 32, "n_local_heads": 8, "n_layers": 32, "rope_base": 500000.0, "vocab_size": 128256, "use_tiktoken": true, "use_hf_tokenizer": true, "norm_eps": 1e-05, "rope_scaling": {"factor": 8.0, "low_freq_factor": 1.0, "high_freq_factor": 4.0, "original_max_position_embeddings": 8192}}

0 commit comments

Comments
 (0)