-
Notifications
You must be signed in to change notification settings - Fork 368
Expand file tree
/
Copy pathmodel_metadata.yaml
More file actions
4934 lines (4320 loc) · 271 KB
/
model_metadata.yaml
File metadata and controls
4934 lines (4320 loc) · 271 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# This file defines all the models officially supported by the Helm API.
# The model names here should match the model names in model_deployments.yaml.
# If you want to add a new model, you can technically do it here but we recommend
# you to do it in prod_env/model_metadata.yaml instead.
# Follow the template of this file to add a new model. You can copy paste this to get started:
# # This file contains the metadata for private models
# models: [] # Leave empty to disable private models
models:
- name: simple/model1
display_name: Simple Model 1
description: This is a test model.
creator_organization_name: Helm
access: open
release_date: 2023-01-01
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
# Adobe
- name: adobe/giga-gan
display_name: GigaGAN (1B)
description: GigaGAN is a GAN model that produces high-quality images extremely quickly. The model was trained on text and image pairs from LAION2B-en and COYO-700M. ([paper](https://arxiv.org/abs/2303.05511)).
creator_organization_name: Adobe
access: limited
num_parameters: 1000000000
release_date: 2023-06-22
tags: [TEXT_TO_IMAGE_MODEL_TAG]
# AI21 Labs
- name: ai21/j1-jumbo
display_name: J1-Jumbo v1 (178B)
description: Jurassic-1 Jumbo (178B parameters) ([docs](https://studio.ai21.com/docs/jurassic1-language-models/), [tech report](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)).
creator_organization_name: AI21 Labs
access: limited
num_parameters: 178000000000
release_date: 2021-08-11
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: ai21/j1-large
display_name: J1-Large v1 (7.5B)
description: Jurassic-1 Large (7.5B parameters) ([docs](https://studio.ai21.com/docs/jurassic1-language-models/), [tech report](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)).
creator_organization_name: AI21 Labs
access: limited
num_parameters: 7500000000
release_date: 2021-08-11
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: ai21/j1-grande
display_name: J1-Grande v1 (17B)
description: Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process ([docs](https://studio.ai21.com/docs/jurassic1-language-models/), [tech report](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)).
creator_organization_name: AI21 Labs
access: limited
num_parameters: 17000000000
release_date: 2022-05-03
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: ai21/j1-grande-v2-beta
display_name: J1-Grande v2 beta (17B)
description: Jurassic-1 Grande v2 beta (17B parameters)
creator_organization_name: AI21 Labs
access: limited
num_parameters: 17000000000
release_date: 2022-10-28
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: ai21/j2-large
display_name: Jurassic-2 Large (7.5B)
description: Jurassic-2 Large (7.5B parameters) ([docs](https://www.ai21.com/blog/introducing-j2))
creator_organization_name: AI21 Labs
access: limited
num_parameters: 7500000000
release_date: 2023-03-09
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: ai21/j2-grande
display_name: Jurassic-2 Grande (17B)
description: Jurassic-2 Grande (17B parameters) ([docs](https://www.ai21.com/blog/introducing-j2))
creator_organization_name: AI21 Labs
access: limited
num_parameters: 17000000000
release_date: 2023-03-09
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: ai21/j2-jumbo
display_name: Jurassic-2 Jumbo (178B)
description: Jurassic-2 Jumbo (178B parameters) ([docs](https://www.ai21.com/blog/introducing-j2))
creator_organization_name: AI21 Labs
access: limited
num_parameters: 178000000000
release_date: 2023-03-09
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# TODO(1524): Change AI21 model names
# - j2-jumbo -> j2-ultra
# - j2-grande -> j2-mid
# - j2-large -> j2-light
- name: ai21/jamba-instruct
display_name: Jamba Instruct
description: Jamba Instruct is an instruction tuned version of Jamba, which uses a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that interleaves blocks of Transformer and Mamba layers. ([blog](https://www.ai21.com/blog/announcing-jamba-instruct))
creator_organization_name: AI21 Labs
access: limited
num_parameters: 52000000000
release_date: 2024-05-02
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: ai21/jamba-1.5-mini
display_name: Jamba 1.5 Mini
description: Jamba 1.5 Mini is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. ([blog](https://www.ai21.com/blog/announcing-jamba-model-family))
creator_organization_name: AI21 Labs
access: open
num_parameters: 51600000000
release_date: 2024-08-22
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: ai21/jamba-1.5-large
display_name: Jamba 1.5 Large
description: Jamba 1.5 Large is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. ([blog](https://www.ai21.com/blog/announcing-jamba-model-family))
creator_organization_name: AI21 Labs
access: open
num_parameters: 399000000000
release_date: 2024-08-22
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# AI Singapore
- name: aisingapore/sea-lion-7b
display_name: SEA-LION 7B
description: SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.
creator_organization_name: AI Singapore
access: open
num_parameters: 7000000000
release_date: 2023-02-24
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: aisingapore/sea-lion-7b-instruct
display_name: SEA-LION 7B Instruct
description: SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.
creator_organization_name: AI Singapore
access: open
num_parameters: 7000000000
release_date: 2023-02-24
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: aisingapore/llama3-8b-cpt-sea-lionv2-base
display_name: Llama3 8B CPT SEA-LIONv2
description: Llama3 8B CPT SEA-LIONv2 is a multilingual model which was continued pre-trained on 48B additional tokens, including tokens in Southeast Asian languages.
creator_organization_name: AI Singapore
access: open
num_parameters: 8030000000
release_date: 2024-07-31
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct
display_name: Llama3 8B CPT SEA-LIONv2.1 Instruct
description: Llama3 8B CPT SEA-LIONv2.1 Instruct is a multilingual model which has been fine-tuned with around 100,000 English instruction-completion pairs alongside a smaller pool of around 50,000 instruction-completion pairs from other Southeast Asian languages, such as Indonesian, Thai and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 8030000000
release_date: 2024-08-21
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: aisingapore/gemma2-9b-cpt-sea-lionv3-base
display_name: Gemma2 9B CPT SEA-LIONv3
description: Gemma2 9B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across the 11 official Southeast Asian languages, such as English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
creator_organization_name: AI Singapore
access: open
num_parameters: 9240000000
release_date: 2024-10-30
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: aisingapore/gemma2-9b-cpt-sea-lionv3-instruct
display_name: Gemma2 9B CPT SEA-LIONv3 Instruct
description: Gemma2 9B CPT SEA-LIONv3 Instruct is a multilingual model which has been fine-tuned with around 500,000 English instruction-completion pairs alongside a larger pool of around 1,000,000 instruction-completion pairs from other ASEAN languages, such as Indonesian, Thai and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 9240000000
release_date: 2024-10-30
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: aisingapore/llama3.1-8b-cpt-sea-lionv3-base
display_name: Llama3.1 8B CPT SEA-LIONv3
description: Llama3.1 8B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across 11 SEA languages, such as Burmese, Chinese, English, Filipino, Indonesia, Khmer, Lao, Malay, Tamil, Thai and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 9240000000
release_date: 2024-12-11
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct
display_name: Llama3.1 8B CPT SEA-LIONv3 Instruct
description: Llama3.1 8B CPT SEA-LIONv3 Instruct is a multilingual model that has been fine-tuned in two stages on approximately 12.3M English instruction-completion pairs alongside a pool of 4.5M Southeast Asian instruction-completion pairs from SEA languages such as Indonesian, Javanese, Sundanese, Tamil, Thai and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 9240000000
release_date: 2024-12-11
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: aisingapore/llama3.1-70b-cpt-sea-lionv3-base
display_name: Llama3.1 70B CPT SEA-LIONv3
description: Llama3.1 70B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across 11 SEA languages, such as Burmese, Chinese, English, Filipino, Indonesia, Khmer, Lao, Malay, Tamil, Thai and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 70600000000
release_date: 2024-12-11
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: aisingapore/llama3.1-70b-cpt-sea-lionv3-instruct
display_name: Llama3.1 70B CPT SEA-LIONv3 Instruct
description: Llama3.1 70B CPT SEA-LIONv3 Instruct is a multilingual model that has been fine-tuned in two stages on approximately 12.3M English instruction-completion pairs alongside a pool of 4.5M Southeast Asian instruction-completion pairs from SEA languages such as Indonesian, Javanese, Sundanese, Tamil, Thai, and Vietnamese.
creator_organization_name: AI Singapore
access: open
num_parameters: 70600000000
release_date: 2024-12-11
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# Aleph Alpha
# Aleph Alpha's Luminous models: https://docs.aleph-alpha.com/docs/introduction/luminous
# TODO: add Luminous World when it's released
- name: AlephAlpha/luminous-base
display_name: Luminous Base (13B)
description: Luminous Base (13B parameters) ([docs](https://docs.aleph-alpha.com/docs/introduction/luminous/))
creator_organization_name: Aleph Alpha
access: limited
num_parameters: 13000000000
# TODO: get exact release date
release_date: 2022-01-01
# Does not support echo
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, FULL_FUNCTIONALITY_VLM_TAG]
- name: AlephAlpha/luminous-extended
display_name: Luminous Extended (30B)
description: Luminous Extended (30B parameters) ([docs](https://docs.aleph-alpha.com/docs/introduction/luminous/))
creator_organization_name: Aleph Alpha
access: limited
num_parameters: 30000000000
release_date: 2022-01-01
# Does not support echo
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, FULL_FUNCTIONALITY_VLM_TAG]
- name: AlephAlpha/luminous-supreme
display_name: Luminous Supreme (70B)
description: Luminous Supreme (70B parameters) ([docs](https://docs.aleph-alpha.com/docs/introduction/luminous/))
creator_organization_name: Aleph Alpha
access: limited
num_parameters: 70000000000
release_date: 2022-01-01
# Does not support echo.
# Currently, only Luminous-extended and Luminous-base support multimodal inputs
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
# TODO: Uncomment when luminous-world is released.
# - name: AlephAlpha/luminous-world # Not released yet.
# display_name: Luminous World (178B)
# description: Luminous World (178B parameters) ([docs](https://docs.aleph-alpha.com/docs/introduction/luminous/))
# creator_organization_name: Aleph Alpha
# access: limited
# num_parameters: TBD
# release_date: TBD
# # Does not support echo.
# tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: AlephAlpha/m-vader
display_name: MultiFusion (13B)
description: MultiFusion is a multimodal, multilingual diffusion model that extend the capabilities of Stable Diffusion v1.4 by integrating different pre-trained modules, which transfers capabilities to the downstream model ([paper](https://arxiv.org/abs/2305.15296))
creator_organization_name: Aleph Alpha
access: limited
num_parameters: 13000000000
release_date: 2023-05-24
tags: [TEXT_TO_IMAGE_MODEL_TAG]
# Amazon Nova models
# References for Amazon Nova models:
# https://aws.amazon.com/ai/generative-ai/nova/
- name: amazon/nova-premier-v1:0
display_name: Amazon Nova Premier
description: Amazon Nova Premier is the most capable model in the Nova family of foundation models. ([blog](https://aws.amazon.com/blogs/aws/amazon-nova-premier-our-most-capable-model-for-complex-tasks-and-teacher-for-model-distillation/))
creator_organization_name: Amazon
access: limited
release_date: 2025-04-30
tags: [NOVA_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: amazon/nova-pro-v1:0
display_name: Amazon Nova Pro
description: Amazon Nova Pro Model
creator_organization_name: Amazon
access: limited
release_date: 2024-12-03
tags: [NOVA_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: amazon/nova-lite-v1:0
display_name: Amazon Nova Lite
description: Amazon Nova Lite Model
creator_organization_name: Amazon
access: limited
release_date: 2024-12-03
tags: [NOVA_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: amazon/nova-micro-v1:0
display_name: Amazon Nova Micro
description: Amazon Nova Micro Model
creator_organization_name: Amazon
access: limited
release_date: 2024-12-03
tags: [NOVA_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
# Titan Models
# References for Amazon Titan models:
# - https://aws.amazon.com/bedrock/titan/
# - https://community.aws/content/2ZUVD3fkNtqEOYIa2iUJAFArS7c/family-of-titan-text-models---cli-demo
# - https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-titan-models-express-lite-bedrock/
- name: amazon/titan-text-lite-v1
display_name: Amazon Titan Text Lite
description: Amazon Titan Text Lite is a lightweight, efficient model perfect for fine-tuning English-language tasks like summarization and copywriting. It caters to customers seeking a smaller, cost-effective, and highly customizable model. It supports various formats, including text generation, code generation, rich text formatting, and orchestration (agents). Key model attributes encompass fine-tuning, text generation, code generation, and rich text formatting.
creator_organization_name: Amazon
access: limited
release_date: 2023-11-29
tags: [BEDROCK_MODEL_TAG,TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: amazon/titan-text-express-v1
display_name: Amazon Titan Text Express
description: Amazon Titan Text Express, with a context length of up to 8,000 tokens, excels in advanced language tasks like open-ended text generation and conversational chat. It's also optimized for Retrieval Augmented Generation (RAG). Initially designed for English, the model offers preview multilingual support for over 100 additional languages.
creator_organization_name: Amazon
access: limited
release_date: 2023-11-29
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
# Mistral Models on Bedrock
# References for Mistral on Amazon Bedrock
# https://aws.amazon.com/bedrock/mistral/
- name: mistralai/amazon-mistral-7b-instruct-v0:2
display_name: Mistral 7B Instruct on Amazon Bedrock
description: A 7B dense Transformer, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 32k context window.
creator_organization_name: Mistral
access: limited
release_date: 2024-03-23
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: mistralai/amazon-mixtral-8x7b-instruct-v0:1
display_name: Mixtral 8x7B Instruct on Amazon Bedrock
description: A 7B sparse Mixture-of-Experts model with stronger capabilities than Mistral 7B. Uses 12B active parameters out of 45B total. Supports multiple languages, code and 32k context window.
creator_organization_name: Mistral
access: limited
release_date: 2023-12-11
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: mistralai/amazon-mistral-large-2402-v1:0
display_name: Mistral Large(2402) on Amazon Bedrock
description: The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.
creator_organization_name: Mistral
access: limited
release_date: 2023-07-26
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: mistralai/amazon-mistral-small-2402-v1:0
display_name: Mistral Small on Amazon Bedrock
description: Mistral Small is perfectly suited for straightforward tasks that can be performed in bulk, such as classification, customer support, or text generation. It provides outstanding performance at a cost-effective price point.
creator_organization_name: Mistral
access: limited
release_date: 2023-02-26
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: mistralai/amazon-mistral-large-2407-v1:0
display_name: Mistral Large(2407) on Amazon Bedrock
description: Mistral Large 2407 is an advanced Large Language Model (LLM) that supports dozens of languages and is trained on 80+ coding languages. It has best-in-class agentic capabilities with native function calling JSON outputting and reasoning capabilities.
creator_organization_name: Mistral
access: limited
release_date: 2024-07-24
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# Llama3 on Amazon Bedrock
# References for Llama3 on Amazon Bedrock
# https://aws.amazon.com/bedrock/llama/
- name: meta/amazon-llama3-8b-instruct-v1:0
display_name: Llama 3 8B Instruct on Amazon Bedrock
description: Meta Llama 3 is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. Ideal for limited computational power and resources, edge devices, and faster training times.
creator_organization_name: Meta
access: limited
release_date: 2024-04-23
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: meta/amazon-llama3-70b-instruct-v1:0
display_name: Llama 3 70B Instruct on Amazon Bedrock
description: Meta Llama 3 is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. Ideal for content creation, conversational AI, language understanding, R&D, and Enterprise applications.
creator_organization_name: Meta
access: limited
release_date: 2024-04-23
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: meta/amazon-llama3-1-405b-instruct-v1:0
display_name: Llama 3.1 405b Instruct on Amazon Bedrock.
description: Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.
creator_organization_name: Meta
access: limited
release_date: 2024-07-26
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: meta/amazon-llama3-1-70b-instruct-v1:0
display_name: Llama 3.1 70b Instruct on Amazon Bedrock.
description: Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.
creator_organization_name: Meta
access: limited
release_date: 2024-07-26
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: meta/amazon-llama3-1-8b-instruct-v1:0
display_name: Llama 3.1 8b Instruct on Amazon Bedrock.
description: Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.
creator_organization_name: Meta
access: limited
release_date: 2024-07-26
tags: [BEDROCK_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# Anthropic
- name: anthropic/claude-v1.3
display_name: Claude v1.3
description: A 52B parameter language model, trained using reinforcement learning from human feedback [paper](https://arxiv.org/pdf/2204.05862.pdf).
creator_organization_name: Anthropic
access: limited
num_parameters: 52000000000
release_date: 2023-03-17
tags: [ANTHROPIC_CLAUDE_1_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-instant-v1
display_name: Claude Instant V1
description: A lightweight version of Claude, a model trained using reinforcement learning from human feedback ([docs](https://www.anthropic.com/index/introducing-claude)).
creator_organization_name: Anthropic
access: limited
release_date: 2023-03-17
tags: [ANTHROPIC_CLAUDE_1_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-instant-1.2
display_name: Claude Instant 1.2
description: A lightweight version of Claude, a model trained using reinforcement learning from human feedback ([docs](https://www.anthropic.com/index/introducing-claude)).
creator_organization_name: Anthropic
access: limited
release_date: 2023-08-09
tags: [ANTHROPIC_CLAUDE_1_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-2.0
display_name: Claude 2.0
description: Claude 2.0 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). ([model card](https://efficient-manatee.files.svdcdn.com/production/images/Model-Card-Claude-2.pdf))
creator_organization_name: Anthropic
access: limited
release_date: 2023-07-11
tags: [ANTHROPIC_CLAUDE_2_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-2.1
display_name: Claude 2.1
description: Claude 2.1 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). ([model card](https://efficient-manatee.files.svdcdn.com/production/images/Model-Card-Claude-2.pdf))
creator_organization_name: Anthropic
access: limited
release_date: 2023-11-21
tags: [ANTHROPIC_CLAUDE_2_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-haiku-20240307
display_name: Claude 3 Haiku (20240307)
description: Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI ([blog](https://www.anthropic.com/news/claude-3-family)).
creator_organization_name: Anthropic
access: limited
release_date: 2024-03-13 # https://www.anthropic.com/news/claude-3-haiku
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-sonnet-20240229
display_name: Claude 3 Sonnet (20240229)
description: Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI ([blog](https://www.anthropic.com/news/claude-3-family)).
creator_organization_name: Anthropic
access: limited
release_date: 2024-03-04 # https://www.anthropic.com/news/claude-3-family
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-opus-20240229
display_name: Claude 3 Opus (20240229)
description: Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI ([blog](https://www.anthropic.com/news/claude-3-family)).
access: limited
creator_organization_name: Anthropic
release_date: 2024-03-04 # https://www.anthropic.com/news/claude-3-family
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-5-haiku-20241022
display_name: Claude 3.5 Haiku (20241022)
description: Claude 3.5 Haiku is a Claude 3 family model which matches the performance of Claude 3 Opus at a similar speed to the previous generation of Haiku ([blog](https://www.anthropic.com/news/3-5-models-and-computer-use)).
creator_organization_name: Anthropic
access: limited
release_date: 2024-11-04 # Released after the blog post
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-5-sonnet-20240620
display_name: Claude 3.5 Sonnet (20240620)
description: Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. ([blog](https://www.anthropic.com/news/claude-3-5-sonnet))
creator_organization_name: Anthropic
access: limited
release_date: 2024-06-20
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-5-sonnet-20241022
display_name: Claude 3.5 Sonnet (20241022)
description: Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost ([blog](https://www.anthropic.com/news/claude-3-5-sonnet)). This is an upgraded snapshot released on 2024-10-22 ([blog](https://www.anthropic.com/news/3-5-models-and-computer-use)).
creator_organization_name: Anthropic
access: limited
release_date: 2024-10-22
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-7-sonnet-20250219
display_name: Claude 3.7 Sonnet (20250219)
description: Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user ([blog](https://www.anthropic.com/news/claude-3-7-sonnet)).
creator_organization_name: Anthropic
access: limited
release_date: 2025-02-24
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-3-7-sonnet-20250219-thinking-10k
display_name: Claude 3.7 Sonnet (20250219, extended thinking)
description: Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user ([blog](https://www.anthropic.com/news/claude-3-7-sonnet)). Extended thinking is enabled with 10k budget tokens.
creator_organization_name: Anthropic
access: limited
release_date: 2025-02-24
tags: [ANTHROPIC_CLAUDE_3_MODEL_TAG, TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-sonnet-4-20250514
display_name: Claude 4 Sonnet (20250514)
description: Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning ([blog](https://www.anthropic.com/news/claude-4)).
creator_organization_name: Anthropic
access: limited
release_date: 2025-05-14
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-sonnet-4-20250514-thinking-10k
display_name: Claude 4 Sonnet (20250514, extended thinking)
description: Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning ([blog](https://www.anthropic.com/news/claude-4)). Extended thinking is enabled with 10k budget tokens.
creator_organization_name: Anthropic
access: limited
release_date: 2025-05-14
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-opus-4-20250514
display_name: Claude 4 Opus (20250514)
description: Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning ([blog](https://www.anthropic.com/news/claude-4)).
creator_organization_name: Anthropic
access: limited
release_date: 2025-05-14
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/claude-opus-4-20250514-thinking-10k
display_name: Claude 4 Opus (20250514, extended thinking)
description: Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning ([blog](https://www.anthropic.com/news/claude-4)). Extended thinking is enabled with 10k budget tokens.
creator_organization_name: Anthropic
access: limited
release_date: 2025-05-14
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: anthropic/stanford-online-all-v4-s3
display_name: Anthropic-LM v4-s3 (52B)
description: A 52B parameter language model, trained using reinforcement learning from human feedback [paper](https://arxiv.org/pdf/2204.05862.pdf).
creator_organization_name: Anthropic
access: closed
num_parameters: 52000000000
release_date: 2021-12-01
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG]
# Berkeley
- name: berkeley/koala-13b # NOT SUPPORTED
display_name: Koala (13B)
description: Koala (13B) is a chatbot fine-tuned from Llama (13B) on dialogue data gathered from the web. ([blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/))
creator_organization_name: UC Berkeley
access: open
num_parameters: 13000000000
release_date: 2022-04-03
tags: [DEPRECATED_MODEL_TAG] # TODO: add tags
# BigScience
- name: bigscience/bloom
display_name: BLOOM (176B)
description: BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages ([paper](https://arxiv.org/pdf/2211.05100.pdf)).
creator_organization_name: BigScience
access: open
num_parameters: 176000000000
release_date: 2022-06-28
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG]
- name: bigscience/bloomz # NOT SUPPORTED
display_name: BLOOMZ (176B)
description: BLOOMZ (176B parameters) is BLOOM that has been fine-tuned on natural language instructions ([details](https://huggingface.co/bigscience/bloomz)).
creator_organization_name: BigScience
access: open
num_parameters: 176000000000
release_date: 2022-11-03
tags: [DEPRECATED_MODEL_TAG] # TODO: add tags
- name: bigscience/t0pp
display_name: T0pp (11B)
description: T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts ([paper](https://arxiv.org/pdf/2110.08207.pdf)).
creator_organization_name: BigScience
access: open
num_parameters: 11000000000
release_date: 2021-10-15
# Does not support echo.
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, NO_NEWLINES_TAG]
# BigCode
- name: bigcode/santacoder
display_name: SantaCoder (1.1B)
description: SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) ([model card](https://huggingface.co/bigcode/santacoder)).
creator_organization_name: BigCode
access: open
num_parameters: 1100000000
release_date: 2023-01-09 # ArXiv submission date
tags: [CODE_MODEL_TAG]
- name: bigcode/starcoder
display_name: StarCoder (15.5B)
description: The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) ([model card](https://huggingface.co/bigcode/starcoder)).
creator_organization_name: BigCode
access: open
num_parameters: 15500000000
release_date: 2023-05-09 # ArXiv submission date
tags: [CODE_MODEL_TAG]
# BioMistral
- name: biomistral/biomistral-7b
display_name: BioMistral (7B)
description: BioMistral 7B is an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.
creator_organization_name: BioMistral
access: open
num_parameters: 7300000000
release_date: 2024-02-15
tags: [TEXT_MODEL_TAG, PARTIAL_FUNCTIONALITY_TEXT_MODEL_TAG]
# Cerebras Systems
- name: cerebras/cerebras-gpt-6.7b # NOT SUPPORTED
display_name: Cerebras GPT (6.7B)
description: Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. ([paper](https://arxiv.org/pdf/2304.03208.pdf))
creator_organization_name: Cerebras
access: limited
num_parameters: 6700000000
release_date: 2023-04-06
tags: [DEPRECATED_MODEL_TAG] # TODO: add tags
- name: cerebras/cerebras-gpt-13b # NOT SUPPORTED
display_name: Cerebras GPT (13B)
description: Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. ([paper](https://arxiv.org/pdf/2304.03208.pdf))
creator_organization_name: Cerebras
access: limited
num_parameters: 13000000000
release_date: 2023-04-06
tags: [DEPRECATED_MODEL_TAG] # TODO: add tags
# Cohere
# Model versioning and the possible versions are not documented here:
# https://docs.cohere.ai/generate-reference#model-optional.
# So, instead, we got the names of the models from the Cohere Playground.
#
# Note that their tokenizer and model were trained on English text and
# they do not have a dedicated decode API endpoint, so the adaptation
# step for language modeling fails for certain Scenarios:
# the_pile:subset=ArXiv
# the_pile:subset=Github
# the_pile:subset=PubMed Central
# TODO: Consider renaming to new model names.
- name: cohere/xlarge-20220609
display_name: Cohere xlarge v20220609 (52.4B)
description: Cohere xlarge v20220609 (52.4B parameters)
creator_organization_name: Cohere
access: limited
num_parameters: 52400000000
release_date: 2022-06-09
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/large-20220720
display_name: Cohere large v20220720 (13.1B)
description: Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.
creator_organization_name: Cohere
access: limited
num_parameters: 13100000000
release_date: 2022-07-20
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/medium-20220720
display_name: Cohere medium v20220720 (6.1B)
description: Cohere medium v20220720 (6.1B parameters)
creator_organization_name: Cohere
access: limited
num_parameters: 6100000000
release_date: 2022-07-20
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/small-20220720
display_name: Cohere small v20220720 (410M)
description: Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.
creator_organization_name: Cohere
access: limited
num_parameters: 410000000
release_date: 2022-07-20
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/xlarge-20221108
display_name: Cohere xlarge v20221108 (52.4B)
description: Cohere xlarge v20221108 (52.4B parameters)
creator_organization_name: Cohere
access: limited
num_parameters: 52400000000
release_date: 2022-11-08
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/medium-20221108
display_name: Cohere medium v20221108 (6.1B)
description: Cohere medium v20221108 (6.1B parameters)
creator_organization_name: Cohere
access: limited
num_parameters: 6100000000
release_date: 2022-11-08
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: cohere/command-medium-beta
display_name: Command beta (6.1B)
description: Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts ([details](https://docs.cohere.ai/docs/command-beta)).
creator_organization_name: Cohere
access: limited
num_parameters: 6100000000
release_date: 2022-11-08
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: cohere/command-xlarge-beta
display_name: Command beta (52.4B)
description: Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts ([details](https://docs.cohere.ai/docs/command-beta)).
creator_organization_name: Cohere
access: limited
num_parameters: 52400000000
release_date: 2022-11-08
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: cohere/command
display_name: Command
description: Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. [docs](https://docs.cohere.com/reference/generate) and [changelog](https://docs.cohere.com/changelog)
creator_organization_name: Cohere
access: limited
release_date: 2023-09-29
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: cohere/command-light
display_name: Command Light
description: Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. [docs](https://docs.cohere.com/reference/generate) and [changelog](https://docs.cohere.com/changelog)
creator_organization_name: Cohere
access: limited
release_date: 2023-09-29
tags: [TEXT_MODEL_TAG, PARTIAL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: cohere/command-r
display_name: Command R
description: Command R is a multilingual 35B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.
creator_organization_name: Cohere
access: open
num_parameters: 35000000000
release_date: 2024-03-11
tags: [TEXT_MODEL_TAG, PARTIAL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: cohere/command-r-plus
display_name: Command R Plus
description: Command R+ is a multilingual 104B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.
creator_organization_name: Cohere
access: open
num_parameters: 104000000000
release_date: 2024-04-04
tags: [TEXT_MODEL_TAG, PARTIAL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# Craiyon
- name: craiyon/dalle-mini
display_name: DALL-E mini (0.4B)
description: DALL-E mini is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 ([code](https://github.com/borisdayma/dalle-mini)).
creator_organization_name: Craiyon
access: open
num_parameters: 400000000
release_date: 2022-04-21
tags: [TEXT_TO_IMAGE_MODEL_TAG]
- name: craiyon/dalle-mega
display_name: DALL-E mega (2.6B)
description: DALL-E mega is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 ([code](https://github.com/borisdayma/dalle-mini)).
creator_organization_name: Craiyon
access: open
num_parameters: 2600000000
release_date: 2022-04-21
tags: [TEXT_TO_IMAGE_MODEL_TAG]
# DeepFloyd
- name: DeepFloyd/IF-I-M-v1.0
display_name: DeepFloyd IF Medium (0.4B)
description: DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).
creator_organization_name: DeepFloyd
access: open
num_parameters: 400000000
release_date: 2023-04-28
tags: [TEXT_TO_IMAGE_MODEL_TAG]
- name: DeepFloyd/IF-I-L-v1.0
display_name: DeepFloyd IF Large (0.9B)
description: DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).
creator_organization_name: DeepFloyd
access: open
num_parameters: 900000000
release_date: 2023-04-28
tags: [TEXT_TO_IMAGE_MODEL_TAG]
- name: DeepFloyd/IF-I-XL-v1.0
display_name: DeepFloyd IF X-Large (4.3B)
description: DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).
creator_organization_name: DeepFloyd
access: open
num_parameters: 4300000000
release_date: 2023-04-28
tags: [TEXT_TO_IMAGE_MODEL_TAG]
# Databricks
- name: databricks/dolly-v2-3b
display_name: Dolly V2 (3B)
description: Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
creator_organization_name: Databricks
access: open
num_parameters: 2517652480
release_date: 2023-04-12
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: databricks/dolly-v2-7b
display_name: Dolly V2 (7B)
description: Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
creator_organization_name: Databricks
access: open
num_parameters: 6444163072
release_date: 2023-04-12
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: databricks/dolly-v2-12b
display_name: Dolly V2 (12B)
description: Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
creator_organization_name: Databricks
access: open
num_parameters: 11327027200
release_date: 2023-04-12
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: databricks/dbrx-instruct
display_name: DBRX Instruct
description: DBRX is a large language model with a fine-grained mixture-of-experts (MoE) architecture that uses 16 experts and chooses 4. It has 132B total parameters, of which 36B parameters are active on any input. ([blog post](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm))
creator_organization_name: Databricks
access: open
num_parameters: 132000000000
release_date: 2024-03-27
tags: [TEXT_MODEL_TAG, PARTIAL_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# DeepMind
- name: deepmind/gopher # NOT SUPPORTED
display_name: Gopher (280B)
description: Gopher (280B parameters) ([paper](https://arxiv.org/pdf/2112.11446.pdf)).
creator_organization_name: DeepMind
access: closed
num_parameters: 280000000000
release_date: 2021-12-08
tags: [UNSUPPORTED_MODEL_TAG]
- name: deepmind/chinchilla # NOT SUPPORTED
display_name: Chinchilla (70B)
description: Chinchilla (70B parameters) ([paper](https://arxiv.org/pdf/2203.15556.pdf)).
creator_organization_name: DeepMind
access: closed
num_parameters: 70000000000
release_date: 2022-03-31
tags: [UNSUPPORTED_MODEL_TAG]
# Deepseek
- name: deepseek-ai/deepseek-llm-67b-chat
display_name: DeepSeek LLM Chat (67B)
description: DeepSeek LLM Chat is a open-source language model trained on 2 trillion tokens in both English and Chinese, and fine-tuned supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). ([paper](https://arxiv.org/abs/2401.02954))
creator_organization_name: DeepSeek
access: open
num_parameters: 67000000000
release_date: 2024-01-05
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/deepseek-v3
display_name: DeepSeek v3
description: DeepSeek v3 a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. It adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. ([paper](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf))
creator_organization_name: DeepSeek
access: open
# NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
num_parameters: 685000000000
release_date: 2024-12-24
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/deepseek-r1
display_name: DeepSeek R1
description: DeepSeek R1 is DeepSeek's first-generation reasoning model which incoporates which incorporates multi-stage training and cold-start data before RL. ([paper](https://arxiv.org/abs/2501.12948))
creator_organization_name: DeepSeek
access: open
# NOTE: The total size of DeepSeek-R3 model1 on HuggingFace is 685B
num_parameters: 685000000000
release_date: 2025-01-20
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/deepseek-r1-hide-reasoning
display_name: DeepSeek R1 (hide reasoning)
description: DeepSeek R1 is DeepSeek's first-generation reasoning model which incoporates which incorporates multi-stage training and cold-start data before RL. ([paper](https://arxiv.org/abs/2501.12948)) The reasoning tokens are hidden from the output of the model.
creator_organization_name: DeepSeek
access: open
# NOTE: The total size of DeepSeek-R3 model1 on HuggingFace is 685B
num_parameters: 685000000000
release_date: 2025-01-20
tags: [DEPRECATED_MODEL_TAG, TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/deepseek-r1-0528
display_name: DeepSeek-R1-0528
description: DeepSeek-R1-0528 is a minor version upgrade from DeepSeek R1 that has improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. ([paper](https://arxiv.org/abs/2501.12948))
creator_organization_name: DeepSeek
access: open
num_parameters: 685000000000
release_date: 2025-05-28
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
display_name: DeepSeek-R1-Distill-Llama-8b
description: DeepSeek-R1-Distill-Llama-8b is a model that is distilled from LLaMA 8B model for the DeepSeek-R1 task.
creator_organization_name: DeepSeek
access: open
num_parameters: 8000000000
release_date: 2025-01-20
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
- name: deepseek-ai/deepseek-coder-6.7b-instruct
display_name: DeepSeek-Coder-6.7b-Instruct
description: DeepSeek-Coder-6.7b-Instruct is a model that is fine-tuned from the LLaMA 6.7B model for the DeepSeek-Coder task.
creator_organization_name: DeepSeek
access: open
num_parameters: 6740000000
release_date: 2025-01-20
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
# EleutherAI
- name: eleutherai/gpt-j-6b # Served by GooseAi, HuggingFace and Together.
display_name: GPT-J (6B)
description: GPT-J (6B parameters) autoregressive language model trained on The Pile ([details](https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/)).
creator_organization_name: EleutherAI
access: open
num_parameters: 6000000000
release_date: 2021-06-04
# TODO: The BUGGY_TEMP_0_TAG is a deployment related tag (Together).
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG, BUGGY_TEMP_0_TAG]
- name: eleutherai/gpt-neox-20b # Served by GooseAi and Together.
display_name: GPT-NeoX (20B)
description: GPT-NeoX (20B parameters) autoregressive language model trained on The Pile ([paper](https://arxiv.org/pdf/2204.06745.pdf)).
creator_organization_name: EleutherAI
access: open
num_parameters: 20000000000
release_date: 2022-02-02
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG, ABLATION_MODEL_TAG]
- name: eleutherai/pythia-1b-v0
display_name: Pythia (1B)
description: Pythia (1B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
creator_organization_name: EleutherAI
access: open
num_parameters: 805736448
release_date: 2023-02-13
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: eleutherai/pythia-2.8b-v0
display_name: Pythia (2.8B)
description: Pythia (2.8B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
creator_organization_name: EleutherAI
access: open
num_parameters: 2517652480
release_date: 2023-02-13
tags: [TEXT_MODEL_TAG, FULL_FUNCTIONALITY_TEXT_MODEL_TAG]
- name: eleutherai/pythia-6.9b
display_name: Pythia (6.9B)
description: Pythia (6.9B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
creator_organization_name: EleutherAI
access: open