Reduced embedding space for Apertus-8B

In this repository we aim improve the performance of Apertus-8B in terms of memory usage and throughput by reducing the embedding space used by the model. The idea is to only load a subset of relevant tokens for a given language and measure the performance boost.

File Index

File name	Description
apertus_benchmark.py	Measure how the performance of the model improves depending on the amount of loaded tokens in the output embedding space
benchmark_head_latency.py	More precise benchmarking of the throughput, by only taking the output layer into account
apply_script_filter.py	Script to filter the embedding space based on the character groups detected in the query and then benchmark memory usage
validate_vocab_robustness.py	Script evaluating the performance of the character-based pruned model in diverse contexts
apply_vocab_filter.py	Script to filter the embedding space based on the tokens relevant to the English language and then benchmark memory usage

Setup

The project was developed on the Clariden Alps cluster by CSCS. In order to properly run the scripts in the repository we suggest you to create a container and a python environment following the official tutorial on CSCS's documentation.

TODO: Add more complete setup tutorial

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced embedding space for Apertus-8B

File Index

Setup

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Reduced embedding space for Apertus-8B

File Index

Setup