Skip to content

run-llama/audio-kb

Repository files navigation

audio-kb

Record, store and search audio notes from the terminal, powered by LlamaParse and Gemini Embeddings 2

What it does

audio-kb is a CLI tool that allows you to store and search your audio notes locally.

You can upload an existing MP3 file or record the audio note directly from the terminal (you might need to give permission to access the microphone), and, after the file has been uploaded:

  • It will be parsed by LlamaParse, and its full text content will be extracted
  • The full content will be chunked and embedded into 3072-dimensional vectors by Gemini Embedding 2
  • The vectors will be uploaded to a local SurrealDB instance, alongside with a payload, and indexed with a HNSW index

Whenever you search something, your query gets embedded by Gemini Embedding 2, and the embedding used to perform semantic search (based on cosine similarity) within the vector store.

Installation

Install from GitHub:

uv tool install git+https://github.com/run-llama/audio-kb
audio-kb --help # test installation

Install from source:

git clone https://github.com/run-llama/audio-kb
cd audio-kb/
uv pip install -e .

This will install a binary named audio-kb.

Set Up

SurrealDB

Install the SurrealDB CLI:

curl -sSf https://install.surrealdb.com | sh

Run a SurrealDB instance locally (with on-disk backup):

surreal start --user root --pass some-password rocksdb://slides.db

Or run directly with Docker:

docker run --rm --pull always -p 8000:8000 -v $(pwd)/mydata:/mydata surrealdb/surrealdb:latest start --user root --pass some-password rocksdb:mydatabase.db

Configuration

You can configure the various services used in the app (LlamaParse, Gemini Embedding, SurrealDB...) within a config.json in the same working directory as the one you run audio-kb. The configuation needs to follow this scheme:

{
  "database": {
    "url": string | null,
    "user": string | null,
    "password": string | null | "$SURREALDB_PASSWORD",
    "namespace": string | null,
    "database": string | null
  },
  "llama_cloud": {
    "llama_cloud_api_key": string | "$LLAMA_CLOUD_API_KEY",
    "llama_cloud_base_url": string | null
  },
  "llama_parse": {
    "tier": string,
    "version": string,
    "expand": string[]
  },
  "embedding": {
    "api_key": string | "$GOOGLE_API_KEY",
    "model_name": string
  },
  "splitter": {
    "chunk_size": int,
    "chunk_overlap": int
  }
}

If you use values like $LLAMA_CLOUD_API_KEY, $SURREALDB_PASSWORD and $GOOGLE_API_KEY for sensitive fields instead of using the plain-text keys, you should then have the associated environment variables available within the environment.

Find an example of the configuration here.

Run

Process a File

Use the process subcommand to process an audio file (or your recording from the terminal). Processing includes extracting text with LlamaParse, cunking, embedding and uploading to the database.

Example usage:

audio-kb process --file audio.mp3 # use with a file
audio-kb process # record directly from the terminal
audio-kb process --recording-file audio.mp3 # record from terminal and save the recording to a specific file (audio.mp3 in this case)

Search

Use the search command to perform vector search starting from a text query:

audio-kb search "What did I say I would buy tonight for dinner?"
audio-kb search "What is the name of the main charachter of 'Emiliy in Paris'?" --json # output the matches as JSON
audio-kb search "What are the movies I said I would watch?" --limit 3 # limit only to the 3 most relevant results

License

MIT

About

Record, store and search audio notes from the terminal, powered by LlamaParse and Gemini Embeddings 2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages