Alchemyst Benchmarks

This repository contains a collection of benchmarks used to evaluate and compare different systems, approaches, and workflows in a reproducible manner.

The primary goals of this repository are clarity, fair comparison, and reproducibility.

Repository Structure

benchmarks/
├── benchmark-testing/
│   └── README.md
│
├── longmemeval/
│   ├── README.md
│   └── src/
│
├── .gitignore
└── README.md

Available Benchmarks

1. OpenCode: `grep` vs Alchemyst Search

Location: benchmark-testing/ Documentation: benchmark-testing/README.md

This benchmark compares two approaches for searching a real-world codebase (OpenCode):

Keyword-based search using grep
Semantic context search using Alchemyst

What this benchmark evaluates

Relevance vs. noise in returned context
Token usage and cost trade-offs
Practical search effectiveness in real codebases

For setup instructions and execution steps, refer to: benchmark-testing/README.md

2. LongMemEval

Location: longmemeval/ Documentation: longmemeval/README.md

LongMemEval is a comprehensive benchmark designed to evaluate long-term memory capabilities of chat assistants.

Key aspects

Multi-session reasoning
Temporal reasoning
Knowledge updates
Abstention behavior

Additional details

Includes released datasets
Provides evaluation scripts
Contains baseline pipelines
Based on the LongMemEval paper (ICLR 2025)

For full setup and execution instructions, see: longmemeval/README.md

🛠️ General Usage

Choose a benchmark folder
Read the README.md inside that folder
Follow the documented setup and execution steps
Run the benchmark locally
Inspect and compare results

🤝 Contributing

Contributions are welcome!

When adding a new benchmark:

Create a new folder under benchmarks/
Include a clear and complete README.md
Document assumptions and limitations
Keep result artifacts out of Git

Summary

This repository serves as a shared benchmarking space for evaluating different systems and approaches under real-world conditions.

For benchmark-specific details, always refer to the README.md inside the corresponding benchmark folder.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
BEAM		BEAM
grep-vs-context-search		grep-vs-context-search
longmemeval		longmemeval
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alchemyst Benchmarks

Repository Structure

Available Benchmarks

1. OpenCode: `grep` vs Alchemyst Search

What this benchmark evaluates

2. LongMemEval

Key aspects

Additional details

🛠️ General Usage

🤝 Contributing

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alchemyst Benchmarks

Repository Structure

Available Benchmarks

1. OpenCode: grep vs Alchemyst Search

What this benchmark evaluates

2. LongMemEval

Key aspects

Additional details

🛠️ General Usage

🤝 Contributing

Summary

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. OpenCode: `grep` vs Alchemyst Search

Packages