dawsongzhao0523

zhaodongsheng dawsongzhao0523

15 followers · 36 following

Achievements

Stars

AI监控评估

22 repositories

langfuse / langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 8,802 803 Updated Feb 25, 2025

PAIR-code / llm-comparator

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.

JavaScript 378 30 Updated Feb 11, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 7,926 2,131 Updated Feb 25, 2025

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,128 211 Updated Feb 24, 2025

fiddlecube / fiddlecube-sdk

Generate ideal question-answers for testing RAG

Python 126 3 Updated Feb 25, 2025

labmlai / inspectus

LLM Analytics

TypeScript 642 25 Updated Oct 19, 2024

naver / bergen

Benchmarking library for RAG

Jupyter Notebook 169 15 Updated Feb 24, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 4,748 508 Updated Feb 24, 2025

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Python 769 130 Updated Dec 9, 2024

project-etalon / etalon

LLM Serving Performance Evaluation Harness

Python 68 10 Updated Feb 25, 2025

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 8,287 847 Updated Feb 24, 2025

defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs

Jupyter Notebook 617 66 Updated Feb 2, 2025

eosphoros-ai / Text2SQL-Eval

Text2SQL-Eval is a Text-to-SQL evaluating component for LLM trained on an open-source training dataset.

Python 9 6 Updated Jan 12, 2024

Helicone / helicone

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

TypeScript 3,311 326 Updated Feb 25, 2025

amazon-science / RAGChecker

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Python 768 66 Updated Dec 13, 2024

lmnr-ai / lmnr

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

TypeScript 1,642 96 Updated Feb 25, 2025

neuralmagic / guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 196 20 Updated Feb 17, 2025

FreedomIntelligence / LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Python 2,935 201 Updated Nov 26, 2023

evidentlyai / evidently

Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

Jupyter Notebook 5,755 632 Updated Feb 24, 2025

CaraJ7 / MMSearch

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 416 30 Updated Jan 23, 2025

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,562 2,668 Updated Dec 18, 2024

CLUEbenchmark / SuperCLUE-Industry

中文原生工业测评基准

13 Updated Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zhaodongsheng dawsongzhao0523

Achievements

Achievements

Block or report dawsongzhao0523

AI监控评估

langfuse / langfuse

PAIR-code / llm-comparator

EleutherAI / lm-evaluation-harness

EvolvingLMMs-Lab / lmms-eval

fiddlecube / fiddlecube-sdk

labmlai / inspectus

naver / bergen

open-compass / opencompass

ray-project / llmperf

project-etalon / etalon

explodinggradients / ragas

defog-ai / sql-eval

eosphoros-ai / Text2SQL-Eval

Helicone / helicone

amazon-science / RAGChecker

lmnr-ai / lmnr

neuralmagic / guidellm

FreedomIntelligence / LLMZoo

evidentlyai / evidently

CaraJ7 / MMSearch

openai / evals

CLUEbenchmark / SuperCLUE-Industry