Skip to content
View dawsongzhao0523's full-sized avatar

Block or report dawsongzhao0523

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

AI监控评估

22 repositories

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 8,802 803 Updated Feb 25, 2025

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.

JavaScript 378 30 Updated Feb 11, 2025

A framework for few-shot evaluation of language models.

Python 7,926 2,131 Updated Feb 25, 2025

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,128 211 Updated Feb 24, 2025

Generate ideal question-answers for testing RAG

Python 126 3 Updated Feb 25, 2025

LLM Analytics

TypeScript 642 25 Updated Oct 19, 2024

Benchmarking library for RAG

Jupyter Notebook 169 15 Updated Feb 24, 2025

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 4,748 508 Updated Feb 24, 2025

LLMPerf is a library for validating and benchmarking LLMs

Python 769 130 Updated Dec 9, 2024

LLM Serving Performance Evaluation Harness

Python 68 10 Updated Feb 25, 2025

Supercharge Your LLM Application Evaluations 🚀

Python 8,287 847 Updated Feb 24, 2025

Evaluate the accuracy of LLM generated outputs

Jupyter Notebook 617 66 Updated Feb 2, 2025

Text2SQL-Eval is a Text-to-SQL evaluating component for LLM trained on an open-source training dataset.

Python 9 6 Updated Jan 12, 2024

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

TypeScript 3,311 326 Updated Feb 25, 2025

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Python 768 66 Updated Dec 13, 2024

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

TypeScript 1,642 96 Updated Feb 25, 2025

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 196 20 Updated Feb 17, 2025

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Python 2,935 201 Updated Nov 26, 2023

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

Jupyter Notebook 5,755 632 Updated Feb 24, 2025

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 416 30 Updated Jan 23, 2025

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,562 2,668 Updated Dec 18, 2024

中文原生工业测评基准

13 Updated Mar 21, 2024