poc for ai eval by Spark960 · Pull Request #1408 · foss42/apidash

Spark960 · 2026-03-22T06:26:53Z

PR Description

Hi team,

Following upon my initial idea submission (#1136 ) where a PoC was requested,

I created a decoupled FastAPI + React pipeline that runs lm-eval safely in background threads.
The coolest part: I built a proxy middleware layer that intercepts lm-eval payloads and cleanses/sanitizes them. This completely solves the vendor specific schema crashes we see with strict APIs like Gemini and Groq (e.g: Gemini instantly throwing a 400 Bad Request if it sees a seed parameter). This proves we can make this tool truly vendor neutral!
Piped real-time execution logs from Python directly to the frontend using Server-Sent Events (SSE).

PoC Repository: https://github.com/Spark960/ai-eval

Demo: A gif demonstration of the live evaluation pipeline streaming via SSE is available in the PoC readme. I've also attached it below for your reference

This is for the testing cloud api's like gemini, groq. The same proxy middleware layer can also be used to test the local models too. So, at the end of all this, we would be able to test local models, cloud models and agents too (through light eval). This is turning out to be a very interesting project. I'm currently working on the proposal too and will submit it soon.

Related Issues

Follows up on docs: add GSoC 2026 Initial Idea Submission for Multimodal AI Eval Framework (Idea 2) #1136

Checklist

I have gone through the contributing guide
I have updated my branch and synced it with project main branch before making this PR
I am using the latest Flutter stable branch (run flutter upgrade and verify)
I have run the tests (flutter test) and all tests are passing

Added/updated tests?

We encourage you to add relevant test cases.

Yes
No, and this is why: just a poc submission

OS on which you have developed and tested the feature?

Windows
macOS
Linux

poc for ai eval

69e478d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poc for ai eval#1408

poc for ai eval#1408
Spark960 wants to merge 1 commit intofoss42:mainfrom
Spark960:poc/ai-eval-lokesh

Spark960 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Spark960 commented Mar 22, 2026

PR Description

Related Issues

Checklist

Added/updated tests?

OS on which you have developed and tested the feature?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant