Skip to content

poc for ai eval#1408

Open
Spark960 wants to merge 1 commit intofoss42:mainfrom
Spark960:poc/ai-eval-lokesh
Open

poc for ai eval#1408
Spark960 wants to merge 1 commit intofoss42:mainfrom
Spark960:poc/ai-eval-lokesh

Conversation

@Spark960
Copy link
Contributor

PR Description

Hi team,

Following upon my initial idea submission (#1136 ) where a PoC was requested,

I created a decoupled FastAPI + React pipeline that runs lm-eval safely in background threads.
The coolest part: I built a proxy middleware layer that intercepts lm-eval payloads and cleanses/sanitizes them. This completely solves the vendor specific schema crashes we see with strict APIs like Gemini and Groq (e.g: Gemini instantly throwing a 400 Bad Request if it sees a seed parameter). This proves we can make this tool truly vendor neutral!
Piped real-time execution logs from Python directly to the frontend using Server-Sent Events (SSE).

PoC Repository: https://github.com/Spark960/ai-eval

Demo: A gif demonstration of the live evaluation pipeline streaming via SSE is available in the PoC readme. I've also attached it below for your reference

This is for the testing cloud api's like gemini, groq. The same proxy middleware layer can also be used to test the local models too. So, at the end of all this, we would be able to test local models, cloud models and agents too (through light eval). This is turning out to be a very interesting project. I'm currently working on the proposal too and will submit it soon.

demo

Related Issues

Checklist

  • I have gone through the contributing guide
  • I have updated my branch and synced it with project main branch before making this PR
  • I am using the latest Flutter stable branch (run flutter upgrade and verify)
  • I have run the tests (flutter test) and all tests are passing

Added/updated tests?

We encourage you to add relevant test cases.

  • Yes
  • No, and this is why: just a poc submission

OS on which you have developed and tested the feature?

  • Windows
  • macOS
  • Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant