Using ANY compatible endpoint for transcription, not just whisper-likes #60
IgorWarzocha
started this conversation in
Ideas
Replies: 2 comments
-
|
What The Feature Does
Verified to be working. Outputs amazing pirate speech by using a proper system prompt. How It Works (Key Pieces)
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@IgorWarzocha I'll be circling back to your Pull Request on this soon! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been thinking... I am already wiring up gemini flash to rewrite/de-ramble my dictation.
I'm using a proxy that lets me use the CLI auth as a standard openai compatible endpoint. It theory, it's got a multimodal input that accepts audio. Would you be open to accept a draft PR that could enable this natively?
My use case is obviously a bit of a hack, but in theory, it would enable using it with any models that accept audio. Send the recording snippet, add a system prompt to rewrite, boom.
But this is text for now. I could be just sending it audio, theoretically! :)
Beta Was this translation helpful? Give feedback.
All reactions