-
-
Notifications
You must be signed in to change notification settings - Fork 575
Description
When experimenting with AI use cases and comparing to dedicated frameworks like Gradio, I can see that in some use cases we would simply have to spend too much time on building data apps. We would spend time on
- understanding how to transform the input provided by Panel widgets into something that is ready for our model
- understanding how to transform the output of our model into something that is possible to display with Panel
- figuring out that there is actually not a widget or pane supporting our use case
I believe this is the case for any domain. Not just AI. But with AI there are just dedicated frameworks that makes it very clear where our weak points are. An improvement in the AI domain would benefit any domain.
Examples
For example
- I've just improved the situation for the
Audiopane such that it supports Torch tensors and more dtypes. This makes it much easier for users that their data format is supported out of the box. Its a bit similar tohvplotwhere we support many data formats out of the box. We could also just tell our users to figure out how to convert their data format to numpy arrays and then they could hvplot. But we don't. We want to make things easy. - The
FileInputwidget is there. But besides not looking great like an easy to use drag and drop area. We should make it much, much easier for our users to get the file uploaded converted to atext,dataframe,audio,videoor similar object ready for use. Right now our users would have to spend a lot of time on this task while it should not be their focus. Their focus should be on putting together the input widgets and output panes into a nice layout and then experiment with their model. - Our users cannot easily experiment with speech recognition because we don't have an
AudioRecorderwidget they can use to easily record texts.
Solution
Below I will develop an overview of strong and weak points.
File Input Widget
- We need a better looking drag and drop
FileInputwidget. A widget that signals you have toclickanduploadis simply old school and the workflow is too slow for AI experimentation. - Provide easy to use functionality to output to the most used formats. If the user uploads a text file it should be easy to output to text, If the user uploads a csv file it should be very simple to get the
DataFrame. Same for any format like audio, video etc. These could be methods on theFileInput. Besides dedicated method like.to_text,.to_dataframe,to_audio,to_videowe should also have a magic methodto_objectthat just outputs the best guess of something that can be displayed correctly withpn.panel. This would make it easy to make more general AI apps where the user can provide many types of input to the AI model/ AI Agent.
An alternative to a do it all FileInput widget, could be dedicated input widgets like TabularInput, AudioInput, VideoInput where the user can drag and drop their file onto for quick experimentation. Gradio does this. For their Audio widget the source can be either upload or microphone.
Inputs
| domain | done | existing widgets | missing widgets | comment |
|---|---|---|---|---|
| audio | [ ] | FileInput |
AudioRecorder |
|
| image | [ ] | FileInput |
CanvasDraw/ Paint |
|
| tabular | [ ] | FileInput, Tabulator |
It should be easier to get the DataFrame from the FileInput. |
|
| video | [ ] | FileInput, VideoStream |
VideoRecorder |
It is not clear to me if it is possible to use the VideoStream as a video recorder |
MORE IS COMING
Outputs
COMING UP
Examples
We should systematically provide examples of simple input-output style apps for the most used domains. For example text-to-speech, image-to-text, ..., text-to-image, speech-to-text etc.
Additional Context
I started working on the above in 2022 in the separate project https://github.com/marcskovmadsen/paithon. But I got distracted by other things.
I'm also working on the tranformers agent ui. An app like that will also be a good benchmark for how well Panel supports AI workflows as in principle users should be able to work with any input and output media.