A versatile AI-powered creative studio for generating, enhancing, and transforming images using modern AI models, plus conversational AI capabilities with text and voice interfaces.
- Research Tool: Generate comprehensive research documents on any topic
- Conversational research interface with AI chat interaction
- Query optimization based on user follow-up responses
- Multi-service web scraping with advanced content extraction
- Tiered research depth options (Quick, Extended, Deep)
- Specialized content extraction for documentation sites and GitHub repositories
- Document generation with proper markdown formatting and source citations
- Sources list with categorization and prioritization
- Copy to clipboard or download functionality for research documents
- Fallback content generation for failed scraping attempts
- AI Image Generation: Create images from text descriptions
- Image Enhancement: Upscale and improve image quality
- Artistic Transformations: Apply artistic styles to existing images
-
Text Chat: Interact with multiple advanced AI models
- Support for vision-enabled models (upload and analyze images)
- 20+ model options including GPT-4o, Llama 3.3, DeepSeek and more
- Personality system to customize AI behavior
- Voice input with Watson Speech-to-Text API
- Improved conversation handling for extended interactions
-
Voice Assistant: Natural voice conversations with AI
- Text-to-Speech with multiple voice options
- Speech recognition with Web Speech API or Watson
- Hands-free mode for continuous conversation
- Animated visual feedback during AI responses
For the best experience with voice features, we recommend using:
- Google Chrome: Full support for Web Speech API
- Microsoft Edge: Good support for speech recognition
- Safari 14.1+: Partial support for speech recognition
- Firefox: Limited support (might require enabling flags)
The application will automatically fall back to IBM Watson for speech recognition if the Web Speech API is not available or if you manually select it.
- Frontend: HTML, CSS, JavaScript
- Image Processing: Cloudinary API
- Voice Recognition: IBM Watson Speech-to-Text API
- Text-to-Speech: Web Speech API
- AI Models: Pollinations AI API
- Web Scraping: Multi-service approach with ScraperAPI, ScrapingAnt, PhantomJS Cloud, and Firecrawl
- IBM Watson Speech-to-Text API Key (for voice functionality)
- Cloudinary account (for image processing)
- Scraping service API keys (for research functionality)
- Clone this repository
- Run the development server:
npx wrangler pages dev . --compatibility-date=2023-03-21 --port=8123 \
--binding WATSON_API_KEY=your_key \
--binding CLOUDINARY_CLOUD_NAME=your_cloud_name \
--binding CLOUDINARY_API_KEY=your_api_key \
--binding CLOUDINARY_API_SECRET=your_api_secret \
--binding FIRECRAWL_API_KEY=your_key \
--binding SCRAPERAI_API_KEY=your_key \
--binding SCRAPINGANT_API_KEY=your_key \
--binding PHANTOMJSCLOUD_API_KEY=your_key
- Image Generator: Visit the home page to create images from text descriptions
- Chat Interface: Navigate to
/public/chat.html
to use the chat interface- Send text messages with the send button
- Upload images for visual analysis with vision-enabled models
- Use voice input by clicking the microphone button
- Voice Assistant: Access the voice interface at
/public/voice.html
- Speak naturally with the AI assistant
- Try different voice models for varied responses
- Research Tool: Access the research tool at
/public/research.html
- Enter a research topic to start a conversation with the AI assistant
- Provide additional context when prompted to optimize your research
- Select research tier (quick, extended, deep) for varying levels of detail
- View sources used in the research with categorization
- Download or copy generated documents with proper formatting and citations
Refer to DEPLOYMENT.md for detailed deployment instructions using Cloudflare Pages.
-
IBM Watson Speech-to-Text API Key
- Sign up at IBM Cloud
- Create a Speech to Text service
- Get your API key from the service credentials
-
Cloudinary Account
- Sign up at Cloudinary
- Get your cloud name, API key, and API secret from your dashboard
-
Web Scraping Services
- ScraperAPI - Free plan available
- ScrapingAnt - Free plan available
- PhantomJS Cloud - Free plan available
- Firecrawl - Free tier available
- Enhanced research capabilities:
- Image integration in research documents
- Improved document formatting
- Resizable document viewer
- Improved conversational research flow
- Web search capabilities
- Video generation
- Music generation
- More AI models and personalities
We've made several improvements in the latest version:
- Research Tool Enhancements:
- Fixed conversation flow with a more reliable state-based approach
- Added natural chat interaction before starting research
- Implemented query optimization based on user follow-up responses
- Fixed source display issues in both sidebar and generated documents
- Improved error handling and recovery throughout the research process
- Image Features:
- Added fullscreen lightbox view for generated images
- Enhanced image download capabilities with direct download buttons
- Simplified the UI by removing redundant settings
- Added automatic title generation for all images
- Improved button alignment and styling for better user experience
See CHANGELOG.md for a complete history of changes.
This project is licensed under the MIT License - see the LICENSE file for details.
- Pollinations AI for their powerful AI API
- IBM Watson for Speech-to-Text capabilities
- Cloudinary for image processing
- Various web scraping services for research capabilities
✨ Made with ❤️ by Pink Pixel