Skip to content

vero-code/gemini-tales-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

160 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โœจ Gemini Tales v2: The Evolution

This is the development repository for Gemini Tales, evolved and updated for the Google for Startups AI Agents Challenge.

Status Gemini Veo Lyria ADK Cloud

Turning screen time into active adventure โ€” A magical AI storyteller that sees, hears, and moves with your child. The story that doesn't move until you do.

Gemini Tales is an interactive fairytale book and a Creative Storyteller โœ๏ธ that breaks the traditional "text box" paradigm. It transforms screen time into an active adventure by creating a seamless experience where AI doesn't just process textโ€”it sees, hears, and moves with.


๐Ÿ“ Quick Navigation


๐Ÿ“บ Watch in Action

Gemini Tales - The Complete Story of V2.0: Gemini Tales Complete Story

Gemini Tales - Early Submission: Gemini Tales Demo

Gemini Tales - Newer Demo: Gemini Tales Newer Demo


๐ŸŽญ Dual Storytelling Modes

Gemini Tales offers two distinct ways to experience the magic:

Feature ๐ŸŽ™๏ธ Live Mode (Spontaneous) ๐Ÿค– Agent Mode (Structured)
Puck's Role The Improviser: Composes and narrates purely on the fly. The Narrator: Brings a carefully crafted script to life.
Preparation No wait time. Jump straight into the action. 30-60s "Story Crafting" context formation.
Backend Agents Idle. Active Background: Researching and weaving the plot.
Technology Direct Gemini 3.1 Flash Live session. Orchestrator (3.1 Pro) + Live Narrator.
Visual Flow Interleaved watercolor illustrations. Themed story-driven scenes.

Note on Agent Mode: While the background agents (Researcher, Judge, Storysmith) are hard at work forming the perfect context, their direct technical scripts are kept "behind the scenes" to keep the child's interface clean and magical. Look forward to seeing their raw creative process in a future Gemini Tales Premium release!


๐Ÿคธ Exercise Focus Modes

To ensure children get the exact type of movement they need, Gemini Tales now features Exercise Focus Modes. The AI Narrator adapts its physical challenges dynamically based on your selection:

Exercise Focus Selection

Focus Mode Description
โœจ Sky Magic Upper Body Exercises focused on arms and upper body (like flying, waving wands, reaching for the stars).
๐ŸŒฟ Earth Magic Lower Body Exercises focused on legs and lower body (like stomping, jumping, running, balancing).
โ˜€๏ธ Solar Power Full Body Full body exercises. Energy and movement everywhere!

โšก Movement Impact Dashboard

Gemini Tales v2 now features a real-time Heroic Energy Tracker (Phase 1 Movement Metrics). This system bridges the gap between digital play and physical health by quantifying the child's activity:

  • Visual Verification: Puck uses the camera to verify exercises.
  • Heroic Energy Meter: A 0-100% progress bar that "charges up" with every jumping jack, bunny hop, or arm wave.
  • Interactive "Glow": The screen flashes with golden energy and showing +15 Energy! whenever a movement is successfully recorded.
  • Educator Adoption Ready: These metrics provide the raw data needed to prove the "screen time โ†’ move time" transformation.

๐Ÿ—๏ธ Project Structure

  • frontend/: React 19 application using Tailwind CSS and Gemini Live SDK.
  • backend/app/: Main Agent (Puck) โ€” the Live Narrator that handles real-time voice, vision, and telling the story.
  • backend/agents/: Supporting Agents โ€” background microservices for Researcher, Judge, and Storysmith.
    • backend/agents/run_local.ps1: Local orchestrator to wake up all supporting agents.

๐Ÿ›๏ธ High-Level Architecture

graph TD
    User([User]) <--> Browser["Browser (Magic Mirror UI)"]
    
    subgraph "Main Interaction Agent (Puck)"
        Browser <-->|WebSocket| Puck["Puck (Live Narrator)"]
        subgraph "Puck's Toolbox"
            Illustrator["Illustration Engine"]
            Awards["Achievement Manager"]
        end
        Puck <--> Illustrator
        Puck <--> Awards
    end
    
    subgraph "Google AI Infrastructure"
        Puck <-->|Multimodal Flow| GeminiLive["Gemini Live (via MODEL_ID)"]
        Illustrator -->|Video Generation| Veo[Veo 3.1]
        Illustrator -->|Dynamic Rendering| FlashImage["Gemini 3.1 Flash-Image (Nano Banana 2)"]
        Illustrator -->|Audio Composition| Lyria3[Lyria 3]
    end
    
    subgraph "Supporting Brain (Agent Mode)"
        Puck -->|Request Pipeline| Orchestrator[Orchestrator]
        Orchestrator <-->|A2A| Researcher[Researcher]
        Orchestrator <-->|A2A| Judge[Judge]
        Orchestrator <-->|A2A| Storysmith[Storysmith]
    end
    
    style Browser fill:#f9f,stroke:#333,stroke-width:2px
    style Puck fill:#f9f,stroke:#333,stroke-width:2px
    style Orchestrator fill:#ccf,stroke:#333,stroke-width:2px
    style Illustrator fill:#fff4dd,stroke:#d4a017,stroke-width:2px
Loading

๐Ÿ“– Deep Dive: For a detailed look at system design, data flows, and design decisions, see the Full Architecture Documentation.

ADK Trace Example

When using Agent Mode, the Google Agent Development Kit provides detailed tracing of all agent invocations, timing, and dependencies:

ADK Web Trace - Agent Pipeline Visualization

This trace shows the complete flow: Puck โ†’ Orchestrator โ†’ Researcher โ†’ Judge โ†’ Storysmith โ†’ Content Builder, with detailed timing for each step.


๐Ÿงš Magic Behind the Scenes

  • ๐ŸŒฟ Cinematic Animation: Puck comes to life with Veo 3.1, generating magical video previews that make the world feel alive.
  • โœŒ๏ธ Seamless Interaction: Real-time visual recognition and context-aware response: Stories only begin when Puck sees the "Magic Sign" (two fingers up).
  • ๐Ÿ“ธ Portrait Transformation: Upload a photo to see the child transformed into a watercolor fairytale character.
  • ๐ŸŽจ Visual Consistency: High-quality scene illustrations are automatically generated by Gemini 3.1 Flash-Image (Nano Banana 2) as the story unfolds, creating a world that is always aware of the narrative state.
  • ๐ŸŽต Adaptive Music (Lyria 3): Every scene in the story now has its own unique, high-fidelity 48kHz stereo background music generated by Lyria 3. The AI "looks" at the scene illustration and composes a 30-second soundtrack that perfectly matches the visual atmosphere.
  • โญ Achievements & Badges: Complete physical challenges (like "Hop like a bunny") to unlock magical badges and track your hero's journey in real-time.

๐ŸŒ Nano Banana 2 Enhancements

Upgraded to Gemini 3.1 Flash-Image (Nano Banana 2) with two powerful improvements:

1. ๐ŸŽจ High-Fidelity Photo Transformation

Transform any photo into a magical fairytale character while maintaining perfect recognizability.

  • Smart Detail Preservation: Automatically preserves facial structure, eye color, distinctive features, hair color/style, and natural expression.
  • Quality Guarantee: Person remains immediately recognizable while being stunningly illustrated.
  • How It Works: Upload a photo โ†’ Gemini transforms it into a watercolor fairytale portrait with magical elements while keeping the person's key features intact.
  • Technical: Multi-turn chat preserves context for consistent character refinement.

2. ๐Ÿ” Google Search Grounding for Scenes

Scene illustrations now use real-world information for enhanced accuracy and realism.

  • Accurate Locations: Mention specific places (e.g., "the Amazon rainforest") and Gemini automatically grounds the visual details in real geography.
  • Rich Details: Accurate plants, animals, landmarks, and environmental features based on actual locations.
  • Magical Realism: Maintains whimsical fairytale style while depicting real-world accuracy.
  • Technical: Google Search tool integrated into scene generation chat session for live data fetching.

Configuration

Model names are now externalized to environment variables for easy customization:

VITE_MODEL_ID_IMAGE=gemini-3.1-flash-image
LYRIA_MODEL_ID=lyria-3-clip-preview
VIDEO_MODEL_ID=veo-3.1-generate-preview

Separate chat sessions optimize image quality:

  • Avatar Session (1:1, 1K resolution): Character portraits and poses.
  • Scene Session (16:9, 2K resolution, with Google Search): Story illustrations.

๐Ÿ•น๏ธ Interactive Control Center

The "Magic Mirror" cockpit provides full transparency and control over the AI experience:

  • ๐Ÿ”Œ API Lifecycle Management: One-click connection to the Gemini Pulse with real-time status tracking.
  • ๐ŸŽ™๏ธ Multimodal Inputs: Live switching between different microphones and cameras to find the best angle for the Magic Sign.
  • ๐Ÿ’ฌ Conversation Hub: A dual-stream chat system that combines real-time AI transcriptions with manual text input.
  • ๐Ÿ” Debug Console: A dedicated "Technician's View" showing the architectural heartbeat, device sync events, and ADK protocol logs.

๐Ÿ› ๏ธ Tech Stack

  • Frontend: React 19, TypeScript, Tailwind CSS.
  • AI Models:
    • Gemini 2.5 / 3.1 Flash Live (Real-time Audio/Vision - via MODEL_ID in .env)
    • Gemini 3.1 Pro & Flash-Lite (Preview) (Agentic Reasoning - via MODEL_NAME / MODEL_NAME_FLASH)
    • Gemini 3.1 Flash-Image (Nano Banana 2) (High-fidelity avatars & scenes - via VITE_MODEL_ID_IMAGE)
    • Lyria 3 (Adaptive 48kHz stereo background music - via LYRIA_MODEL_ID)
    • Veo 3.1 (Cinematic Video Generation - via VIDEO_MODEL_ID)
  • Backend: FastAPI, Google ADK, WebSockets.
  • Hosting: Google Cloud Run, Cloud Artifact Registry.
  • Image Processing: PIL (Python Imaging Library) for avatar manipulation.
  • Google Cloud Services: Vertex AI, Google AI API, Generative AI Client Library.

๐Ÿš€ Getting Started

1. Prerequisites

  • Python 3.10+ and uv installed.
  • Node.js 18+ and npm.
  • Google Cloud Project with Gemini API access.

2. Launch Instructions (Local Development)

To run the full experience locally, start these components in separate terminals:

A. Start ADK Agents (The Brain)

cd backend/agents
.\run_local.ps1

This starts the sub-agents on ports 8001โ€“8004 required for Agent Mode.

B. Start Main Agent (Puck)

cd backend
uv sync
uv run python app/main.py

Starts Puck, the Live Narrator, ready to see and hear you (Port 8000).

C. Start Frontend UI

cd frontend
npm install
npm run dev

3. Cloud Deployment (Google Cloud Run)

To ensure this project is fully reproducible, I've included automation scripts that handle the complex deployment of my multi-agent architecture to Google Cloud Run.

A. Prerequisites for Cloud

  • Google Cloud CLI installed and authenticated (gcloud auth login).
  • An active Google Cloud Project with Billing enabled.
  • Your .env file in backend/app/ should contain your GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION.

B. Deploy Supporting Agents (The Brain)

These agents (Researcher, Judge, Storysmith, Orchestrator) provide the agentic reasoning for the story mode.

cd backend/agents
.\deploy.ps1

This script automatically bundles shared logic, configures security, and deploys 4 microservices to Cloud Run.

C. Deploy Main App (Puck + Frontend)

This deploys the central "Magic Mirror" interface and the Live Narrator.

# Run from the repository root
.\deploy_app.ps1

This script handles the dual-stage build: compiling the React 19 frontend and wrapping it with the FastAPI/Puck bridge into a single production-ready container.

๐Ÿ’ก Pro-Tip: After deployment, you can manage all AI parameters (Model IDs, API Keys) directly through the Cloud Run environment variables without needing to re-deploy.


๐Ÿงช Testing & Interactive Guide

Follow these steps to ensure a magical and stable session:

1. The Character Workshop & Setup

Our redesigned onboarding experience takes the user through a structured Character Workshop to customize their magical companion:

  1. Select Puck's Style: Choose one of 4 unique fairytale styles:
    • ๐Ÿงš Elf โ€” Magic Woodland Elf
    • ๐Ÿช„ Wizard โ€” Young Sorcerer
    • ๐Ÿ‘‘ Royal โ€” Prince / Princess
    • ๐ŸฆŠ Critter โ€” Fox / Woodland Animal
  2. Upload Photo: Upload a photo (or type a custom description) to transform the child's likeness into a watercolor fairytale character using Gemini 3.1 Flash-Image (Nano Banana 2). The smart detail preservation ensures the child remains instantly recognizable.
  3. Cinematic Animation: Click the Animate button to see Puck come to life with a 4-second magical video preview generated by Veo 3.1.
  4. Compose Theme Music: Select one of 4 musical genres to compose Puck's background theme music using Lyria 3:
    • ๐ŸŒฒ Forest โ€” Whimsical woodland melodies
    • ๐Ÿ”ฎ Sorcerer โ€” Mystical celestial chords
    • ๐Ÿฐ Harp โ€” Elegant royal palace harp music
    • ๐Ÿฅ March โ€” Brave heroic adventure march
  5. Choose Story Mode: Select one of the two distinct ways to experience the magic:
    • ๐ŸŽ™๏ธ Live Mode (Spontaneous): Immediate unscripted live session with Puck. Puck acts as the Improviser, dynamically creating the narrative on the fly in response to the child's comments, surroundings, and actions.
    • ๐Ÿค– Agent Mode (Structured): Pre-generates a detailed, safe narrative using our background ADK agents (Researcher, Judge, Storysmith). Once the script is ready, the button changes to "Wake Puck!" to start the session, where Puck follows the structured blueprint exactly.
  6. Choose Exercise Focus: Adapt the physical challenges to adapt the narration to your goals:
    • โœจ Sky Magic (Upper Body): Exercises focused on arms and upper body (like flying, waving wands, reaching for the stars).
    • ๐ŸŒฟ Earth Magic (Lower Body): Exercises focused on legs and lower body (like stomping, jumping, running, balancing).
    • โ˜€๏ธ Solar Power (Full Body): High-energy, full-body exercises.

2. Magical Interaction

  • Listen & Act: Follow the spoken instructions from Puck.
  • Watch the Magic: As you journey through the tale, Puck will automatically generate beautiful illustrations for every new scene using Google Search grounding โ€” real locations and elements are accurately depicted while maintaining a whimsical fairytale style.
  • Earn Badges: Successfully complete physical tasks to unlock Achievements and see your badge collection grow!
  • Voice Response: Select your Microphone and click Start Audio to talk to the mirror.
  • Vision Recognition: Select your Camera and click Start Video to show your face and the "Magic Sign".
  • Debug Console: Check the Debug Console to see the WebSocket status and any error messages.

3. Troubleshooting

  • No Sound/Image?: Ensure the correct device is selected in the Microphone and Camera dropdowns before clicking "Start".
  • Connection: Check the Debug Console to verify that the WebSocket status is "Connected".

๐Ÿ‘ฉโ€๐ŸŽ“ Hackathon & Build Story

The original prototype was created for the Gemini Live Agent Challenge.

This repository contains the evolved v2 version, developed further for the Google for Startups AI Agents Challenge.

Check out the full story of how I built an "AI Nanny" to fight the sedentary lifestyle using Google AI and Google Cloud:
๐Ÿ‘‰ Read the Build Story on Dev.to


๐Ÿ“œ License

MIT โ€” see LICENSE.

Created with โค๏ธ for the next generation of explorers by Veronika Kashtanova

About

Active evolution of Gemini Tales. ๐Ÿงšโœจ A magical AI storyteller that sees, hears, and moves with your child.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors