Skip to content

Commit 58c6efb

Browse files
authored
Merge pull request #7 from google-gemini/add-gemini-2.5
Add Image creation, editing and composing with Gemini 2.5 Flash Image
2 parents 972162e + 0b55042 commit 58c6efb

19 files changed

Lines changed: 1893 additions & 245 deletions

README.md

Lines changed: 70 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,46 @@
1-
# Veo 3 Gemini API Quickstart
1+
# Gemini API Veo 3 & Nano Banana Quickstart
2+
3+
A NextJs quickstart for creating and editing images and videos using Google's latest Gemini API models including [Veo 3](https://ai.google.dev/gemini-api/docs/video), [Imagen 4](https://ai.google.dev/gemini-api/docs/imagen), and [Gemini 2.5 Flash Image aka nano banana](https://ai.google.dev/gemini-api/docs/image-generations).
4+
5+
<table>
6+
<tr>
7+
<td align="center">
8+
<img src="./public/compose.png" alt="Compose" width="300"/>
9+
<br/>
10+
<strong>Compose</strong>
11+
</td>
12+
<td align="center">
13+
<img src="./public/edit.png" alt="Edit" width="300"/>
14+
<br/>
15+
<strong>Edit</strong>
16+
</td>
17+
<td align="center">
18+
<img src="./public/video.png" alt="Video" width="300"/>
19+
<br/>
20+
<strong>Video</strong>
21+
</td>
22+
</tr>
23+
</table>
24+
25+
> [!NOTE]
26+
> If you want a full studio, consider [Google's Flow](https://labs.google/fx/tools/flow) (a professional environment for Veo/Imagen). Use this repo as a lightweight studio to learn how to build your own UI that generates content with Google's AI models via the Gemini API.
227
3-
[Veo 3](https://ai.google.dev/gemini-api/docs/video) is Google's state-of-the-art video generation model available in the Gemini API. This repository is a quickstart that demonstrates how to build a simple UI to generate videos with Veo 3, play them, and download the results. It also includes an image + text to video generation using the [Imagen 4](https://ai.google.dev/gemini-api/docs/imagen) model.
28+
(This is not an official Google product.)
429

5-
![Example](./public/example.png)
30+
## Features
631

7-
> [!NOTE]
8-
> If you want a full studio, consider [Google's Flow](https://labs.google/fx/tools/flow) (a professional environment for Veo/Imagen). Use this repo as a lightweight quickstart to learn how to build your own UI that generates videos with Veo 3 via the Gemini API.
32+
The quickstart provides a unified composer UI with different modes for content creation:
933

10-
(This is not an official Google product.)
34+
- **Create Image**: Generate images from text prompts using **Imagen 4** or **Gemini 2.5 Flash Image**.
35+
- **Edit Image**: Edit an image based on a text prompt using **Gemini 2.5 Flash Image**.
36+
- **Compose Image**: Combine multiple images with a text prompt to create a new image using **Gemini 2.5 Flash Image**.
37+
- **Create Video**: Generate videos from text prompts or an initial image using **Veo 3**.
1138

12-
## Features
39+
### Quick Actions & UI Features
40+
- Seamless navigation between modes after generating content
41+
- Download generated images & videos
42+
- Cut videos directly in the browser to specific time ranges
1343

14-
- Generate videos from text prompts using the Veo-3 model.
15-
- Generate videos from images + text prompts using the Imagen 4.0 model or upload a starting image.
16-
- Play and download generated videos.
17-
- Cut videos directly in the browser to a specific time range.
1844

1945
## Getting Started: Development and Local Testing
2046

@@ -26,7 +52,7 @@ Follow these steps to get the application running locally for development and te
2652
- **`GEMINI_API_KEY`**: The application requires a [GEMINI API key](https://aistudio.google.com/app/apikey). Either create a `.env` file in the project root and add your API key: `GEMINI_API_KEY="YOUR_API_KEY"` or set the environment variable in your system.
2753

2854
> [!WARNING]
29-
> Google Veo 3 and Imagen 4 are both part of the Gemini API Paid tier. You will need to be on the paid tier to use these models.
55+
> Google Veo 3, Imagen 4, and Gemini 2.5 Flash Image are part of the Gemini API Paid tier. You will need to be on the paid tier to use these models.
3056
3157
**2. Install Dependencies:**
3258

@@ -46,11 +72,22 @@ Open your browser and navigate to `http://localhost:3000` to see the application
4672

4773
The project is a standard Next.js application with the following key directories:
4874

49-
- `app/`: Contains the main application logic, including the user interface and API routes.
50-
- `api/`: API routes for generating videos and images, and checking operation status.
51-
- `components/`: Reusable React components used throughout the application.
52-
- `lib/`: Utility functions and schema definitions.
53-
- `public/`: Static assets.
75+
- `app/`: Contains the main application logic and pages
76+
- `page.tsx`: Main page with the unified composer UI.
77+
- `api/`: API routes for different operations
78+
- `imagen/generate/`: Image generation with Imagen 4
79+
- `gemini/generate/`: Image generation with Gemini 2.5 Flash Image
80+
- `gemini/edit/`: Image editing/composition with Gemini 2.5 Flash Image
81+
- `veo/generate/`: Video generation operations
82+
- `veo/operation/`: Check video generation status
83+
- `veo/download/`: Download generated videos
84+
- `components/`: Reusable React components
85+
- `ui/Composer.tsx`: The main unified composer for all interactions.
86+
- `ui/VideoPlayer.tsx`: Video player with trimming
87+
- `ui/ModelSelector.tsx`: Model selection component
88+
- `ui/dropzone.tsx`: Drag-and-drop component for file uploads.
89+
- `lib/`: Utility functions and schema definitions
90+
- `public/`: Static assets
5491

5592
## Official Docs and Resources
5693

@@ -62,17 +99,25 @@ The project is a standard Next.js application with the following key directories
6299

63100
The application uses the following API routes to interact with the Google models:
64101

65-
- `app/api/veo/generate/route.ts`: Handles video generation requests. It takes a text prompt as input and initiates a video generation operation with the Veo-3 model.
66-
- `app/api/veo/operation/route.ts`: Checks the status of a video generation operation.
67-
- `app/api/veo/download/route.ts`: Downloads the generated video.
68-
- `app/api/imagen/generate/route.ts`: Handles image generation requests with the Imagen model.
102+
### Image APIs
103+
- `app/api/imagen/generate/route.ts`: Handles image generation requests with Imagen 4
104+
- `app/api/gemini/generate/route.ts`: Handles image generation requests with Gemini 2.5 Flash Image
105+
- `app/api/gemini/edit/route.ts`: Handles image editing and composition with Gemini 2.5 Flash (supports multiple images)
106+
107+
### Video APIs
108+
- `app/api/veo/generate/route.ts`: Handles video generation requests with Veo 3
109+
- `app/api/veo/operation/route.ts`: Checks the status of video generation operations
110+
- `app/api/veo/download/route.ts`: Downloads generated videos
69111

70112
## Technologies Used
71113

72-
- [Next.js](https://nextjs.org/) - React framework for building the user interface.
73-
- [React](https://reactjs.org/) - JavaScript library for building user interfaces.
74-
- [Tailwind CSS](https://tailwindcss.com/) - For styling.
75-
- [Gemini API](https://ai.google.dev/gemini-api/docs) with Veo 3 - For video generation; Imagen - For image generation.
114+
- [Next.js](https://nextjs.org/) - React framework for building the user interface
115+
- [React](https://reactjs.org/) - JavaScript library for building user interfaces
116+
- [Tailwind CSS](https://tailwindcss.com/) - For styling
117+
- [Gemini API](https://ai.google.dev/gemini-api/docs) with:
118+
- **Veo 3** - For video generation
119+
- **Imagen 4** - For high-quality image generation
120+
- **Gemini 2.5 Flash** - For fast image generation, editing, and composition
76121

77122
## Questions and feature requests
78123

app/api/gemini/edit/route.ts

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
import { NextResponse } from "next/server";
2+
import { GoogleGenAI } from "@google/genai";
3+
4+
if (!process.env.GEMINI_API_KEY) {
5+
throw new Error("GEMINI_API_KEY environment variable is not set.");
6+
}
7+
8+
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
9+
10+
export async function POST(req: Request) {
11+
try {
12+
const contentType = req.headers.get("content-type") || "";
13+
14+
if (!contentType.includes("multipart/form-data")) {
15+
return NextResponse.json(
16+
{ error: "Expected multipart/form-data" },
17+
{ status: 400 }
18+
);
19+
}
20+
21+
const form = await req.formData();
22+
const prompt = (form.get("prompt") as string) || "";
23+
24+
if (!prompt) {
25+
return NextResponse.json({ error: "Missing prompt" }, { status: 400 });
26+
}
27+
28+
// Handle multiple image files
29+
const imageFiles = form.getAll("imageFiles");
30+
console.log("Received imageFiles from form:", imageFiles.length);
31+
console.log(
32+
"Image file details:",
33+
imageFiles.map((f, i) => ({
34+
index: i,
35+
name: f instanceof File ? f.name : "not-file",
36+
type: f instanceof File ? f.type : typeof f,
37+
}))
38+
);
39+
40+
const contents: (
41+
| { text: string }
42+
| { inlineData: { mimeType: string; data: string } }
43+
)[] = [];
44+
45+
// Add the prompt as text
46+
contents.push({ text: prompt });
47+
48+
// Process each image file
49+
console.log("Processing image files...");
50+
for (const imageFile of imageFiles) {
51+
if (imageFile && imageFile instanceof File) {
52+
console.log(
53+
`Processing file: ${imageFile.name}, size: ${imageFile.size}, type: ${imageFile.type}`
54+
);
55+
const buf = await imageFile.arrayBuffer();
56+
const b64 = Buffer.from(buf).toString("base64");
57+
contents.push({
58+
inlineData: {
59+
mimeType: imageFile.type || "image/png",
60+
data: b64,
61+
},
62+
});
63+
}
64+
}
65+
console.log("Total contents after processing:", contents.length);
66+
67+
// Handle single image (backward compatibility)
68+
const singleImageFile = form.get("imageFile");
69+
if (
70+
singleImageFile &&
71+
singleImageFile instanceof File &&
72+
contents.length === 1
73+
) {
74+
const buf = await singleImageFile.arrayBuffer();
75+
const b64 = Buffer.from(buf).toString("base64");
76+
contents.push({
77+
inlineData: {
78+
mimeType: singleImageFile.type || "image/png",
79+
data: b64,
80+
},
81+
});
82+
}
83+
84+
// Handle base64 image (for generated images)
85+
const imageBase64 = (form.get("imageBase64") as string) || undefined;
86+
const imageMimeType = (form.get("imageMimeType") as string) || undefined;
87+
88+
if (imageBase64 && contents.length === 1) {
89+
const cleaned = imageBase64.includes(",")
90+
? imageBase64.split(",")[1]
91+
: imageBase64;
92+
contents.push({
93+
inlineData: {
94+
mimeType: imageMimeType || "image/png",
95+
data: cleaned,
96+
},
97+
});
98+
}
99+
100+
if (contents.length < 2) {
101+
return NextResponse.json(
102+
{ error: "No images provided for editing" },
103+
{ status: 400 }
104+
);
105+
}
106+
107+
const response = await ai.models.generateContent({
108+
model: "gemini-2.5-flash-image-preview",
109+
contents: contents,
110+
});
111+
112+
// Process the response to extract the image
113+
let imageData = null;
114+
let responseMimeType = "image/png";
115+
116+
for (const part of response.candidates[0].content.parts) {
117+
if (part.text) {
118+
console.log("Generated text:", part.text);
119+
} else if (part.inlineData) {
120+
imageData = part.inlineData.data;
121+
responseMimeType = part.inlineData.mimeType || "image/png";
122+
break;
123+
}
124+
}
125+
126+
if (!imageData) {
127+
return NextResponse.json(
128+
{ error: "No image generated" },
129+
{ status: 500 }
130+
);
131+
}
132+
133+
return NextResponse.json({
134+
image: {
135+
imageBytes: imageData,
136+
mimeType: responseMimeType,
137+
},
138+
});
139+
} catch (error) {
140+
console.error("Error editing image with Gemini:", error);
141+
return NextResponse.json(
142+
{ error: "Failed to edit image" },
143+
{ status: 500 }
144+
);
145+
}
146+
}

app/api/gemini/generate/route.ts

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import { NextResponse } from "next/server";
2+
import { GoogleGenAI } from "@google/genai";
3+
4+
if (!process.env.GEMINI_API_KEY) {
5+
throw new Error("GEMINI_API_KEY environment variable is not set.");
6+
}
7+
8+
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
9+
10+
export async function POST(req: Request) {
11+
try {
12+
const body = await req.json();
13+
const prompt = (body?.prompt as string) || "";
14+
15+
if (!prompt) {
16+
return NextResponse.json({ error: "Missing prompt" }, { status: 400 });
17+
}
18+
19+
const response = await ai.models.generateContent({
20+
model: "gemini-2.5-flash-image-preview",
21+
contents: prompt,
22+
});
23+
24+
// Process the response to extract the image
25+
let imageData = null;
26+
let imageMimeType = "image/png";
27+
28+
for (const part of response.candidates[0].content.parts) {
29+
if (part.text) {
30+
console.log("Generated text:", part.text);
31+
} else if (part.inlineData) {
32+
imageData = part.inlineData.data;
33+
imageMimeType = part.inlineData.mimeType || "image/png";
34+
break;
35+
}
36+
}
37+
38+
if (!imageData) {
39+
return NextResponse.json({ error: "No image generated" }, { status: 500 });
40+
}
41+
42+
return NextResponse.json({
43+
image: {
44+
imageBytes: imageData,
45+
mimeType: imageMimeType,
46+
},
47+
});
48+
} catch (error) {
49+
console.error("Error generating image with Gemini:", error);
50+
return NextResponse.json(
51+
{ error: "Failed to generate image" },
52+
{ status: 500 }
53+
);
54+
}
55+
}

app/api/veo/generate/route.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ export async function POST(req: Request) {
2121
const form = await req.formData();
2222

2323
const prompt = (form.get("prompt") as string) || "";
24-
const model = (form.get("model") as string) || "veo-3.0-generate-preview";
24+
const model = (form.get("model") as string) || "veo-3.0-generate-001";
2525
const negativePrompt = (form.get("negativePrompt") as string) || undefined;
2626
const aspectRatio = (form.get("aspectRatio") as string) || undefined;
2727

app/globals.css

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,12 @@
129129
}
130130
}
131131

132+
@keyframes shimmer {
133+
100% {
134+
transform: translateX(100%);
135+
}
136+
}
137+
132138

133139
body {
134140
color: var(--foreground);

app/layout.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ const sourceCodePro = Source_Code_Pro({
1717
});
1818

1919
export const metadata: Metadata = {
20-
title: "Veo 3 Studio",
20+
title: "Gemini API Studio",
2121
description: "A quickstart for the Gemini API with Veo 3",
2222
icons: {
2323
icon: "/imgs/gemini_icon.svg",

0 commit comments

Comments
 (0)