daytonaio
diff --git a/‎authors/assets/haohui-xie.svg‎
Lines changed: 15 additions & 0 deletions b/‎authors/assets/haohui-xie.svg‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎authors/haohui_xie.md‎
Lines changed: 3 additions & 0 deletions b/‎authors/haohui_xie.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎definitions/20260521_definition_gemini_audio_transcription.md‎
Lines changed: 20 additions & 0 deletions b/‎definitions/20260521_definition_gemini_audio_transcription.md‎
Lines changed: 20 additions & 0 deletions
@@ -0,0 +1,3 @@
+Author: Haohui Xie Title: AI Engineering Contributor
+Description: Haohui Xie works on AI engineering, speech technology, and agent workflows. He enjoys turning rough experiments into reproducible developer guides with clear setup steps, validation checks, and practical troubleshooting notes for teams building with modern AI tools.
+Author Image: ![haohui-xie](./assets/haohui-xie.svg) Author LinkedIn: Author Twitter: Company Name: Independent Contributor Company Description: Independent AI engineering and open-source contributor. Company Logo Dark: Company Logo White:
@@ -0,0 +1,20 @@
+---
+title: 'Gemini Audio Transcription'
+description: 'Using Google Gemini multimodal models to convert speech in audio or video recordings into text transcripts.'
+date: 2026-05-21
+author: 'Haohui Xie'
+---
+
+# Gemini Audio Transcription
+
+## Definition
+
+Gemini audio transcription is the use of Google's Gemini multimodal models to convert spoken audio into text. Instead of sending a file to a dedicated Whisper-compatible endpoint, a developer can send audio and text instructions together to Gemini's `generateContent` API.
+The model then returns transcript text that can be reviewed or passed to downstream tools.
+
+## Context and Usage
+
+In an engineering workflow, Gemini audio transcription is useful when the transcript needs both speech recognition and instruction-following context. A developer can provide hints such as product names, repository names, acronyms, or the dominant language of the recording.
+The model receives those hints alongside the audio and returns a transcript that can be reviewed, searched, summarized, or turned into follow-up tasks.
+
+When using Gemini for transcription, teams should handle API keys as secrets, validate file-size limits, and run a short sample before batching many recordings. For privacy-sensitive recordings, confirm that the provider and project settings match the organization's data policy before uploading audio.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+Author: Haohui Xie Title: AI Engineering Contributor`
	`2`	`+Description: Haohui Xie works on AI engineering, speech technology, and agent workflows. He enjoys turning rough experiments into reproducible developer guides with clear setup steps, validation checks, and practical troubleshooting notes for teams building with modern AI tools.`
	`3`	`+Author Image: ![haohui-xie](./assets/haohui-xie.svg) Author LinkedIn: Author Twitter: Company Name: Independent Contributor Company Description: Independent AI engineering and open-source contributor. Company Logo Dark: Company Logo White:`