|
| 1 | +# Espressif Multimedia Capture |
| 2 | + |
| 3 | +- [](https://components.espressif.com/components/espressif/esp_capture) |
| 4 | + |
| 5 | +- [中文版](./README_CN.md) |
| 6 | + |
| 7 | +Espressif Multimedia Capture (**esp_capture**) is a lightweight multimedia capture component developed by Espressif, based on the [ESP-GMF](https://github.com/espressif/esp-gmf/blob/main/README.md) architecture. It features low memory footprint, high flexibility, and a modular design. The component integrates functions such as audio/video encoding, image rotation and scaling, echo cancellation, and text overlay. It is widely applicable to scenarios including audio/video recording, AI large model input, WebRTC, RTMP/RTSP streaming, local storage, and remote monitoring. |
| 8 | + |
| 9 | +## 🔑 Key Features |
| 10 | + |
| 11 | +- 📦 **Low memory overhead** with modular pipeline structure |
| 12 | +- 🎚️ **Tight integration with ESP-GMF** for advanced audio/video processing |
| 13 | +- 🎥 **Support for multiple input devices**: V4L2, DVP cameras, audio codecs |
| 14 | +- 🔁 **Parallel streaming and storage** options |
| 15 | +- ⚙️ **Automatic source-sink negotiation** for simplified configuration |
| 16 | +- ✨ **Customizable processing pipelines** for professional use cases |
| 17 | + |
| 18 | +## ⚙️ Architecture Overview |
| 19 | + |
| 20 | +A capture system connects sources (input devices) to sinks (output targets) through an intermediate processing path. |
| 21 | + |
| 22 | +```mermaid |
| 23 | +graph LR |
| 24 | + Capture_Source --> Capture_Path --> Capture_Sink |
| 25 | +``` |
| 26 | + |
| 27 | +| Component | Description | |
| 28 | +|-------------------|--------------------------------------------------------------------| |
| 29 | +| **Capture Source** | Interfaces for physical input devices (camera, mic, etc.) | |
| 30 | +| **Capture Path** | Processing pipeline (audio/video filters, encoders, overlays) | |
| 31 | +| **Capture Sink** | Output targets (e.g., streaming, storage, muxers) | |
| 32 | + |
| 33 | +### 🧠 AV Synchronization and Muxing |
| 34 | + |
| 35 | +To enable synchronized audio-video muxing, a dedicated sync module aligns timestamps across streams. |
| 36 | + |
| 37 | +```mermaid |
| 38 | +graph LR |
| 39 | + capture_audio_src --> capture_audio_path --> capture_audio_sink |
| 40 | + capture_audio_src --> capture_sync |
| 41 | + capture_video_src --> capture_sync |
| 42 | + capture_video_src --> capture_video_path --> capture_video_sink |
| 43 | + capture_audio_sink --> capture_muxer |
| 44 | + capture_video_sink --> capture_muxer |
| 45 | + capture_muxer --> capture_muxer_sink |
| 46 | +``` |
| 47 | + |
| 48 | +## 🔊 Audio Sources |
| 49 | + |
| 50 | +Audio sources are used to acquire audio data from audio input devices connected via various buses (like I2S, USB, etc.). |
| 51 | + |
| 52 | +**Interface**: `esp_capture_audio_src_if_t` |
| 53 | + |
| 54 | +Built-in sources: |
| 55 | + |
| 56 | +- `esp_capture_new_audio_dev_src`: Codec-based audio capture |
| 57 | +- `esp_capture_new_audio_aec_src`: Codec-based audio capture with Acoustic Echo Cancellation (AEC) |
| 58 | + |
| 59 | +## 🎥 Video Sources |
| 60 | + |
| 61 | +Video sources are used to capture video data from video input devices connected via various buses (like SPI, MIPI, USB, etc.). |
| 62 | + |
| 63 | +**Interface**: `esp_capture_video_src_if_t` |
| 64 | + |
| 65 | +Built-in sources: |
| 66 | + |
| 67 | +- `esp_capture_new_video_v4l2_src`: V4L2 camera input (via `esp_video`) |
| 68 | +- `esp_capture_new_video_dvp_src`: DVP camera input |
| 69 | + |
| 70 | +## 🕓 Stream Synchronization |
| 71 | + |
| 72 | +Stream synchronization is achieved by the `capture_sync` module. `capture_sync` aligns audio and video frame timestamps for synchronized playback or muxing. It is automatically configured through `esp_capture_open`. |
| 73 | + |
| 74 | +## 🔧 Audio/Video Processing Paths |
| 75 | + |
| 76 | +**Interface**: `esp_capture_path_mngr_if_t` |
| 77 | + |
| 78 | +### 🎚️ Audio Path |
| 79 | + |
| 80 | +Built-in: |
| 81 | + |
| 82 | +- `esp_capture_new_gmf_audio_mngr`: Creates audio processing path using `ESP-GMF` with elements like: |
| 83 | + - `aud_rate_cvt` – Sample rate conversion |
| 84 | + - `aud_ch_cvt` – Channel conversion (mono ↔ stereo) |
| 85 | + - `aud_bit_cvt` – Bit depth conversion` |
| 86 | + - `aud_enc` – Audio encoder |
| 87 | + |
| 88 | +**Pipeline Builders** (`esp_capture_pipeline_builder_if_t`): |
| 89 | + |
| 90 | +- `esp_capture_create_auto_audio_pipeline`: Auto-generated audio pipeline based on negotiation |
| 91 | +- `esp_capture_create_audio_pipeline`: Prebuilt audio template pipeline |
| 92 | + |
| 93 | +### 🎛️ Video Path |
| 94 | + |
| 95 | +Built-in: |
| 96 | + |
| 97 | +- `esp_capture_new_gmf_video_mngr`: Creates video processing path using `ESP-GMF` with elements like: |
| 98 | + - `vid_ppa` – Resize, crop, color conversion |
| 99 | + - `vid_overlay` – Text/graphic overlays |
| 100 | + - `vid_fps_cvt` – Framerate conversion |
| 101 | + - `vid_enc` – Video encoder |
| 102 | + |
| 103 | +**Pipeline Builders**: |
| 104 | + |
| 105 | +- `esp_capture_create_auto_video_pipeline`: Auto-generated video pipeline based on negotiation |
| 106 | +- `esp_capture_create_video_pipeline`: Prebuilt video template pipeline |
| 107 | + |
| 108 | +## 🎞️ Muxing |
| 109 | + |
| 110 | +Mux audio/video into containers for storage or streaming: |
| 111 | + |
| 112 | +- MP4: File-based only |
| 113 | +- TS: Supports streaming and file-based |
| 114 | + |
| 115 | +### Data Flow Control for Muxers |
| 116 | + |
| 117 | +The module provides flexible data flow control options for muxers: |
| 118 | + |
| 119 | +- **Muxer-only mode**: All data is consumed by the muxer, preventing access to raw audio/video streams |
| 120 | +- **Streaming while storage**: Simultaneous storage and streaming when supported by the muxer |
| 121 | +- **Unified API**: Use `esp_capture_sink_acquire_frame` for both muxer output and direct stream access |
| 122 | + |
| 123 | +## 🖋️ Overlays |
| 124 | + |
| 125 | +Overlays are used to mix text or images into original video frames. |
| 126 | +Typical use cases include: Adding real-time timestamps or statistical data onto video frames. |
| 127 | + |
| 128 | +**Interface**: `esp_capture_overlay_if_t` |
| 129 | + |
| 130 | +- Built-in: `esp_capture_new_text_overlay` |
| 131 | +- Automatically handled if overlay is present in the video path |
| 132 | + |
| 133 | +## ⚡ Auto Capture Mode |
| 134 | + |
| 135 | +Simplified configuration by automatically connecting sources, paths, and sinks. |
| 136 | +Typical call sequence for auto capture is shown below (using audio capture as an example): |
| 137 | + |
| 138 | +```mermaid |
| 139 | +sequenceDiagram |
| 140 | + participant App as Application |
| 141 | + participant AudioSrc as Audio Source |
| 142 | + participant Capture as ESP Capture |
| 143 | + participant Sink as Capture Sink |
| 144 | +
|
| 145 | + App->>AudioSrc: esp_capture_new_audio_dev_src(...) |
| 146 | + AudioSrc-->>App: audio_src handle |
| 147 | +
|
| 148 | + App->>Capture: esp_capture_open(&cfg, &capture) |
| 149 | + Note over App,Capture: cfg.audio_src = audio_src |
| 150 | +
|
| 151 | + App->>Capture: esp_capture_sink_setup(capture, 0, &sink_cfg, &sink) |
| 152 | +
|
| 153 | + App->>Sink: esp_capture_sink_enable(sink, ESP_CAPTURE_RUN_MODE_ALWAYS) |
| 154 | +
|
| 155 | + App->>Capture: esp_capture_start(capture) |
| 156 | +
|
| 157 | + loop Frame Processing |
| 158 | + App->>Sink: esp_capture_sink_acquire_frame(sink, &frame, false) |
| 159 | + App->>Sink: esp_capture_sink_release_frame(sink, &frame) |
| 160 | + end |
| 161 | +
|
| 162 | + App->>Capture: esp_capture_stop(capture) |
| 163 | +``` |
| 164 | + |
| 165 | +For detailed examples, see [audio_capture](examples/audio_capture/README.md) and [video_capture](examples/video_capture/README.md) |
| 166 | + |
| 167 | +## 🧩 Customizing Auto Pipelines |
| 168 | + |
| 169 | +1. Register Custom Elements |
| 170 | + |
| 171 | +```c |
| 172 | +esp_capture_register_element(capture, ESP_CAPTURE_STREAM_TYPE_AUDIO, proc_element); |
| 173 | +``` |
| 174 | +
|
| 175 | +2. Customize Pipeline Before Start |
| 176 | +
|
| 177 | +```c |
| 178 | +const char *elems[] = { "aud_ch_cvt", "aud_rate_cvt", "aud_enc" }; |
| 179 | +esp_capture_sink_build_pipeline(sink, ESP_CAPTURE_STREAM_TYPE_AUDIO, elems, 3); |
| 180 | +``` |
| 181 | + |
| 182 | +## 🤝 Auto-Negotiation |
| 183 | + |
| 184 | +### Audio |
| 185 | + |
| 186 | +- Automatically inserts elements like `aud_rate_cvt`, `aud_ch_cvt` on demand |
| 187 | +- Negotiates format based on encoder requirements |
| 188 | +- Elements are configured based on negotiation results |
| 189 | + |
| 190 | +Built-in: |
| 191 | + |
| 192 | +- `esp_capture_audio_pipeline_auto_negotiate` – Auto negotiate from audio source to multiple audio sinks |
| 193 | + |
| 194 | +### Video |
| 195 | + |
| 196 | +- Automatically inserts `vid_ppa`, `vid_fps_cvt` on demand |
| 197 | +- Prioritizes high-quality format |
| 198 | +- Negotiates source format based on encoder capabilities |
| 199 | + |
| 200 | +Built-in: |
| 201 | + |
| 202 | +- `esp_capture_video_pipeline_auto_negotiate` – Auto negotiate from video source to multiple video sinks |
| 203 | + |
| 204 | +### Fixed Negotiation for Sources |
| 205 | + |
| 206 | +In some cases, auto-negotiation for source format and information may not meet requirements. |
| 207 | +Audio sources and video sources support `set_fixed_caps` to fix source format settings and avoid negotiation failure cases. |
| 208 | + |
| 209 | +## ❌ When Auto-Negotiation Fails |
| 210 | + |
| 211 | +In complex pipelines, auto-negotiation may fail (e.g., redundant sample rate converter in one pipeline). Manual configuration is recommended. |
| 212 | + |
| 213 | +## 📦 Binary Size Optimization |
| 214 | + |
| 215 | +Unused elements are excluded unless registered. |
| 216 | + |
| 217 | +### Menuconfig Options |
| 218 | + |
| 219 | +Enable features only when needed: |
| 220 | +- `CONFIG_ESP_CAPTURE_ENABLE_AUDIO`: Enable audio support |
| 221 | +- `CONFIG_ESP_CAPTURE_ENABLE_VIDEO`: Enable video support |
| 222 | + |
| 223 | +### Optional Registrations |
| 224 | + |
| 225 | +- `mp4_muxer_register()` / `ts_muxer_register()` – on-demand muxers |
| 226 | +- `esp_audio_enc_register_default()` / `esp_video_enc_register_default()` – customize encoder usage via menuconfig |
| 227 | + |
| 228 | +## 🔧 Extending esp_capture |
| 229 | + |
| 230 | +You can extend `esp_capture` by: |
| 231 | + |
| 232 | +1. Adding a custom capture source |
| 233 | +2. Implementing a new muxer using `esp_muxer` |
| 234 | +3. Creating new encoders via `esp_audio_codec` / `esp_video_codec` |
0 commit comments