β‘ Lightning-fast β’ π οΈ Simple API β’ π Any Format β’ π Web-ready β’ π₯οΈ Cross-platform
MediaToolkit is a high-performance Python library for processing images, audio, and video with a unified, developer-friendly API. Built on FFmpeg (PyAV) and OpenCV for production-grade speed and reliability.
Perfect for: AI/ML pipelines, web services, batch processing, media automation, computer vision, and audio analysis.
pip install media-toolkit
Note: Audio/video processing requires FFmpeg. PyAV usually installs it automatically, but if needed, install manually from ffmpeg.org.
One API for all media types - load from files, URLs, bytes, base64, or numpy arrays:
from media_toolkit import ImageFile, AudioFile, VideoFile, media_from_any
# load any file and convert it to the correct format. This works with smart content detection
audio = media_from_any("media/my_favorite_song.mp3") # -> AudioFile
# Load from any source
image = ImageFile().from_any("https://example.com/image.jpg")
audio = AudioFile().from_file("audio.wav")
video = VideoFile().from_file("video.mp4")
imb = ImageFile().from_base64("data:image/png;base64,...")
# Convert to any format
image_array = image.to_np_array() # β numpy array (H, W, C)
audio_array = audio.to_np_array() # β numpy array (samples, channels)
image_base64 = image.to_base64() # β base64 string
video_bytes = video.to_bytes_io() # β BytesIO object
from media_toolkit import MediaList, AudioFile
# Process multiple files efficiently
audio_files = MediaList([
"song1.wav",
"https://example.com/song2.mp3",
b"raw_audio_bytes..."
])
for audio in audio_files:
audio.save(f"converted_{audio.file_name}.mp3") # Auto-convert on save
OpenCV-powered image operations:
from media_toolkit import ImageFile
import cv2
# Load and process
img = ImageFile().from_any("image.png")
image_array = img.to_np_array() # β (H, W, C) uint8 array
# Apply transformations
flipped = cv2.flip(image_array, 0)
# Save processed image
ImageFile().from_np_array(flipped).save("flipped.jpg")
FFmpeg/PyAV-powered audio operations:
from media_toolkit import AudioFile
# Load audio
audio = AudioFile().from_file("input.wav")
# Get numpy array for ML/analysis
audio_array = audio.to_np_array() # β (samples, channels) float32 in [-1, 1] range
# Inspect metadata
print(f"Sample rate: {audio.sample_rate} Hz; Channels: {audio.channels}; Duration: {audio.duration}")
# Format conversion (automatic re-encoding)
audio.save("output.mp3") # MP3
audio.save("output.flac") # FLAC (lossless)
audio.save("output.m4a") # AAC
# Create audio from numpy
new_audio = AudioFile().from_np_array(
audio_array,
sample_rate=audio.sample_rate,
audio_format="wav"
)
Supported formats: WAV, MP3, FLAC, AAC, M4A, OGG, Opus, WMA, AIFF
High-performance video operations:
from media_toolkit import VideoFile
import cv2
video = VideoFile().from_file("input.mp4")
# Extract audio track
audio = video.extract_audio("audio.mp3")
# Process frames
for i, frame in enumerate(video.to_stream()):
if i >= 300: # First 300 frames
break
# frame is numpy array (H, W, C)
processed = my_processing_function(frame)
cv2.imwrite(f"frame_{i:04d}.png", processed)
# Create video from images
images = [f"frame_{i:04d}.png" for i in range(300)]
modifiedVid = VideoFile().from_files(images, frame_rate=30, audio_file="audio.mp3")
Native FastTaskAPI Support
Built-in integration with FastTaskAPI for simplified file handling:
from fast_task_api import FastTaskAPI, ImageFile, VideoFile
app = FastTaskAPI()
@app.task_endpoint("/process")
def process_media(image: ImageFile, video: VideoFile) -> VideoFile:
# Automatic type conversion, validation
modified_video = my_ai_inference(image, video)
# any media can be returned automatically
return modified_video
from fastapi import FastAPI, UploadFile, File
from media_toolkit import ImageFile
app = FastAPI()
@app.post("/process-image")
async def process_image(file: UploadFile = File(...)):
image = ImageFile().from_any(file)
import httpx
from media_toolkit import ImageFile
image = ImageFile().from_file("photo.jpg")
# Send to API
files = {"file": image.to_httpx_send_able_tuple()}
response = httpx.post("https://api.example.com/upload", files=files)
MediaList - Type-safe batch processing:
from media_toolkit import MediaList, ImageFile
images = MediaList[ImageFile]()
images.extend(["img1.jpg", "img2.png", "https://example.com/img3.jpg"])
# Lazy loading - files loaded on access
for img in images:
img.save(f"processed_{img.file_name}")
MediaDict - Key-value media storage:
from media_toolkit import MediaDict, ImageFile
media_db = MediaDict()
media_db["profile"] = "profile.jpg"
media_db["banner"] = "https://example.com/banner.png"
# Export to JSON
json_data = media_db.to_json()
# Memory-efficient processing
audio = AudioFile().from_file("large_audio.wav")
for chunk in audio.to_stream():
process_chunk(chunk) # Process in chunks
video = VideoFile().from_file("large_video.mp4")
stream = video.to_stream()
for frame in stream:
process_frame(frame) # Frame-by-frame processing
# video-to-audio-stream
for av_frame in stream.audio_frames():
pass
MediaToolkit leverages industry-standard libraries for maximum performance:
- FFmpeg (PyAV): Professional-grade audio/video codec support
- OpenCV: Optimized computer vision operations
- Streaming: Memory-efficient processing of large files
- Hardware acceleration: GPU support where available
Benchmarks:
- Audio conversion: ~100x faster than librosa/pydub
- Image processing: Near-native OpenCV speed
- Video processing: Hardware-accelerated encoding/decoding. FPS > 300 for video decoding on consumer grade hardware.
β
Universal input: Files, URLs, bytes, base64, numpy arrays, bytesio, starlette upload files, soundfile
β
Automatic format detection: Smart content-type inference
β
Seamless conversion: Change formats on save
β
Type-safe: Full typing support with generics
β
Web-ready: Native FastTaskAPI integration, extra features for httpx and fastapi
β
Production-tested: Used in production AI/ML pipelines
We welcome contributions! Key areas:
- Performance optimizations
- New format support
- Documentation & examples
- Test coverage
- Platform-specific enhancements
MIT License - see LICENSE for details.
Join the intelligence revolution. Join socaity.ai