MemoMosaic Backend

A Node.js Express server that generates cinematic narratives and multimedia content from images and videos. Uses Google's Gemini API to intelligently create album or vlog scripts with automatic scene generation, narration, and TTS (Text-to-Speech) integration.

Features

📸 Asset Processing: Upload images and videos for analysis
🎬 Intelligent Narrative Generation: Uses Gemini AI to create contextual stories
🖼️ Collage Creation: Automatically groups and creates collages from media by location
🎙️ Voice Synthesis: Generates audio narration with PlayHT voice cloning (fallback to Google TTS)
🎨 Digital Annotations: Render facial annotations to HTML/PNG for genealogy context
🌍 Location Banners: Fetches relevant background images from Unsplash for each location
📤 Temporary File Storage: Uploads generated content to tmpfiles.org

Prerequisites

Node.js (v16+)
Multer for file uploads
Dependencies (see package.json)

Installation

npm install

Environment Variables

Create a .env file in the root directory:

PORT=3000
GEMINI_API_KEY=<your-google-gemini-api-key>
GEMINI_MODEL=gemini-1.5-pro
UNSPLASH_API_ACCESS_KEY=<your-unsplash-api-key>

Running the Server

npm start
# or
node index.js

The server will start on http://localhost:3000

API Endpoints

`GET /`

Health check endpoint.

Response:

{
  "message": "Welcome to the MemoMosaic Backend!"
}

`POST /create`

Generate a complete multimedia script from uploaded media files and annotation face images.

Request:

Method: POST
Content-Type: multipart/form-data
Files:
- assets (max 30 files): Media files (images/videos)
- annotationFaces (max 50 files): Face images for annotations
Fields:
- payload: JSON string containing metadata

Form Fields:

Media file metadata (optional, use if structured metadata needed):

- assets[0].type : "IMAGE" or "VIDEO"
- assets[0].location : Location string (e.g., "Paris")
- assets[0].creation_time : ISO timestamp or date string

Annotation face mapping (in payload):

{
  "annotations": [
    {
      "name": "John",
      "relation": "Father",
      "faceIndex": 0
    },
    {
      "name": "Jane",
      "relation": "Mother",
      "faceIndex": 1
    }
  ]
}

Payload Schema:

{
  "type": "album" or "vlog",
  "memorableMoments": "Optional string describing key moments",
  "playHTCred": {
    "userId": "PlayHT user ID",
    "secretKey": "PlayHT API secret key",
    "audio": "Base64-encoded sample audio for voice cloning",
    "gender": "male" or "female"
  },
  "annotations": [
    {
      "name": "Person name",
      "relation": "Relationship",
      "faceIndex": 0
    }
  ]
}

How it works:

Upload media files via assets field
Upload face images via annotationFaces field (images are indexed 0, 1, 2, ...)
In the payload.annotations array, reference face images using faceIndex
The server converts faceIndex to actual base64 face data before processing
Face images are cleaned up after processing

Response:

{
  "title": "Generated album/vlog title",
  "caption": "Short description",
  "hashtags": ["tag1", "tag2"],
  "scenes": [
    {
      "scene": "1",
      "narrative": "Scene narrative",
      "collage": "https://tmpfiles.org/...",
      "type": "IMAGE",
      "mimeType": "image/png",
      "location": "Paris",
      "background_image": "https://unsplash.com/...",
      "audio": "https://tmpfiles.org/..."
    }
  ]
}

File Handling

Upload: Files are saved to /tmp/uploads on disk
Processing: Files are read and converted to base64 for API processing
Asset Tracking: Each asset is assigned an index which is preserved through all transformations (collage creation, grouping, etc.)
Video URI Mapping: Video file URIs are mapped by asset index for reliable lookups regardless of media transformations
Generation: Collages and audio are generated and uploaded to tmpfiles.org
Cleanup: Temporary files are automatically deleted after processing

Internal Processing

Asset Index Tracking

The system tracks assets by their original index throughout the entire processing pipeline:

Initial indexing: Assets receive an assetIndex property preserving their upload order
Grouping: Assets are grouped by location and type, but retain their original index
Video URI mapping: Video Gemini URIs are stored in a map keyed by asset index (videoUriMap[assetIndex])
Collage generation: When videos are included in collages, their URIs are retrieved using the preserved asset index
Scene generation: Each scene correctly references the appropriate video URI through the index

This index-based approach ensures:

✅ Efficient lookups without string-based key matching
✅ Robust URI resolution through grouping and sorting transformations
✅ No data loss during collage creation or media grouping

Key Dependencies

@google/generative-ai: Gemini API for AI-powered narratives
multer: File upload middleware
express: Web framework
puppeteer: HTML to image rendering for annotations
playht: Voice cloning and text-to-speech
@wylie39/image-collage: Collage generation from images
unsplash-js: Fetching location banner images
ejs: Template rendering for annotations

Error Handling

The server includes comprehensive error handling:

Failed file uploads are cleaned up automatically
Collage upload failures fall back to base64 responses
TTS generation falls back from PlayHT to Google TTS if needed
All errors are logged to console with descriptive messages

Usage Example

See the API endpoints section for detailed payload examples.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
views		views
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemoMosaic Backend

Features

Prerequisites

Installation

Environment Variables

Running the Server

API Endpoints

`GET /`

`POST /create`

File Handling

Internal Processing

Asset Index Tracking

Key Dependencies

Error Handling

Usage Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemoMosaic Backend

Features

Prerequisites

Installation

Environment Variables

Running the Server

API Endpoints

GET /

POST /create

File Handling

Internal Processing

Asset Index Tracking

Key Dependencies

Error Handling

Usage Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`POST /create`

Packages