Skip to content

Latest commit

 

History

History
474 lines (363 loc) · 12.7 KB

File metadata and controls

474 lines (363 loc) · 12.7 KB

🔬 Technical Documentation - Vocal Mixer Pro

In-depth technical documentation for audio engineers and developers.

🎚️ Audio Processing Architecture

Overview

The application uses Tone.js (a Web Audio API wrapper) to implement a professional audio processing chain. The key innovation is using Tone.Offline for faster-than-real-time rendering, allowing users to download processed files without waiting for playback duration.

Processing Flow

Input File (WAV/MP3)
    ↓
Decode to AudioBuffer
    ↓
Tone.Offline Rendering Context
    ↓
High-Pass Filter (30Hz)
    ↓
Multiband Compressor (Dynamic Rumble Control)
    ↓
Limiter (-0.5dB)
    ↓
Rendered AudioBuffer
    ↓
Post-Process Normalization (-1.0dB)
    ↓
Convert to WAV Blob
    ↓
Download / Playback

🎛️ Processing Chain Details

1. High-Pass Filter

Purpose: Remove imperceptible sub-bass frequencies that waste headroom and cause muddiness.

Implementation:

const highPass = new Tone.Filter({
  type: 'highpass',
  frequency: 30,      // Hz
  rolloff: -24,       // dB/octave
});

Parameters:

  • Frequency: 30Hz (below human hearing threshold for most systems)
  • Rolloff: -24dB/octave (aggressive rolloff for clean removal)
  • Q Factor: Default (Butterworth response)

Effect: Removes frequencies below 30Hz with steep slope, preventing sub-bass from taking up valuable headroom in the final mix.

2. Multiband Compressor (Dynamic Rumble Control)

Purpose: Apply heavy compression only to low frequencies while leaving vocal frequencies transparent.

Implementation:

const multiband = new Tone.MultibandCompressor({
  low: {
    threshold: -24,    // dB
    ratio: 4,          // 4:1
    attack: 0.03,      // seconds
    release: 0.25,     // seconds
  },
  mid: {
    threshold: -6,
    ratio: 1.5,
    attack: 0.02,
    release: 0.15,
  },
  high: {
    threshold: -6,
    ratio: 1.5,
    attack: 0.02,
    release: 0.15,
  },
  lowFrequency: 60,       // Hz
  highFrequency: 10000,   // Hz
});

Band Split Points:

  • Low Band: DC to 60Hz (heavy compression)
  • Mid Band: 60Hz to 10kHz (light compression for presence)
  • High Band: 10kHz to Nyquist (light compression for air)

Low Band Parameters (Critical for rumble control):

  • Threshold: -24dB (catches even quiet rumble)
  • Ratio: 4:1 (significant reduction)
  • Attack: 30ms (fast enough to catch transients)
  • Release: 250ms (smooth release to avoid pumping)

Why This Works: This simulates a "dynamic EQ" by compressing only the problematic frequency range when it gets too loud, rather than applying a static EQ cut that would affect all content equally.

3. Safety Limiter

Purpose: Catch peaks and prevent clipping after processing.

Implementation:

const limiter = new Tone.Limiter(-0.5); // dB

Parameters:

  • Threshold: -0.5dB FS (0.5dB of headroom)
  • Look-Ahead: Automatic (built into Tone.Limiter)
  • Attack/Release: Automatic (optimized for transparency)

Effect: Transparent limiting that prevents any sample from exceeding -0.5dB, ensuring the file never clips.

4. Post-Process Normalization

Purpose: Auto-leveling to ensure consistent loudness across different source files.

Implementation:

const normalizeAudioBuffer = (buffer) => {
  const numberOfChannels = buffer.numberOfChannels;
  const length = buffer.length;
  let maxPeak = 0;

  // Find absolute highest peak
  for (let channel = 0; channel < numberOfChannels; channel++) {
    const channelData = buffer.getChannelData(channel);
    for (let i = 0; i < length; i++) {
      const absSample = Math.abs(channelData[i]);
      if (absSample > maxPeak) {
        maxPeak = absSample;
      }
    }
  }

  // Calculate multiplier to reach -1.0 dB
  const targetAmplitude = 0.89; // -1.0 dB = 20*log10(0.89)
  const multiplier = maxPeak > 0 ? targetAmplitude / maxPeak : 1;

  // Apply multiplier
  for (let channel = 0; channel < numberOfChannels; channel++) {
    const channelData = buffer.getChannelData(channel);
    for (let i = 0; i < length; i++) {
      channelData[i] *= multiplier;
    }
  }

  return buffer;
};

Target Level: -1.0 dB FS (0.89 linear amplitude)

Why -1.0 dB?:

  • Safe headroom for codec encoding (MP3/AAC)
  • Meets YouTube/Instagram loudness standards
  • Prevents inter-sample peaks in lossy codecs
  • Industry standard for mastering

Algorithm:

  1. Scan entire buffer to find highest peak
  2. Calculate scaling factor: target / currentPeak
  3. Multiply every sample by this factor
  4. Result: Peak of the loudest sample = -1.0 dB

🚀 Performance Optimization

Tone.Offline Rendering

Key Feature: Renders audio faster than real-time.

Implementation:

const renderedBuffer = await Tone.Offline(async ({ transport }) => {
  // Set up processing chain
  player.chain(highPass, multiband, limiter, Tone.Destination);
  player.start(0);
}, duration);

Performance:

  • Real-time: 3-minute song = 3 minutes to process
  • Tone.Offline: 3-minute song = ~60-90 seconds to process
  • Speedup: ~2-3x faster than real-time on average hardware

Why It's Fast:

  • No visualization rendering
  • No UI updates during processing
  • Optimized buffer operations
  • Single-threaded but efficient Web Audio processing

Memory Management

File Size Limits:

  • Practical limit: ~100MB audio files
  • Theoretical limit: Browser memory (typically 2-4GB)

Memory Usage:

  • Input file: Loaded once into ArrayBuffer
  • Decoded audio: Float32Array per channel
  • Processing: Minimal additional memory (in-place where possible)
  • Output: New AudioBuffer (same size as input)

Cleanup:

audioContext.close();  // Free Web Audio resources
URL.revokeObjectURL(processedAudioUrl);  // Free blob URL

📊 Audio Quality Specifications

Input Support

Formats:

  • WAV (PCM, 16/24/32-bit)
  • MP3 (CBR/VBR, any bitrate)
  • Other formats supported by browser (FLAC, OGG, etc.)

Sample Rates:

  • Common: 44.1kHz, 48kHz
  • High-res: 88.2kHz, 96kHz, 192kHz
  • Automatically preserved (no resampling)

Output Specifications

Format: WAV (PCM)

  • Bit Depth: 16-bit signed integer
  • Sample Rate: Same as input (no resampling)
  • Channels: Same as input (mono/stereo)
  • Byte Order: Little-endian (standard WAV)

WAV Header Implementation:

// RIFF Header
writeString(0, 'RIFF');
view.setUint32(4, 36 + length, true);  // File size
writeString(8, 'WAVE');

// Format Chunk
writeString(12, 'fmt ');
view.setUint32(16, 16, true);          // Format chunk size
view.setUint16(20, 1, true);           // PCM format
view.setUint16(22, numberOfChannels, true);
view.setUint32(24, sampleRate, true);
view.setUint32(28, sampleRate * numberOfChannels * 2, true);
view.setUint16(32, numberOfChannels * 2, true);
view.setUint16(34, 16, true);          // 16-bit

// Data Chunk
writeString(36, 'data');
view.setUint32(40, length, true);

🎨 UI/UX Architecture

State Management

States:

'idle'       // No file loaded or ready to process
'processing' // Audio is being processed
'ready'      // Processed audio ready for playback/download
'playing'    // Audio is currently playing

State Flow:

idle → [Upload File] → idle
idle → [Process] → processing → ready
ready → [Play] → playing → [End/Pause] → ready
ready → [Upload New] → idle

Progress Tracking

Milestones:

  • 10%: File loaded
  • 30%: Audio decoded
  • 40%: Offline rendering started
  • 70%: Offline rendering complete
  • 90%: Normalization complete
  • 100%: WAV conversion complete

Note: Tone.Offline doesn't provide granular progress, so we use estimated milestones.

🔧 Customization Guide

Adjusting Compression Characteristics

More Aggressive Rumble Control:

low: {
  threshold: -30,  // Lower threshold
  ratio: 6,        // Higher ratio
  attack: 0.02,    // Faster attack
}

Gentler Processing:

low: {
  threshold: -18,  // Higher threshold
  ratio: 2,        // Lower ratio
  release: 0.5,    // Slower release
}

Changing Normalization Target

// Current: -1.0 dB (0.89)
const targetAmplitude = 0.89;

// -0.5 dB (Louder, for streaming)
const targetAmplitude = 0.944;

// -2.0 dB (Quieter, more conservative)
const targetAmplitude = 0.794;

// -3.0 dB (Broadcast standard)
const targetAmplitude = 0.708;

Adding LUFS Metering (Advanced)

For true loudness metering (LUFS), you would need to:

  1. Implement BS.1770 K-weighting filter
  2. Calculate integrated loudness over entire file
  3. Normalize to target LUFS (e.g., -14 LUFS for streaming)

Note: This is complex and beyond the scope of this implementation, but possible with Web Audio API.

🧪 Testing Recommendations

Test Files

  1. Quiet vocal: Test normalization (should boost significantly)
  2. Loud vocal: Test limiter (should catch peaks)
  3. Bassy vocal: Test multiband compression (should tame lows)
  4. Clean vocal: Test transparency (should sound natural)

Quality Checks

  1. No clipping: Check waveform peaks in DAW (should be at -1.0 dB)
  2. No pumping: Listen for artifacts in sustained notes
  3. Low-end control: Compare low frequencies before/after
  4. Transparency: A/B test against unprocessed (vocal should sound natural)

Performance Testing

  1. Small file (30 seconds): Should process in ~10-15 seconds
  2. Medium file (3 minutes): Should process in ~60-90 seconds
  3. Large file (10 minutes): Should process in ~3-5 minutes

📚 References

Audio Engineering Concepts

  • High-Pass Filtering: Remove subsonic frequencies
  • Dynamic Range Compression: Reduce level differences
  • Multiband Processing: Frequency-selective processing
  • Limiting: Prevent clipping with transparent gain reduction
  • Normalization: Standardize peak levels

Web Audio API

  • AudioContext: Main audio processing context
  • AudioBuffer: Container for decoded audio
  • AudioNode: Building block of processing chain
  • OfflineAudioContext: Non-real-time rendering

Tone.js Documentation

🎓 Learning Resources

Recommended Reading

  1. "Mixing Secrets" by Mike Senior: Comprehensive mixing guide
  2. "Mastering Audio" by Bob Katz: Industry standard reference
  3. Web Audio API Spec: W3C documentation

Related Technologies

  • Web Audio API: Native browser audio processing
  • Tone.js: High-level Web Audio wrapper
  • AudioWorklet: Custom DSP in Web Audio (advanced)
  • WebAssembly: High-performance audio processing

🚧 Future Enhancements

Potential Features

  1. Real-time preview: Process and preview while uploading
  2. Batch processing: Process multiple files at once
  3. Custom presets: Save/load processing settings
  4. A/B comparison: Toggle between original and processed
  5. Spectrum analyzer: Visual feedback during processing
  6. LUFS metering: True loudness normalization
  7. Export formats: MP3, OGG, FLAC export options
  8. Cloud processing: Offload to server for faster processing

Technical Improvements

  1. AudioWorklet: Move processing to separate thread
  2. Web Workers: Parallelize normalization/WAV conversion
  3. Progressive loading: Stream large files instead of loading entirely
  4. Waveform visualization: Canvas-based waveform display
  5. Undo/Redo: Processing history and rollback

💡 Tips for Audio Engineers

When to Use This Tool

Good for:

  • Podcast vocals
  • YouTube voiceovers
  • Social media content
  • Quick vocal cleanup
  • Batch processing similar content

Not ideal for:

  • Music mixing (too aggressive)
  • Mastering (needs more control)
  • Broadcast (requires LUFS compliance)
  • Live performance (needs real-time processing)

Comparison to Professional Tools

Similar to:

  • iZotope RX's Voice De-noise
  • Adobe Podcast Enhance
  • Descript Studio Sound
  • Auphonic (but simpler)

Advantages:

  • Free and open-source
  • Runs entirely in browser
  • Instant (no upload/download)
  • Customizable code

Limitations:

  • Less sophisticated algorithms
  • No AI/ML enhancement
  • Limited format support
  • No spectral editing

📄 License & Attribution

License: MIT

Dependencies:

  • React: MIT
  • Tone.js: MIT
  • Lucide React: ISC
  • Tailwind CSS: MIT

Credits:

  • Tone.js by Yotam Mann
  • Web Audio API by W3C
  • Audio engineering principles: Industry standard practices

Built with ❤️ for audio engineers and content creators