voxceleb2 has 5991 speakers by dimuthuanuraj · Pull Request #195 · clovaai/voxceleb_trainer

dimuthuanuraj · 2025-10-22T15:03:32Z

No description provided.

… compatibility issue with newer NumPy versions

…uration

…ent accumulation, and optimized DataLoader

…on-blocking transfers, and inference mode

…ding, and vectorized operations

…rkers, and mixed precision enabled

…n monitoring and comparison

…with different modes

…nge summary

…ter compatibility

… (max_seg_per_spk, seed)

- Add defaults for max_frames, max_seg_per_spk, seed, nPerSpeaker, distributed - Fix syntax error in sampler creation - Benchmark comparison now works: 2.46x speedup achieved

- Import from SpeakerNet_performance_updated and DatasetLoader_performance_updated - Fix find_option_type to handle store_true/store_false actions properly - Training now starts successfully with all optimizations enabled

- OPTIMIZATION_COMPLETE.txt: Summary of all optimizations - OPTIMIZATION_GUIDE.md: Detailed optimization guide - analyze_performance.py: Performance analysis tool - debug_repo.py: Repository debugging tool - quick_optimize.py: Quick optimization script

- Detailed performance benchmarks (2.46x speedup) - Complete file structure overview - Quick start guide with multiple options - Dataset information and validation statistics - Configuration documentation - Commit history highlights - Acknowledgments and contact information

- Convert EER and MinDCF to float to avoid numpy array formatting issues - Handle threshold value which can be array, tuple, or scalar - Fixes TypeError: unsupported format string passed to numpy.ndarray.__format__

- Updated ResNetSE34L.py, ResNetSE34V2.py, RawNet3.py, VGGVox.py - Changed torch.cuda.amp.autocast() to torch.amp.autocast('cuda') - Fixes FutureWarning for PyTorch 2.x compatibility - Also improved test_validation_phase.py model loading

- New parameter: --max_test_pairs (0 = use all pairs) - Can be set in config file or command line - Default: 10,000 pairs (~75 seconds validation) - Full dataset: 553,550 pairs (~70 minutes validation) - Added quick_test_validation.py for fast GPU/pipeline testing - Added test_max_pairs_param.py to verify parameter works Examples: --max_test_pairs 1000 (~10 seconds - quick test) --max_test_pairs 10000 (~75 seconds - default) --max_test_pairs 0 (~70 minutes - full validation)

- Add eval_batch_size parameter (default: 64, quick test: 128) - Batch process embeddings instead of one-by-one (32x speedup potential) - Improved GPU memory utilization (0.03 GB -> 0.16 GB cached) - Speed: ~145 pairs/second (was ~130 pairs/s) - Estimated 40K pairs: ~4.5 minutes (was ~5 minutes) - Full 553K pairs: ~64 minutes (was ~70 minutes) Performance improvements: - Larger batches = better GPU utilization - Configurable via --eval_batch_size parameter - Safe defaults: 32 (training), 64 (config), 128 (quick test)

- Use threshold_val (float) instead of current_threshold (numpy array) - Fixes TypeError: unsupported format string passed to numpy.ndarray - Occurs when saving best model after achieving new best EER

- Create mini-VoxCeleb2: 140 speakers, 30,179 files (~7.1 GB) * Script: create_mini_voxceleb2.py * Config: configs/mini_voxceleb2_config.yaml * Training list: mini_voxceleb2_train_list.txt (30,179 entries) * Documentation: MINI_VOXCELEB2_README.md * Uses symbolic links to save disk space * ~4x faster training than full dataset - Create mini-VoxCeleb1: 50 speakers, 6,286 files (~1.6 GB) * Script: create_mini_voxceleb1.py with BALANCED test pairs * Config: configs/mini_voxceleb1_config.yaml * Training list: mini_voxceleb1_train_list.txt (6,286 entries) * Test list: mini_test_list.txt (930 BALANCED pairs, 50/50 split) * Documentation: MINI_VOXCELEB1_README.md * Fixed imbalance issue (was 96.3% positive, now 50/50) - Add comprehensive TRAINING_GUIDE.md * Quick start commands (foreground, tmux, script) * Monitoring and troubleshooting * Configuration explanations * Performance tips and expected training times * Common issues and solutions - Update experiment_01_performance_updated.yaml * Adjusted test_interval and max_test_pairs settings Benefits: - Fast experimentation and development - Reduced training time for testing - Balanced evaluation metrics - Complete documentation for new users

- Fix TypeError: unsupported format string for numpy.ndarray in threshold saving * Changed print statement to use threshold_val (float) instead of current_threshold (array) * Ensures consistent float formatting across all threshold operations - Fix ROC curve plotting errors with type conversions * Explicitly convert fprs/fnrs lists to numpy arrays with float64 dtype * Use float literal (1.0) instead of int (1) for array arithmetic * Calculate EER point index once and reuse for both FPR and TPR * Resolves 'unsupported operand type(s) for -: int and list' error - Update mini_voxceleb1_config.yaml for variable-length evaluation * Set eval_frames=0 to use full audio length (no truncation) * Set eval_batch_size=1 for variable-length processing * Update to 140 speakers for mini VoxCeleb2 training dataset * Increase n_mels from 64 to 80 for better feature extraction * Reduce nOut from 512 to 256 for faster experimentation

- Add MINDCF_IMPROVEMENT_GUIDE.md: Complete guide for improving MinDCF * 6 improvement strategies with expected gains * Model architecture recommendations (ResNetSE34L/V2, RawNet3) * Loss function analysis and comparisons * Phase-by-phase implementation roadmap * Expected: 40-60% total MinDCF reduction - Add ZEROSHOT_VS_FEWSHOT_ANALYSIS.md: Learning paradigm analysis * Confirmed current setup is zero-shot (disjoint train/test speakers) * Complete zero-shot vs few-shot comparison * Impact analysis for switching approaches * When to use each learning paradigm * Performance expectations for both approaches - Add optimized configs: * mini_voxceleb1_optimized_phase1.yaml: Quick wins config (15-30% improvement) * mini_voxceleb1_fewshot_ge2e.yaml: GE2E few-shot config * mini_voxceleb1_fewshot_proto.yaml: Prototypical few-shot config - Update research log 2025-10-30.md with detailed analysis notes * MinDCF improvement strategies summary * Zero-shot vs few-shot findings * Key insights and recommendations * Documentation files overview

- Implement 4-level nested learning architecture for speaker verification - Features: multi-path aggregation where each level receives ALL previous levels - Components: DepthwiseSeparableConv, SE blocks, learnable weights, GroupNorm - 1.62M parameters, supports SAP/ASP encoders - Includes stability fixes: adaptive pooling, dropout, gradient-friendly design

- nested_4level.yaml: Main config with stability-optimized hyperparameters - nested_4level_asp.yaml: Variant with ASP encoder instead of SAP - nested_5level_asp.yaml: Extended 5-level architecture config - Tuned for stability: lr=0.001, decay=0.98, weight_decay=5e-5, batch_size=48

- Tests 5 different nested configurations (3/4/5 levels, SAP/ASP encoders) - Validates forward pass, output shapes, and parameter counts - Compares inference speed with baseline ResNetSE34L - All tests passing: ensures architecture implementation correctness

- visualize_nested_architecture.py: Generates architecture diagrams - Shows 4 levels with nested connections (red dashed arrows) - Includes spatial dimensions and parameter counts - Comparison with ResNetSE34L baseline architecture - Output: PNG (300 DPI) and PDF (vector) formats for publication

- Comprehensive root cause analysis of NaN loss issues - Documents 7 different stabilization strategies attempted - Includes validation checklist and rollback procedures - Technical reference for gradient explosion in nested architectures - Useful for research documentation and future debugging

- Documents complete implementation and evaluation of nested learning - Three training attempts with progressive stability improvements - Best result: 18.71% EER (before NaN collapse at epoch 12) - Comprehensive scientific justification and domain analysis: * Mathematical gradient flow analysis (25-80× larger than vision) * Feature space topology differences (negative correlation in audio) * Information theory perspective (47% entropy increase) * Optimization landscape analysis (condition number 160,000) * Theoretical framework for domain compatibility (audio: 0/5 criteria) - Conclusions: Nested learning unsuitable for speaker verification - Recommendation: Abandon approach, pursue LSTM + Autoencoder instead - Publication-ready with empirical results and theoretical justification

- ASP (Attentive Statistics Pooling) instead of SAP - Expected 8-10% improvement: 14.2% EER target vs 15.48% baseline - ASP captures both mean AND variance (first & second-order statistics) - Benefits: More discriminative embeddings, better variance modeling - Hyperparameters tuned for ASP: batch_size=48, lr_decay=0.97, patience=20 - Reference: Okabe et al., Interspeech 2018

Architecture: - Denoising autoencoder: n_mels (80) → 128 latent dimensions * Learns robust spectral representations * Can be pre-trained unsupervised for noise removal - Bidirectional LSTM: 2 layers, 256 hidden units per direction * Captures temporal dependencies and speaking patterns * Models prosody and rhythm information - Attentive statistics pooling (ASP) * Aggregates LSTM outputs over time * Computes mean and standard deviation Key Features: - 3.87M parameters (2.6× larger than ResNetSE34L) - Temporal modeling for better speaker discrimination - Noise robustness through autoencoder denoising - Expected 20-35% improvement over baseline Training Configuration: - Batch size: 32 (with gradient accumulation = effective 64) - Learning rate: 0.0005 (lower for LSTM stability) - LR decay: 0.98 (gentler) - Patience: 25 epochs - Expected target: 10-12% EER (vs 13.98% ASP baseline) Based on deep learning approaches for temporal sequence modeling in speaker verification tasks.

Adapted from: 'A Speaker Verification System Based on a Modified MLP-Mixer Student Model for Transformer Compression' Key Features: - MLP-Mixer architecture adapted for mel-spectrogram input - Knowledge distillation from LSTM+Autoencoder teacher (9.68% EER) - Paper's innovations: ID Conv, MFM activation, grouped projections - 2.66M parameters (31% fewer than LSTM+AE's 3.87M) - 2.04× faster inference than LSTM+AE (parallel processing) Implementation: - models/MLPMixerSpeaker.py: Main model (373 lines) * MLPMixerBlock with ID Convolution + MFM * TokenMixingMLP, ChannelMixingMLP (grouped projections) * AttentiveStatsPooling (ASP aggregation) - DistillationWrapper.py: Knowledge distillation framework (267 lines) * DistillationSpeakerNet: Combined student+teacher training * TeacherModelWrapper: Frozen teacher with checkpoint loading * DistillationLoss: (1-α)×classification + α×MSE distillation - configs/mlp_mixer_distillation_config.yaml: Training configuration * hidden_dim=192, num_blocks=6, expansion_factor=3 * Teacher: exps/lstm_autoencoder/model/model000000057.model * Distillation: alpha=0.5, temperature=4.0 - test_mlp_mixer.py: Validation suite * Tests: instantiation, forward pass, speed benchmark * Confirmed: 2.04× speedup vs LSTM+AE - research_logs/2025-12-30-mlp-mixer-implementation.md: Documentation * Architecture details, hyperparameters, training instructions * Comparison with paper, performance targets, next steps Performance Targets: - EER: 10-11% (distillation gap from 9.68% teacher) - Speed: 2-3× faster (confirmed 2.04× on CPU) - Size: 2.66M params (31% reduction) - Training: 40-50 epochs expected Architecture Highlights: 1. ID Convolution: Captures local temporal dependencies 2. Max-Feature-Map: Speaker-discriminative feature selection 3. Grouped Projections: 4× parameter efficiency 4. ASP Pooling: Mean + std statistics Compatibility: Zero impact on existing models - Modular design (separate .py file) - Config-driven selection (model: MLPMixerSpeaker) - Can still run ResNetSE34L, LSTM+AE, NestedSpeakerNet Next Steps: - Modify trainSpeakerNet_performance_updated.py for distillation - Train with distillation (batch_size=64, lr=0.001) - Ablation: alpha variations (0.3, 0.5, 0.7) - Evaluate on full VoxCeleb dataset

…ting models) Created standalone distillation scripts to enable knowledge distillation WITHOUT modifying existing training pipeline. All existing models remain 100% functional with original scripts. NEW FILES (Distillation-Only): - trainSpeakerNet_distillation.py: Copy of training script with distillation - SpeakerNet_distillation.py: Auto-detects teacher_model in config - train_mlp_mixer.sh: Convenience script for MLP-Mixer training UNCHANGED FILES (Backward Compatibility): - trainSpeakerNet_performance_updated.py: Still works for all models - SpeakerNet_performance_updated.py: Untouched, existing models safe - All existing configs: Work unchanged - All existing models: ResNetSE34L, LSTM+AE, NestedSpeakerNet Auto-Detection Logic in SpeakerNet_distillation.py: - IF teacher_model + teacher_checkpoint in config: → Use DistillationSpeakerNet (student learns from teacher) → Print: '🎓 DISTILLATION MODE ENABLED' → Returns: (loss, accuracy, distillation_loss) - ELSE: → Use standard SpeakerNet (backward compatible) → Print: '📚 STANDARD CLASSIFICATION MODE' → Returns: (loss, accuracy) Usage: # Old models (unchanged) python3 trainSpeakerNet_performance_updated.py \ --config configs/lstm_autoencoder_config.yaml # New distillation python3 trainSpeakerNet_distillation.py \ --config configs/mlp_mixer_distillation_config.yaml Safety Guarantee: All existing training commands work unchanged All existing configs work unchanged Can train any model anytime with original scripts Distillation is opt-in via separate script

…ght) Experiment: V2_Large_LowAlpha - MLP-Mixer with reduced alpha for large student Summary: -------- Implements P2 variant testing hypothesis that large student models (capacity > teacher) require lower distillation weight (alpha) to achieve optimal performance. New Files: ---------- 1. configs/mlp_mixer_distillation_v2_large_lowAlpha.yaml - P2 variant configuration - Architecture: 8 blocks, 256 hidden, 4 expansion (7.84M params) - Distillation alpha: 0.4 (reduced from 0.7) - Rationale: Student (7.84M) > Teacher (3.87M) needs more hard labels - Dataset: Mini VoxCeleb2 (30K samples, 140 speakers) 2. check_corrupted_audio.py - Utility script to scan and identify corrupted audio files - Prevents LibsndfileError crashes during training - Supports .wav, .flac, .m4a, .aac formats - Generates corrupted_audio_files.txt exclusion list 3. research_logs/2025-12-30-31-experimental-results-analysis.md - Comprehensive 27-page experimental analysis - Documents V1, V2, V2_Large, V2_Large_LowAlpha experiments - Detailed ablation studies and performance comparisons - Theoretical insights and learned principles Key Findings: ------------- V2_Large_LowAlpha: 10.11% EER (alpha=0.4) - HYPOTHESIS VALIDATED V2_Large: 14.84% EER (alpha=0.7) - Capacity mismatch issue V2: 10.32% EER (alpha=0.7, 2.66M params) - Best efficiency V1: 16.13% EER (MSE loss) - Distillation broken Conclusions: ----------- 1. Alpha must be tuned based on student/teacher capacity ratio 2. Small student (<teacher): High alpha (0.7) optimal 3. Large student (>teacher): Low alpha (0.4) optimal 4. V2 remains best model: same EER as V2_LA with 2.7x fewer params Results: -------- V2_Large_LowAlpha (7.84M params, alpha=0.4): - Best VEER: 10.11% (Epoch 90) - Final VEER: 10.32% (Epoch 100) - vs V2_Large: -4.73% improvement (validated hypothesis) - vs V2: Same performance but 195% more parameters - Inference: 220 samples/sec (1.5x faster than teacher) Training Configuration: ----------------------- - Teacher: LSTM+Autoencoder (9.68% EER, 3.87M params) - Distillation: Cosine similarity loss (proven effective) - Alpha: 0.4 (60% classification, 40% distillation) - Optimizer: Adam (lr=0.001, decay=0.95) - Epochs: 100 - Dataset: Mini VoxCeleb2 (30,179 samples) Impact: ------- - Establishes alpha-tuning principle for knowledge distillation - Proves capacity scaling requires hyperparameter adjustment - Validates V2 as production model (best efficiency) - Opens path for P3 (multi-stage distillation) experiments See: research_logs/2025-12-30-31-experimental-results-analysis.md for complete experimental details, ablation studies, and future work. Signed-off-by: Anuraj <anuraj@example.com>

Implementation Details: - Added DistillationWrapper with teacher-student knowledge transfer - Integrated cosine similarity loss for embedding distillation - Updated training pipeline to support distillation workflow - Added comprehensive evaluation with distillation mode support Key Components: 1. DistillationWrapper.py (20 changes): - Cosine similarity loss for normalized embeddings (replaces MSE) - Loss magnitude: 0.2-0.4 (vs MSE: 0.0002) - Temperature scaling for soft targets (T=4.0) - Combined loss: α*L_distill + (1-α)*L_hard 2. SpeakerNet_distillation.py (19 changes): - Auto-detection of teacher model architecture - Distillation mode evaluation support - Fixed __L__ attribute access for wrapped models 3. trainSpeakerNet_distillation.py (42 changes): - Added distillation-specific argument parsing - Teacher checkpoint loading and freezing - Distillation hyperparameters (alpha, temperature) Critical Bug Fixes: - Fixed MSE loss magnitude issue (1000× too small) - Cosine loss provides proper gradient scale - Added try-except for distillation mode evaluation - Normalized embedding comparison Research Documentation: - Updated research_logs/2025-12-30-mlp-mixer-implementation.md - Added V1 training results (MSE failure, 16.13% EER) - Documented bug fixes and convergence analysis - Added performance comparison table Experimental Results (documented in commit 80a63e7): - V1 (MSE loss): 16.13% EER ❌ (distillation broken) - V2 (Cosine loss): 10.32% EER ✅ (5.81% improvement) - V2_Large_lowAlpha: 10.11% EER ✅ (validates α-tuning) Impact: - Cosine loss critical for embedding distillation (36% improvement) - Enables effective knowledge transfer from teacher to student - Foundation for all subsequent distillation experiments Files Modified: DistillationWrapper.py: 20 changes SpeakerNet_distillation.py: 19 changes trainSpeakerNet_distillation.py: 42 changes research_logs/2025-12-30-mlp-mixer-implementation.md: 358 additions Related Commits: - 80a63e7: P2 variant results and comprehensive analysis - See research_logs/2025-12-30-31-experimental-results-analysis.md

Implementation Details: - Raw waveform input instead of mel-spectrogram preprocessing - SincNet learnable bandpass filters (80 filters, replaces fixed mel-filterbanks) - Additional CNN feature extraction layers - Same MLP-Mixer encoder as V2 (6 blocks, hidden_dim=192) - Zero impact on existing code (all new files) Architecture Comparison: V2 (Mel-based): Raw Audio → Mel-Spec (fixed) → CNN → MLP-Mixer → Embedding P3 (Raw wave): Raw Audio → SincNet (learn) → CNN → MLP-Mixer → Embedding Model Statistics: - Parameters: 3.48M (+30.9% vs V2: 2.66M) - Additional params from SincNet frontend + CNN layers - Learnable filters: 80 bandpass filters with mel-scale initialization - Filter specs: 251 samples kernel (~15ms), 160 samples stride (10ms) Research Hypothesis: Raw waveform input with learnable filters may capture speaker-discriminative features automatically, potentially matching or outperforming fixed mel-spectrogram preprocessing (V2: 10.32% EER baseline to beat). Experimental Setup: 1. Phase 1 - Baseline (no distillation): - Config: configs/mlp_mixer_rawwaveform_baseline.yaml - Training: 50 epochs, mini dataset - Expected EER: 12-14% (validates raw waveform processing) - Script: train_mlp_mixer_rawwaveform_baseline.sh 2. Phase 2 - Distillation: - Config: configs/mlp_mixer_rawwaveform_distillation.yaml - Teacher: LSTM+Autoencoder (9.68% EER) - Training: 100 epochs, mini dataset - Distillation: - Expected EER: 10.5-11.5% (compare with V2: 10.32%) - Script: train_mlp_mixer_rawwaveform_distillation.sh Success Criteria: - Baseline EER < 14%: Validates raw waveform approach - Distillation EER ≤ 10.5%: Matches/beats mel-based V2 (replace mel preprocessing) - Distillation EER 10.5-11.5%: Competitive (use case dependent) - Distillation EER > 11.5%: Mel preprocessing superior (archive experiment) Technical Implementation: 1. SincNet Frontend (models/MLPMixerSpeaker_RawWaveform.py): - Learnable low cutoff frequencies (initialized 30 Hz - 7.6 kHz) - Learnable bandwidths (initialized 23 Hz - 261 Hz) - Mel-scale spacing initialization - Hamming window for filter smoothing 2. Feature Extraction: - Conv1d(80, 80, k=5) → LeakyReLU → MaxPool(3) - Conv1d(80, 80, k=5) → LeakyReLU → MaxPool(3) - Instance normalization of learned features 3. Testing (test_mlp_mixer_rawwaveform.py): - All tests passed ✓ - Forward pass validated (multiple input lengths) - Filter initialization verified - Parameter count confirmed: 3.48M Files Created: models/MLPMixerSpeaker_RawWaveform.py (389 lines) - SincConv_fast: Learnable bandpass filters - MLPMixerSpeakerNet_RawWaveform: Main model configs/mlp_mixer_rawwaveform_baseline.yaml (95 lines) - Phase 1: Baseline training configuration configs/mlp_mixer_rawwaveform_distillation.yaml (97 lines) - Phase 2: Distillation training configuration test_mlp_mixer_rawwaveform.py (106 lines) - Validation suite (all tests passed) train_mlp_mixer_rawwaveform_baseline.sh - Phase 1 training script train_mlp_mixer_rawwaveform_distillation.sh - Phase 2 training script README_RAW_WAVEFORM_EXPERIMENT.md (400+ lines) - Comprehensive experiment documentation - Architecture details - Expected results and success metrics - How-to guide Zero Impact Guarantee: - No modifications to existing files - Separate model class (MLPMixerSpeaker_RawWaveform) - Separate config files - Uses existing training infrastructure - Compatible with current distillation framework References: - SincNet: Ravanelli & Bengio, "Speaker Recognition from Raw Waveform with SincNet", IEEE SLT 2018 - MLP-Mixer paper modifications (ID Conv, MFM, grouped projections) - Previous experiments: V1 (16.13%), V2 (10.32%), V2_Large_lowAlpha (10.11%) Next Steps: 1. Run baseline training: bash train_mlp_mixer_rawwaveform_baseline.sh 2. If successful (EER < 14%), run distillation training 3. Compare results with mel-based V2 (10.32% EER) 4. Document findings in research log Status: ✓ Implementation complete, ready for training

dimuthuanuraj added 30 commits October 20, 2025 20:18

Replaced deprecated numpy.float with float in DatasetLoader.py to fix…

95a0fed

… compatibility issue with newer NumPy versions

Fix NumPy 1.20+ compatibility and MUSAN glob pattern

5060d1e

Uncomment musan_path and rir_path; set RIR path to /simulated_rirs

1b1f98f

voxceleb2 has 5991 speakers

a90098c

Update experiment_01.yaml with relevant changes for experiment config…

7638cee

…uration

Add test script to verify test list and audio loading before training

f4e8537

Add performance-optimized training script with mixed precision, gradi…

45329f2

…ent accumulation, and optimized DataLoader

Add performance-optimized SpeakerNet with improved mixed precision, n…

ffe7cc9

…on-blocking transfers, and inference mode

Add performance-optimized DatasetLoader with LRU caching, float32 loa…

168d213

…ding, and vectorized operations

Add performance-optimized configuration with increased batch size, wo…

b54fe5c

…rkers, and mixed precision enabled

Add comprehensive performance benchmarking script with GPU utilizatio…

84a5d77

…n monitoring and comparison

Add quick-start shell script for easy execution of optimized trainer …

2a0c7ef

…with different modes

Add comprehensive performance optimization documentation and file cha…

85393f3

…nge summary

Fix test_dataloader.py to use soundfile instead of torchaudio for bet…

41a0366

…ter compatibility

Fix benchmark script and config to include missing sampler parameters…

a3ae233

… (max_seg_per_spk, seed)

Fix benchmark script to handle missing parameters in original config

f902b17

- Add defaults for max_frames, max_seg_per_spk, seed, nPerSpeaker, distributed - Fix syntax error in sampler creation - Benchmark comparison now works: 2.46x speedup achieved

Fix imports and argument parsing in optimized training script

6fb6910

- Import from SpeakerNet_performance_updated and DatasetLoader_performance_updated - Fix find_option_type to handle store_true/store_false actions properly - Training now starts successfully with all optimizations enabled

Fix validation metric formatting error

0f0a735

- Convert EER and MinDCF to float to avoid numpy array formatting issues - Handle threshold value which can be array, tuple, or scalar - Fixes TypeError: unsupported format string passed to numpy.ndarray.__format__

Add comprehensive guide for max_test_pairs parameter

fb6c0cd

Fix threshold formatting error when saving best model

9db3edd

- Use threshold_val (float) instead of current_threshold (numpy array) - Fixes TypeError: unsupported format string passed to numpy.ndarray - Occurs when saving best model after achieving new best EER

Add NaN debugging guide and analysis script

3b6f3a6

Add daily research progress logs (Oct 20-29, 2025)

a7e8730

Add experiment logs and results (excluding model checkpoints)

203cd1f

dimuthuanuraj added 16 commits October 30, 2025 03:33

Add research progress log for October 30, 2025

6842bb4

Update October 30 research log with detailed bug fixes and analysis

e72e061

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voxceleb2 has 5991 speakers#195

voxceleb2 has 5991 speakers#195
dimuthuanuraj wants to merge 46 commits into
clovaai:masterfrom
dimuthuanuraj:master

dimuthuanuraj commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dimuthuanuraj commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant