Skip to content

Latest commit

 

History

History
249 lines (172 loc) · 7.12 KB

File metadata and controls

249 lines (172 loc) · 7.12 KB

DNA Kit Import and Triangulation Features

This document describes the enhanced DNA kit import and triangulation features for the laravel-dna library integration.

Overview

The genealogy-laravel application now includes comprehensive DNA kit import and triangulation capabilities that extend the base functionality of the liberu-genealogy/laravel-dna package.

New Features

1. Bulk DNA Kit Import

Import multiple DNA kits at once with automatic file format detection and validation.

Supported Formats:

  • 23andMe
  • AncestryDNA
  • MyHeritage
  • FamilyTreeDNA
  • Generic CSV/TSV formats

Usage:

# Import from a directory
php artisan dna:import {user_id} --directory=path/to/dna/files

# Import specific files
php artisan dna:import {user_id} --files=file1.txt --files=file2.txt

# Import without automatic matching
php artisan dna:import {user_id} --directory=path/to/files --no-match

Features:

  • Automatic file format detection
  • File validation (size, format, SNP count)
  • Progress tracking with progress bar
  • Detailed import statistics
  • Error handling with failed file reporting

2. DNA Triangulation

Match one DNA kit against many or perform three-way triangulation to find shared segments.

One-to-Many Triangulation

Match a single DNA kit against all other kits or a specific subset:

# Match against all kits
php artisan dna:triangulate {base_kit_id}

# Match against specific kits
php artisan dna:triangulate {base_kit_id} --kits=2 --kits=3 --kits=4

# Set minimum cM threshold
php artisan dna:triangulate {base_kit_id} --min-cm=50

# Store results in database
php artisan dna:triangulate {base_kit_id} --store

Output:

  • List of significant matches sorted by shared cM
  • Relationship predictions with confidence levels
  • Match quality scores
  • Chromosome breakdowns

Three-Way Triangulation

Find shared segments among three DNA kits:

php artisan dna:triangulate {base_kit_id} --three-way --three-way-kits=1 --three-way-kits=2 --three-way-kits=3

Output:

  • Pairwise match results for all three combinations
  • Triangulated chromosomes (where all three share DNA)
  • Triangulation score based on minimum shared cM
  • Detailed chromosome breakdown

3. Services

DnaImportService

Provides programmatic access to DNA import functionality:

use App\Services\DnaImportService;

$importService = app(DnaImportService::class);

// Import single kit
$result = $importService->importSingleKit('path/to/file.txt', $userId, $autoMatch = true);

// Import multiple kits
$results = $importService->importMultipleKits(['file1.txt', 'file2.txt'], $userId);

// Validate file format
$validation = $importService->validateDnaFile('path/to/file.txt');

// Get import statistics
$stats = $importService->getImportStatistics($userId);

DnaTriangulationService

Provides programmatic access to triangulation functionality:

use App\Services\DnaTriangulationService;

$triangulationService = app(DnaTriangulationService::class);

// One-to-many triangulation
$results = $triangulationService->triangulateOneAgainstMany(
    $baseKitId,
    $compareKitIds = null, // null = all kits
    $minSharedCm = 20.0
);

// Three-way triangulation
$results = $triangulationService->triangulateThreeWay($kit1Id, $kit2Id, $kit3Id);

// Find triangulated groups
$groups = $triangulationService->findTriangulatedGroups([$kit1Id, $kit2Id, $kit3Id, $kit4Id]);

// Store results
$triangulationService->storeTriangulationResults($results, 'one_to_many');

4. DNA Module Services

Access DNA functionality through the module system:

// Get DNA service
$dnaService = app('genealogy.dna');

// Access import service
$importService = $dnaService->import();

// Access triangulation service
$triangulationService = $dnaService->triangulate();

// Access matching service
$matchingService = $dnaService->match();

File Format Detection

The import service automatically detects DNA file formats based on header content:

  • 23andMe: Identified by "# This data file generated by 23andMe" header
  • AncestryDNA: Identified by "rsid" and "chromosome" column headers
  • MyHeritage: Identified by "RSID" and "Chr" column headers
  • FamilyTreeDNA: Identified by uppercase "RSID" and "CHROMOSOME" headers
  • Generic: Any file containing rsid patterns (rs\d+)

Triangulation Algorithm

The triangulation service uses advanced algorithms to:

  1. Match pairs: Compare DNA kits pairwise using the AdvancedDnaMatchingService
  2. Find shared segments: Identify IBD (Identical By Descent) segments
  3. Calculate metrics: Compute shared centiMorgans, confidence levels, and quality scores
  4. Detect triangulation: Find chromosomes where all kits share DNA
  5. Score matches: Rank matches by triangulation score

Triangulation Score

The triangulation score represents the sum of minimum shared cM across all triangulated chromosomes. Higher scores indicate stronger triangulation evidence.

Testing

Comprehensive unit tests are included:

# Run DNA import tests
php artisan test --filter=DnaImportServiceTest

# Run triangulation tests
php artisan test --filter=DnaTriangulationServiceTest

# Run all DNA tests
php artisan test tests/Unit/Services/Dna*

Database Schema

The enhanced functionality uses the existing dnas and dna_matchings tables with these key fields:

dnas table:

  • id: Primary key
  • name: Kit name
  • file_name: Path to DNA file
  • variable_name: Unique identifier (var_xxxxx)
  • user_id: Owner of the kit

dna_matchings table:

  • user_id, match_id: User IDs of matched individuals
  • total_shared_cm: Total shared centiMorgans
  • largest_cm_segment: Largest shared segment
  • confidence_level: Confidence percentage (0-100)
  • predicted_relationship: Predicted genetic relationship
  • shared_segments_count: Number of shared segments
  • match_quality_score: Quality score (0-100)
  • chromosome_breakdown: JSON data with per-chromosome details
  • detailed_report: JSON data with analysis notes

Error Handling

All services include comprehensive error handling:

  • File validation errors are caught and reported
  • Missing or corrupted DNA files are handled gracefully
  • Failed imports are tracked separately from successful ones
  • Database operations are wrapped in try-catch blocks
  • Logging is performed for all errors

Performance Considerations

For large-scale operations:

  • Use batch processing for multiple imports
  • Set appropriate minimum cM thresholds to reduce processing time
  • Consider queueing triangulation jobs for large datasets
  • Monitor memory usage when processing many kits

Integration with laravel-dna Package

This implementation complements the liberu-genealogy/laravel-dna package by:

  • Using php-dna for SNP loading and parsing
  • Leveraging existing DNA kit structures
  • Maintaining compatibility with package job dispatching
  • Extending functionality without modifying core package code

Future Enhancements

Planned improvements include:

  • UI components for bulk import in Filament
  • Interactive triangulation visualization
  • Cluster detection for triangulated groups
  • Export functionality for triangulation results
  • Integration with family tree matching