|
| 1 | +# DNA Kit Import and Triangulation Features |
| 2 | + |
| 3 | +This document describes the enhanced DNA kit import and triangulation features for the laravel-dna library integration. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The genealogy-laravel application now includes comprehensive DNA kit import and triangulation capabilities that extend the base functionality of the liberu-genealogy/laravel-dna package. |
| 8 | + |
| 9 | +## New Features |
| 10 | + |
| 11 | +### 1. Bulk DNA Kit Import |
| 12 | + |
| 13 | +Import multiple DNA kits at once with automatic file format detection and validation. |
| 14 | + |
| 15 | +**Supported Formats:** |
| 16 | +- 23andMe |
| 17 | +- AncestryDNA |
| 18 | +- MyHeritage |
| 19 | +- FamilyTreeDNA |
| 20 | +- Generic CSV/TSV formats |
| 21 | + |
| 22 | +**Usage:** |
| 23 | + |
| 24 | +```bash |
| 25 | +# Import from a directory |
| 26 | +php artisan dna:import {user_id} --directory=path/to/dna/files |
| 27 | + |
| 28 | +# Import specific files |
| 29 | +php artisan dna:import {user_id} --files=file1.txt --files=file2.txt |
| 30 | + |
| 31 | +# Import without automatic matching |
| 32 | +php artisan dna:import {user_id} --directory=path/to/files --no-match |
| 33 | +``` |
| 34 | + |
| 35 | +**Features:** |
| 36 | +- Automatic file format detection |
| 37 | +- File validation (size, format, SNP count) |
| 38 | +- Progress tracking with progress bar |
| 39 | +- Detailed import statistics |
| 40 | +- Error handling with failed file reporting |
| 41 | + |
| 42 | +### 2. DNA Triangulation |
| 43 | + |
| 44 | +Match one DNA kit against many or perform three-way triangulation to find shared segments. |
| 45 | + |
| 46 | +#### One-to-Many Triangulation |
| 47 | + |
| 48 | +Match a single DNA kit against all other kits or a specific subset: |
| 49 | + |
| 50 | +```bash |
| 51 | +# Match against all kits |
| 52 | +php artisan dna:triangulate {base_kit_id} |
| 53 | + |
| 54 | +# Match against specific kits |
| 55 | +php artisan dna:triangulate {base_kit_id} --kits=2 --kits=3 --kits=4 |
| 56 | + |
| 57 | +# Set minimum cM threshold |
| 58 | +php artisan dna:triangulate {base_kit_id} --min-cm=50 |
| 59 | + |
| 60 | +# Store results in database |
| 61 | +php artisan dna:triangulate {base_kit_id} --store |
| 62 | +``` |
| 63 | + |
| 64 | +**Output:** |
| 65 | +- List of significant matches sorted by shared cM |
| 66 | +- Relationship predictions with confidence levels |
| 67 | +- Match quality scores |
| 68 | +- Chromosome breakdowns |
| 69 | + |
| 70 | +#### Three-Way Triangulation |
| 71 | + |
| 72 | +Find shared segments among three DNA kits: |
| 73 | + |
| 74 | +```bash |
| 75 | +php artisan dna:triangulate {base_kit_id} --three-way --three-way-kits=1 --three-way-kits=2 --three-way-kits=3 |
| 76 | +``` |
| 77 | + |
| 78 | +**Output:** |
| 79 | +- Pairwise match results for all three combinations |
| 80 | +- Triangulated chromosomes (where all three share DNA) |
| 81 | +- Triangulation score based on minimum shared cM |
| 82 | +- Detailed chromosome breakdown |
| 83 | + |
| 84 | +### 3. Services |
| 85 | + |
| 86 | +#### DnaImportService |
| 87 | + |
| 88 | +Provides programmatic access to DNA import functionality: |
| 89 | + |
| 90 | +```php |
| 91 | +use App\Services\DnaImportService; |
| 92 | + |
| 93 | +$importService = app(DnaImportService::class); |
| 94 | + |
| 95 | +// Import single kit |
| 96 | +$result = $importService->importSingleKit('path/to/file.txt', $userId, $autoMatch = true); |
| 97 | + |
| 98 | +// Import multiple kits |
| 99 | +$results = $importService->importMultipleKits(['file1.txt', 'file2.txt'], $userId); |
| 100 | + |
| 101 | +// Validate file format |
| 102 | +$validation = $importService->validateDnaFile('path/to/file.txt'); |
| 103 | + |
| 104 | +// Get import statistics |
| 105 | +$stats = $importService->getImportStatistics($userId); |
| 106 | +``` |
| 107 | + |
| 108 | +#### DnaTriangulationService |
| 109 | + |
| 110 | +Provides programmatic access to triangulation functionality: |
| 111 | + |
| 112 | +```php |
| 113 | +use App\Services\DnaTriangulationService; |
| 114 | + |
| 115 | +$triangulationService = app(DnaTriangulationService::class); |
| 116 | + |
| 117 | +// One-to-many triangulation |
| 118 | +$results = $triangulationService->triangulateOneAgainstMany( |
| 119 | + $baseKitId, |
| 120 | + $compareKitIds = null, // null = all kits |
| 121 | + $minSharedCm = 20.0 |
| 122 | +); |
| 123 | + |
| 124 | +// Three-way triangulation |
| 125 | +$results = $triangulationService->triangulateThreeWay($kit1Id, $kit2Id, $kit3Id); |
| 126 | + |
| 127 | +// Find triangulated groups |
| 128 | +$groups = $triangulationService->findTriangulatedGroups([$kit1Id, $kit2Id, $kit3Id, $kit4Id]); |
| 129 | + |
| 130 | +// Store results |
| 131 | +$triangulationService->storeTriangulationResults($results, 'one_to_many'); |
| 132 | +``` |
| 133 | + |
| 134 | +### 4. DNA Module Services |
| 135 | + |
| 136 | +Access DNA functionality through the module system: |
| 137 | + |
| 138 | +```php |
| 139 | +// Get DNA service |
| 140 | +$dnaService = app('genealogy.dna'); |
| 141 | + |
| 142 | +// Access import service |
| 143 | +$importService = $dnaService->import(); |
| 144 | + |
| 145 | +// Access triangulation service |
| 146 | +$triangulationService = $dnaService->triangulate(); |
| 147 | + |
| 148 | +// Access matching service |
| 149 | +$matchingService = $dnaService->match(); |
| 150 | +``` |
| 151 | + |
| 152 | +## File Format Detection |
| 153 | + |
| 154 | +The import service automatically detects DNA file formats based on header content: |
| 155 | + |
| 156 | +- **23andMe**: Identified by "# This data file generated by 23andMe" header |
| 157 | +- **AncestryDNA**: Identified by "rsid" and "chromosome" column headers |
| 158 | +- **MyHeritage**: Identified by "RSID" and "Chr" column headers |
| 159 | +- **FamilyTreeDNA**: Identified by uppercase "RSID" and "CHROMOSOME" headers |
| 160 | +- **Generic**: Any file containing rsid patterns (rs\d+) |
| 161 | + |
| 162 | +## Triangulation Algorithm |
| 163 | + |
| 164 | +The triangulation service uses advanced algorithms to: |
| 165 | + |
| 166 | +1. **Match pairs**: Compare DNA kits pairwise using the AdvancedDnaMatchingService |
| 167 | +2. **Find shared segments**: Identify IBD (Identical By Descent) segments |
| 168 | +3. **Calculate metrics**: Compute shared centiMorgans, confidence levels, and quality scores |
| 169 | +4. **Detect triangulation**: Find chromosomes where all kits share DNA |
| 170 | +5. **Score matches**: Rank matches by triangulation score |
| 171 | + |
| 172 | +### Triangulation Score |
| 173 | + |
| 174 | +The triangulation score represents the sum of minimum shared cM across all triangulated chromosomes. Higher scores indicate stronger triangulation evidence. |
| 175 | + |
| 176 | +## Testing |
| 177 | + |
| 178 | +Comprehensive unit tests are included: |
| 179 | + |
| 180 | +```bash |
| 181 | +# Run DNA import tests |
| 182 | +php artisan test --filter=DnaImportServiceTest |
| 183 | + |
| 184 | +# Run triangulation tests |
| 185 | +php artisan test --filter=DnaTriangulationServiceTest |
| 186 | + |
| 187 | +# Run all DNA tests |
| 188 | +php artisan test tests/Unit/Services/Dna* |
| 189 | +``` |
| 190 | + |
| 191 | +## Database Schema |
| 192 | + |
| 193 | +The enhanced functionality uses the existing `dnas` and `dna_matchings` tables with these key fields: |
| 194 | + |
| 195 | +**dnas table:** |
| 196 | +- `id`: Primary key |
| 197 | +- `name`: Kit name |
| 198 | +- `file_name`: Path to DNA file |
| 199 | +- `variable_name`: Unique identifier (var_xxxxx) |
| 200 | +- `user_id`: Owner of the kit |
| 201 | + |
| 202 | +**dna_matchings table:** |
| 203 | +- `user_id`, `match_id`: User IDs of matched individuals |
| 204 | +- `total_shared_cm`: Total shared centiMorgans |
| 205 | +- `largest_cm_segment`: Largest shared segment |
| 206 | +- `confidence_level`: Confidence percentage (0-100) |
| 207 | +- `predicted_relationship`: Predicted genetic relationship |
| 208 | +- `shared_segments_count`: Number of shared segments |
| 209 | +- `match_quality_score`: Quality score (0-100) |
| 210 | +- `chromosome_breakdown`: JSON data with per-chromosome details |
| 211 | +- `detailed_report`: JSON data with analysis notes |
| 212 | + |
| 213 | +## Error Handling |
| 214 | + |
| 215 | +All services include comprehensive error handling: |
| 216 | + |
| 217 | +- File validation errors are caught and reported |
| 218 | +- Missing or corrupted DNA files are handled gracefully |
| 219 | +- Failed imports are tracked separately from successful ones |
| 220 | +- Database operations are wrapped in try-catch blocks |
| 221 | +- Logging is performed for all errors |
| 222 | + |
| 223 | +## Performance Considerations |
| 224 | + |
| 225 | +For large-scale operations: |
| 226 | + |
| 227 | +- Use batch processing for multiple imports |
| 228 | +- Set appropriate minimum cM thresholds to reduce processing time |
| 229 | +- Consider queueing triangulation jobs for large datasets |
| 230 | +- Monitor memory usage when processing many kits |
| 231 | + |
| 232 | +## Integration with laravel-dna Package |
| 233 | + |
| 234 | +This implementation complements the liberu-genealogy/laravel-dna package by: |
| 235 | + |
| 236 | +- Using php-dna for SNP loading and parsing |
| 237 | +- Leveraging existing DNA kit structures |
| 238 | +- Maintaining compatibility with package job dispatching |
| 239 | +- Extending functionality without modifying core package code |
| 240 | + |
| 241 | +## Future Enhancements |
| 242 | + |
| 243 | +Planned improvements include: |
| 244 | + |
| 245 | +- UI components for bulk import in Filament |
| 246 | +- Interactive triangulation visualization |
| 247 | +- Cluster detection for triangulated groups |
| 248 | +- Export functionality for triangulation results |
| 249 | +- Integration with family tree matching |
0 commit comments