Implement Jaro-Winkler for unequal genome lengths#113
Implement Jaro-Winkler for unequal genome lengths#113AlexanderGatesDev wants to merge 8 commits intodavidrmiller:mainfrom
Conversation
Add handling for unequal genome lengths using Jaro-Winkler distance.
|
Thanks for giving some attention to this bug. As I recall, I was never able to fully support variable-length genomes -- colonies with variable-length genomes tended to converge to infinite-length or zero-length genomes. Fragments of those failed experiments remain in the code base. Your fix for the distance measure is a necessary step to get that feature working, but I suspect there are additional issues that must be addressed before that feature will work well. Before I merge in any changes, let me know if you are able to evolve stable colonies with variable-length genomes. This project is in minimal maintenance mode, but this would be an interesting improvement if you can get it to work. |
Refactor genome comparison functions to use standard algorithms and improve readability. Added length penalty for similarity calculation when genomes are of unequal lengths.
|
Hello David! Thank you for pointing that out. I think I've made some progress with the variable-length genomes, but I'll need to do some more testing to confirm. |
|
Hi, so I'm considering implementing connection deduplication, which merges duplicate source-sink pairs by summing weights, keeping one connection per pair. This removes the artificial benefit of redundant genes. Expected Impact
Important Notes
This should significantly reduce genome length growth. What are your thoughts? I know this is in maintenance mode, so we can cancel this merge request and I can keep the changes on my fork if that's safer. Keeping your foundation intact might be a safe idea. |
- Added .vscode/ to .gitignore to exclude Visual Studio Code settings from version control. - Updated LICENSE file to reflect new copyright holder and year.
Added link to Rust refactor repository.
|
On deduplication -- This is an interesting experiment about understanding variable-length genomes, but I'm not sure there is an analogous guiding hand in nature. In nature, most genomes are large and messy, with lots of redundancy from crossovers, replication errors, and replications of entire sections of DNA. In nature, phenotypes often scale with the number of copies of a gene. Nature imposes a modest cost for genome length, but that cost is often outweighed by the effects of adjusting copy number to adjust expression. If we want to simulate nature as best we can, I think we need to allow duplicates to persist, or drift, or mutate, or whatever reproduces best. Adding a small cost for genome length makes sense, but otherwise constraining the evolutionary process moves this imperfect simulator a little further from how nature works. I suspect that the solution to the problem of unstable variable-length genomes may lurk in other deficiencies of the simulation. Nevertheless, your experiments are very interesting and I look forward to hearing more about the results. |
Add handling for unequal genome lengths using Jaro-Winkler distance.
#112