Skip to content

Conversation

@stschiff
Copy link
Member

@stschiff stschiff commented Jan 6, 2026

Previously, our type for jPoseidonID in the Janno Structure was String. I have now changed this to a newtype based on ByteString. I have also changed GroupName to now be also based on ByteString, which makes it more compatible with sequence-formats.

I've implemented a smart constructor via Makeable, which now checks for illegal characters.

I still have to implement some tests, so not ready for review just yet.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 59.32203% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.89%. Comparing base (8865d3d) to head (834df04).

Files with missing lines Patch % Lines
src/Poseidon/ColumnTypesJanno.hs 64.00% 2 Missing and 7 partials ⚠️
src/Poseidon/Package.hs 53.84% 5 Missing and 1 partial ⚠️
src/Poseidon/ServerHTML.hs 0.00% 4 Missing ⚠️
src/Poseidon/CLI/Serve.hs 0.00% 2 Missing ⚠️
src/Poseidon/GenotypeData.hs 50.00% 2 Missing ⚠️
src/Poseidon/Janno.hs 85.71% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           Schema_300_dev     #362      +/-   ##
==================================================
+ Coverage           56.83%   56.89%   +0.05%     
==================================================
  Files                  33       33              
  Lines                4993     5016      +23     
  Branches              546      550       +4     
==================================================
+ Hits                 2838     2854      +16     
- Misses               1609     1612       +3     
- Partials              546      550       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stschiff
Copy link
Member Author

stschiff commented Jan 6, 2026

OK, I've added tests. This is ready for review.

@stschiff stschiff marked this pull request as ready for review January 6, 2026 10:44
@stschiff stschiff requested a review from nevrome January 6, 2026 10:44
@nevrome
Copy link
Member

nevrome commented Jan 8, 2026

Nice! The conversions between String, Bytestring and Text are sometimes a bit awkward, but I still think you made the right decisions by choosing Bytestring as the main format. The code looks fine 👍

I ran validate on the archives to see where our datasets stand:

commmunity-archive

...
[Error]   Can't read sample in ./2016_LazaridisNature/2016_LazaridisNature.janno in line 70: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Italian_South(first_degree_relative))
[Error]   Can't read sample in ./2025_GnecchiRuscone_CarpathianBasinHunPeriod/2025_GnecchiRuscone_CarpathianBasinHunPeriod.janno in line 22: parse error (Failed reading: conversion error: GroupName contains invalid characters: Hun/Gepid_P_5th-6th)
[Error]   Can't read sample in ./2025_GnecchiRuscone_CarpathianBasinHunPeriod/2025_GnecchiRuscone_CarpathianBasinHunPeriod.janno in line 23: parse error (Failed reading: conversion error: GroupName contains invalid characters: Hun/Gepid_P_5th-6th)
[Error]   Can't read sample in ./2025_GnecchiRuscone_CarpathianBasinHunPeriod/2025_GnecchiRuscone_CarpathianBasinHunPeriod.janno in line 24: parse error (Failed reading: conversion error: GroupName contains invalid characters: Hun/Gepid_P_5th-6th)
[Error]   Can't read sample in ./2025_GnecchiRuscone_CarpathianBasinHunPeriod/2025_GnecchiRuscone_CarpathianBasinHunPeriod.janno in line 26: parse error (Failed reading: conversion error: GroupName contains invalid characters: Hun/Gepid_P_5th-6th)
[Error]   Can't read sample in ./2025_GnecchiRuscone_CarpathianBasinHunPeriod/2025_GnecchiRuscone_CarpathianBasinHunPeriod.janno in line 28: parse error (Failed reading: conversion error: GroupName contains invalid characters: Hun/Gepid_P_5th-6th)
[Error]   Can't read sample in ./2012_PickrellNatureCommunications/2012_PickrellNatureCommunications.janno in line 8: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Hadza_Henn(relative))
[Error]   Can't read sample in ./2012_PickrellNatureCommunications/2012_PickrellNatureCommunications.janno in line 9: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Hadza_Henn(relative))
[Error]   Can't read sample in ./2014_LazaridisNature/2014_LazaridisNature.janno in line 217: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Tlingit(relative))
[Error]   Can't read sample in ./2014_LazaridisNature/2014_LazaridisNature.janno in line 228: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Even(lt0.95_completeness))
[Error]   Can't read sample in ./2014_LazaridisNature/2014_LazaridisNature.janno in line 242: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Eskimo(PCA_outlier))
[Error]   Can't read sample in ./2014_LazaridisNature/2014_LazaridisNature.janno in line 253: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Eskimo(outlier_groups_with_Naukan))
[Error]   Can't read sample in ./2014_LazaridisNature/2014_LazaridisNature.janno in line 708: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_GujaratiA(PCA_outlier))
[Error]   Can't read sample in ./2012_PattersonGenetics/2012_PattersonGenetics.janno in line 107: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Sindhi_Pakistan(PCA_outlier))
[Error]   Can't read sample in ./2012_PattersonGenetics/2012_PattersonGenetics.janno in line 108: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Sindhi_Pakistan(PCA_outlier))
[Error]   Can't read sample in ./2012_PattersonGenetics/2012_PattersonGenetics.janno in line 110: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Sindhi_Pakistan(PCA_outlier))
[Error]   Can't read sample in ./2012_PattersonGenetics/2012_PattersonGenetics.janno in line 112: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Sindhi_Pakistan(PCA_outlier))
[Error]   Can't read sample in ./2012_PattersonGenetics/2012_PattersonGenetics.janno in line 129: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Pathan(PCA_outlier))
[Error]   Can't read sample in ./2016_Mallick_SGDP1240K_diploid_pulldown/SGDP.janno in line 2: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Han(discovery).DG)
[Error]   Can't read sample in ./2016_Mallick_SGDP1240K_diploid_pulldown/SGDP.janno in line 3: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Karitiana(discovery).DG)
[Error]   Can't read sample in ./2016_Mallick_SGDP1240K_diploid_pulldown/SGDP.janno in line 6: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Papuan(discovery).DG)
[Error]   Can't read sample in ./2019_Jeong_InnerEurasia/2019_Jeong_InnerEurasia.janno in line 28: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Tubalar(PC_outlier))
[Error]   Can't read sample in ./2019_Jeong_InnerEurasia/2019_Jeong_InnerEurasia.janno in line 36: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Altaian(PC_outlier))
[Error]   Can't read sample in ./2019_Jeong_InnerEurasia/2019_Jeong_InnerEurasia.janno in line 73: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Negidal(PC_outlier))
[Error]   Can't read sample in ./2019_Jeong_InnerEurasia/2019_Jeong_InnerEurasia.janno in line 174: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Buryat(PC_outlier))
[Error]   Can't read sample in ./2019_Jeong_InnerEurasia/2019_Jeong_InnerEurasia.janno in line 195: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Buryat(PC_outlier))
[Error]   Can't read sample in ./2012_MeyerScience/2012_MeyerScience.janno in line 2: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Mbuti(discovery).DG)
[Error]   Can't read sample in ./2012_MeyerScience/2012_MeyerScience.janno in line 3: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Yoruba(discovery).DG)
[Error]   Can't read sample in ./2012_MeyerScience/2012_MeyerScience.janno in line 4: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Sardinian(discovery).DG)
[Error]   Can't read sample in ./2012_MeyerScience/2012_MeyerScience.janno in line 5: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_French(discovery).DG)
[Error]   Can't read sample in ./2012_MeyerScience/2012_MeyerScience.janno in line 7: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Ju_hoan_North(discovery).DG)
[Error]   Can't read sample in ./2017_VyasAJPA/2017_VyasAJPA.janno in line 77: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Yemeni_Northwest_(PCA_outlier))
[Error]   Can't read sample in ./2025_Saag_NorthPontic/2025_Saag_NorthPontic.janno in line 3: parse error (Failed reading: conversion error: GroupName contains invalid characters: UkrFBA/EIA_Cimmerian)
[Error]   Can't read sample in ./2025_Saag_NorthPontic/2025_Saag_NorthPontic.janno in line 41: parse error (Failed reading: conversion error: GroupName contains invalid characters: UkrMA_Post-Cuman_Cuman?)
[Error]   Can't read sample in ./2025_Saag_NorthPontic/2025_Saag_NorthPontic.janno in line 55: parse error (Failed reading: conversion error: GroupName contains invalid characters: UkrMA_GoldenHorde_Slav/Nom?)
[Error]   Can't read sample in ./2025_Saag_NorthPontic/2025_Saag_NorthPontic.janno in line 65: parse error (Failed reading: conversion error: GroupName contains invalid characters: UkrEIA_Antiquity_Greeks?_1)
[Error]   Can't read sample in ./2025_Saag_NorthPontic/2025_Saag_NorthPontic.janno in line 66: parse error (Failed reading: conversion error: GroupName contains invalid characters: UkrEIA_Antiquity_Greeks?_2)
[Error]   Can't read sample in ./2020_Bergstrom_HGDP/Bergstrom_HGDP.janno in line 3: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Mandenka(relative).SDG)
[Error]   Can't read sample in ./2020_Bergstrom_HGDP/Bergstrom_HGDP.janno in line 6: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_MbutiPygmy(relative).SDG)
[Error]   Can't read sample in ./2020_Bergstrom_HGDP/Bergstrom_HGDP.janno in line 22: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Mandenka(relative).SDG)
[Error]   Can't read sample in ./2020_Bergstrom_HGDP/Bergstrom_HGDP.janno in line 26: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).SDG)
[Error]   Can't read sample in ./2020_Bergstrom_HGDP/Bergstrom_HGDP.janno in line 43: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).SDG)
...
[Error]   Validation failed

aadr-archive

...
[Error]   Can't read sample in ./AADR_v62_1240K_BeyondAncient/AADR_v62_1240K_BeyondAncient.janno in line 1885: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Chaganguole(Chagangole)_BA_Chemurcheck.AG.SG)
[Error]   Can't read sample in ./AADR_v62_1240K_BeyondAncient/AADR_v62_1240K_BeyondAncient.janno in line 2021: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Zhagunluke(Zaghunluq)_IA_Zaghunluq.AG)
[Error]   Can't read sample in ./AADR_v62_1240K_BeyondAncient/AADR_v62_1240K_BeyondAncient.janno in line 2039: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Jierzankale(Jirzankal)_IA.AG)
[Error]   Can't read sample in ./AADR_v62_1240K_BeyondAncient/AADR_v62_1240K_BeyondAncient.janno in line 2065: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Shanpula(Sampula)_Historical_Sampula.AG)
[Error]   Can't read sample in ./AADR_v54_1_p1_HO_Modern_not_in_1240K/AADR_v54_1_p1_HO_Modern_not_in_1240K.janno in line 360: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Ezid_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v54_1_p1_HO_Modern_not_in_1240K/AADR_v54_1_p1_HO_Modern_not_in_1240K.janno in line 647: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Russian_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v54_1_p1_HO_Modern_not_in_1240K/AADR_v54_1_p1_HO_Modern_not_in_1240K.janno in line 960: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Even_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v54_1_p1_HO_Modern_not_in_1240K/AADR_v54_1_p1_HO_Modern_not_in_1240K.janno in line 1962: parse error (Failed reading: conversion error: PoseidonID contains invalid characters: IHW9118(BUR_E).HO)
[Error]   Can't read sample in ./AADR_v54_1_p1_HO_Modern_not_in_1240K/AADR_v54_1_p1_HO_Modern_not_in_1240K.janno in line 1964: parse error (Failed reading: conversion error: PoseidonID contains invalid characters: IHW9124(IHL_AD036).HO)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_BeyondAncient/AADR_v54_1_p1_1240K_BeyondAncient.janno in line 3667: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Chaganguole(Chagangole)_BA_Chemurcheck)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_BeyondAncient/AADR_v54_1_p1_1240K_BeyondAncient.janno in line 3803: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Zhagunluke(Zaghunluq)_IA_Zaghunluq)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_BeyondAncient/AADR_v54_1_p1_1240K_BeyondAncient.janno in line 3821: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Jierzankale(Jirzankal)_IA)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_BeyondAncient/AADR_v54_1_p1_1240K_BeyondAncient.janno in line 3847: parse error (Failed reading: conversion error: GroupName contains invalid characters: China_Xinjiang_Shanpula(Sampula)_Historical_Sampula)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_EuropeAncient/AADR_v54_1_p1_1240K_EuropeAncient.janno in line 4900: parse error (Failed reading: conversion error: GroupName contains invalid characters: Switzerland_EBA_1_1d.rel.TU876(SX10))
[Error]   Can't read sample in ./AADR_v62_1240K_EuropeAncientNorth/AADR_v62_1240K_EuropeAncientNorth.janno in line 1661: parse error (Failed reading: conversion error: GroupName contains invalid characters: Switzerland_EBA_1_1d.rel.TU876(SX10).AG)
[Error]   Can't read sample in ./AADR_v62_1240K_Modern/AADR_v62_1240K_Modern.janno in line 512: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).DG)
[Error]   Can't read sample in ./AADR_v62_1240K_Modern/AADR_v62_1240K_Modern.janno in line 513: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).DG)
[Error]   Can't read sample in ./AADR_v62_1240K_Modern/AADR_v62_1240K_Modern.janno in line 532: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_French(discovery).DG)
[Error]   Can't read sample in ./AADR_v62_1240K_Modern/AADR_v62_1240K_Modern.janno in line 533: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_French(discovery).DG)
[Error]   Can't read sample in ./AADR_v62_1240K_Modern/AADR_v62_1240K_Modern.janno in line 534: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Han(discovery).DG)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_Modern/AADR_v54_1_p1_1240K_Modern.janno in line 1859: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).DG)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_Modern/AADR_v54_1_p1_1240K_Modern.janno in line 1860: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_BantuKenya(relative).DG)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_Modern/AADR_v54_1_p1_1240K_Modern.janno in line 1880: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_French(discovery).DG)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_Modern/AADR_v54_1_p1_1240K_Modern.janno in line 1881: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_French(discovery).DG)
[Error]   Can't read sample in ./AADR_v54_1_p1_1240K_Modern/AADR_v54_1_p1_1240K_Modern.janno in line 1882: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Han(discovery).DG)
[Error]   Can't read sample in ./AADR_v62_HO_Modern_not_in_1240K/AADR_v62_HO_Modern_not_in_1240K.janno in line 370: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Ezid_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v62_HO_Modern_not_in_1240K/AADR_v62_HO_Modern_not_in_1240K.janno in line 657: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Russian_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v62_HO_Modern_not_in_1240K/AADR_v62_HO_Modern_not_in_1240K.janno in line 970: parse error (Failed reading: conversion error: GroupName contains invalid characters: Ignore_Even_<0.95_completeness.HO)
[Error]   Can't read sample in ./AADR_v62_HO_Modern_not_in_1240K/AADR_v62_HO_Modern_not_in_1240K.janno in line 1972: parse error (Failed reading: conversion error: PoseidonID contains invalid characters: IHW9118(BUR_E).HO)
[Error]   Can't read sample in ./AADR_v62_HO_Modern_not_in_1240K/AADR_v62_HO_Modern_not_in_1240K.janno in line 1974: parse error (Failed reading: conversion error: PoseidonID contains invalid characters: IHW9124(IHL_AD036).HO)
...
[Error]   Validation failed

minotaur-archive

...
[Info]    Validation passed

How should we approach this reality? We still have to be able to read this old data. Maybe the strict validation should only happen for Poseidon v3.0.0+ packages?

@nevrome
Copy link
Member

nevrome commented Jan 8, 2026

I had an alternative idea: trident could turn every unexpected character to _.

@stschiff
Copy link
Member Author

Update on this issue from today's meeting: We've become unsure on whether it is still a good idea to introduce this restriction with Poseidon 3.0. To discuss this, I have opened a PR on the schema-repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants