Type-validated Python models for delimited data files.
fgmetric lets you define Python classes ("Metrics") that map directly to rows in CSV/TSV files.
It handles parsing, type coercion (strings → int, float, bool), and validation automatically using Pydantic.
Requires Python 3.12 or later.
pip install fgmetricOr with uv:
uv add fgmetricIf you're a bioinformatician or data engineer processing delimited files in Python, you've probably written code like this:
import csv
with open("metrics.tsv") as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
quality = int(row["mapping_quality"])
is_duplicate = row["is_duplicate"].lower() in ("true", "1", "yes")
if row["score"]: # handle empty strings
score = float(row["score"])
# ... repeat for every fieldfgmetric replaces this with:
for metric in AlignmentMetric.read(path):
# metric.mapping_quality is already an int
# metric.is_duplicate is already a bool
# metric.score is already Optional[float]How it compares:
- vs. csv + dataclasses — Automatic type coercion and validation without boilerplate. Built on Pydantic, so additional custom validators and serializer can be readily added.
- vs. pandas — Unlike pandas,
fgmetricprocesses records lazily — you can handle files larger than memory. AndMetrics are type-validated and can be made immutable, making them safe to pass between functions without defensive copying. - vs. Pydantic alone —
fgmetrichandles CSV/TSV specifics (header parsing, delimiter configuration) and provides out-of-the box features like empty value handling and Counter field pivoting.
Define a class to represent each row:
from pathlib import Path
from fgmetric import Metric, MetricWriter
class AlignmentMetric(Metric):
read_name: str
mapping_quality: int
is_duplicate: bool = FalseThen read or write:
# Reading
for metric in AlignmentMetric.read(Path("alignments.tsv")):
print(f"{metric.read_name}: MQ={metric.mapping_quality}")
# Writing
metrics = [
AlignmentMetric(read_name="read1", mapping_quality=60),
AlignmentMetric(read_name="read2", mapping_quality=30, is_duplicate=True),
]
with MetricWriter(AlignmentMetric, Path("output.tsv")) as writer:
writer.writeall(metrics)Example input file (alignments.tsv):
read_name mapping_quality is_duplicate
read1 60 false
read2 30 trueInvalid data raises pydantic.ValidationError with details about which field failed.
Both reading and writing support custom delimiters for working with CSV or other formats:
# Reading CSV files
for metric in MyMetric.read(Path("data.csv"), delimiter=","):
...
# Writing CSV files
with MetricWriter(MyMetric, Path("output.csv"), delimiter=",") as writer:
...Fields typed as list[T] are automatically parsed from and serialized to delimited strings:
class TaggedRead(Metric):
read_id: str
tags: list[str] # "A,B,C" becomes ["A", "B", "C"]
scores: list[int] # "1,2,3" becomes [1, 2, 3]
optional_tags: list[str] | None # "" becomes NoneThe list delimiter defaults to , but can be customized per-metric:
class SemicolonMetric(Metric):
collection_delimiter = ";"
values: list[int] # "1;2;3" becomes [1, 2, 3]When your file has categorical data with one column per category (e.g. base counts A, C, G, T), you can model them as a single Counter[StrEnum] field:
from collections import Counter
from enum import StrEnum
from fgmetric import Metric
class Base(StrEnum):
A = "A"
C = "C"
G = "G"
T = "T"
class BaseCountMetric(Metric):
position: int
counts: Counter[Base]
# Input TSV:
# position A C G T
# 1 10 5 3 2
# Parses to:
# BaseCountMetric(position=1, counts=Counter({Base.A: 10, Base.C: 5, ...}))See the contributing guide for development setup and testing instructions.