Skip to content

Add polars-bio to the benchmarked tools #5

Description

@riasc

Add polars-bio as a benchmarkable interval tool.

What it is: a Python genomics library built on polars / Apache Arrow / DataFusion, offering genomic range operations (overlap/intersect) via a DataFrame API. Installed with pip install polars-bio; overlap is exposed as polars_bio.overlap(df1, df2, ...).

Integration points (follow the existing intervaltree / awk pattern — a small CLI wrapper invoked through tool_call):

  • Add a wrapper segmeter/tools/intersect_polars_bio.py that takes -t target -q query -o output and writes BED-format overlaps, mirroring segmeter/tools/intersect_intervaltree.py.
  • Add a polars_bio branch in query_call that runs the wrapper via tool_call. segmeter/calls.py (near the intervaltree branch, ~line 270)
  • Add "polars_bio" to the --tool argparse choices. segmeter/main.py:35-37
  • Decide if a sorted/indexed variant is worth benchmarking; if so add polars_bio_sorted to idx_based_tools and an index_call branch. segmeter/benchmark.py:24-27, segmeter/calls.py:41
  • Add it to the Docker image / environment so the benchmark can find it.
  • Verify output matches the -wa-style format the precision check expects (first 3 cols = interval), so TP/FP/FN scoring stays comparable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions