-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Issue Description
I am trying to subset my pod5 files as described on the nanopore website for duplex calling. .
When I enter pod5 subset -r pod5_skip/ --summary summary.tsv --columns channel --output /scratch/raw/run_06_09_25_split_by_channel I get the error:
POD5 has encountered an error: ''return_dtype' of function python_udf must be set
A later expression might fail because the output type is not known. Set return_dtype=pl.self_dtype() if the type is unchanged, or set the proper output data type.
Resolved plan until failure:
---> FAILED HERE RESOLVING 'sink' <---
SELECT [col("__dest_fname").unique()]
WITH_COLUMNS:
["/scratch/raw/run_06_09_25_split_by_channel".str.concat_horizontal([col("__dest_fname")]).strict_cast(Categorical(Categories { name: "", namespace: "", physical: U32 }, CategoricalMapping { max_categories: 4294967295, upper_bound: 1 })).alias("__dest_fname")]
WITH_COLUMNS:
["channel-".str.concat_horizontal([col("channel"), ".pod5"]).strict_cast(Categorical(Categories { name: "", namespace: "", physical: U32 }, CategoricalMapping { max_categories: 4294967295, upper_bound: 1 })).alias("__dest_fname"), col("read_id").alias("__read_id")]
DF ["read_id", "channel"]; PROJECT */2 COLUMNS'
For detailed information set POD5_DEBUG=1'
Logs
Setting POD5_DEBUG=1 does not change the error reporting behavior.
Specifications
- Pod5 Version: 0.3.28
- Python Version: 3.12.9
- Platform: Ubuntu 22.04 LTS
Metadata
Metadata
Assignees
Labels
No labels