Skip to content

Image features and important Metadata fields missing from parquets #425

@kate-bowers-broad

Description

@kate-bowers-broad

Hi there! I have been trying out CytoTable for the first time, following the tutorial for turning CellProfiler analysis CSVs into Parquet files (https://cytomining.github.io/CytoTable/tutorials/cellprofiler_to_parquet.html).

My code:

from cytotable import convert
import pandas as pd
import pyarrow.parquet as pq

source_path = "s3://cellpainting-gallery/cpg0037-oasis/broad/workspace/analysis/2025_04_14_OASIS_U2OS_Industry_Batch1/BR00147139/analysis/BR00147139-A01-1/"
convert(
    source_path=source_path,
    source_datatype="csv",
    dest_path="cytotable_kb3",
    dest_datatype="parquet",
    concat=True,
    compartments=("cells", "nuclei", "cytoplasm", "image"),
    preset="cellprofiler_csv",
    no_sign_request=True,
    join=True,
    parsl_config=None
)

When I compare the columns in this parquet file to the columns in the backends CSV made for this plate by pycytominer collate.py, I see that there are 1300+ columns missing from the parquet file. These include Image measurements (ie Image_Granularity measurements, Image_Texture measurements, etc), Metadata_Plate,Metadata_Well, Metadata_Site_Count,Metadata_Object_Count, and all the Counts like Metadata_Count_Cells.

The pycytominer-made backends CSV I compared to is here: s3://cellpainting-gallery/cpg0037-oasis/broad/workspace/backend/2025_04_14_OASIS_U2OS_Industry_Batch1/BR00147139/BR00147139.csv

I looked through the Cytotable documentation, but I couldn't figure out how to get these metadata and image measurements columns in my Cytotable parquet files. Am I missing a setting or command here? Thanks very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions