fix type incompatible bug when nochr is used #237

RunpengLuo · 2025-02-24T23:06:46Z

combin_counts.py always use str type for #CHR, as shown below.

hatchet/src/hatchet/utils/combine_counts.py

Lines 110 to 112 in 3729a35

    
           bbs = [pd.read_table(bb, dtype={"#CHR": str}) for bb in outfiles] 
        
           big_bb = pd.concat(bbs) 
        
           big_bb = big_bb.sort_values(by=["#CHR", "START", "SAMPLE"])

When nochr is enabled (reference/bam file doesn't have chr prefix), to_dataframe() in rd_gccorrect.py will automatically use int64 dtype for column #CHR, which is incompatible with #CHR from dataframe bb, as shown below and from the issue #236.

hatchet/src/hatchet/utils/rd_gccorrect.py

Lines 20 to 32 in 3729a35

    
           bb = bb.merge( 
        
               BedTool.from_dataframe(bb[["#CHR", "START", "END"]].drop_duplicates()) 
        
               .nucleotide_content(fi=ref_genome) 
        
               .to_dataframe(disable_auto_names=True) 
        
               .rename( 
        
                   columns={ 
        
                       "#1_usercol": "#CHR", 
        
                       "2_usercol": "START", 
        
                       "3_usercol": "END", 
        
                       "5_pct_gc": "GC", 
        
                   } 
        
               )[["#CHR", "START", "END", "GC"]] 
        
           )

This fix converts the dtype of #CHR to str post to_dataframe().

RunpengLuo added 2 commits February 24, 2025 17:52

force #CHR be str type in gccorrect.py

9ad566c

ruff code format

a8aed80

RunpengLuo mentioned this pull request Feb 25, 2025

ValueError: The specified BB file does not exist! #236

Closed

RunpengLuo requested a review from simozacca February 25, 2025 00:17

simozacca approved these changes Feb 26, 2025

View reviewed changes

RunpengLuo merged commit 39b9ffa into master Feb 26, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix type incompatible bug when nochr is used #237

fix type incompatible bug when nochr is used #237

Uh oh!

RunpengLuo commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	bbs = [pd.read_table(bb, dtype={"#CHR": str}) for bb in outfiles]
	big_bb = pd.concat(bbs)
	big_bb = big_bb.sort_values(by=["#CHR", "START", "SAMPLE"])

	bb = bb.merge(
	BedTool.from_dataframe(bb[["#CHR", "START", "END"]].drop_duplicates())
	.nucleotide_content(fi=ref_genome)
	.to_dataframe(disable_auto_names=True)
	.rename(
	columns={
	"#1_usercol": "#CHR",
	"2_usercol": "START",
	"3_usercol": "END",
	"5_pct_gc": "GC",
	}
	)[["#CHR", "START", "END", "GC"]]
	)

fix type incompatible bug when nochr is used #237

fix type incompatible bug when nochr is used #237

Uh oh!

Conversation

RunpengLuo commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants