BED (Browser Extensible Data) is the standard format for representing genomic intervals. This skill covers creating, reading, validating, and converting BED files using bedtools (CLI) and pybedtools (Python).
# bedtools (CLI)
conda install -c bioconda bedtools
# pybedtools (Python)
pip install pybedtoolsTell your AI agent what you want to do:
- "Create a BED file from my list of peak coordinates"
- "Validate my BED file format and fix any issues"
- "Convert my GFF file to BED format"
"Create a BED file from this DataFrame with columns chr, start, end, name" "Generate a BED4 file from my peak coordinates stored in peaks.csv"
"Check if my peaks.bed file has valid BED format" "Validate the coordinate system in my annotation.bed file"
"Convert my GTF annotations to BED format" "Convert my VCF variants to BED intervals, adjusting for coordinate systems"
"Sort my BED file by chromosome and position" "Filter my BED file to keep only intervals larger than 500bp"
- Load the BED file or create one from your data
- Validate format (check coordinates, column count, sorting)
- Apply requested operations (filter, sort, convert)
- Save the output with proper formatting
BED uses 0-based, half-open coordinates:
| System | First base | Interval 100-200 |
|---|---|---|
| BED (0-based) | 0 | Bases 100-199 |
| GFF/VCF (1-based) | 1 | Bases 100-200 |
When converting:
- BED to GFF: add 1 to start
- GFF to BED: subtract 1 from start
| Columns | Name | Required Fields |
|---|---|---|
| BED3 | Minimal | chr, start, end |
| BED4 | Named | + name |
| BED5 | Scored | + score |
| BED6 | Stranded | + strand |
| BED12 | Full | + thick, rgb, blocks |
- Always sort BED files before operations requiring sorted input
- Check chromosome naming consistency (chr1 vs 1)
- Use
pybedtools.cleanup()after processing to remove temp files - Validate coordinate systems when combining data from different sources
- BED files are tab-separated; spaces will cause parsing errors