Data Types

Before you load your data, you may want to make sure you're using a supported format from the list below. The data type also determines the type of Track you will get.

Data preparation recipe

Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).
Create indices for data files that need it (check table below)
Convert file formats to get desired visuals (check table below)
Load data (see above)

Tip

Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.

Recommended file formats

This is a list of file formats that are recommended for different data types. See the full list of data types in the section below.

Data type	Recommended file format
Reference sequence	fasta
Annotation	GFF3
Read alignments	BAM
Variation	VCF
nucleotide coverage	TDF
Whole genome alignments	MAF
GenomeView Session	Session File
Syntenic Data	SYN

Supported data formats

There are supported formats for reference sequences, annotation, Whole genome alignments , Read alignments, Read coverage summary - continuous value data, Genome variation and diversity, Allele diversity

Reference sequence

Data type	File format	Index*	Max size**		Comments
			unindexed***	indexed
Reference sequence	fasta ^¤	Recommended Index FASTA	50 Mb	unlimited	GenomeView will query the user create index for you if you don't have one and the file is very large.
Reference sequence	embl, genbank	Not possible	50 Mb	--	EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.

Annotation

Data type	File format	Index*	Max size**		Comments
Annotation	gff ^¤	Not recommended Index GFF	50 Mb	unlimited
	embl, genbank	Not possible	50 Mb	--	EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.
	bed	Not recommended Index BED	50 Mb or less	unlimited	By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.
	ptt, tbl	Not possible	50 Mb or less	--	Other standard annotation formats GenomeView understands
	various formats	Not possible	50 Mb or less	--	GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan

Whole genome alignments

Data type	File format	Index*	Max size**		Comments
Multiple genome alignment	maf ^¤	Recommended	100 Mb	unlimited	GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file. MAF is the recommended file format for whole genome alignemnt of large/complex genomes
	multi-fasta ^¤	Not possible	100 Mb	--	Recommended for small/simple genomes with a near 1:1 relationship.
	aln, ClustalW	Not possible	100 Mb	--

Read alignments

Data type	File format	Index*	Max size**		Comments
Sequence read alignment	bam ^¤ Preparing read data	Required	--	unlimited	GenomeView will prompt you if there is no index and will create one for you. GenomeView can not automatically sort BAM files.
Sequence read alignment	MAQ, MapView, BroadSolexa	Not possible	100 Mb	--

Read coverage summary - continuous value data

Data type	File format	Index*	Max size**		Comments
Read coverage summary	tdf ^¤	Native	unlimited	unlimited	TDF files can be created with the bam2tdf tool that is available for download.
	bigwig	Native	unlimited	unlimited	This format can be used for any wig file, not just read coverage
	pileup	Required	--	unlimited	The pileup format becomes slow when you have extreme read depth (>5000 x coverage)
	wig	Not possible	50 Mb	--	We strongly recommend to convert your wig files to TDF. GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.

Genome variation and diversity

Data type	File format	Index*	Max size**		Comments
Genome variation	vcf ^¤	Not recommended	--	unlimited	It is recommended to run reducevcf on VCF prior to loading them, this will speed up the loading time significantly.

Allele diversity

Data type	File format	Index*	Max size**		Comments
Allele diversity summary	pileup ^¤	Required	--	unlimited	The pileup format becomes slow when you have extreme read depth (>5000 x coverage)

Syntenic

Data type	File format	Index*	Max size**		Comments
Syntenic Data	syn	no	--	unlimited	avoid more than 50 names as this would result in overflowing visualizations

* Indicates whether this file format can/should be indexed.

** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.

*** Unindexed data files can be gzip compressed.

¤ Recommended file format for this data type.

GenomeView Session

A session file allows you to organize a large number of data files and config options in a single file. Check the wiki page for details

Output formats

(Modified) annotations can be saved as either GFF or EMBL.

All data that is loaded can be exported in their original format. This will not include modifications.

Converting formats

For conversion we recommend to use picard or genometools.

We offer a few tools to convert files between formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Types

Data preparation recipe

Recommended file formats

Supported data formats

Reference sequence

Annotation

Whole genome alignments

Read alignments

Read coverage summary - continuous value data

Genome variation and diversity

Allele diversity

Syntenic

GenomeView Session

Output formats

Converting formats

Other documentation

FilesExpand file tree

DataTypes.md

Latest commit

History

DataTypes.md

File metadata and controls

Data Types

Data preparation recipe

Recommended file formats

Supported data formats

Reference sequence

Annotation

Whole genome alignments

Read alignments

Read coverage summary - continuous value data

Genome variation and diversity

Allele diversity

Syntenic

GenomeView Session

Output formats

Converting formats

Other documentation