Skip to content

E-vill/uniprot_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uniprot-parser

This ruby parser aims to help you to parse your uniprot txt files, this gem is based on the uniprot documentation

Extract informations from proteome

To parse a local proteome file :

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt')

To parse the file with a progress bar (please note that to setup the verbose option slightly increase the treatment time) :

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt', verbose: true)

If you want to parse a big file (big proteom file for exemple) or if you want to parse only a certain part of the file

require 'uniprot_parser'

#will extract the 201th (INCLUDED) to the 500th (INCLUDED)
list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt', offset: 200, limit: 300)

Our parser propose some methods to automatically download proteom and protein files. Please note that there is no mechanisms to either check that the proteom or protein id really exists and to check the file integrity

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::download_and_parse('UP000005640')

Proteome object manipulation

A proteome is basically considered as a list of proteins. For exemple, let's find all the QN stretch from the human proteome

require 'uniprot_parser'

human_QNstretch = UniprotParser::download_and_parse('UP000005640').select{|e|e.sequence.match /(Q|N){7,}/ }

Proteins object manipulation

To extract protein ID :

#exemple for : ID   ADH_DROSE               Reviewed;         256 AA.
myprot.id 
#=> return 'ADH_DROSE'

To extract protein AC

#exemple for #AC   Q9GN94; B4HXI2; P07163;
myprot.first_ac 
#=> return 'Q9GN94'
myprot.secondary_ac
#=> return ['B4HXI2','P07163']

To extract GO term

#exemple for : DR   GO; GO:0005576; C:extracellular region; IEA:UniProtKB-SubCell.
myprot.go_term
#=> return a GO_term object
myprot.go_term[0].id
#=> return 'GO:0005576'
myprot.go_term[0].database
#=> 'UniProtKB-SubCell'
myprot.go_term[0].ontology_type
#=> return 'C'
myprot.go_term[0].description
#=> return 'extracellular region'
myprot.go_term[0].evidence
#=> return 'IEA'

To extract gene :

# exemple for GN   Name=Acp70A; Synonyms=PAPB; ORFNames=GM25408;
myprot.gene
#=> return a gene object
myprot.gene.name
#=> return 'Acp70A'
myprot.gene.orf
#=> return 'GM25408'

To extract orthology :

# return on orthologous group object,
# the cross referenced database for phylogenie are :
# - eggNOG ex : DR   eggNOG; ENOG410IEUN; Eukaryota.
# - GeneTree ex : DR   GeneTree; EMGT00050000006238; -.
# - HOGEMON ex : DR   HOGENOM; HBG282443; -.
# - HOVERGEN ex : DR   HOVERGEN; HBG057182; -.
# - InParanoid ex : DR   InParanoid; O04196; -.
# - KO ex : DR   KO; K09972; -.
# - OMA ex : DR   OMA; GLCHYFS; -.
# - OrthoDB ex : DR   OrthoDB; EOG94QWM6; -.
# - PhylomeDB ex : DR   PhylomeDB; A4WFL4; -.
# - TreeFam ex : DR   TreeFam; TF324882; -.
# exemple for DR   OrthoDB; EOG94QWM6; -.
myprot.orthology
# => return an orthologous_group object
puts myprot.orthology.database
# => return 'OrthoDB'
puts myprot.orthology.id
# => return 'EOG94QWM6'
puts myprot.orthology.information
# => return '-'

For more examples please refer to the documentation or test case in spec/

License

The code source is available as open source under the terms of the CeCILL-B License (fully compatible with BSD license ).

About

Ruby parser for uniprot files in pseudo stockholm format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages