uniprot-parser

This ruby parser aims to help you to parse your uniprot txt files, this gem is based on the uniprot documentation

Extract informations from proteome

To parse a local proteome file :

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt')

To parse the file with a progress bar (please note that to setup the verbose option slightly increase the treatment time) :

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt', verbose: true)

If you want to parse a big file (big proteom file for exemple) or if you want to parse only a certain part of the file

require 'uniprot_parser'

#will extract the 201th (INCLUDED) to the 500th (INCLUDED)
list_of_all_protein_from_the_file = UniprotParser::parse('/path/to/UP000000.txt', offset: 200, limit: 300)

Our parser propose some methods to automatically download proteom and protein files. Please note that there is no mechanisms to either check that the proteom or protein id really exists and to check the file integrity

require 'uniprot_parser'

list_of_all_protein_from_the_file = UniprotParser::download_and_parse('UP000005640')

Proteome object manipulation

A proteome is basically considered as a list of proteins. For exemple, let's find all the QN stretch from the human proteome

require 'uniprot_parser'

human_QNstretch = UniprotParser::download_and_parse('UP000005640').select{|e|e.sequence.match /(Q|N){7,}/ }

Proteins object manipulation

To extract protein ID :

#exemple for : ID   ADH_DROSE               Reviewed;         256 AA.
myprot.id 
#=> return 'ADH_DROSE'

To extract protein AC

#exemple for #AC   Q9GN94; B4HXI2; P07163;
myprot.first_ac 
#=> return 'Q9GN94'
myprot.secondary_ac
#=> return ['B4HXI2','P07163']

To extract GO term

#exemple for : DR   GO; GO:0005576; C:extracellular region; IEA:UniProtKB-SubCell.
myprot.go_term
#=> return a GO_term object
myprot.go_term[0].id
#=> return 'GO:0005576'
myprot.go_term[0].database
#=> 'UniProtKB-SubCell'
myprot.go_term[0].ontology_type
#=> return 'C'
myprot.go_term[0].description
#=> return 'extracellular region'
myprot.go_term[0].evidence
#=> return 'IEA'

To extract gene :

# exemple for GN   Name=Acp70A; Synonyms=PAPB; ORFNames=GM25408;
myprot.gene
#=> return a gene object
myprot.gene.name
#=> return 'Acp70A'
myprot.gene.orf
#=> return 'GM25408'

To extract orthology :

# return on orthologous group object,
# the cross referenced database for phylogenie are :
# - eggNOG ex : DR   eggNOG; ENOG410IEUN; Eukaryota.
# - GeneTree ex : DR   GeneTree; EMGT00050000006238; -.
# - HOGEMON ex : DR   HOGENOM; HBG282443; -.
# - HOVERGEN ex : DR   HOVERGEN; HBG057182; -.
# - InParanoid ex : DR   InParanoid; O04196; -.
# - KO ex : DR   KO; K09972; -.
# - OMA ex : DR   OMA; GLCHYFS; -.
# - OrthoDB ex : DR   OrthoDB; EOG94QWM6; -.
# - PhylomeDB ex : DR   PhylomeDB; A4WFL4; -.
# - TreeFam ex : DR   TreeFam; TF324882; -.
# exemple for DR   OrthoDB; EOG94QWM6; -.
myprot.orthology
# => return an orthologous_group object
puts myprot.orthology.database
# => return 'OrthoDB'
puts myprot.orthology.id
# => return 'EOG94QWM6'
puts myprot.orthology.information
# => return '-'

For more examples please refer to the documentation or test case in spec/

License

The code source is available as open source under the terms of the CeCILL-B License (fully compatible with BSD license ).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
spec		spec
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
uniprot_parser.gemspec		uniprot_parser.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

uniprot-parser

Extract informations from proteome

Proteome object manipulation

Proteins object manipulation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

uniprot-parser

Extract informations from proteome

Proteome object manipulation

Proteins object manipulation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages