Skip to content

Download reference data from UCSC for RefSeq #8

@wenbostar

Description

@wenbostar

The CDS and protein data were downloaded from UCSC on the same day with running the following code that had the following warning message:

library(PGA)
annotation_path <- tempdir()
pepfasta <- "~/Downloads/hg19_refGenePro.fa"
CDSfasta <- "~/Downloads/hg19_refGeneCDS.fa"
PrepareAnnotationRefseq2(genome='hg19', CDSfasta, pepfasta, annotation_path,
                         dbsnp=NULL, splice_matrix=FALSE, COSMIC=FALSE)
Build TranscriptDB object (txdb.sqlite) ... 
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
 done
Prepare gene/transcript/protein id mapping information (ids.RData) ...  done
Prepare exon annotation information (exon_anno.RData) ...  done
Prepare protein sequence (proseq.RData) ...  done
Prepare protein coding sequence (procodingseq.RData)...  done
Warning message:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable) :
  UCSC data anomaly in 433 transcript(s): the cds cumulative length is not a multiple of 3
  for transcripts ‘NM_033425’ ‘NM_006510’ ‘NM_001146344’ ‘NM_001010890’ ‘NM_001300891’
  ‘NM_001300891’ ‘NM_017940’ ‘NM_002537’ ‘NM_003954’ ‘NM_006510’ ‘NM_001278563’
  ‘NM_001291815’ ‘NM_001359231’ ‘NM_001354658’ ‘NM_001350198’ ‘NM_001243042’
  ‘NM_001243042’ ‘NM_002570’ ‘NM_001128590’ ‘NM_001271870’ ‘NM_001271872’ ‘NM_001329984’
  ‘NM_001037501’ ‘NM_001037675’ ‘NM_001277444’ ‘NM_001351365’ ‘NM_001297654’
  ‘NM_001288952’ ‘NM_001134939’ ‘NM_001301371’ ‘NM_153334’ ‘NM_001348286’ ‘NM_001348208’
  ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001289152’ ‘NM_199349’
  ‘NM_138324’ ‘NM_138323’ ‘NM_138322’ ‘NM_138319’ ‘NM_005671’ ‘NM_001143962’ ‘NM_000500’
  ‘NM_145171’ ‘NM_001318833’ ‘NM_006904� [... truncated]
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] PGA_1.13.3           rTANDEM_1.22.1       Rcpp_1.0.1
 [4] XML_3.98-1.20        data.table_1.12.2    Biostrings_2.50.2
 [7] XVector_0.22.0       GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[10] IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0              httr_1.4.0
 [3] bit64_0.9-7                 assertthat_0.2.1
 [5] BiocManager_1.30.4          blob_1.1.1
 [7] BSgenome_1.50.0             GenomeInfoDbData_1.2.0
 [9] Rsamtools_1.34.1            remotes_2.0.4
[11] progress_1.2.2              pillar_1.4.1
[13] RSQLite_2.1.1               lattice_0.20-38
[15] glue_1.3.1                  digest_0.6.19
[17] RColorBrewer_1.1-2          colorspace_1.4-1
[19] Matrix_1.2-17               plyr_1.8.4
[21] pkgconfig_2.0.2             pheatmap_1.0.12
[23] customProDB_1.22.1          biomaRt_2.38.0
[25] zlibbioc_1.28.0             purrr_0.3.2
[27] scales_1.0.0                processx_3.3.1
[29] BiocParallel_1.16.6         tibble_2.1.3
[31] ggplot2_3.2.0               AhoCorasickTrie_0.1.0
[33] SummarizedExperiment_1.12.0 GenomicFeatures_1.34.8
[35] lazyeval_0.2.2              magrittr_1.5
[37] crayon_1.3.4                memoise_1.1.0
[39] ps_1.3.0                    MASS_7.3-51.4
[41] RMariaDB_1.0.6.9000         tools_3.5.3
[43] prettyunits_1.0.2           hms_0.4.2
[45] matrixStats_0.54.0          stringr_1.4.0
[47] munsell_0.5.0               DelayedArray_0.8.0
[49] AnnotationDbi_1.44.0        ade4_1.7-13
[51] compiler_3.5.3              rlang_0.3.4
[53] grid_3.5.3                  RCurl_1.95-4.12
[55] VariantAnnotation_1.28.13   bitops_1.0-6
[57] gtable_0.3.0                curl_3.3
[59] DBI_1.0.0.9001              R6_2.4.0
[61] GenomicAlignments_1.18.1    Nozzle.R1_1.1-1
[63] dplyr_0.8.1                 rtracklayer_1.42.2
[65] seqinr_3.4-5                bit_1.1-14
[67] readr_1.3.1                 stringi_1.4.3
[69] tidyselect_0.2.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions