Download reference data from UCSC for RefSeq

The CDS and protein data were downloaded from UCSC on the same day with running the following code that had the following warning message:

```r
library(PGA)
annotation_path <- tempdir()
pepfasta <- "~/Downloads/hg19_refGenePro.fa"
CDSfasta <- "~/Downloads/hg19_refGeneCDS.fa"
PrepareAnnotationRefseq2(genome='hg19', CDSfasta, pepfasta, annotation_path,
                         dbsnp=NULL, splice_matrix=FALSE, COSMIC=FALSE)
```

```
Build TranscriptDB object (txdb.sqlite) ... 
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
 done
Prepare gene/transcript/protein id mapping information (ids.RData) ...  done
Prepare exon annotation information (exon_anno.RData) ...  done
Prepare protein sequence (proseq.RData) ...  done
Prepare protein coding sequence (procodingseq.RData)...  done
Warning message:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable) :
  UCSC data anomaly in 433 transcript(s): the cds cumulative length is not a multiple of 3
  for transcripts ‘NM_033425’ ‘NM_006510’ ‘NM_001146344’ ‘NM_001010890’ ‘NM_001300891’
  ‘NM_001300891’ ‘NM_017940’ ‘NM_002537’ ‘NM_003954’ ‘NM_006510’ ‘NM_001278563’
  ‘NM_001291815’ ‘NM_001359231’ ‘NM_001354658’ ‘NM_001350198’ ‘NM_001243042’
  ‘NM_001243042’ ‘NM_002570’ ‘NM_001128590’ ‘NM_001271870’ ‘NM_001271872’ ‘NM_001329984’
  ‘NM_001037501’ ‘NM_001037675’ ‘NM_001277444’ ‘NM_001351365’ ‘NM_001297654’
  ‘NM_001288952’ ‘NM_001134939’ ‘NM_001301371’ ‘NM_153334’ ‘NM_001348286’ ‘NM_001348208’
  ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001289152’ ‘NM_199349’
  ‘NM_138324’ ‘NM_138323’ ‘NM_138322’ ‘NM_138319’ ‘NM_005671’ ‘NM_001143962’ ‘NM_000500’
  ‘NM_145171’ ‘NM_001318833’ ‘NM_006904� [... truncated]
```

```r
sessionInfo()
```
```
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] PGA_1.13.3           rTANDEM_1.22.1       Rcpp_1.0.1
 [4] XML_3.98-1.20        data.table_1.12.2    Biostrings_2.50.2
 [7] XVector_0.22.0       GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[10] IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0              httr_1.4.0
 [3] bit64_0.9-7                 assertthat_0.2.1
 [5] BiocManager_1.30.4          blob_1.1.1
 [7] BSgenome_1.50.0             GenomeInfoDbData_1.2.0
 [9] Rsamtools_1.34.1            remotes_2.0.4
[11] progress_1.2.2              pillar_1.4.1
[13] RSQLite_2.1.1               lattice_0.20-38
[15] glue_1.3.1                  digest_0.6.19
[17] RColorBrewer_1.1-2          colorspace_1.4-1
[19] Matrix_1.2-17               plyr_1.8.4
[21] pkgconfig_2.0.2             pheatmap_1.0.12
[23] customProDB_1.22.1          biomaRt_2.38.0
[25] zlibbioc_1.28.0             purrr_0.3.2
[27] scales_1.0.0                processx_3.3.1
[29] BiocParallel_1.16.6         tibble_2.1.3
[31] ggplot2_3.2.0               AhoCorasickTrie_0.1.0
[33] SummarizedExperiment_1.12.0 GenomicFeatures_1.34.8
[35] lazyeval_0.2.2              magrittr_1.5
[37] crayon_1.3.4                memoise_1.1.0
[39] ps_1.3.0                    MASS_7.3-51.4
[41] RMariaDB_1.0.6.9000         tools_3.5.3
[43] prettyunits_1.0.2           hms_0.4.2
[45] matrixStats_0.54.0          stringr_1.4.0
[47] munsell_0.5.0               DelayedArray_0.8.0
[49] AnnotationDbi_1.44.0        ade4_1.7-13
[51] compiler_3.5.3              rlang_0.3.4
[53] grid_3.5.3                  RCurl_1.95-4.12
[55] VariantAnnotation_1.28.13   bitops_1.0-6
[57] gtable_0.3.0                curl_3.3
[59] DBI_1.0.0.9001              R6_2.4.0
[61] GenomicAlignments_1.18.1    Nozzle.R1_1.1-1
[63] dplyr_0.8.1                 rtracklayer_1.42.2
[65] seqinr_3.4-5                bit_1.1-14
[67] readr_1.3.1                 stringi_1.4.3
[69] tidyselect_0.2.5
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download reference data from UCSC for RefSeq #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Download reference data from UCSC for RefSeq #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions