Skip to content

Help to identify differentially edited sites - RRD - with jacusaHelper #43

@CelineLabbeCurie

Description

@CelineLabbeCurie

Hello,

I'm comparing two RNA samples (RRD - RNA RNA differences).
As Michael Piechotta advised, I've filtered all SNPs on the output of Jacusa2 with bedtools.

Here is an exemple of my output after removing the SNPs but before any other filters :

#contig     start end   name  score strand      bases11     bases12     bases21     bases22     info  filter      ref
chr1  14470 14471 call-2      0.7647364319592498      -     3,0,53,0    2,0,36,0    0,0,31,0    1,0,33,0    *     *     G
chr1  14484 14485 call-2      0.3625563409707411      -     0,0,63,0    0,0,42,0    1,0,33,0    0,0,41,0    *     *     G
chr1  14487 14488 call-2      0.30562855413927537     -     63,1,0,0    46,0,0,0    37,0,0,0    44,0,0,0    *     D     A
chr1  14488 14489 call-2      0.33983885716202167     -     0,0,65,0    0,0,47,0    0,0,39,0    1,0,47,0    *     D     G
chr1  14490 14491 call-2      1.306677796788108 -     0,0,79,0    3,0,55,0    0,0,47,0    0,0,60,0    *     D     G
chr1  14491 14492 call-2      0.024052593029637137    -     0,0,82,0    3,0,56,0    1,0,48,0    2,0,65,0    *     *     G

I'm not sure if I understand correctly. The score assesses the difference between the two conditions ? The higher it is the bigger is the differences between my conditions ?

I've tried to apply some filters :

  • Coverage >= 2
  • Remove sites with more than 2 observed bases
  • Retain only robust sites (not sure what's that entailed -> dplyr::filter(robust(bases)))
  • Remove Artefacts (D option of JACUSA)

In R, I obtained this kind of output :

>head(jacusa.filt)
GRanges object with 6 ranges and 9 metadata columns:
      seqnames    ranges strand |        name     score        info      filter         ref                                               bases         cov
         <Rle> <IRanges>  <Rle> | <character> <numeric> <character> <character> <character>                                            <tbl_df>    <tbl_df>
  [1]     chr1     14471      - |      call-2 0.7647364           *           *           G  3: 0:53:...: 2: 0:36:...: 0: 0:31:...: 1: 0:33:... 56:38:31:34
  [2]     chr1     14492      - |      call-2 0.0240526           *           *           G  0: 0:82:...: 3: 0:56:...: 1: 0:48:...: 2: 0:65:... 82:59:49:67
  [3]     chr1     14522      - |      call-2 0.0243439           *           *           C  0:47: 0:...: 0:44: 0:...: 0:33: 0:...: 0:50: 0:... 79:70:56:77
  [4]     chr1     14574      - |      call-2 0.7674580           *           *           T  0:30: 0:...: 0:29: 0:...: 0:27: 0:...: 0:40: 0:... 78:62:60:75
  [5]     chr1     14604      - |      call-2 0.0177258           *           *           T  0: 1: 0:...: 0: 3: 0:...: 0: 1: 0:...: 0: 3: 0:... 59:39:42:43
  [6]     chr1     14610      - |      call-2 0.1768375           *           *           A 57: 0: 1:...:39: 0: 2:...:41: 0: 1:...:38: 0: 3:... 58:41:42:41
           FALSE.        bc
      <character> <integer>
  [1]                     2
  [2]                     2
  [3]                     2
  [4]                     2
  [5]                     2
  [6]                     2

What I'd like, is to have a list of sites (positions, strand, score...) differentially edited in my test condition. Could you please guide me ? After all that've done, how do I do that ? A filter on the score ? How can I determine a good threshold ? A filter on the edited bases ? I'm searching for C>T... Does it depend on the strand ? Is there anymore step to do to identify the differentially edited sites ?

I'm a little lost... I'd appreciate any help :)
Thanks !
Best,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions