5. Output files

FusionVet generates 5 files

  • prefix.fusion.sorted.bam
  • prefix.fusion.sorted.bam.bai
  • prefix.fusion.bed
  • prefix.fusion.interact.bed
  • prefix.fusion.summary.txt

5.1. prefix.fusion.sorted.bam

BAM file containing the chimeric reads supporting gene fusions. Comparing to the orignal BAM file, two additional tags are added to each alignment record: FN (Fusion Name) and SR (Supporting Read)

  • SR:i:1 : Fusion was supported by split read
  • SR:i:2 : Fusion was supported by paired reads
  • SR:I:3 : Fusion was supported by both split read and paired reads.
$ samtools view out.fusion.sorted.bam | head -10
UNC13-SN749:172:D101FACXX:8:1104:12580:173001/1        99      chr21   39775575        66      48M     =       42879910        -3104288        CTTTCACCGCCCACTCCAGCCACTGCCGCACATGGTCTGTACTCCATA        CCCFFFFFHHHHHJJJJIIJJIJJJJIJIIJJJJJFHGJGFHIHIJJJ        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:1104:4678:34964/2  163     chr21   39817326        66      48M     =       42879890        -3062517        CCTTGAGCCATTCACCTGGCTAGGGTTACATTCCATTTTGATGGTGAC        CCCFFFDFHHHHBGHIJJJJJJIIJ?GIIGIJGGGIJJJJJJJJIFDG        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:1208:10044:4367/1  99      chr21   39817340        66      48M     =       42880015        -3062628        CCTGGCTAGGGTTACATTCCATTTTGATGGTGACCCTGGCTGGGGGTT        CCCFFFFFHHHFFIJJIJJJIIJJJJIIJJHJJJJIJJJIJJJJIJI>        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:1301:12176:174226/2        163     chr21   39817361        66      48M     =       42879922        -3062514        TTTTGATGGTGACCCTGGCTGGGGGTTGAGACAGCCAATCCTGCTGAG        BCCFFFFFHFHHHJJJJJJJJJJJJFHIIIJJIIJJJJJJJJIJIJJJ        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:2201:10011:20671/1 99      chr21   39817379        66      48M     =       42879951        -3062525        CTGGGGGTTGAGACAGCCAATCCTGCTGAGGGACGCGTGGGCTCATCT        CCCFFFFDHHHGHJJJJJJJJJJJIIJJJJJJIJIJGHHHFFFDEEEE        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:1108:17583:42031/2 163     chr21   39817384        65      48M     =       42880007        -3062576        GGTTGAGACAGCCAATCCTGCTGAGGGACGCGTGGGCTCATCTTGGAA        ?@;BDFDABFFDHHAFHHGHIIIJGIIJGIAE?@6;FGH@DDCC@CA#        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:2302:8715:52295/1  99      chr21   39817385        66      48M     =       42879932        -3062500        GTTGAGACAGCCAATCCTGCTGAGGGACGCGTGGGCTCATCTTGGAAG        CCCFFFFFHHHHHJJJJJHJJJJJJJJJJJJFHIJIIJGIJIJIIJIJ        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:2305:11177:45091/1 99      chr21   39817385        66      48M     =       42880014        -3062582        GTTGAGACAGCCAATCCTGCTGAGGGACGCGTGGGCTCATCTTGGAAG        B@CFFFFFHHHHHJJJJJJJJJJJJJIJJJJHJJJJJJJJJJJJIJJG        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:2306:12796:14838/2 163     chr21   39817391        53      48M     =       42879889        -3062451        ACAGCCAATCCTGCTGAGGGACGCGTGGGCTCATCTTGGAAGTCTGTA        @CCFFFFFHHHGHJJJJJJJJJJJJHGIJIJJJJJJJJIIIJHHJ###        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:1  SR:i:2  FN:Z:ERG--TMPRSS2
UNC13-SN749:172:D101FACXX:8:1308:12672:71749/1 99      chr21   39817394        66      48M     =       42880007        -3062566        GCCAATCCTGCTGAGGGACGCGTGGGCTCATCTTGGAAGTCTGTCCAT        ?@@FDDDFADF?D@AAB?ACGAHHEHG@BFHIGHBB=8=88@C=@@CE        RG:Z:120508_UNC13-SN749_0172_AD101FACXX_8_CGATGT        IH:i:1  HI:i:1  NM:i:0  SR:i:2  FN:Z:ERG--TMPRSS2

5.2. prefix.fusion.sorted.bam.bai

The index file of prefix.fusion.sorted.bam

5.3. prefix.fusion.bed

This is standard BED12 format file. Paired reads are merged into a single BED entry. This file can be uploaded to UCSC genome browser to visualize intra-chromosomal fusions. This is useful to identify the fusion point. If this file is too large to upload to UCSC genome browser directly, you could try to convert this BED file into bigBed file (using the bedToBigBed program) following this instruction.

5.4. prefix.fusion.interact.bed

This is Interact format file. This file can be uploaded to UCSC genome browser to visualize both intra-chromosomal and inter-chromosomal fusions. If this file is too large to upload to UCSC genome browser directly, you could try to convert this Interact file into bigInteract file (using the bedToBigBed program) following this instruction.

Intra-chromosomal fusions will be visualized as below (Note the two breaking points on ERG gene). Toggle between full display mode and pack/squish display mode help identify the exact breaking point(s).

_images/ERG_TMPRSS2.png

Inter-chromosomal fusions will be visualized as below. Toggle between full display mode and pack/squish display mode help identify the exact breaking point(s).

_images/Inter_chrom.png

5.5. prefix.fusion.summary.txt

Report the total number of supporting RNA fragments (split reads + read pairs) for each fusion.

Sample_ID       ERG--TMPRSS2
Tumor_RNA_TCGA-HC-7819-01A-11R-2118-07.bam      48