转录组分析实战附录:Trinity 拼接结果质量控制

在第二节中,我们采用了Trinity工具做了转录组数据的拼接,我一共是6个样本6个G的数据量,在我那个设置下跑了接近30多个小时就完成了拼接工作。

那么今天的工作就是通过RSeQC这个软件对拼接结果进行一个质量控制与可视化

这个软件主要是针对于一些临床RNAseq的数据以及有参考基因组的数据,但是对没有参考基因组的RNAseq数据就很多Tool没有办法使用。

首先,通过bowtie2对得到的Trinity拼接好的fasta格式进行构建Index

yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7total 86667684
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
-rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
-rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
-rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
-rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
-rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
-rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
-rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
-rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
-rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
-rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
-rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
-rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
-rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
-rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2-build Trinity.fasta Trinity.fasta
Settings:
  Output files: "Trinity.fasta.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  Trinity.fasta
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:03
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
...
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 103828770 bytes to primary EBWT file: Trinity.fasta.rev.1.bt2
Wrote 55572488 bytes to secondary EBWT file: Trinity.fasta.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 222289920
    bwtLen: 222289921
    sz: 55572480
    bwtSz: 55572481
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 13893121
    offsSz: 55572484
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 1157761
    numLines: 1157761
    ebwtTotLen: 74096704
    ebwtTotSz: 74096704
    color: 0
    reverse: 1
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7
total 86822504
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
-rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
-rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
-rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
-rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
-rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
-rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
-rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
-rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
-rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
-rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
-rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
-rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
-rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
-rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
drwxrwxr-x 3 yeyt yeyt        4096 Sep 15 18:42 ../
drwxrwxr-x 5 yeyt yeyt        4096 Sep 15 20:51 ./
-rw-rw-r-- 1 yeyt yeyt     1984490 Sep 23 13:50 Trinity.fasta.3.bt2
-rw-rw-r-- 1 yeyt yeyt    55572480 Sep 23 13:50 Trinity.fasta.4.bt2
-rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:03 Trinity.fasta.2.bt2
-rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:16 Trinity.fasta.rev.2.bt2
-rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:03 Trinity.fasta.1.bt2
-rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:16 Trinity.fasta.rev.1.bt2
在最后生成的6个以bt2结尾的则是Index文件
接下来进行Bowtie2回贴并生成sam文件
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_2.P.fq.gz -S B251.sam

#最后生成的以下文件log:
#回贴B251的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_2.P.fq.gz -S B251.sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
28213701 reads; of these:
  28213701 (100.00%) were paired; of these:
    3865337 (13.70%) aligned concordantly 0 times
    2140365 (7.59%) aligned concordantly exactly 1 time
    22207999 (78.71%) aligned concordantly >1 times
    ----
    3865337 pairs aligned concordantly 0 times; of these:
      134400 (3.48%) aligned discordantly 1 time
    ----
    3730937 pairs aligned 0 times concordantly or discordantly; of these:
      7461874 mates make up the pairs; of these:
        2553395 (34.22%) aligned 0 times
        273693 (3.67%) aligned exactly 1 time
        4634786 (62.11%) aligned >1 times
95.47% overall alignment rate
#回贴B252的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B252_2.P.fq.gz -S B252.sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
24423445 reads; of these:
  24423445 (100.00%) were paired; of these:
    2755943 (11.28%) aligned concordantly 0 times
    2003579 (8.20%) aligned concordantly exactly 1 time
    19663923 (80.51%) aligned concordantly >1 times
    ----
    2755943 pairs aligned concordantly 0 times; of these:
      82738 (3.00%) aligned discordantly 1 time
    ----
    2673205 pairs aligned 0 times concordantly or discordantly; of these:
      5346410 mates make up the pairs; of these:
        1943923 (36.36%) aligned 0 times
        258490 (4.83%) aligned exactly 1 time
        3143997 (58.81%) aligned >1 times
96.02% overall alignment rate
#回贴R251的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R251_2.P.fq.gz -S R251sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
24498964 reads; of these:
  24498964 (100.00%) were paired; of these:
    2605874 (10.64%) aligned concordantly 0 times
    2058157 (8.40%) aligned concordantly exactly 1 time
    19834933 (80.96%) aligned concordantly >1 times
    ----
    2605874 pairs aligned concordantly 0 times; of these:
      68645 (2.63%) aligned discordantly 1 time
    ----
    2537229 pairs aligned 0 times concordantly or discordantly; of these:
      5074458 mates make up the pairs; of these:
        1920173 (37.84%) aligned 0 times
        259673 (5.12%) aligned exactly 1 time
        2894612 (57.04%) aligned >1 times
96.08% overall alignment rate
#回贴R252的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R252_2.P.fq.gz -S R252.sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
23929511 reads; of these:
  23929511 (100.00%) were paired; of these:
    3455581 (14.44%) aligned concordantly 0 times
    1770888 (7.40%) aligned concordantly exactly 1 time
    18703042 (78.16%) aligned concordantly >1 times
    ----
    3455581 pairs aligned concordantly 0 times; of these:
      132348 (3.83%) aligned discordantly 1 time
    ----
    3323233 pairs aligned 0 times concordantly or discordantly; of these:
      6646466 mates make up the pairs; of these:
        2061887 (31.02%) aligned 0 times
        216206 (3.25%) aligned exactly 1 time
        4368373 (65.72%) aligned >1 times
95.69% overall alignment rate
#回贴W251的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W251_2.P.fq.gz -S W251.sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
25553075 reads; of these:
  25553075 (100.00%) were paired; of these:
    3705332 (14.50%) aligned concordantly 0 times
    2003416 (7.84%) aligned concordantly exactly 1 time
    19844327 (77.66%) aligned concordantly >1 times
    ----
    3705332 pairs aligned concordantly 0 times; of these:
      163553 (4.41%) aligned discordantly 1 time
    ----
    3541779 pairs aligned 0 times concordantly or discordantly; of these:
      7083558 mates make up the pairs; of these:
        2021254 (28.53%) aligned 0 times
        226959 (3.20%) aligned exactly 1 time
        4835345 (68.26%) aligned >1 times
96.04% overall alignment rate
#回贴W252的双端测序结果
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W252_2.P.fq.gz -S W252.sam
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_PAPER = "zh_CN.UTF-8",
        LC_ADDRESS = "zh_CN.UTF-8",
        LC_MONETARY = "zh_CN.UTF-8",
        LC_NUMERIC = "zh_CN.UTF-8",
        LC_TELEPHONE = "zh_CN.UTF-8",
        LC_IDENTIFICATION = "zh_CN.UTF-8",
        LC_MEASUREMENT = "zh_CN.UTF-8",
        LC_TIME = "zh_CN.UTF-8",
        LC_NAME = "zh_CN.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
24577100 reads; of these:
  24577100 (100.00%) were paired; of these:
    3173490 (12.91%) aligned concordantly 0 times
    1898984 (7.73%) aligned concordantly exactly 1 time
    19504626 (79.36%) aligned concordantly >1 times
    ----
    3173490 pairs aligned concordantly 0 times; of these:
      112017 (3.53%) aligned discordantly 1 time
    ----
    3061473 pairs aligned 0 times concordantly or discordantly; of these:
      6122946 mates make up the pairs; of these:
        2060673 (33.65%) aligned 0 times
        226885 (3.71%) aligned exactly 1 time
        3835388 (62.64%) aligned >1 times
95.81% overall alignment rate
这个过程比较消耗时间,我们于此同时做个简单质量控制报告
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ $TRINITY_HOME/util/TrinityStats.pl Trinity.fasta > Trinitystats.log 
#输出到Trinitystats.log文件
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ cat Trinitystats.log 


################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  110851
Total trinity transcripts:  220498
Percent GC: 42.98

########################################
Stats based on ALL transcript contigs:
########################################

   Contig N10: 4369
   Contig N20: 3291
   Contig N30: 2640
   Contig N40: 2183
   Contig N50: 1802

   Median contig length: 542
   Average contig: 1008.13
   Total assembled bases: 222289920


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

   Contig N10: 3997
   Contig N20: 2867
   Contig N30: 2195
   Contig N40: 1663
   Contig N50: 1157

   Median contig length: 364
   Average contig: 686.86
   Total assembled bases: 76139520

解释一下上面的结果。

首先做一个概括 拼接得到多少个基因,得到多少个转录本
然后平均的GC含量是多少
接下来做一个两个工作
一个是基于所有转录本的contig统计
一个是基于所有基因的统计
N50代表的是

接下来我们将把得到的sam结果转化成bam结果并进行排序以提供后期的分析文件

yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ls *sam | grep '25' |xargs -I [] echo 'samtools view -bS [] | samtools sort -o [].sorted.bam ' > samtoolssort.sh
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ cat samtoolssort.sh 
samtools view -bS B251.sam | samtools sort -o B251.sam.sorted.bam 
samtools view -bS B252.sam | samtools sort -o B252.sam.sorted.bam 
samtools view -bS R251sam | samtools sort -o R251sam.sorted.bam 
samtools view -bS R252.sam | samtools sort -o R252.sam.sorted.bam 
samtools view -bS W251.sam | samtools sort -o W251.sam.sorted.bam 
samtools view -bS W252.sam | samtools sort -o W252.sam.sorted.bam
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bash samtoolssort.sh 
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bash samtoolssort.sh                                    [bam_sort_core] merging from 41 files...
[bam_sort_core] merging from 36 files...
[bam_sort_core] merging from 36 files...
[bam_sort_core] merging from 35 files...
[bam_sort_core] merging from 38 files...
[bam_sort_core] merging from 36 files...
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7
total 234179528
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
-rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
-rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
-rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
-rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
-rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
-rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
-rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
-rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
-rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
-rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
-rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
-rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
-rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
-rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
-rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
-rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
drwxrwxr-x 3 yeyt yeyt        4096 Sep 15 18:42 ../
-rw-rw-r-- 1 yeyt yeyt         821 Sep 23 15:17 Trinitystats.log
-rw-rw-r-- 1 yeyt yeyt     1984490 Sep 23 13:50 Trinity.fasta.3.bt2
-rw-rw-r-- 1 yeyt yeyt    55572480 Sep 23 13:50 Trinity.fasta.4.bt2
-rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:03 Trinity.fasta.2.bt2
-rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:16 Trinity.fasta.rev.2.bt2
-rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:03 Trinity.fasta.1.bt2
-rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:16 Trinity.fasta.rev.1.bt2
-rw-rw-r-- 1 yeyt yeyt         400 Sep 24 13:22 samtoolssort.sh
-rw-rw-r-- 1 yeyt yeyt  3049959975 Sep 24 15:37 R252.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt  3181086895 Sep 24 16:44 W252.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt  3192193677 Sep 24 15:06 R251.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt  3206939510 Sep 24 14:33 B252.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt  3267705730 Sep 24 16:11 W251.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt  3655386513 Sep 24 14:01 B251.sam.sorted.bam
-rw-rw-r-- 1 yeyt yeyt 20770276094 Sep 24 01:49 R252.sam
-rw-rw-r-- 1 yeyt yeyt 21235142607 Sep 24 02:03 B252.sam
-rw-rw-r-- 1 yeyt yeyt 21293400430 Sep 24 02:07 R251sam
-rw-rw-r-- 1 yeyt yeyt 21346715631 Sep 24 02:15 W252.sam
-rw-rw-r-- 1 yeyt yeyt 22197735984 Sep 24 02:29 W251.sam
-rw-rw-r-- 1 yeyt yeyt 24496840308 Sep 24 03:04 B251.sam
这样我们就得到了6个sort后的bam文件

采用以下工具

bam_stat.py

clipping_profile.py

inner_distance.py

read_duplication.py

read_GC.py

你可能感兴趣的:(转录组分析实战附录:Trinity 拼接结果质量控制)