转录组分析实战第二节:无参考基因转录组拼接

Trinity是广泛应用的不依赖基因组的转录组分析工具

我们在这一节中将采用Trinity在服务器端对第一节中获得的cleandata进行转录组拼接

在此过程中我们会涉及软件安装、环境变量配置、转录组reads的拼接等操作

1. 首先是Trinity软件的安装

首先到本地进行下载该软件后传到服务器端
到Trinity的Github仓库中下载软件就好了.
然后用scp命令放到服务器端就好了
放过去后解压进入文件夹,并开始make
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make
#这个make依赖于cmake,如果没有cmake的就要自己安装好了放到环境变量PATH就好了
#出现一下的结果就是make好了的结果
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Performing Unit Tests of Build
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inchworm:                has been Installed Properly
Chrysalis:               has been Installed Properly
QuantifyGraph:           has been Installed Properly
GraphFromFasta:          has been Installed Properly
ReadsToTranscripts:      has been Installed Properly
parafly:                 has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ 

2. 开始配置perl路径

yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ cd ..
yeyt@ubuntu:~/biosoft$ which Trinity 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
yeyt@ubuntu:~/biosoft$ Trinity -h
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en_US:en",
    LC_ALL = (unset),
    LC_PAPER = "zh_CN.UTF-8",
    LC_ADDRESS = "zh_CN.UTF-8",
    LC_MONETARY = "zh_CN.UTF-8",
    LC_NUMERIC = "zh_CN.UTF-8",
    LC_TELEPHONE = "zh_CN.UTF-8",
    LC_IDENTIFICATION = "zh_CN.UTF-8",
    LC_MEASUREMENT = "zh_CN.UTF-8",
    LC_TIME = "zh_CN.UTF-8",
    LC_NAME = "zh_CN.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").



###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.8.3


#
#
# Required:
#
#  --seqType       :type of reads: ('fa' or 'fq')
#
#  --max_memory       :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left      :left reads, one or more file names (separated by commas, no spaces)
#      --right     :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single    :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file          tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make plugins
#安装插件
## Checking plugin installations:

slclust:                 has been Installed Properly
collectl:                has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ 
#这样就可以了,接下来检查依赖的软件
#bowtie2
#jellyfish
#salmon
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which bowtie2
/opt/biosoft/bowtie2-2.2.9//bowtie2
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which jellyfish 
/opt/biosoft/jellyfish-2.2.3/bin//jellyfish
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which salmon 
/opt/biosoft/salmon/bin//salmon
#没有问题就继续往后面做
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ pwd
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo $TRINITY_HOME 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
#配置环境变量
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
yeyt@ubuntu:~/biosoft$ echo 'export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH' >> ~/.bashrc 
yeyt@ubuntu:~/biosoft$ source ~/.bashrc
yeyt@ubuntu:~/biosoft$ which Trinity 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
可以通过调用Trinity就达到目的了

3. 运行Trinity

我们在这个地方用 Nature Protocol 上面的方法进行处理
1) 数据的质量控制与清理(前一节已经讲了)
2)转录组数据reads的拼接

首先构建样品信息矩阵
我的样品是三个处理两个生物学重复每个重复样品2个Run因此就是这样的

yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ l
B251_1.P.fq.gz  B252_2.P.fq.gz  R252_1.P.fq.gz  W251_2.P.fq.gz  samples.txt
B251_2.P.fq.gz  R251_1.P.fq.gz  R252_2.P.fq.gz  W252_1.P.fq.gz
B252_1.P.fq.gz  R251_2.P.fq.gz  W251_1.P.fq.gz  W252_2.P.fq.gz
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ cat samples.txt 
B25 B251    B251_1.P.fq.gz  B251_2.P.fq.gz
B25 B252    B252_1.P.fq.gz  B252_2.P.fq.gz
R25 R251    R251_1.P.fq.gz  R251_2.P.fq.gz
R25 R252    R252_1.P.fq.gz  R252_2.P.fq.gz
W25 W251    W251_1.P.fq.gz  W251_2.P.fq.gz
W25 W252    W252_1.P.fq.gz  W252_2.P.fq.gz
#然后进行运行Trinity
Trinity --seqType fq --max_memory 60G --samples_file samples.txt --CPU 6 
Trinity需要的参数

--seqType :这个参数指定数据类型 (fq or fa)
--max_memory : 这个参数指定运算过程占用内存(自己量力而行)
--samples_file samples.txt : 这个是数据的样品信息矩阵
--CPU : 这个指定运算过程使用的CPU情况 (自己量力而行)
一般情况 一个CPU配搭10G内存

另外需要指出的是由于这个运算过程需要较长的时间,因此建议用Screen工具进行托管
拼接完成后会得到一个fasta文件
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/  Salmonout/  Trinity.fasta*  Trinity.fasta.gene_trans_map*
eyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ head Trinity.fasta
>TRINITY_DN104555_c0_g1_i1 len=281 path=[0:0-280]
ACTGAGCTAAAATAAGACTTATATACGTAACTTTTTTTATTCAGTCAACATATGAAACTCAAGTTCAACCATCCAAGACATGAGCTTGTACCTTATTATGAATATTTTCTTTGGACAGAAAAAATAACACTTCAAAACCTCAACCATTTCCAAGTTTTTAGACATGCAAAAAGAGCAACATCATCCCCCCACTCTATTTGTGGAACGGTGTTCCAGATGCCTAAACTCGACATCACCCCTCCCCCACAATAAGTTCAGACTAAAAAAAGGGCAAAAAATTA
>TRINITY_DN104630_c0_g1_i1 len=376 path=[0:0-375]
TTTATGGCTGATAAATCGGCACATATGTTCGGTGCTTGTTGATTCTCCATGAGCTCGTTGAAGGGTAGAAGTTTAATTTGATTTGTTGAAGACTTGAAAATGTGGTTTTATTAAGGGTCGCATAGGCTTGATAATGATCGGGTTTGCGCCACGAGCAATTCCACGTGATGAATGTTCTCTATCTGGAAGTTGGTGAAAATGTCAGCTATTTAGCAACTTGATGACTCTTCATGTTTTGACAACTTCTAAGCTTGAAGTTCATTAGAAACTGACTATTTGTGAGCTTAGTAGTTCTTCACAAGTGTTTTTGAGACATTTGATATTTCGGAAGTAATTTGTTCTCTCTACCTCAAAGCCCCAATTTTCACTTTCTCTG
>TRINITY_DN104553_c0_g1_i1 len=222 path=[0:0-221]
TCCTTCCCAGAGAAAAACGACCCTTCATATTTGGAAGCCATCCATTACAGCATGCCGCCCTCGCTGCTACAGTTTCTTCACTGAAAGTCGTCTCCTTTTCTTTTATTTGCTCGTCTGCCTTGACCAGCCAATCAATAACTTGGCCACCAATAATCATTGAATTTTCGCTCGCACCTTTCTCCTCCAGCTCTATTCCATCTTTTTTGTGACGCAGCTGCTGAA
>TRINITY_DN104629_c0_g1_i1 len=266 path=[0:0-265]
CATTCTGGGTTTGGGGTTGAGTTATTGTGTTTATCATTAGTTATTGTGTTGATCAAATGAGTGATATATCACAATCATTGTCAAAGCTGAAGCCTTCATCATTAATCCGTTTCGGGTATTGGATTTTGTGTTTAAGGATTAAGTGGGGGTTTAAAGTTAAGGGAAATCGGTGGGAAGCTGAAGGTGTGGAAGGAAGAACAACACAAAAATGAAGGTTTGAGTTGGAGTAAAAATGTTGGAAATATTGAAACTATGGCTCCTACTCT
>TRINITY_DN104606_c0_g1_i1 len=298 path=[0:0-297]
TCAATGAAGGAATCAGTTTAATTGCTCTATGCTAGTTACACTTCAATTTTTTTGATAGAGTTAACTTATTCTAATGAATGGGTCTTATAGAGGGGAAGATTCAATTTAGGGCCAAGTATGTACCTATGTGCACTTTATGTCGTATGCCTAGTATTGTATTGTGTATTCTTATGCTTTCACTTCCATACAGTCATATTTTTTTTCTCTAAGGAATCCATCATTTTTGGCAATGCAGATTTGTATTCTTGATTATTAATAGAAAAAAAAAAAATCCTTTCTGATTGTTTCTGTTCAGATT
里面是转录本数据库,我们后期将对这个转录本数据库进行一系列的注释与归类。
在此,我们先对其进行一个初步的处理,找出这些转录本的开放阅读框(Open Reading Frame,ORF)
采用的工具是TransDecoder

下载与安装TransDecoder

下载网页

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ wget https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
--2019-02-02 16:51:50--  https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0 [following]
--2019-02-02 16:51:51--  https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘TransDecoder-v5.5.0.tar.gz’

TransDecoder-v5.5.0.tar.g     [                  <=>              ]  15.02M  3.84MB/s    in 4.5s    

2019-02-02 16:51:58 (3.33 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671]

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/  Salmonout/  TransDecoder-v5.5.0.tar.gz  Trinity.fasta*  Trinity.fasta.gene_trans_map*
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ tar zxvf TransDecoder-v5.5.0.tar.gz 
TransDecoder-TransDecoder-v5.5.0/
TransDecoder-TransDecoder-v5.5.0/.gitmodules
TransDecoder-TransDecoder-v5.5.0/Changelog.txt
TransDecoder-TransDecoder-v5.5.0/LICENSE.txt
TransDecoder-TransDecoder-v5.5.0/Makefile
TransDecoder-TransDecoder-v5.5.0/PerlLib/
TransDecoder-TransDecoder-v5.5.0/PerlLib/DelimParser.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_reader.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_retriever.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GFF3_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Gene_obj.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Longest_orf.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Nuc_translator.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Overlap_piler.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/PWM.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Pipeliner.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Process_cmd.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/overlapping_nucs.ph
TransDecoder-TransDecoder-v5.5.0/README.md
TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs
TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict
TransDecoder-TransDecoder-v5.5.0/TransDecoder.lrgTests/
TransDecoder-TransDecoder-v5.5.0/TransDecoder.wiki/
TransDecoder-TransDecoder-v5.5.0/__testing/
TransDecoder-TransDecoder-v5.5.0/__testing/Makefile
TransDecoder-TransDecoder-v5.5.0/__testing/__test.best_w_homology.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.simplest.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.single_best.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.wBlastNPfam.expected
TransDecoder-TransDecoder-v5.5.0/__testing/blastp.outfmt6
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.cds.scores
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.gff3
TransDecoder-TransDecoder-v5.5.0/__testing/pfam.domtblout
TransDecoder-TransDecoder-v5.5.0/notes
TransDecoder-TransDecoder-v5.5.0/sample_data/
TransDecoder-TransDecoder-v5.5.0/sample_data/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/README
TransDecoder-TransDecoder-v5.5.0/sample_data/README.md
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_Pfam-A.hmm.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_sprot.db.pep.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.tophat.sam.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/transcripts.gtf.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies_described.txt.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Trinity.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/genome_alignments.gmap.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.gtf
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.transcripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.gtf
TransDecoder-TransDecoder-v5.5.0/util/
TransDecoder-TransDecoder-v5.5.0/util/PWM/
TransDecoder-TransDecoder-v5.5.0/util/PWM/README.md
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_pwm.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/score_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/build_atgPWM_+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/compute_AUC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/deplete_feature_noise.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scores_to_ROC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scoring.+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/make_seqLogo.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/plot_ROC.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/simulate_feature_seq_from_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/cleanMe.sh
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/longest_orfs.cds.top_longest_5000.nr80.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/runMe.sh
TransDecoder-TransDecoder-v5.5.0/util/bin/
TransDecoder-TransDecoder-v5.5.0/util/bin/.hidden
TransDecoder-TransDecoder-v5.5.0/util/cdna_alignment_orf_to_genome_orf.pl
TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl
TransDecoder-TransDecoder-v5.5.0/util/exclude_similar_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/fasta_prot_checker.pl
TransDecoder-TransDecoder-v5.5.0/util/ffindex_resume.pl
TransDecoder-TransDecoder-v5.5.0/util/gene_list_to_gff.pl
TransDecoder-TransDecoder-v5.5.0/util/get_FL_accs.pl
TransDecoder-TransDecoder-v5.5.0/util/get_longest_ORF_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/get_top_longest_fasta_entries.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_gene_to_gtf_format.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_genome_to_cdna_fasta.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_alignment_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/misc/
TransDecoder-TransDecoder-v5.5.0/util/misc/__init__.py
TransDecoder-TransDecoder-v5.5.0/util/misc/get_FP_FN_scores.py
TransDecoder-TransDecoder-v5.5.0/util/misc/plot_indiv_seq_likelihood_profile.py
TransDecoder-TransDecoder-v5.5.0/util/misc/rpart_scores.Rscript
TransDecoder-TransDecoder-v5.5.0/util/misc/select_TD_orfs.py
TransDecoder-TransDecoder-v5.5.0/util/nr_ORFs_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/pfam_mpi.pbs
TransDecoder-TransDecoder-v5.5.0/util/pfam_runner.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_gff3_group_iso_strip_utrs.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_hexamer_scores.pl
TransDecoder-TransDecoder-v5.5.0/util/remove_eclipsed_ORFs.pl
TransDecoder-TransDecoder-v5.5.0/util/score_CDS_likelihood_all_6_frames.pl
TransDecoder-TransDecoder-v5.5.0/util/select_best_ORFs_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/seq_n_baseprobs_to_loglikelihood_vals.pl
TransDecoder-TransDecoder-v5.5.0/util/start_codon_refinement.pl
TransDecoder-TransDecoder-v5.5.0/util/train_start_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/uri_unescape.pl

运行

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs -t Trinity.fasta
* Running CMD: /home/yeyuntian/Biodata/trinitytest/downstr/TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl Trinity.fasta 0 > /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/base_freqs.dat


-first extracting base frequencies, we'll need them later.


- extracting ORFs from transcripts.
-total transcripts to examine: 220498
[220400/220498] = 99.96% done    CMD: touch /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/TD.longorfs.ok


#################################
### Done preparing long ORFs.  ###
##################################

    Use file: /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

    Then, run TransDecoder.Predict for your final coding region predictions.

生成了一个文件夹

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l -alt
total 264404
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:15 Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:09 Trinity.fasta.transdecoder_dir/
drwxrwxr-x  7 yeyuntian yeyuntian      4096 2月   2 17:07 ./
-rw-rw-r--  1 yeyuntian yeyuntian       212 2月   2 17:07 pipeliner.4094.cmds
-rw-rw-r--  1 yeyuntian yeyuntian  15748671 2月   2 16:51 TransDecoder-v5.5.0.tar.gz
drwxrwxr-x  8 yeyuntian yeyuntian      4096 10月 22 20:45 TransDecoder-TransDecoder-v5.5.0/

然后开始继续执行下一个命令

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t Trinity.fasta 

会有报错说seqLogo不存在,因为这个命令会调用一个R包可以在Bioconductor来进行安装好。

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("seqLogo", version = "3.8")

最后我们可以看到通过这个软件生成的几个数据:

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ll -alt 
total 558620
-rw-rw-r--  1 yeyuntian yeyuntian      3016 2月   2 18:17 pipeliner.7404.cmds
drwxrwxr-x  8 yeyuntian yeyuntian      4096 2月   2 18:17 ./
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:43 Trinity.fasta.transdecoder_dir.__checkpoints/
-rw-rw-r--  1 yeyuntian yeyuntian 133585627 2月   2 17:43 Trinity.fasta.transdecoder.cds
-rw-rw-r--  1 yeyuntian yeyuntian      3016 2月   2 17:43 pipeliner.5796.cmds
-rw-rw-r--  1 yeyuntian yeyuntian  56068696 2月   2 17:43 Trinity.fasta.transdecoder.pep
-rw-rw-r--  1 yeyuntian yeyuntian  19376844 2月   2 17:42 Trinity.fasta.transdecoder.bed
-rw-rw-r--  1 yeyuntian yeyuntian  92216622 2月   2 17:42 Trinity.fasta.transdecoder.gff3

其中
.pep (是最终的候选ORF编码的蛋白序列)
.cds (是编码蛋白的核酸序列)
.gff3 (是表示ORF和转录本的位置关系)
.bed (用于后期的IGV可视化)

你可能感兴趣的:(转录组分析实战第二节:无参考基因转录组拼接)