MCScanX-transposed安装及使用

1.简介

(1)MCScanX-transposed是基于在相关基因组内和相互之间应用MCScanX,能够检测不同时期内发生的转座基因复制的软件包,也有助于基因复制模式的综合分析和用基因复制模式注释感兴趣的基因家族。
MCScanX是用于检测和进化分析基因同源性和共线性的工具包,而MCScanX-transposed是用于检测不同时期内发生的转座基因复制以及基因复制模式的综合分析的软件包,参看MCScanX-tansposed's manual。

(2)发表文章:Wang Y, Li J, Paterson AH. (2013) MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics, doi: 10.1093/bioinformatics/btt150.

2.安装

wget http://chibba.pgml.uga.edu/mcscan2/transposed/MCScanX-transposed.zip
unzip MCScanX-transposed.zip
cd MCScanX-transposed
make
解压以后包括以下程序:
Snipaste_2019-03-12_20-12-49.png

3.利用测试文件,了解方法

注意:

 解压安装以后有个data文件夹,里面有At测试数据,
 -i 后面必须要接文件夹名不能用 ./,指定输出文件夹名./data
 准备的数据必须在MCScanX-transpose文件夹下,否则报错

perl ~/biosoft/MCScanX-transposed/MCScanX-transposed.pl -i ./data -t at -c al,br,cp,pt,vv -o result/at_result
结果如图:

生成15个结果文件,主要有8个:


1.png

2.png

4.核心程序 MCScanX-transposed.pl

使用前需要准备文件:
注意:

1.由于不方便演示自己的准备的文件,还是以官网测试数据为例,若自己要准备文件,即替换拟南芥为自己研究的物种,其他的物种可以选择自己关心的物种。
2.不用测试文件,用自己的文件容易被坑,因为不知道结果是什么(我就是被坑惨啦)。

(1)准备文件:
3.png
重要:使用者必须通过仔细阅读下列说明(1-4)准备输入文件。
  1. All input files should be stored under ONE folder(the "data_directory" parameter)
  2. For the target genome in which gene duplicaiton modes will be classified, please prepare two input files:
    a) "[target_species].gff", a gene position file for the target species, following a tab-delimited format: "sp&chr_NO gene starting_position ending_position"
    b) "[target_species].blast", a blastp output file (m8 format) for the target species (self-genome comparison).
  3. For each outgroup genome, please prepare two input files:
    a) "[target_species][outgroup_species].gff", a gene position file for the target_species and outgroup_species, following a tab-delimited format:"sp&chr_NO gene starting_position ending_position"
    b) "[target_species]
    [outgroup_species].blast", a blastp output file (m8 format) between the target and outgroup species (cross-genome comparison).
  4. For example, assuming that you are going to classify gene duplication modes in Arabidopsis thaliana (ID: at), using Brassica rapa (ID: br) and Carica papaya (ID: cp) as outgroups, you need to prepare 6 input files: "at.gff","at.blast", "at_br.gff", "at_br.blast","at_br.gff","at_cp.gff" and "at_cp.blast".
(2)建库

以at_vv.gff文件为例,其他准备相同:

cat at.gff vv.gff >at_vv.gff
makeblastdb -in at_vv.pep -dbtype prot -parse_seqids -out at_vv.db
blastp -query at_vv.pep -db at_vv.db -out at_vv.blast -evalue 1e-10 -num_threads 20 -outfmt 6 -num_alignments 5

at_vv.blast文件:
1.官网at_vv.blast 包括2种结果:at-vv、vv-at(我自己分析at-at、vv-vv结果不去掉,好像就是程序运行慢,读取过程也会自动去掉)。
2.多个转录本存在时,选择最长转录本。使用命令行/脚本或者软件TBtools的Fasta Longest Representive功能。
3.多个物种建库blastp命令进行封装。

(3)分类提取结果

Classify gene duplication modes in A. thaliana, using A. lyrata, Brassica rapa, Carica papaya, Populus trichocarpa and Vitis vinifera as outgroups and specifying three epochs to be identified, by the command:

1)同上第三点3的命令和结果:
perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/at_result
Snipaste_2019-03-12_20-08-22.png
2)加上-x 3的结果,自己与上面比对:
perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/at_result -x 3

Snipaste_2019-03-12_20-06-40.png

5.下游分析程序(仅介绍前三种)

Tool 1. add_ka_ks.pl

Tool 2. detect_dup_modes_for_a_gene.pl

Tool 3. detect_dup_modes_for_a_family.pl

Tool 4. annotate_tree_with_dup_mode

Tool 5. annotate_tree_with_tra_dup

(1)add_ka_ks.pl(需要Bioperl)
perl add_ka_ks.pl -d data/at.cds -i result/at_result/at.transposed_after_al.pairs -o result/at.transposed_after_al.pairs.kaks
(2)detect_dup_modes_for_a_family.pl

mads.genes文件: gene ID以tab键分隔

perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/at_result/at -o result/mads.duplication.modes

注意:
结果有包含转座基因~


dup.png
(3)detect_dup_modes_for_a_gene.pl
perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/test1/at -o result/mads.dup

你可能感兴趣的:(MCScanX-transposed安装及使用)