WGD (Whole Genome duplication)

最近要做一下WGD,搜索到一款好用的软件wgd - simple command line tools for the analysis of ancient whole genome duplications

运行环境

Python3.5 & Python3.6 on Linux Ubuntu

依赖软件

For wgd blast:
    BLAST, from which it uses the blastp and makeblastdb commands
    MCL (https://micans.org/mcl/index.html)
For wgd ks:
    One of the following multiple sequence alignment programs: MUSCLE, MAFFT or PRANK
    PAML (http://abacus.gene.ucl.ac.uk/software/paml.html).
    PhyML and FastTree (Note that FastTree should be executable as FastTree and not fasttree, so please specify an alias or symlink from the latter to the former if necessary.)
For wgd syn: 
    i-ADHoRe 3.0 suite (http://bioinformatics.psb.ugent.be/beg/tools/i-adhore30)

安装

conda create--name python36 python=3.6    ##conda 创建python36虚拟环境
source activate python36    ##激活python36
$ git clone https://github.com/arzwa/wgd.git
$ cd wgd
$ pip install .

安装成功后 输入wgd会显示软件标识

$ wgd
Usage: wgd [OPTIONS] COMMAND [ARGS]...

  Welcome to the wgd command line interface!

                         _______
                         \  ___ `'.
         _     _ .--./)   ' |--.\  \
   /\    \\   ///.''\\    | |    \  '
   `\\  //\\ //| |  | |   | |     |  '
     \`//  \'/  \`-' /    | |     |  |
      \|   |/   /("'`     | |     ' .'
       '        \ '---.   | |___.' /'
                 /'""'.\ /_______.'/
                ||     ||\_______|/
                \'. __//
                 `'---'

下载测试数据运行软件

wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04/Fasta/cds.all_transcripts.ath.fasta.gz
wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04/GFF/ath/annotation.all_transcripts.all_features.ath.gff3.gz
wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04/Fasta/cds.all_transcripts.car.fasta.gz
wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04/GFF/car/annotation.all_transcripts.all_features.car.gff3.gz
gunzip cds.all_transcripts.ath.fasta.gz 
gunzip annotation.all_transcripts.all_features.ath.gff3.gz
gunzip cds.all_transcripts.car.fasta.gz
gunzip annotation.all_transcripts.all_features.car.gff3.gz
mv cds.all_transcripts.ath.fasta.gz ath.fasta
mv cds.all_transcripts.car.fasta.gz catr.fasta
mv annotation.all_transcripts.all_features.ath.gff3 ath.gff
mv annotation.all_transcripts.all_features.car.gff3 car.gff

source activate python36
### wgd mcl 生成.mcl文件
wgd mcl -s ath.fasta --cds --mcl -o ath_out
wgd mcl -s car.fasta --cds --mcl -o papaya_out
wgd mcl --cds --one_v_one -s ath.fasta,car.fasta -id ath,car -e 1e-8 -o ath_car_out

mkdir ks_out   ## 将上一步产生的.mcl文件转移到新文件夹ks_out
mv ath_out/ath.fasta.blast.tsv.mcl ks_out/ath.mcl
mv car_out/car.fasta.blast.tsv.mcl ks_out/car.mcl
mv ath_car_out/ath_car.ovo.tsv ks_out/ath_car.mcl

### wgd ksd 将.mcl文件计算为Ks distribution
wgd ksd ath.mcl ath.fasta  -n 8 -o ath_ks
wgd ksd car.mcl car.fasta -n 8 -o car_ks
wgd ksd -o ath_car_ks ath_car.mcl ath.fasta car.fasta -n 8

mkdir ksout ##将上一步产生的.ks.tsv文件转移到新文件夹ksout

###wgd viz画图
#单独plot
wgd viz -ks ath.ks.tsv  

wgd viz -ks ksout/ -c red,blue,yellow

#合并plot
bokeh serve &       ##$代表后台运行
wgd viz -i -ks ath.fasta.ks.tsv,ath.fasta_car.fasta.ks.tsv,car.fasta.ks.tsv

画图结果

image.png

参考:https://github.com/arzwa/wgd

你可能感兴趣的:(WGD (Whole Genome duplication))