使用python MCscan绘制野生大豆和威廉82共线性图

参考文章:
https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)
https://www.jianshu.com/p/466274ad932b

网页上有很多的相关文章,但是当我自己在用网上的方法进行分析的时候,总是会报出各种各样的错误,本着愚公精神,最后总与做出了想要的图片,记录一下。

一 分析所需的软件和数据

1. 软件

LASTAL http://last.cbrc.jp/

LASTAL官网

## 安装LASTAL,可以直接下载到windows中再传到服务器。或者直接右键,复制下载链接直接在服务器中下载。
wget http://last.cbrc.jp/last-1111.zip && unzip ./last-1111.zip && cd ./last-1111 && make
## move lastal and lastdb on your PATH
cd src && echo "export PATH=`pwd`:\$PATH" >> ~/.bashrc && source ~/.bashrc

## 安装jcvi
pip install jcvi
2.数据
fetch

github网页中说的是可以直接通过登录phytozome,但是我登陆的时候,总是会出现如下的提示,在尝试了很多次无果,决定直接下载。从phytozome上下载野生大豆和栽培大豆的.cds.fa和.gff3文件发现给出的链接是NCBI。没办法了只能从NCBI和Soybase官网上下载相关的文件,其中野生大豆w05的文件从soybase中下载。栽培大豆威廉82的相关文件是从NCBI中下载。

错误提示

w05下载地址:https://soybase.org/data/public/Glycine_soja/W05.gnm1.ann1.T47J/
w82下载地址:
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/Glycine_max/latest_assembly_versions/GCA_000004515.4_Glycine_max_v2.1/

二 数据处理

1.GFF转换成bed文件
## w05和w82
python -m jcvi.formats.gff bed --type=mRNA --key=ID glyso.W05.gnm1.ann1.T47J.gene_models_main.gff3 -o w05.bed
python -m jcvi.formats.gff bed --type=mRNA --key=ID GCA_000004515.4_Glycine_max_v2.1_genomic.gff -o w82.bed
w05.bed

w82.bed
2.根据bed文件获取对应的cds序列
## w05和w82
seqkit grep -f <(cut -f4 w05.bed) w05.cds.fa  | seqkit seq -i >w05.cds
seqkit grep -f <(cut -f4 w82.bed) w82.cds.fa  | seqkit seq -i >w82.cds

seqkit下载网址(linux 64-bit):https://github.com/shenwei356/seqkit/releases/download/v0.13.2/seqkit_linux_amd64.tar.gz

上面的代码运行后发现,w05没有问题,可以提取相应的CDS序列。但是,w82的CDS 文件则为0,检查后发现*.fa文件和bed文件中的第4列的命名完全不相同。



只能人为改变文件的格式,使得后面的程序可以正常运行。

#!/usr/bin/python

import sys

#def bed():
#       for line in open(sys.argv[1],"r"):
#               line = line.strip().split('\t')
#               gene = line[3].split("|")[2]
#               print("%s\t%s\t%s\t%s\t%s\t%s"%(line[0],line[1],line[2],gene,line[4],line[5]))

def fa():
        for line in open(sys.argv[1],"r"):
                line = line.strip()
                if line.startswith(">"):
                        geneID = line.strip().split(" ")[2].split(":")[1].split(".")
                        new_genID = ">" + geneID[0] + '.' + geneID[1] + "." + geneID[2]
                        print(new_genID)
                        continue
                else:
                        print(line)
if __name__ == "__main__":
#       bed()
        fa()
## 改bed文件的格式,使用bed函数。改cds.fa文件,使用fa函数。
## 用法:python xxx.py xxx.cds.fa > new_xxx.cds.fa

修改后的w82的bed文件和cds.fa文件,运行前面的代码便可以成功的pull出w82.cds


3.共线性分析
## 将生成的w82.bed;w82.cds和w05.bed;w05.cds放到一个目录下
ls ./cds
w05.bed  w05.cds  w82.bed  w82.cds

## Pairwise synteny search
python -m jcvi.compara.catalog ortholog w82 w05  --no_strip_names

查看目录里生成的文件


生成文件

点阵图
5.可视化

绘制karyotype figure需要准备两个文件;
First is the seqids file, which tells the plotter which set of chromosomes to include. Here, we've removed unplaced and small scaffolds. The first line contains 19 grape chromosomes and second line contains 8 peach chromosomes.

CM000834.3,CM000835.3,CM000836.3,CM000837.4,CM000838.2,CM000839.3,CM000840.3,CM000841.3,CM000842.3,CM000843.3,CM000844.3,CM000845.2,CM000846.2,CM000847.2,CM000848.2,CM000849.2,CM000850.3,CM000851.3,CM000852.3,CM000853.3
glyso.W05.gnm1.Chr01,glyso.W05.gnm1.Chr02,glyso.W05.gnm1.Chr03,glyso.W05.gnm1.Chr04,glyso.W05.gnm1.Chr05,glyso.W05.gnm1.Chr06,W05.gnm1.Chr05,glyso.W05.gnm1.Chr07,W05.gnm1.Chr05,glyso.W05.gnm1.Chr08,W05.gnm1.Chr05,glyso.W05.gnm1.Chr09,W05.gnm1.Chr05,glyso.W05.gnm1.Chr10,W05.gnm1.Chr05,glyso.W05.gnm1.Chr11,W05.gnm1.Chr05,glyso.W05.gnm1.Chr12,W05.gnm1.Chr05,glyso.W05.gnm1.Chr13,W05.gnm1.Chr05,glyso.W05.gnm1.Chr14,W05.gnm1.Chr05,glyso.W05.gnm1.Chr15,W05.gnm1.Chr05,glyso.W05.gnm1.Chr16,glyso.W05.gnm1.Chr17,W05.gnm1.Chr05,glyso.W05.gnm1.Chr18,W05.gnm1.Chr05,glyso.W05.gnm1.Chr19,W05.gnm1.Chr05,glyso.W05.gnm1.Chr20

Second is the layout file, which tells the plotter where to draw what. The whole canvas is 0-1 on x-axis and 0-1 on y-axis. First, three columns specify the position of the track. Then rotation, color, label, vertical alignment (va), and then the genome BED file. Track 0 is now grape, track 1 is now peach. The next stanza specifies what edges to draw between the tracks. e, 0, 1 asks to draw edges between track 0 and 1, using information from the .simple file.

# 先生成.sample文件
python -m jcvi.compara.synteny screen --minspan=30 --simple w82.w05.anchors w82.w05.anchors.new
# y, xstart, xend, rotation, color, label, va,  bed
 .6,     .1,    .8,       0,      , w82, top, w82.bed
 .4,     .1,    .8,       0,      , w05, top, w05.bed
# edges
e, 0, 1, w82.w05.anchors.simple
# 绘图
python -m jcvi.graphics.karyotype seqids layout --shadestyle=line

图中的染色体体序号还有问题。还没有找到好的解决方法。

你可能感兴趣的:(使用python MCscan绘制野生大豆和威廉82共线性图)