hicPro+EndHiC(二)染色体挂载

EndHic

想比较HiC-Pro,EndHic的安装就简单很多,就是下载即可用

EndHiC的安装

git clone [email protected]:fanagislab/EndHiC.git

要用到的脚本都在文件夹下,直接调用就行

怎么使用呢?不得不说一下,github上面写的简直潦草~~

还不如直接看他给出的实例中的脚本来得直接

EndHiC的使用

给出的实例脚本

$ cat biosoft/EndHiC/z.testing_data/Arabidopsis_thalina/work.sh
##Atha.contigs.fa is generated by Hifiasm
##AthaHiC_100000_abs.bed, AthaHiC_100000.matrix, AthaHiC_100000_iced.matrix are generated by HiC-pro using Atha.contigs.fa as the reference genome

gzip -d Atha.contigs.fa.gz

##get contig length
perl ../../fastaDeal.pl -attr id:len Atha.contigs.fa > Atha.contigs.fa.len

##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
../../matrix2heatmap.py AthaHiC_100000_abs.bed AthaHiC_100000.matrix 10

##Run one round, when the contig assembly is quite good
perl ../../endhic.pl Atha.contigs.fa.len AthaHiC_100000_abs.bed AthaHiC_100000.matrix AthaHiC_100000_iced.matrix

ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./


##convert cluster file to agp file
perl ../../cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster Atha.contigs.fa.len > Atha.scaffolds.agp

##get final scaffold sequence file
perl ../../agp2fasta.pl Atha.scaffolds.agp Atha.contigs.fa > Atha.scaffolds.fa

##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
../../cluster2bed.pl AthaHiC_100000_abs.bed z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
../../matrix2heatmap.py clusterA_100000_abs.bed AthaHiC_100000.matrix 10

##Here, Arabidopsis thalina has 5 chromosomes, and all these chromosomes can be successfully scaffolded by EndHiC

使用的数据就是我们上一步HiC-Pro输出的数据:

改良后的脚本

contig=/share/home/off/Work/Genome_assembly/Assembly/contig.fa  ##contig文件,一定要和HiC-Pro中的contig保持一致
endhic_dir=/share/home/off_wenhao/biosoft/EndHiC    ##EndHiC的安装路径
name=dlo    ##物种名称,也要和HiC-Pro设置的保持一致,也是就是hic-pro的输出文件夹`**_outdir_new`

##get contig length
perl ${endhic_dir}/fastaDeal.pl -attr id:len ${contig} > contigs.fa.len

##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
hic_pro_dir=/share/home/off/Work/Genome_assembly/Assembly/08.EndHiC/01.hicprp/${name}_outdir_new/hic_results/matrix/${name}


${endhic_dir}/matrix2heatmap.py ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10

##Run one round, when the contig assembly is quite good

perl ${endhic_dir}/endhic.pl contigs.fa.len ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix ${hic_pro_dir}/iced/100000/${name}_100000_iced.matrix

ln  Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./

##convert cluster file to agp file
perl ${endhic_dir}/cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster contigs.fa.len > scaffolds.agp

##get final scaffold sequence file
perl ${endhic_dir}/agp2fasta.pl scaffolds.agp ${contig} > ${name}.scaffolds.fa

##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
${endhic_dir}/cluster2bed.pl ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
${endhic_dir}/matrix2heatmap.py clusterA_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10

结果

clusterA.id.len
clusterA_100000_abs.bed   
clusterA_100000_abs.bed.pdf 
endhic.100000.10.iced.sh  
endhic.100000.20.iced.sh  
endhic.100000.5.iced.sh                            
endhic.100000.10.raw.sh   
endhic.100000.20.raw.sh   
endhic.100000.5.raw.sh   
endhic.100000.15.raw.sh   
endhic.100000.25.raw.sh   
endhic.Round_A.sh    
endhic.100000.15.iced.sh  
endhic.100000.25.iced.sh  
endhic.log
EndHic.sh     
dlo.scaffolds.fa                                                  
Round_A.01.contig_end_contact_results/
Round_A.02.GFA_contig_graph_results/
Round_A.03.cluster_order_orient_results/
Round_A.04.summary_and_merging_results/
scaffolds.agp
contigs.fa.len                 
z.EndHiC.A.results.summary.cluster
z.EndHiC.A.results.summary.cluster.GFA.v1.2.GFA
z.EndHiC.A.results.summary.cluster.GFA

文件很多,但是我们真正需要的就只有scaffolds.agpprefix.scaffolds.fa两个,一个是scaffold文件,一个是map文件。

你可能感兴趣的:(hicPro+EndHiC(二)染色体挂载)