Rfam本地安装教程

Rfam简介

Rfam是Rfam是用来鉴定non-coding RNAs的数据库,常用于注释新的核酸序列或者基因组序列。Rfam:http://eddylab.org/infernal/
Rfam用户手册:http://eddylab.org/infernal/Userguide.pdf

1. 下载infernal软件

# infernal-1.1.1.tar.gz 下载软
#在你安装软件的文件中建立一个Rfam的文件
wget http://eddylab.org/software/infernal/infernal-1.1.1.tar.gz 
tar xf infernal/infernal-1.1.1.tar.gz
cd infernal/infernal-1.1.1.tar.gz
./configure  --prefix=`pwd`/../infernal_bin
#安装步骤
make 
make install
cd easel; make install
cd ../../infernal_bin/bin
ls
#在该文件夹值就可以看到已安装的文件
export PATH=${PATH}:`pwd`  #改变环境变量

2.下载数据库

wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.2/Rfam.cm.gz
gunzip Rfam.cm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.2/Rfam12.2.claninfo
#使用infernal中的cmpress引索Rfam.cm
../infernal_bin/bin/cmpress Rfam.cm  #我的必须进入到该文件家中进行
#输出文件
Working...    done.
Pressed and indexed 2588 CMs and p7 HMM filters (2588 names and 2588 accessions).
Covariance models and p7 filters pressed into binary file:  Rfam.cm.i1m
SSI index for binary covariance model file:                 Rfam.cm.i1i
Optimized p7 filter profiles (MSV part)  pressed into:      Rfam.cm.i1f
Optimized p7 filter profiles (remainder) pressed into:      Rfam.cm.i1p
#表示完成

3. 查询待测基因组的大小【必须】

../infernal_bin/bin/esl-seqstat ~/M.truncatula/Medtr_v4_0v1/JCVI.Medtr.v4.20130313.fasta
#输出
Format:              FASTA
Alphabet type:       DNA
Number of sequences: 230
Total # residues:    532015 #该行是我们需要的数字考虑到基因组为双链和下一步用到的参数的单位为Million,我们使用公式532015* 2 / 1000000计算得出结果为1.06403,作为下一步参数-Z的值.
Smallest:            202
Largest:             21302
Average length:      2313.1

运行

# Rfam12.2.claninfo 为下载的claninfo文件,需提供所在路径
# Rfam.cm 下载的cm文件
# my-genome.fa 待查询序列
# my-genome.cmscan 输出结果
# my-genome.tblout 有一个输出结果
cmscan -Z `esl-seqstat my-genome.fa | awk '{if($0~/^Total/) print int($4/2000000);}''` --cut_ga --rfam --nohmmonly --tblout my-genome.tblout --fmt 2 --clanin Rfam12.2.claninfo Rfam.cm my-genome.fa > my-genome.cmscan
#根据参考博客的博主命令如上,但是自己的运行时总是报错,出不了结果

根据官网给出的使用手册

image.png

根据使用手册运行的

~/software/infernal_bin/bin/cmscan ~/software/Rfam/Rfam.cm ../candidate_fasta/CPC_fasta/u_cpc.fasta
#
 cmscan :: search sequence(s) against a CM database
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:                   ../candidate_fasta/CPC_fasta/u_cpc.fasta
# target CM database:                    /root/software/Rfam/Rfam.cm
# number of worker threads:              1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       XLOC_000318::chr1:4780155-4784203()  [L=4048]

Hit scores:
 rank     E-value  score  bias  modelname   start    end   mdl trunc   gc  description
 ----   --------- ------ -----  ---------- ------ ------   --- ----- ----  -----------
 ------ inclusion threshold ------
  (1) ?      0.14   15.6   1.0  snR73        2280   2346 + hmm     - 0.30  -
  (2) ?      0.17   18.2   0.2  sroH         2044   1992 -  cm    no 0.25  -
  (3) ?       1.5   12.2   0.2  Afu_328      3042   3012 - hmm     - 0.29  -
  (4) ?       5.5   10.7   1.6  adapt33_1    3099   3052 - hmm     - 0.23  -
  (5) ?       5.8   18.7   0.0  SNORD19      3298   3375 +  cm    no 0.40  -
  (6) ?       6.5   16.5   0.1  snoR66       2441   2506 +  cm    no 0.26  -
  (7) ?       7.6    9.3   2.3  DLX6-AS1_2    136    241 + hmm     - 0.33  -
  (8) ?       9.4   23.9   0.2  KRAS_3UTR    1432   1501 +  cm    no 0.26  -


Hit alignments:
>> snR73  
 rank     E-value  score  bias mdl mdl from   mdl to       seq from      seq to       acc trunc   gc
 ----   --------- ------ ----- --- -------- --------    ----------- -----------      ---- ----- ----
  (1) ?      0.14   15.6   1.0 hmm        1       67 [.        2280        2346 + .. 0.66     - 0.30

                                           ::::::::::::::::::::.::::::::::::::::::::::::::::::::::::::::.::::::: CS
                                snR73    1 GUUUAUGAUGAuUucCacUU.aUCACGACGGUCAaCUGcGuUcuUCgAuUGUUUAuuuaaG.aACuUUG 67  
                                           GUU A GAUGAuUu  a+UU +UCA   C GUCAaCUG+G U+u C+  UG UUA   a+G +A uUU 
  XLOC_000318::chr1:4780155-4784203() 2280 GUUGAGGAUGAUUUUUAUUUaUUCAUAUCUGUCAACUGUGAUUUCCU--UGAUUAAACAGGuGAGUUUA 2346
                                           5778899******6666555499*****************9988774..55555544333323333333 PP
......................

在这步,卡住了
后续再继续...............
[2019.8.20]

参考:本地使用Rfam

你可能感兴趣的:(Rfam本地安装教程)