cutadapt用法案例

详细的教程官方已经给出。
这里记录自己常用的方法:

安装方法:用Python3安装就可以使用多核参数。
sudo python3 -m pip install --user --upgrade cutadapt

什么是 3’接头,就是一段序列之后跟了adapter。 XXXXXXXXXXXXXXadapter
什么是 5’接头,就是adapter在序列开始。 adapterXXXXXXXXXXXXXX

假如说我的情况属于第一种。就使用-a参数,接头和随后的序列将都被trim掉。
属于第二种,就使用-g参数,接头和接头之前的序列都被trim掉。
默认adapter的错误率为10%,通过-e参数修改。结果文件非压缩。

举例:

cutadapt -a adapter=ATATCCAGAACCCTGACCCTGCCGTGTACCAGCTGAC -O 10  -o  G18E2L2_R1.p1.fq  -r  R1.p2.fq --info-file=R1.cutadapt.log  /your/fastq/fastq_1.fq.gz > R1.cutadapt.stats
cutadapt -g adapter=CACAGCGACCTCGGGTGGGAACACCTTGTTCAGGTCT -O 10  -o  G18E2L2_R2.p1.fq  -r  R2.p2.fq --info-file=R2.cutadapt.log  /your/fastq/fastq_2.fq.gz > R2.cutadapt.stats

-O --overlap=MINLENGTH  : Require MINLENGTH overlap between read and adapter for an adapter to be found. Default: 3
-o  output.fastq
-r  FILE, --rest-file=FILE  When the adapter matches in the middle of a read, write the rest (after the adapter) to FILE.
--info-file=FILE    Write information about each read and its adapter matches into FILE. See the documentation for the file format.
-j CORES, --cores=CORES Number of CPU cores to use. Use 0 to auto-detect. Default: 1  python2 下不能使用多核。
-a ADAPTER, --adapter=ADAPTER  Sequence of an adapter ligated to the 3' end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a '$' character is appended
                        ('anchoring'), the adapter is only found if it is a  suffix of the read.
-g ADAPTER, --front=ADAPTER  Sequence of an adapter ligated to the 5' end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5'
                        end are allowed. If a '^' character is prepended ('anchoring'), the adapter is only found if it is a prefix of the read.
-b ADAPTER, --anywhere=ADAPTER Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of the first read). Both types of matches as described under -a and -g are allowed.
                        If the first base of the read is part of the match, the behavior is as with -g, otherwise as with -a. This option is mostly for rescuing failed library preparations
                        - do not use if you know which end your adapter was ligated to!
模糊匹配或容错:
-e RATE, --error-rate=RATE   Maximum allowed error rate as value between 0 and 1 (no. of errors divided by length of matching region). Default: 0.1 (=10%)
For paired-end reads:
    cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

参数:-O MINLENGTH, --overlap=MINLENGTH
Require MINLENGTH overlap between read and adapter for an adapter to be found.
Default: 3
-r:表示将截掉的序列保存在R2.p2.fq文件中。
--info-file:输出log文件。
stat文件是记录adapter的详细过程,最好像我一样重定向到一个文件方便日后查看。默认屏幕输出。

stat文件部分内容截图

cutadapt结果默认会trim掉adapter和adapter之后(3'的话是之前)的序列,所以,如果你只想切掉adapter,想保留adapter之前和之后的序列,那么就需要从log文件中提取出序列来了。

cutadapt结果log文件处理:
log文件格式是以下这样子的。


log文件

这里面存储着三种类型的格式。

实用脚本1:

将cutadapt 生成的log 中的adapter前后的reads分别输出不同的文件中备用。
就是可以将adapter两端的reads分别输出到p1,和p2文件中。
用法:脚本自己写的,很实用!
python deal_cutadapt_log.py -l xxx.cutadapt.log -d /result/dir/
就会得到
xxx.p1.fq 和 xxx.p2.fq两个文件,代表着adapter之前序列和adapter之后序列。
-f 参数还可以选择保留或者删除log文件中没有adapter 的序列。

usage: deal_cutadapt_log.py [-h] -l LOG_FILE [-d RESULT_DIR] [-f] [-v]

This is description

optional arguments:
  -h, --help            show this help message and exit
  -l LOG_FILE, --log LOG_FILE
                        input read1 file
  -d RESULT_DIR, --dir RESULT_DIR
                        input read2 file
  -f, --flag            means to contains -l flag in output.
  -v, --version         show program's version number and exit

实用脚本2:

批量统计cutadapt.stats文件信息:输入为路径,就会统计该路径下的所有stats文件中的相关信息。

python statistic_basic_info.py ./
sample  Total reads processed   Reads with adapters
G34E3L1 10,934,616      10,455,685 (95.6%)

非常好用。

点赞送脚本!

你可能感兴趣的:(cutadapt用法案例)