linux学习100篇39:转录组分析用软件及安装trim-galore

安装

(rnaseq) root 11:58:44 ~
$ conda install -y trim-galoretrim-galore
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /root/miniconda3/envs/rnaseq

  added / updated specs:
    - trim-galore


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cutadapt-3.4               |   py39h38f01e4_1         198 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    dnaio-0.5.1                |   py39h38f01e4_0         140 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    isa-l-2.30.0               |       ha770c72_4         192 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    pigz-2.6                   |       h27826a3_0          87 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    python-isal-0.10.0         |   py39h3811e60_0         117 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    trim-galore-0.6.6          |       hdfd78af_1          42 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    xopen-1.1.0                |   py39hf3d152e_2          20 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    ------------------------------------------------------------
                                           Total:         797 KB

The following NEW packages will be INSTALLED:

  cutadapt           anaconda/cloud/bioconda/linux-64::cutadapt-3.4-py39h38f01e4_1
  dnaio              anaconda/cloud/bioconda/linux-64::dnaio-0.5.1-py39h38f01e4_0
  isa-l              anaconda/cloud/conda-forge/linux-64::isa-l-2.30.0-ha770c72_4
  pigz               anaconda/cloud/conda-forge/linux-64::pigz-2.6-h27826a3_0
  python-isal        anaconda/cloud/conda-forge/linux-64::python-isal-0.10.0-py39h3811e60_0
  trim-galore        anaconda/cloud/bioconda/noarch::trim-galore-0.6.6-hdfd78af_1
  xopen              anaconda/cloud/conda-forge/linux-64::xopen-1.1.0-py39hf3d152e_2



Downloading and Extracting Packages
python-isal-0.10.0   | 117 KB    | ######################################## | 100% 
trim-galore-0.6.6    | 42 KB     | ######################################## | 100% 
xopen-1.1.0          | 20 KB     | ######################################## | 100% 
dnaio-0.5.1          | 140 KB    | ######################################## | 100% 
pigz-2.6             | 87 KB     | ######################################## | 100% 
isa-l-2.30.0         | 192 KB    | ######################################## | 100% 
cutadapt-3.4         | 198 KB    | ######################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(rnaseq) root 12:00:42 ~

查看

(rnaseq) root 12:05:52 ~
$ cutadapt --help
cutadapt version 3.4

Copyright (C) 2010-2021 Marcel Martin 

cutadapt removes adapter sequences from high-throughput sequencing reads.

Usage:
    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

For paired-end reads:
    cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard
characters are supported. All reads from input.fastq will be written to
output.fastq with the adapter sequence removed. Adapter matching is
error-tolerant. Multiple adapter sequences can be given (use further -a
options), but only the best-matching adapter will be removed.

Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. Without the -o option, output is sent to standard output.

Citation:

Marcel Martin. Cutadapt removes adapter sequences from high-throughput
sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.
http://dx.doi.org/10.14806/ej.17.1.200

Run "cutadapt --help" to see all command-line options.
See https://cutadapt.readthedocs.io/ for full documentation.

Options:
  -h, --help            Show this help message and exit
  --version             Show version number and exit
  --debug               Print debug log. Use twice to also print DP matrices
  -j CORES, --cores CORES
                        Number of CPU cores to use. Use 0 to auto-detect. Default:
                        1

Finding adapters:
  Parameters -a, -g, -b specify adapters to be removed from each read (or from
  the first read in a pair if data is paired). If specified multiple times, only
  the best matching adapter is trimmed (but see the --times option). When the
  special notation 'file:FILE' is used, adapter sequences are read from the given
  FASTA file.

  -a ADAPTER, --adapter ADAPTER
                        Sequence of an adapter ligated to the 3' end (paired data:
                        of the first read). The adapter and subsequent bases are
                        trimmed. If a '$' character is appended ('anchoring'), the
                        adapter is only found if it is a suffix of the read.
  -g ADAPTER, --front ADAPTER
                        Sequence of an adapter ligated to the 5' end (paired data:
                        of the first read). The adapter and any preceding bases are
                        trimmed. Partial matches at the 5' end are allowed. If a
                        '^' character is prepended ('anchoring'), the adapter is
                        only found if it is a prefix of the read.
  -b ADAPTER, --anywhere ADAPTER
                        Sequence of an adapter that may be ligated to the 5' or 3'
                        end (paired data: of the first read). Both types of matches
                        as described under -a and -g are allowed. If the first base
                        of the read is part of the match, the behavior is as with
                        -g, otherwise as with -a. This option is mostly for
                        rescuing failed library preparations - do not use if you
                        know which end your adapter was ligated to!
  -e E, --error-rate E, --errors E
                        Maximum allowed error rate (if 0 <= E < 1), or absolute
                        number of errors for full-length adapter match (if E is an
                        integer >= 1). Error rate = no. of errors divided by length
                        of matching region. Default: 0.1 (10%)
  --no-indels           Allow only mismatches in alignments. Default: allow both
                        mismatches and indels
  -n COUNT, --times COUNT
                        Remove up to COUNT adapters from each read. Default: 1
  -O MINLENGTH, --overlap MINLENGTH
                        Require MINLENGTH overlap between read and adapter for an
                        adapter to be found. Default: 3
  --match-read-wildcards
                        Interpret IUPAC wildcards in reads. Default: False
  -N, --no-match-adapter-wildcards
                        Do not interpret IUPAC wildcards in adapters.
  --action {trim,retain,mask,lowercase,none}
                        What to do if a match was found. trim: trim adapter and up-
                        or downstream sequence; retain: trim, but retain adapter;
                        mask: replace with 'N' characters; lowercase: convert to
                        lowercase; none: leave unchanged. Default: trim
  --rc, --revcomp       Check both the read and its reverse complement for adapter
                        matches. If match is on reverse-complemented version,
                        output that one. Default: check only read

Additional read modifications:
  -u LENGTH, --cut LENGTH
                        Remove bases from each read (first read only if paired). If
                        LENGTH is positive, remove bases from the beginning. If
                        LENGTH is negative, remove bases from the end. Can be used
                        twice if LENGTHs have different signs. This is applied
                        *before* adapter trimming.
  --nextseq-trim 3'CUTOFF
                        NextSeq-specific quality trimming (each read). Trims also
                        dark cycles appearing as high-quality G bases.
  -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff [5'CUTOFF,]3'CUTOFF
                        Trim low-quality bases from 5' and/or 3' ends of each read
                        before adapter removal. Applied to both reads if data is
                        paired. If one value is given, only the 3' end is trimmed.
                        If two comma-separated cutoffs are given, the 5' end is
                        trimmed with the first cutoff, the 3' end with the second.
  --quality-base N      Assume that quality values in FASTQ are encoded as
                        ascii(quality + N). This needs to be set to 64 for some old
                        Illumina FASTQ files. Default: 33
  --length LENGTH, -l LENGTH
                        Shorten reads to LENGTH. Positive values remove bases at
                        the end while negative ones remove bases at the beginning.
                        This and the following modifications are applied after
                        adapter trimming.
  --trim-n              Trim N's on ends of reads.
  --length-tag TAG      Search for TAG followed by a decimal number in the
                        description field of the read. Replace the decimal number
                        with the correct length of the trimmed read. For example,
                        use --length-tag 'length=' to correct fields like
                        'length=123'.
  --strip-suffix STRIP_SUFFIX
                        Remove this suffix from read names if present. Can be given
                        multiple times.
  -x PREFIX, --prefix PREFIX
                        Add this prefix to read names. Use {name} to insert the
                        name of the matching adapter.
  -y SUFFIX, --suffix SUFFIX
                        Add this suffix to read names; can also include {name}
  --rename TEMPLATE     Rename reads using TEMPLATE containing variables such as
                        {id}, {adapter_name} etc. (see documentation)
  --zero-cap, -z        Change negative quality values to zero.

Filtering of processed reads:
  Filters are applied after above read modifications. Paired-end reads are always
  discarded pairwise (see also --pair-filter).

  -m LEN[:LEN2], --minimum-length LEN[:LEN2]
                        Discard reads shorter than LEN. Default: 0
  -M LEN[:LEN2], --maximum-length LEN[:LEN2]
                        Discard reads longer than LEN. Default: no limit
  --max-n COUNT         Discard reads with more than COUNT 'N' bases. If COUNT is a
                        number between 0 and 1, it is interpreted as a fraction of
                        the read length.
  --max-expected-errors ERRORS, --max-ee ERRORS
                        Discard reads whose expected number of errors (computed
                        from quality values) exceeds ERRORS.
  --discard-trimmed, --discard
                        Discard reads that contain an adapter. Use also -O to avoid
                        discarding too many randomly matching reads.
  --discard-untrimmed, --trimmed-only
                        Discard reads that do not contain an adapter.
  --discard-casava      Discard reads that did not pass CASAVA filtering (header
                        has :Y:).

Output:
  --quiet               Print only error messages.
  --report {full,minimal}
                        Which type of report to print: 'full' or 'minimal'.
                        Default: full
  -o FILE, --output FILE
                        Write trimmed reads to FILE. FASTQ or FASTA format is
                        chosen depending on input. Summary report is sent to
                        standard output. Use '{name}' for demultiplexing (see
                        docs). Default: write to standard output
  --fasta               Output FASTA to standard output even on FASTQ input.
  -Z                    Use compression level 1 for gzipped output files (faster,
                        but uses more space)
  --info-file FILE      Write information about each read and its adapter matches
                        into FILE. See the documentation for the file format.
  -r FILE, --rest-file FILE
                        When the adapter matches in the middle of a read, write the
                        rest (after the adapter) to FILE.
  --wildcard-file FILE  When the adapter has N wildcard bases, write adapter bases
                        matching wildcard positions to FILE. (Inaccurate with
                        indels.)
  --too-short-output FILE
                        Write reads that are too short (according to length
                        specified by -m) to FILE. Default: discard reads
  --too-long-output FILE
                        Write reads that are too long (according to length
                        specified by -M) to FILE. Default: discard reads
  --untrimmed-output FILE
                        Write reads that do not contain any adapter to FILE.
                        Default: output to same file as trimmed reads

Paired-end options:
  The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts, but are
  applied to the second read in each pair.

  -A ADAPTER            3' adapter to be removed from second read in a pair.
  -G ADAPTER            5' adapter to be removed from second read in a pair.
  -B ADAPTER            5'/3 adapter to be removed from second read in a pair.
  -U LENGTH             Remove LENGTH bases from second read in a pair.
  -p FILE, --paired-output FILE
                        Write second read in a pair to FILE.
  --pair-adapters       Treat adapters given with -a/-A etc. as pairs. Either both
                        or none are removed from each read pair.
  --pair-filter (any|both|first)
                        Which of the reads in a paired-end read have to match the
                        filtering criterion in order for the pair to be filtered.
                        Default: any
  --interleaved         Read and/or write interleaved paired-end reads.
  --untrimmed-paired-output FILE
                        Write second read in a pair to this FILE when no adapter
                        was found. Use with --untrimmed-output. Default: output to
                        same file as trimmed reads
  --too-short-paired-output FILE
                        Write second read in a pair to this file if pair is too
                        short.
  --too-long-paired-output FILE
                        Write second read in a pair to this file if pair is too
                        long.
(rnaseq) root 12:06:01 ~

就是一个简单的perl wrapper,打包了fastqc和cutadapt,但是却非常实用。

因为cutadapt的参数选择实在是有够复杂,光接头类型就有5种,还有各种参数,大哥,我就想去去接头、trim一下质量而已,你就不能自动搞了吗。不要给选择困难症的我这么多选择啊。

想自动化?trim_galore 完美的符合了你的需求,无需自己去查接头,全自动质量过滤,噢耶。

还能和mutilqc完美对接,生成网页版报告。

使用比较简单直接:

其他参数无需选择,默认的就可以了,是不是十分之自动化。

参见说明文档

你可能感兴趣的:(linux学习100篇39:转录组分析用软件及安装trim-galore)