1. MutiQC的安装和配置
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ curl -LOk https://github.com/ewels/MultiQC/archive/master.zip
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ l
master.zip rawdata/ rnaseq_workshop_slides.pdf trinityrnaseq-Trinity-v2.8.3/ tuxedo_nprot.2012.016.pdf RNASeq_Trinity_Tuxedo_Workshop/ TrinityNatureProtocol.nprot.2013.084.pdf Trinity-v2.8.3.tar
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ unzip master.zip
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ cd MultiQC-master/
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/MultiQC-master$ sudo python setup.py install
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/MultiQC-master$ multiqc --help
Usage: multiqc [OPTIONS]
MultiQC aggregates results from bioinformatics analyses across many
samples into a single report.
It searches a given directory for analysis logs and compiles a HTML
report. It's a general use tool, perfect for summarising the output from
numerous bioinformatics tools.
To run, supply with one or more directory to scan for analysis results. To
run here, use 'multiqc .'
See http://multiqc.info for more details.
Author: Phil Ewels (http://phil.ewels.co.uk)
-f, --force Overwrite any existing reports
-d, --dirs Prepend directory to sample names
-dd, --dirs-depth INTEGER Prepend [INT] directories to sample names.
Negative number to take from start of path.
-s, --fullnames Do not clean the sample names (leave as full
file name)
-i, --title TEXT Report title. Printed as page header, used
for filename if not otherwise specified.
-b, --comment TEXT Custom comment, will be printed at the top
of the report.
-n, --filename TEXT Report filename. Use 'stdout' to print to
standard out.
-o, --outdir TEXT Create report in the specified output
-t, --template [default|default_dev|geo|simple|sections]
Report template to use.
--tag TEXT Use only modules which tagged with this
keyword, eg. RNA
--view-tags, --view_tags View the available tags and which modules
they load
-x, --ignore TEXT Ignore analysis files (glob expression)
--ignore-samples TEXT Ignore sample names (glob expression)
--ignore-symlinks Ignore symlinked directories and files
--sample-names PATH File containing alternative sample names
-l, --file-list Supply a file containing a list of file
paths to be searched, one per row
-e, --exclude [module name] Do not use this module. Can specify multiple
-m, --module [module name] Use only this module. Can specify multiple
--data-dir Force the parsed data directory to be
--no-data-dir Prevent the parsed data directory from being
-k, --data-format [tsv|yaml|json]
Output parsed data in a different format.
Default: tsv
-z, --zip-data-dir Compress the data directory.
-p, --export Export plots as static images in addition to
the report
-fp, --flat Use only flat plots (static images)
-ip, --interactive Use only interactive plots (HighCharts
--lint Use strict linting (validation) to help code
--pdf Creates PDF report with 'simple' template.
Requires Pandoc to be installed.
--no-megaqc-upload Don't upload generated report to MegaQC,
even if MegaQC options are found
-c, --config PATH Specific config file to load, after those in
MultiQC dir / home dir / working dir.
--cl-config, --cl_config TEXT Specify MultiQC config YAML on the command
-v, --verbose Increase output verbosity.
-q, --quiet Only show log warnings
--version Show the version and exit.
-h, --help Show this message and exit.
2. 在服务器端的FastQC运行
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ pwd
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ l
B251_1.fq.gz B251_2.fq.gz B252_1.fq.gz B252_2.fq.gz R251_1.fq.gz R251_2.fq.gz R252_1.fq.gz R252_2.fq.gz W251_1.fq.gz W251_2.fq.gz W252_1.fq.gz W252_2.fq.gz rd_md5.txt
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ l
B251_1.fq.gz B251_2.fq.gz B252_1.fq.gz B252_2.fq.gz R251_1.fq.gz R251_2.fq.gz R252_1.fq.gz R252_2.fq.gz W251_1.fq.gz W251_2.fq.gz W252_1.fq.gz W252_2.fq.gz rd_md5.txt
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ fastqc *.gz
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US:en",
LC_ALL = (unset),
LC_PAPER = "zh_CN.UTF-8",
LC_TIME = "zh_CN.UTF-8",
LC_NAME = "zh_CN.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Started analysis of B251_1.fq.gz
Approx 5% complete for B251_1.fq.gz
Approx 10% complete for B251_1.fq.gz
Approx 15% complete for B251_1.fq.gz
Approx 30% complete for W252_2.fq.gz
Approx 35% complete for W252_2.fq.gz
Approx 40% complete for W252_2.fq.gz
Approx 45% complete for W252_2.fq.gz
Approx 50% complete for W252_2.fq.gz
Approx 55% complete for W252_2.fq.gz
Approx 60% complete for W252_2.fq.gz
Approx 65% complete for W252_2.fq.gz
Approx 70% complete for W252_2.fq.gz
Approx 75% complete for W252_2.fq.gz
Approx 80% complete for W252_2.fq.gz
Approx 85% complete for W252_2.fq.gz
Approx 90% complete for W252_2.fq.gz
Approx 95% complete for W252_2.fq.gz
Analysis complete for W252_2.fq.gz
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ l
B251_1.fq.gz B251_2_fastqc.html B252_1_fastqc.zip R251_1.fq.gz R251_2_fastqc.html R252_1_fastqc.zip W251_1.fq.gz W251_2_fastqc.html W252_1_fastqc.zip rd_md5.txt
B251_1_fastqc.html B251_2_fastqc.zip B252_2.fq.gz R251_1_fastqc.html R251_2_fastqc.zip R252_2.fq.gz W251_1_fastqc.html W251_2_fastqc.zip W252_2.fq.gz
B251_1_fastqc.zip B252_1.fq.gz B252_2_fastqc.html R251_1_fastqc.zip R252_1.fq.gz R252_2_fastqc.html W251_1_fastqc.zip W252_1.fq.gz W252_2_fastqc.html
B251_2.fq.gz B252_1_fastqc.html B252_2_fastqc.zip R251_2.fq.gz R252_1_fastqc.html R252_2_fastqc.zip W251_2.fq.gz W252_1_fastqc.html W252_2_fastqc.zip
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ mv *.html Fastqcresult/
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ mv *.zip Fastqcresult/
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ l
B251_1.fq.gz B251_2.fq.gz B252_1.fq.gz B252_2.fq.gz Fastqcresult/ R251_1.fq.gz R251_2.fq.gz R252_1.fq.gz R252_2.fq.gz W251_1.fq.gz W251_2.fq.gz W252_1.fq.gz W252_2.fq.gz rd_md5.txt
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ scp yeyt@220:/home/yeyt/biodata/NH160034/NH160034/rawdata/Fastqcresult/* .
B251_1_fastqc.html 100% 299KB 299.1KB/s 00:01
B251_1_fastqc.zip 100% 375KB 187.7KB/s 00:02
B251_2_fastqc.html 100% 308KB 307.8KB/s 00:01
B251_2_fastqc.zip 100% 389KB 389.1KB/s 00:01
B252_1_fastqc.html 100% 308KB 308.0KB/s 00:01
B252_1_fastqc.zip 100% 389KB 389.4KB/s 00:01
B252_2_fastqc.html 100% 310KB 309.6KB/s 00:00
B252_2_fastqc.zip 100% 392KB 392.5KB/s 00:00
R251_1_fastqc.html 100% 300KB 300.0KB/s 00:01
R251_1_fastqc.zip 100% 377KB 377.3KB/s 00:00
R251_2_fastqc.html 100% 304KB 304.4KB/s 00:00
R251_2_fastqc.zip 100% 384KB 384.1KB/s 00:00
R252_1_fastqc.html 100% 302KB 302.5KB/s 00:00
R252_1_fastqc.zip 100% 381KB 381.2KB/s 00:00
R252_2_fastqc.html 100% 306KB 305.9KB/s 00:00
R252_2_fastqc.zip 100% 387KB 386.7KB/s 00:01
W251_1_fastqc.html 100% 300KB 300.3KB/s 00:00
W251_1_fastqc.zip 100% 378KB 377.6KB/s 00:01
W251_2_fastqc.html 100% 313KB 313.3KB/s 00:00
W251_2_fastqc.zip 100% 398KB 397.9KB/s 00:00
W252_1_fastqc.html 100% 306KB 306.4KB/s 00:00
W252_1_fastqc.zip 100% 387KB 387.4KB/s 00:00
W252_2_fastqc.html 100% 310KB 310.0KB/s 00:00
W252_2_fastqc.zip 100% 392KB 391.7KB/s 00:01
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ l
B251_1_fastqc.html B252_1_fastqc.zip R251_1_fastqc.html R252_1_fastqc.zip rnaseq_workshop_slides.pdf W251_1_fastqc.html W252_1_fastqc.zip
B251_1_fastqc.zip B252_2_fastqc.html R251_1_fastqc.zip R252_2_fastqc.html TrinityNatureProtocol.nprot.2013.084.pdf W251_1_fastqc.zip W252_2_fastqc.html
B251_2_fastqc.html B252_2_fastqc.zip R251_2_fastqc.html R252_2_fastqc.zip trinityrnaseq-Trinity-v2.8.3/ W251_2_fastqc.html W252_2_fastqc.zip
B251_2_fastqc.zip master.zip R251_2_fastqc.zip rawdata/ Trinity-v2.8.3.tar W251_2_fastqc.zip
B252_1_fastqc.html MultiQC-master/ R252_1_fastqc.html RNASeq_Trinity_Tuxedo_Workshop/ tuxedo_nprot.2012.016.pdf W252_1_fastqc.html
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ mkdir Fastqcresult
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ mv *zip Fastqcresult/
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ mv *html Fastqcresult/
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ l
Fastqcresult/ rawdata/ rnaseq_workshop_slides.pdf trinityrnaseq-Trinity-v2.8.3/ tuxedo_nprot.2012.016.pdf
MultiQC-master/ RNASeq_Trinity_Tuxedo_Workshop/ TrinityNatureProtocol.nprot.2013.084.pdf Trinity-v2.8.3.tar
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest$ cd Fastqcresult/
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ l
B251_1_fastqc.html B251_2_fastqc.zip B252_2_fastqc.html R251_1_fastqc.html R251_2_fastqc.zip R252_2_fastqc.html W251_1_fastqc.zip W252_1_fastqc.html W252_2_fastqc.zip
B251_1_fastqc.zip B252_1_fastqc.html B252_2_fastqc.zip R251_1_fastqc.zip R252_1_fastqc.html R252_2_fastqc.zip W251_2_fastqc.html W252_1_fastqc.zip
B251_2_fastqc.html B252_1_fastqc.zip master.zip R251_2_fastqc.html R252_1_fastqc.zip W251_1_fastqc.html W251_2_fastqc.zip W252_2_fastqc.html
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ multiqc ./
[INFO ] multiqc : This is MultiQC v1.7.dev0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching './'
Searching 25 files.. [####################################] 100%
[INFO ] fastqc : Found 12 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ l
B251_1_fastqc.html B251_2_fastqc.zip B252_2_fastqc.html multiqc_data/ R251_1_fastqc.zip R252_1_fastqc.html R252_2_fastqc.zip W251_2_fastqc.html W252_1_fastqc.zip
B251_1_fastqc.zip B252_1_fastqc.html B252_2_fastqc.zip multiqc_report.html R251_2_fastqc.html R252_1_fastqc.zip W251_1_fastqc.html W251_2_fastqc.zip W252_2_fastqc.html
B251_2_fastqc.html B252_1_fastqc.zip master.zip R251_1_fastqc.html R251_2_fastqc.zip R252_2_fastqc.html W251_1_fastqc.zip W252_1_fastqc.html W252_2_fastqc.zip
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ cd multiqc_data/
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult/multiqc_data$ l
multiqc_data.json multiqc_fastqc.txt multiqc_general_stats.txt multiqc.log multiqc_sources.txt
从以上结果我们可以看到,Reads长度是150bp,并且rawdata中一个Run含有25M条序列。对于双端测序来讲,这个测序结果的数据量为: 150bp × 25 M × 2 ends = 7.5 G
3. 在服务器端运行Trimomatic进行Reads的数据清洗
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip
--2018-09-13 00:55:35-- http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip
Connecting to connected.
Proxy request sent, awaiting response... 200 OK
Length: 132647 (130K) [application/zip]
Saving to: ‘Trimmomatic-0.38.zip’
Trimmomatic-0.38.zip 100%[================================================================================================================>] 129.54K 114KB/s in 1.1s
2018-09-13 00:55:38 (114 KB/s) - ‘Trimmomatic-0.38.zip’ saved [132647/132647]
yeyuntian@yeyuntian-rescuer-r720-15ikbn:~/trinitytest/Fastqcresult$ scp Trimmomatic-0.38.zip yeyt@220:/home/yeyt/biosoft/
yeyt@ubuntu:~/biosoft$ l
MultiQC-master/ Trimmomatic-0.38.zip bin/ iqtree-1.6.7-Linux/ lib/ master.zip
yeyt@ubuntu:~/biosoft$ unzip Trimmomatic-0.38.zip
Archive: Trimmomatic-0.38.zip
creating: Trimmomatic-0.38/
inflating: Trimmomatic-0.38/LICENSE
inflating: Trimmomatic-0.38/trimmomatic-0.38.jar
creating: Trimmomatic-0.38/adapters/
inflating: Trimmomatic-0.38/adapters/NexteraPE-PE.fa
inflating: Trimmomatic-0.38/adapters/TruSeq2-PE.fa
inflating: Trimmomatic-0.38/adapters/TruSeq2-SE.fa
inflating: Trimmomatic-0.38/adapters/TruSeq3-PE-2.fa
inflating: Trimmomatic-0.38/adapters/TruSeq3-PE.fa
inflating: Trimmomatic-0.38/adapters/TruSeq3-SE.fa
yeyt@ubuntu:~/biosoft$ l
MultiQC-master/ Trimmomatic-0.38/ Trimmomatic-0.38.zip bin/ iqtree-1.6.7-Linux/ lib/ master.zip
yeyt@ubuntu:~/biosoft$ cd Trimmomatic-0.38/
yeyt@ubuntu:~/biosoft/Trimmomatic-0.38$ l
LICENSE adapters/ trimmomatic-0.38.jar
yeyt@ubuntu:~/biosoft/Trimmomatic-0.38$ pwd
yeyt@ubuntu:~/biosoft/Trimmomatic-0.38$ java -jar trimmomatic-0.38.jar
PE [-version] [-threads ] [-phred33|-phred64] [-trimlog ] [-summary ] [-quiet] [-validatePairs] [-basein | ] [-baseout | ] ...
SE [-version] [-threads ] [-phred33|-phred64] [-trimlog ] [-summary ] [-quiet] ...
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ cat trimmomaitc.sh
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 B251_1.fq.gz B251_2.fq.gz B251_1.P.fq.gz B251_1.UP.fq.gz B251_2.P.fq.gz B251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 B252_1.fq.gz B252_2.fq.gz B252_1.P.fq.gz B252_1.UP.fq.gz B252_2.P.fq.gz B252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 R251_1.fq.gz R251_2.fq.gz R251_1.P.fq.gz R251_1.UP.fq.gz R251_2.P.fq.gz R251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 R252_1.fq.gz R252_2.fq.gz R252_1.P.fq.gz R252_1.UP.fq.gz R252_2.P.fq.gz R252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 W251_1.fq.gz W251_2.fq.gz W251_1.P.fq.gz W251_1.UP.fq.gz W251_2.P.fq.gz W251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
nohup java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 3 W252_1.fq.gz W252_2.fq.gz W252_1.P.fq.gz W252_1.UP.fq.gz W252_2.P.fq.gz W252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33 &
#java -jar ~/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar 是启动该jar程序
#PE -threads 12 是指明处理数据为Pair-End的数据类型,并且采用计算线程为12
#B251_1.fq.gz B251_2.fq.gz 为双端测序的两个RUN文件
# B251_1.P.fq.gz B251_1.UP.fastq.gz B251_2.P.fq.gz B251_2.UP.fq.gz 这四个文件为输出文件
#HEADCROP:13 MINLEN:50 TOPHRED33 为剪切参数其含义为: 去掉5端开头13个碱基,然后去掉低于50bp的reads,并且将fastq质量格式转为phred33格式
yeyt@ubuntu:~$ bash trimmomaitc.sh
yeyt@ubuntu:~/biodata/NH160034/NH160034/rawdata$ cat nohup.out
TrimmomaticPE: Started with arguments:
-threads 3 B252_1.fq.gz B252_2.fq.gz B252_1.P.fq.gz B252_1.UP.fq.gz B252_2.P.fq.gz B252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
TrimmomaticPE: Started with arguments:
-threads 3 W251_1.fq.gz W251_2.fq.gz W251_1.P.fq.gz W251_1.UP.fq.gz W251_2.P.fq.gz W251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
TrimmomaticPE: Started with arguments:
-threads 3 R251_1.fq.gz R251_2.fq.gz R251_1.P.fq.gz R251_1.UP.fq.gz R251_2.P.fq.gz R251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
TrimmomaticPE: Started with arguments:
-threads 3 W252_1.fq.gz W252_2.fq.gz W252_1.P.fq.gz W252_1.UP.fq.gz W252_2.P.fq.gz W252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
TrimmomaticPE: Started with arguments:
-threads 3 R252_1.fq.gz R252_2.fq.gz R252_1.P.fq.gz R252_1.UP.fq.gz R252_2.P.fq.gz R252_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
TrimmomaticPE: Started with arguments:
-threads 3 B251_1.fq.gz B251_2.fq.gz B251_1.P.fq.gz B251_1.UP.fq.gz B251_2.P.fq.gz B251_2.UP.fq.gz HEADCROP:18 MINLEN:50 TOPHRED33
Quality encoding detected as phred33
Quality encoding detected as phred33
Quality encoding detected as phred33
Quality encoding detected as phred33
Quality encoding detected as phred33
Quality encoding detected as phred33
Input Read Pairs: 23929511 Both Surviving: 23929511 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Input Read Pairs: 24577100 Both Surviving: 24577100 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Input Read Pairs: 24423445 Both Surviving: 24423445 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Input Read Pairs: 24498964 Both Surviving: 24498964 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Input Read Pairs: 25553075 Both Surviving: 25553075 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Input Read Pairs: 28213701 Both Surviving: 28213701 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully