金黄葡萄球菌RNA-seq数据分析

注:已经有了fastq.gz格式数据,一个对照,一个突变株,每个处理三个重复。

1 fastqc进行质量控制+结果解读

#将所有的数据进行质控,得到zip的压缩文件和html文件
fastqc -o .  *.fastq.gz
Started analysis of D1_R1.fastq.gz
Approx 5% complete for D1_R1.fastq.gz
Approx 10% complete for D1_R1.fastq.gz
Approx 15% complete for D1_R1.fastq.gz
Approx 20% complete for D1_R1.fastq.gz
Approx 25% complete for D1_R1.fastq.gz
Approx 30% complete for D1_R1.fastq.gz
Approx 35% complete for D1_R1.fastq.gz
Approx 40% complete for D1_R1.fastq.gz
Approx 45% complete for D1_R1.fastq.gz
Approx 50% complete for D1_R1.fastq.gz
Approx 55% complete for D1_R1.fastq.gz
Approx 60% complete for D1_R1.fastq.gz
Approx 65% complete for D1_R1.fastq.gz
Approx 70% complete for D1_R1.fastq.gz
Approx 75% complete for D1_R1.fastq.gz
Approx 80% complete for D1_R1.fastq.gz
Approx 85% complete for D1_R1.fastq.gz
Approx 90% complete for D1_R1.fastq.gz
Approx 95% complete for D1_R1.fastq.gz
Analysis complete for D1_R1.fastq.gz

结果


金黄葡萄球菌RNA-seq数据分析_第1张图片
fastqc result for D1_R1.png

2 下载金黄葡萄球菌基因组及基因组注释文件

金黄葡萄球菌基因组下载地址

3 HISAT2建立索引并序列比对

3.1建立index

hisat2-build GCF_000013425.1_ASM1342v1_genomic.fna sa_index

3.2序列比对

 hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
 hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R2.fastq.gz -S WT1_R1.sam
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:09:58
14200996 reads; of these:
  14200996 (100.00%) were paired; of these:
    14180176 (99.85%) aligned concordantly 0 times
    13590 (0.10%) aligned concordantly exactly 1 time
    7230 (0.05%) aligned concordantly >1 times
    ----
    14180176 pairs aligned concordantly 0 times; of these:
      135 (0.00%) aligned discordantly 1 time
    ----
    14180041 pairs aligned 0 times concordantly or discordantly; of these:
      28360082 mates make up the pairs; of these:
        28212046 (99.48%) aligned 0 times
        146051 (0.51%) aligned exactly 1 time
        1985 (0.01%) aligned >1 times
0.67% overall alignment rate
Time searching: 00:09:58
Overall time: 00:09:58
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:15:25
15979802 reads; of these:
  15979802 (100.00%) were paired; of these:
    3149851 (19.71%) aligned concordantly 0 times
    12801703 (80.11%) aligned concordantly exactly 1 time
    28248 (0.18%) aligned concordantly >1 times
    ----
    3149851 pairs aligned concordantly 0 times; of these:
      87906 (2.79%) aligned discordantly 1 time
    ----
    3061945 pairs aligned 0 times concordantly or discordantly; of these:
      6123890 mates make up the pairs; of these:
        4423631 (72.24%) aligned 0 times
        1695827 (27.69%) aligned exactly 1 time
        4432 (0.07%) aligned >1 times
86.16% overall alignment rate
Time searching: 00:15:34
Overall time: 00:15:34

这里出现问题了,突变株的比对率太低,不到1%,这是不可能的,怀疑样品污染,然后随机挑选了5条序列blast了下,发现应该是被溶血葡萄球菌污染。

4 下载溶血葡萄球菌基因组序列

下载地址

4.1 建立index文件

hisat2-build GCF_000009865.1_ASM986v1_genomic.fna haemo_sa_index

4.2 突变组数据比对到溶血葡萄球菌基因组

 hisat2 -t -x haemo_sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/ D1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
#D1_R1
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:15:01
14200996 reads; of these:
  14200996 (100.00%) were paired; of these:
    2971894 (20.93%) aligned concordantly 0 times
    11205145 (78.90%) aligned concordantly exactly 1 time
    23957 (0.17%) aligned concordantly >1 times
    ----
    2971894 pairs aligned concordantly 0 times; of these:
      67246 (2.26%) aligned discordantly 1 time
    ----
    2904648 pairs aligned 0 times concordantly or discordantly; of these:
      5809296 mates make up the pairs; of these:
        4179316 (71.94%) aligned 0 times
        1622752 (27.93%) aligned exactly 1 time
        7228 (0.12%) aligned >1 times
85.29% overall alignment rate
Time searching: 00:15:01
Overall time: 00:15:01
#D2_R2
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:19:54
18272739 reads; of these:
  18272739 (100.00%) were paired; of these:
    3984869 (21.81%) aligned concordantly 0 times
    14260046 (78.04%) aligned concordantly exactly 1 time
    27824 (0.15%) aligned concordantly >1 times
    ----
    3984869 pairs aligned concordantly 0 times; of these:
      83138 (2.09%) aligned discordantly 1 time
    ----
    3901731 pairs aligned 0 times concordantly or discordantly; of these:
      7803462 mates make up the pairs; of these:
        5671806 (72.68%) aligned 0 times
        2110960 (27.05%) aligned exactly 1 time
        20696 (0.27%) aligned >1 times
84.48% overall alignment rate
Time searching: 00:24:25
Overall time: 00:24:25
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:18:10
17122975 reads; of these:
  17122975 (100.00%) were paired; of these:
    3511683 (20.51%) aligned concordantly 0 times
    13593051 (79.38%) aligned concordantly exactly 1 time
    18241 (0.11%) aligned concordantly >1 times
    ----
    3511683 pairs aligned concordantly 0 times; of these:
      74659 (2.13%) aligned discordantly 1 time
    ----
    3437024 pairs aligned 0 times concordantly or discordantly; of these:
      6874048 mates make up the pairs; of these:
        5027355 (73.14%) aligned 0 times
        1838519 (26.75%) aligned exactly 1 time
        8174 (0.12%) aligned >1 times
85.32% overall alignment rate
Time searching: 00:18:10
Overall time: 00:18:10

5 金黄葡萄球菌和溶血葡萄球菌基因组基本数据比较

金黄葡萄球菌

Total_Len:  2697861
Total_Seq_Num : 4
Total_N_Counts: 0
Total_LowCase_Counts:   0
Total_GC_content:   0.33
Minimum Len:    2300
Maximum Len:    2685015
Mean Len:   674,465.25
Median Len: 1,346,597.5
N50:    2685015

溶血葡萄球菌

Total_Len:  2821361
Total_Seq_Num : 1
Total_N_Counts: 1
Total_LowCase_Counts:   0
Total_GC_content:   0.33
Minimum Len:    2821361
Maximum Len:    2821361
Mean Len:   2,821,361
Median Len: 2,821,361
N50:    2821361

6 金黄葡萄球菌和溶血葡萄球菌基因组比对结果分析

# 710 hits found
query id        subject id      identity%       alignment length        mismatches      gap opens       q. start        q. end  s. start        s. end  evalue  bit score
NC_007795.1     NC_007168.1     89.55   14914   1410    111     2304062 2318871 833494  818625  0       18774
NC_007795.1     NC_007168.1     94.84   7822    241     100     1899018 1906760 1110575 1102838 0       12056
NC_007795.1     NC_007168.1     95.18   5392    171     58      448540  453908  2544291 2538966 0       8434
NC_007795.1     NC_007168.1     95.95   5209    140     47      492993  498180  2544133 2538975 0       8384
NC_007795.1     NC_007168.1     96.04   5156    139     40      492993  498130  879727  884835  0       8331
NC_007795.1     NC_007168.1     95.95   5161    143     42      448709  453850  879722  884835  0       8312
NC_007795.1     NC_007168.1     77.9    12405   2498    209     978268  990541  1940909 1928618 0       7492
NC_007795.1     NC_007168.1     91.12   5045    394     45      529559  534584  2461838 2456829 0       6785
NC_007795.1     NC_007168.1     86.38   5706    668     76      520058  525712  2471424 2465777 0       6131
NC_007795.1     NC_007168.1     96.47   3683    90      21      2238680 2242354 885361  881711  0       6045
NC_007795.1     NC_007168.1     78.25   9179    1738    197     1588304 1597364 1376275 1367237 0       5655
NC_007795.1     NC_007168.1     96.42   3407    103     10      2122741 2126140 976287  972893  0       5598
NC_007795.1     NC_007168.1     96.36   3411    105     11      2239223 2242626 976292  972894  0       5594
NC_007795.1     NC_007168.1     92.35   3895    205     44      450529  454385  2498609 2494770 0       5456
NC_007795.1     NC_007168.1     96.22   3336    86      22      494809  498137  1104945 1108247 0       5426
NC_007795.1     NC_007168.1     96.33   3324    83      21      1901348 1904666 2539029 2542318 0       5426
NC_007795.1     NC_007168.1     96.21   3324    88      19      1901348 1904666 884833  881543  0       5406
NC_007795.1     NC_007168.1     96.18   3327    89      20      494808  498130  2498609 2495317 0       5406
NC_007795.1     NC_007168.1     96.1    3336    89      23      450530  453857  1104945 1108247 0       5402
NC_007795.1     NC_007168.1     95.49   3411    106     28      1901348 1904745 2495319 2498694 0       5402
NC_007795.1     NC_007168.1     97.14   3184    76      9       2122692 2125867 2538975 2542151 0       5361
NC_007795.1     NC_007168.1     97.06   3168    79      9       2239193 2242354 1108272 1105113 0       5323
NC_007795.1     NC_007168.1     97.35   3133    72      6       2239226 2242354 2539026 2542151 0       5315
NC_007795.1     NC_007168.1     97.17   3149    78      6       494986  498130  973146  976287  0       5312
NC_007795.1     NC_007168.1     94.5    3472    144     20      450707  454157  973146  976591  0       5310
NC_007795.1     NC_007168.1     97.32   3131    74      4       2122741 2125867 884835  881711  0       5308

7 话外

你可能感兴趣的:(金黄葡萄球菌RNA-seq数据分析)