基因数据处理48之ART使用实例

相关参数请见上一篇

1.使用实例1:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -f 20 -o G38L100F20Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17, 2016)       
     Contact: Weichun Huang @gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 1162.71

The random seed for the run: 1464879720

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            20X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100F20Nhs20.fq

  ALN Alignment File:
    G38L100F20Nhs20.aln

2.使用实例2:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS25 -sam -i GRCH38chr1L3556522.fna -p -l 150 -f 20 -m 200 -s 10 -o paired_dat

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17, 2016)       
     Contact: Weichun Huang @gmail.com> 
    -------------------------------------------

                  Paired-end sequencing simulation

Total CPU time used: 1070.33

The random seed for the run: 1464880583

Parameters used during run
    Read Length:    150
    Genome masking 'N' cutoff frequency:    1 in 150
    Fold Coverage:            20X
    Mean Fragment Length:     200
    Standard Deviation:       10
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2500 Length 150 R1 (built-in profile) 
    First Read:   HiSeq 2500 Length 150 R2 (built-in profile) 

Output files

  FASTQ Sequence Files:
     the 1st reads: paired_dat1.fq
     the 2nd reads: paired_dat2.fq

  ALN Alignment Files:
     the 1st reads: paired_dat1.aln
     the 2nd reads: paired_dat2.aln

  SAM Alignment File:
    paired_dat.sam

查看文件:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ll -h
total 50G
drwxrwxr-x 2 hadoop hadoop 4.0K  62 23:16 ./
drwxrwxr-x 6 hadoop hadoop 4.0K  62 22:59 ../
-rw-rw-r-- 1 hadoop hadoop  11G  62 23:29 G38L100F20Nhs20.aln
-rw-rw-r-- 1 hadoop hadoop 9.4G  62 23:29 G38L100F20Nhs20.fq
-rw-r--r-- 1 hadoop hadoop 241M  62 23:00 GRCH38chr1L3556522.fna
-rw-rw-r-- 1 hadoop hadoop 2.5K  62 23:09 GRCH38chr1L3556522.fna.amb
-rw-rw-r-- 1 hadoop hadoop  144  62 23:09 GRCH38chr1L3556522.fna.ann
-rw-rw-r-- 1 hadoop hadoop 238M  62 23:09 GRCH38chr1L3556522.fna.bwt
-rw-rw-r-- 1 hadoop hadoop  60M  62 23:09 GRCH38chr1L3556522.fna.pac
-rw-rw-r-- 1 hadoop hadoop 119M  62 23:10 GRCH38chr1L3556522.fna.sa
-rw-rw-r-- 1 hadoop hadoop 4.9G  62 23:42 paired_dat1.aln
-rw-rw-r-- 1 hadoop hadoop 4.6G  62 23:42 paired_dat1.fq
-rw-rw-r-- 1 hadoop hadoop 4.8G  62 23:42 paired_dat2.aln
-rw-rw-r-- 1 hadoop hadoop 4.6G  62 23:42 paired_dat2.fq
-rw-rw-r-- 1 hadoop hadoop  11G  62 23:42 paired_dat.sam

生成文件都好大

3.制定每条序列产生的reads数: (产生的数据变小了)

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 50 -o G38L100c50Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17, 2016)       
     Contact: Weichun Huang @gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 15.96

The random seed for the run: 1464918709

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            0X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100c50Nhs20.fq

  ALN Alignment File:
    G38L100c50Nhs20.aln

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ls
G38L100c50Nhs20.aln  G38L100F20Nhs20.aln  GRCH38chr1L3556522.fna      GRCH38chr1L3556522.fna.ann  GRCH38chr1L3556522.fna.pac  paired_dat1.aln  paired_dat2.aln  paired_dat.sam
G38L100c50Nhs20.fq   G38L100F20Nhs20.fq   GRCH38chr1L3556522.fna.amb  GRCH38chr1L3556522.fna.bwt  GRCH38chr1L3556522.fna.sa   paired_dat1.fq   paired_dat2.fq
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ll
total 51506772
drwxrwxr-x 2 hadoop hadoop        4096  63 09:51 ./
drwxrwxr-x 6 hadoop hadoop        4096  62 22:59 ../
-rw-rw-r-- 1 hadoop hadoop       11400  63 09:52 G38L100c50Nhs20.aln
-rw-rw-r-- 1 hadoop hadoop       10428  63 09:52 G38L100c50Nhs20.fq

4.生成一条数据:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17, 2016)       
     Contact: Weichun Huang @gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 15.82

The random seed for the run: 1464918910

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            0X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100c1Nhs20.fq

  ALN Alignment File:
    G38L100c1Nhs20.aln

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.
cat: G38L100c1Nhs20.: No such file or directory
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.fq
@chr1-1
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT
+
@C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)chr1   chr1-1  225496693   +
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT

5.使用bwa验证:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.sam
@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa samse GRCH38chr1L3556522.fna G38L100c1Nhs20.sai G38L100c1Nhs20.fq
chr1-1  0   chr1    225496694   37  100M    *   0   0   CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT    @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)chr1   chr1-1  225496693   +
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT

可以发现art产生的数据是从位置0开始,跟Adam一致,bwa是从一开始
如何自动判断bwa等算法的准确率?

6.用snap验证:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.snap.sam 
@HD VN:1.4  SO:unsorted
@RG ID:FASTQ    PL:Illumina PU:pu   LB:lb   SM:sm
@PG ID:SNAP PN:SNAP CL:single index G38L100c1Nhs20.fq -o G38L100c1Nhs20.snap.sam    VN:1.0beta.23
@SQ SN:chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38   LN:248956422
chr1-1  0   chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38  225496694   70  100M    *   0   0   CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT    @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)chr1   chr1-1  225496693   +
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT

附录
(1) 50条数据bwa对比:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c50Nhs20.sam
@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa samse GRCH38chr1L3556522.fna G38L100c50Nhs20.sai G38L100c50Nhs20.fq
chr1-50 0   chr1    93465785    37  100M    *   0   0   TTCCACAATAGTTGAACTAATTTACAGTCCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAA    @@CDFDFDHFHGHIJH:IJJJ(JJE?JDIDEJIB@FGJIGBHJ()HG8(CIICGFFHEH=GI3@&@DD58FADDACHDDHFCD8D,DCCXT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-48 0   chr1    228133746   37  100M    *   0   0   ATCATTGTATGCCACAGAAATAATTAAATTTCCTTGTCAACTGACACATTATTATTAGGCACTCTCACCAGATCTTTACCCATGGCCATTTAAAGTGTGG    @>CFFFFFH<@D(:EFDDC@;DDAC95(D?BD    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:44G55
chr1-47 0   chr1    13772988    37  100M    *   0   0   TTCAGTAATTCAGAATAACACATGAGGGAATGAATGAATGAATAAATAAAAAAAAACTGAATGAATAAATTACAAAAAATTGTGTTTCAGGGAAGAAAAA    CC@F(FFFDFH.HDHIGI(JIIIGGIEEJIIIHJJHHH3IJJIIJ3=EI>JDIGH((IBJCIEHGD>;J@HF+DC)CCCADBDBD+BDDDD5B5DDDE(C    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:56A15A27
chr1-46 16  chr1    37474758    37  100M    *   0   0   GGGTCGGGGTCCTGTTCCCCGGTCCGCCGGGCCTCAGGACCCCTCCAACTTTGCCCAAGTTGGGAGAGCCGGGGAAGAGCACCAGGTTCCTGATCGGGAT    (5CBACDDD>FBDDDDDEC:CE(CBDFDDHEFH;FGEFHGHDGJJJDIGI:JEHJ=JJJJJH8CI?JJJG9JIII>IJIIGJ=EIJGAHHHHFFDFDCC?    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:22C77
chr1-45 0   chr1    29056657    37  100M    *   0   0   CTGGGATTACAGGTGCCCGCCACCATGCCCAGCTAATTTTTGTATTTTTGGTAGAGACAAGGTTTCACCATGTTGGCCGGGATTGTCTCGAACTCCTGAT    B@@FFFFFHHG)HIJJJJBJIJCJHGJIBFJJI3IIHDF@JIAJ9JJJIJJBIJJ?BJID8F:HFHA(+D>J>CG>7D=DDFF@EDC3D@BD@B    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-44 0   chr1    49993893    37  100M    *   0   0   CAATTTAGCCAAAACTGGCTAATCGTTTTACCAGAATCATTCCCATTGTTCAAGACCTATTTTAAGCTCCACTATCACCATAAAACTTTCCCGATCAGTT    C@CFFFFDHHHHH@JJJHI)IDIBIJA:HJHFJJJIJGGJJIIIIHGGJJGH<(IIIJI?ICDG;CDHFHCDCCB?FDDED:CD:>DD5C&DDCD    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:24C75
chr1-43 16  chr1    194714506   37  100M    *   0   0   AATATGTTTTAATAATATCATATTTAAATTTGATGATACTTTAAAAATGGTTCCATGTGTGTTCTCTTGGGTTATTTCACAATCAATAAAAGGTCTGCAA    CCCCDDC@E>CDCDDC>D9CD=C)CGC>E@7.HF)DIBJBJJ.JEJEJ@JJIIIIGD?@B    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-42 0   chr1    35706203    37  100M    *   0   0   CAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAGGCACGTGCCACCATGCCTGGCAATTTTTGTATTTTTAGTACAGATGGGG    CC@FDFFAHHGFHJIHJFJJII=@JEHIJIIJIJEJIJJHHGIJBBFJG6JJHJJGXT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-41 16  chr1    156482338   37  100M    *   0   0   GTGTGTGCATAGGCAGGTCTGCGTGTACATGCAACGTGGGCACGTGTCCATGTGGATGCAGGCGGGGGTATATCCTGGTGCCTGTGTGTATGGGCCCACC    D;CCDDCDCDDDCD:EDA@@CJJI,=FHJJIGJ7GEC?IGJJIFBBICHJEIJJHHAIJIJI.IJGJJGJJHHGHHFFFFFB=B    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-40 16  chr1    221779284   37  100M    *   0   0   CATGGCACATAGCACTTTGGTGATGGGGACTGCTTTGCTAATGTCAGGGTCAAGGGGTGCATGGACCATGGGCAGAGTGCTGGGCTCAGCCAAATGGTTC    DDBCDCDDDDDDD25F?DD@4I5HED?CAHGA?JJIIJB)IHFJJFCJII?@XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:39G60
chr1-39 16  chr1    3895605 37  100M    *   0   0   GTCCTCTCCGGATTGACAGGAGTCAAAACATGAGATCGGCTTAGCTTCAGTTTCGTCATGGATTAACCACCTCCAAGGTGTCAACTCCAAAATGTCAAGA    DD5CCAD&8DAD>D&FDDDCDBDD?6DD.FHDDIFE?@IDEGIBCGD?JFJ>JGBI,IJIF.JJIHJJJEIEGFJ=JJHJHHJFHIJHHHHHFFAFFC@B    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-38 16  chr1    33174926    37  100M    *   0   0   CACACATACATATATGTGTGTATATATATATATATATATATATACACACATATACATATATATGCACACACACATGTATGTATATGTATATGTATATGTG    CDDC(FBDC(AACBDDCBDDECEC5@H;HFDJFH>=FCHAHJFJ'H3JG9JFEHIJFDJJ9IJHEJIGJIJJJJJC;J?AJFJGEHFHDC>D?DDBDFB)DDDDDC5(9>F;G)FB84/AJE3JJIJIGIGJBBIGCJCJGJGHJIDJ>IB7IGJGEGCIIGFJJJEFHIIJHHF=HFF8=F??=    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:14C85
chr1-34 16  chr1    12934213    0   100M    *   0   0   TTTTGATACTTTTGATGTGGCCAAAGGTTCTCCAATAAAGATACCATATATAAATATATGTATTTCTAATGTCTGAAACAGATTAAAACCTTCCCTGTAT    D@CB?DEDCEDDD(DC>F>DEHE>[email protected]'5I8IBFJHDI=CIIJ8JFHIBJJI0IJGFFJGIIJJABH<)IFJJJIEDHHHHAFFFFFCC@    XT:A:R  NM:i:0  X0:i:2  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100    XA:Z:chr1,+13267477,100M,0;
chr1-33 0   chr1    48968233    37  100M    *   0   0   TAATAGTAGGCAATAAACAAAGAGAGCAACTTAGGAGCCAGATCACATGTGGCCGCTCGAGCAATATGGTAAAAGTTCTGGACTTCATTCTAGGTGAATG    1CCB=FFFHHHHHEDHJIIAFG4JIFJIJB)JJI?(&JJIJJEE)HIJJBJ?HJ(=B(I@?I?8DC8C>JHJH>@EDFDD5DDDDDDDDCFD:=DCC(DD    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:0G53T45
chr1-32 0   chr1    88980623    37  100M    *   0   0   TAGTTCAGTAAACTATTTATCAAACAGGTGTCAGGTCATTTTAACATACTCCTTGCTTTGAACAATATTCATTCATACTTGGTACAAACTCTATATCCTA    B?CFDFFFHHFH3JIJJJIGJJJJJFJDEJGJ(EHFI>E=JIJ(GGJDFCH>>GJ=IHDJEHHDI>GEBJE@DD@HH'AA@ECC@BDEDDD@CDDADBDD    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:32T44T22
chr1-31 16  chr1    227005594   37  100M    *   0   0   TCACCAGGCATCTTTACTGACTCACACCAATAGTAGTACTGGGATTAGAAATAAGACGCTGCAATACTCACAACCTAGGTGAAGTTAGTTAATTTGGGAA    D@D5B=DACDDDDDBEFECBFDC5BCDDDCDFIDC8ICEIJ=DHIGHIJIJJB0HJJCDJHJGJIJI9GGHGGJ3@IJJAIJGGBGJ7HFHHDEBFF@CC    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-30 0   chr1    9852129 37  100M    *   0   0   TGTGAAATGGAGTCAGCAGAGTGAGCCGGCCTCCACTCAGTGAGCCGGGTCTCCCCCACAGCCGGCATGTGCTGACCTCCTTCCAACTGCTCTACCAAGA    CBCDDFFDHGHHHIEJ+JD+?ECDDDDB    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-29 16  chr1    156397431   37  100M    *   0   0   TCAGCCTCCCGAGTAGCTGGGATTACAGGAACCTGCCACCACGCCCGGCTAATTTTTGTATTTTCAGTTGAGACGGGGTTTCACCATGTTGCCCAGGCTG    D1D(@9DDDC@D0C3=CDDJ;FDHDD@H2BDHIDAGDDDCDIFJ9GIFGIG@?)JJHJGFGJIB7JG>'IJIJJGJ+JJGIIHFIJIDHHHFFFFFFC@B    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:68A31
chr1-28 16  chr1    56986638    37  100M    *   0   0   ACTCAGAACAGGTCTCCTTGTGGAACCATGGCCTTCCTTTTGGATCCTGGCCATGAGAGCCCATTCTTAGGAACCATGTTTCAATTCCAGTAGGTGATGT    DD)DC@CJ@A)GIFJJJHHFFHFDDFFCCC    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-27 16  chr1    172015198   37  100M    *   0   0   AGGTGTCAGTCCTCCAGCTTTGTTCTTCTTTTATATTGTGTTGGCTATCCTGGGCTCTTTGCTTCTCCATACAAAACTTAGAATCAGTTTGTTGATATCC    B8BD>/D@CCEBBEBCH,F?CCD.E;HGJBJ)IGD7HED5@6JJJCHIGHJIJFDJCIJJHGJIJJJIEF:FEJHBJ.JJJIGHBHCF2DDFFCC@    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-26 0   chr1    233336763   37  100M    *   0   0   AGATATACAGCAAAGTTTGAAAGCTACAGTTCTGAGGACCATATTTATGGATTCCTTCTTATATGTTATCTGGGTTGATATAGAAATTCTTCCATGGCTA    CBCFDFF;HHDDHHB?C9DE?DCE@D?B&5E>DDD7DD?D    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:41G58
chr1-25 0   chr1    105787069   0   100M    *   0   0   GCCATTCTAACTGGTGTGAGATGGTATCTCATTGTGGTTTTGATTTGCATTTCTCTGATGGCCAGTGATGGTGAGCATTTTTTCATGTGTTTTTTGGCTG    CCCDFFFFBGHHHHHCIGFJ:JAIGIJIJG)HCIIJGIHHJJJGEDHIHJHIII3J>JHJ?GDD?:;EFE(EDIJD?DDEAHCEDCDD?CDCF6D=>DDD    XT:A:R  NM:i:0  X0:i:52 XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-24 0   chr1    235841969   37  100M    *   0   0   GTTGGCTACTAGCTTAGCAGAGGTGCAAAACCATGAATTTCTGGTGGTATGGATTTTTTCAGCTATTTCAGATTCACCAGCAGGATCCAGCTGCTTGGGT    CCCFF?FFFHHHFGI,JEJIIG@IIBDJJIFDIEAIJB;JGADHJD,CBD@DEC;?DDHD@DCDEDDDAD?    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:25G60T13
chr1-23 0   chr1    96545358    37  100M    *   0   0   AGTGAAAAAGGCTGGCTGCCCTTCAATATCATCTTCAAATGTTAACAACACTGAATATTAATAAATTTCCTTTAGCGAATAATGAATCCAGCCTTCCTTA    C@CF+FFFGGDHGJIBDJI2JGJIHHJJII?GJJJJGIJIJJGFJJG)IJ0HD0JIFJDJDFC;D7JGFFCEDFHADCDCCDEDDEAHDDD+9?<CA2:D    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-22 0   chr1    80270679    37  100M    *   0   0   TTGTACACCCTATTTCTGACCAGAAGAAGGAGCATTTTGCTTTTTGCCAAATGAGAAGTGCATTCTGGAAACACTTGATGCCTGCACCACACCTCGAGTT    ?@CFDDFFHFHHHJJJGC7J(GI8IJJJE?HHI>BJG*IJFJIDJHD0IEJIHDI>@H=EHGIAHJ33(EJCDEDA?FDG@DDBD    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-21 0   chr1    35923261    37  100M    *   0   0   CTAAGCAGCAGTGTTTTTGGATACTTTTTTTTTCTGTTTGTGAATAAGGCCAGCACTCAAGATGGGCAGCCAAGGGTGCACTGACTATTAGCTGGCCCAT    =@@DFDFEHGHHH8JIJGJH1JJHHJIHJGH?IIFEJIIG87JI=IAJJJBJIJD(IIFI8JIHF=JDHEJHEHDDCEDCDEACDDCCADXT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-20 16  chr1    112489190   37  100M    *   0   0   AGGGAATGAACTATGCACATCTATATAGTAACAGGGACAGATTTTTTTTTAACATGAGAGTGTAAAAAAAAGAAAAAGAAAAAAAAAGGCCAGGCACAGT    DACDABD@DDDDDA7DDDC8GHI@EI(DC?FG'+8.FBDJIHIEGG=IIG=I@*DFIJJIBIIJIJIIHJCHBGFJJJI@F>HJIIIHHAHFAFDFFC@1    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:33A66
chr1-19 0   chr1    160371244   37  100M    *   0   0   AGGCCCTGGGCACAGGCAGAGAGCCCACCGGCTGGTCATGAGGGCCTCTTCCTTTCTCTGACCCAGGCACCTCGAGGGCTCTTCTCCTGGGTTCCTTCCG    @@:FDDFFCHHAHI:GEJFJGF@JJJFIC9JIIJJJ?IIEFHGJ'G?BFFBIIDIG,J)AJIHEGFBHCI&ECCD@EDD?)DED(D>3C?ABEEDDD4BD    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:80A19
chr1-18 16  chr1    179855835   37  100M    *   0   0   AGCAATTAAAATAAATTAGGGTATCTTTAAAAGTTGTAAAATTATAGCAGTGAAGTACTGTTGACCAGGCACAGTGGCTCACACCTGTAATACCAGCACT    DCEDBBDD/DD9DDD@DDFB(DDDHCHDF;C?;FJGC/IJ8DHEJ:DFGGIGHBIGIJDI(JDHGJJGJIHJII@HJJJ3JIJDIJBBHHFHFDFFF@C@    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:20T39C39
chr1-17 16  chr1    207455995   37  100M    *   0   0   GGTTCTTATGATTGGAAAGGTTAAAGAGTGACCTATAGGTCACTTTCCAATTATGAAAACAAAAAATTAAGAAATATATATATTTTCATTATTTCACTCC    CBDDDCDD:&DDCFCFDDHDEJEDCFDJ;;EHGCD;CG?DIHGGCIJJJJ-GIJ7GIFHHHCGI)JJJJIJEGJIGJJJIH@FDCC1    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-16 0   chr1    114154603   37  100M    *   0   0   TGTATCTTTCTGCTAAGCATAACAAGAAAGACAGAAAGCTCAACGGGAGGATTGAGGCTAGACTTAAAGTAGAGATCCCCTCAGAAACTGTGGAGTGAGG    CCCF8FFDHHHH4JIJIGIJIIFJHJJ?JEDI9BG?I>GHJ7FJJJIF67EIIHD2C>?>DDHDE8E7@JEJ(IFDDC;EDCC:FD>@DBC>D5D>=XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-15 16  chr1    169767580   37  100M    *   0   0   GGTGGGGGAGAGGAAAGGAAACGAGGGAGGAAAGGCCCTAATAGGGAGGATTTTGGAGTTTAGATTTTAAAATGATAAAGGTTGTTTGACACTCTAGGCA    DEDD9DDD@DD4DDDAEDDDC@D7=D;DA)7;IIJFD(J?JJDGI(IDGD7D'3JIE;H?AC@EHJJE?JJHDFJIIIECG)GGJJECHFHHFDFDFC@C    XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:45A35A18
chr1-14 0   chr1    117644126   37  100M    *   0   0   GCATTTCATTGTGGACTAATTTTCCCCCACTATTGAGGGAAGACCCTTTTGAGTACTCTATCTGATGCCCCATGAATGATAAAGTTTTATACTCTGGCTG    C?CFBDFFGHHHJI5CJ=CD9DC-HGIJDCJHHDBEDDCC&DDDBD39DBCDDDDDCD    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-13 0   chr1    104996994   37  100M    *   0   0   TTCTTGCTGGAACACATGTTTTCACCTTTACCTTCACCCACAGCCCAATGTGCATCAATATGGAGATAATGCAGTTCCATTTATACCTCTTTGTGGTTCA    =@?FFFDDHBHHHJHJIJ)JIJHBHIJ*G:CJCJJJI?G)>GJI;JD3FJ8FJFGD;DDDDFBED7C7&A(ABC9CD+DCC&DDCA    XT:A:U  NM:i:3  X0:i:1  X1:i:0  XM:i:3  XO:i:0  XG:i:0  MD:Z:18G63G12A4
chr1-12 16  chr1    108617705   37  100M    *   0   0   AGGTCGGGGAGATTGGGAAGAAGAATGAGCAAAGAAACCACCAGTGTGATCAGAGGAGGAAAGCAAAGCAGAGTCCTGTCCTGAAAACCAAATGAAGAAA    :=>+D(DCEC=@GHB(CDDDDHABDD+HBJJ9F?A35DDIE?JJHIHJJIEE?JFJ?7JBGJJI>JJGJBJIIBIJJJIIIIJGJGJHHDFHF3FFFCB@    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-11 16  chr1    72085324    37  100M    *   0   0   TACTAGCCTTGAAAATGTTTAAAATAATATTCCAGAGTTAATATTGTTGTCCCTGGTATGTTAAAGAGTATTTGTTATCATAGCCAATTCTTGAGTCTGC    8@DDCD4D>D?C3DF(DCCHDDDA;HDEIBFCHGHHHFFIFEG1JHIJIJCGEJIHJG)IH(IJ)BDJ??FHHJHCJJIFJHJJJGIGH)2HFFFA=HFG1JJFIIJJDJJIJDIGICHEFD@D3:0A/(BECDDDDCBE>BD8DDDDDC8C    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-8  0   chr1    79478960    37  100M    *   0   0   GGCAACACTTGAGAACACAAAGTGAGTTCTCACTTTGGGCGGTGGTTTCAGGCTTCAGGGTGGAGTTTTGTCAGGAACCCAACCTTTTCTGCCTAGAATT    @CCFDFDADHHAHFIH@ICJJHI5?JIJ)GCFEIJHG=II)HIGI9JJIJGHEJHFI8EIDG)GCI4FJF?I8HCDH;DD0&3CFDDDD@C4DCD6ADD>    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:81C18
chr1-7  0   chr1    178190761   37  100M    *   0   0   GTAGCCGGAATAAACAGTCACTGTGAGTTGTCCATTTTAGAGCATAGGTTTTCAGGTGGTGAAGACCTGTCCTTAGTTGAATTTGTATGTGAATTAAACT    B?<;FDDFHHHHHJEGJFCJJIFJJJA=JHHGIIGJJIIGIGJ(D:DAFG7)&DJID9J)FCD/HHJEDFIJ@D@DDADF?C@A@ADCDD@CDDD    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:52T47
chr1-6  0   chr1    42572411    37  100M    *   0   0   AACCCTTTATCAGGTATGTATTATAAACATCGACTCTGTGGCTTGCATTTTCATTCTCCTTATATATCTTTTGATGAATCAAAGTTTTTAATTTGAATAT    BCCFFFFDAH)HHJJHIGG,HFH2JIJJ4IDI93IJJ<=JJ>IH7IJIJBIBG)CFH7DHHFAHFHEDIFBEFH;EBICA?3DD5D(DDBACDC(BADD:    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:94T5
chr1-5  0   chr1    153186635   37  100M    *   0   0   GTCTTGACTCTTTATCCACTTTGCCAGTCTGTGTCTTGTAATTGGGGCATTTAGCCTATTTACATTTAAGGTTAATATTGTTATGTGTGAATTTGATCCT    C1@?FF=FHHHHHI?JJEJFIIIHG:.?>EEJEI(JG9J'IIHIJIHIJGJGJFJ9FJAG4EEC:DADE8DAEJFCCBBCDAEDDDD-DDDD@+DBC8D+    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
chr1-4  0   chr1    145038405   0   100M    *   0   0   AGTGGAAATAATACTCGTCAACATATGCCTTTCAAAAAAATTTTTTTTCATATTTTAAATTTACCTTTACTACCTATTTATTTGGTTCAAGGCTCCATTT    C:CFFFDDHFHDHJIIJJJJ29CJ+JJJIIJIIFIG?JI08?CJJIFIFDEFDGBD>JAIDJDJ>JCBG(CG=DE5?(EDB3HDD>ED2:CCHDBJA2CCHECDEFD,DDFBCHXT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:92C5A1
chr1-2  0   chr1    62477842    37  100M    *   0   0   TAGGAAAATGGAGAAACTTTAATATGAAATCTTCCTGTTTTTCACATTATGTTTAGATTGTTACAGCATAAAATTTCAGAAACATTGCAAAAAGTTTTAA    @C=FFDFDHHHH>GJ@IEJGJIIJJJJF@JHHIIGGJICIAJFJIH)H7E?GHEDI>HHFAHC@)(D>DDDEDXT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:77C0A21
chr1-1  0   chr1    11355150    37  100M    *   0   0   ATTTATTGGCTGTCTTTCAGGCACATTTTAGCTGTCATCCAACATTCTCAACCTTAGTCCCCTTCTCTGGGCTAAGGGGAGAATGATGGTCCTACCCCAG    BC?DFFFFHH7FJJJJJFDHID)JCH=3DIJ5JGDI8@@I@A=>3<:IDCA9DDFFI(FADEBDCDCCDDB(DC>    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c50Nhs20.aln
##ART_Illumina  read_length 100
@CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 50 -o G38L100c50Nhs20 -rs 1464918709
@SQ chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38  248956422
##Header End
>chr1   chr1-50 93465784    +
TTCCACAATAGTTGAACTAATTTACAGTCCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAA
TTCCACAATAGTTGAACTAATTTACAGTCCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAA
>chr1   chr1-48 228133745   +
ATCATTGTATGCCACAGAAATAATTAAATTTCCTTGTCAACTGAGACATTATTATTAGGCACTCTCACCAGATCTTTACCCATGGCCATTTAAAGTGTGG
ATCATTGTATGCCACAGAAATAATTAAATTTCCTTGTCAACTGACACATTATTATTAGGCACTCTCACCAGATCTTTACCCATGGCCATTTAAAGTGTGG
>chr1   chr1-47 13772987    +
TTCAGTAATTCAGAATAACACATGAGGGAATGAATGAATGAATAAATAAAAAAAAAATGAATGAATAAATTAAAAAAAATTGTGTTTCAGGGAAGAAAAA
TTCAGTAATTCAGAATAACACATGAGGGAATGAATGAATGAATAAATAAAAAAAAACTGAATGAATAAATTACAAAAAATTGTGTTTCAGGGAAGAAAAA
>chr1   chr1-46 211481565   -
ATCCCGATCAGGAACCTGGTGCTCTTCCCCGGCTCTCCCAACTTGGGCAAAGTTGGAGGGGTCCTGAGGCCCGGCGGGCCGGGGAACAGGACCCCGACCC
ATCCCGATCAGGAACCTGGTGCTCTTCCCCGGCTCTCCCAACTTGGGCAAAGTTGGAGGGGTCCTGAGGCCCGGCGGACCGGGGAACAGGACCCCGACCC
>chr1   chr1-45 29056656    +
CTGGGATTACAGGTGCCCGCCACCATGCCCAGCTAATTTTTGTATTTTTGGTAGAGACAAGGTTTCACCATGTTGGCCGGGATTGTCTCGAACTCCTGAT
CTGGGATTACAGGTGCCCGCCACCATGCCCAGCTAATTTTTGTATTTTTGGTAGAGACAAGGTTTCACCATGTTGGCCGGGATTGTCTCGAACTCCTGAT
>chr1   chr1-44 49993892    +
CAATTTAGCCAAAACTGGCTAATCCTTTTACCAGAATCATTCCCATTGTTCAAGACCTATTTTAAGCTCCACTATCACCATAAAACTTTCCCGATCAGTT
CAATTTAGCCAAAACTGGCTAATCGTTTTACCAGAATCATTCCCATTGTTCAAGACCTATTTTAAGCTCCACTATCACCATAAAACTTTCCCGATCAGTT
>chr1   chr1-43 54241817    -
TTGCAGACCTTTTATTGATTGTGAAATAACCCAAGAGAACACACATGGAACCATTTTTAAAGTATCATCAAATTTAAATATGATATTATTAAAACATATT
TTGCAGACCTTTTATTGATTGTGAAATAACCCAAGAGAACACACATGGAACCATTTTTAAAGTATCATCAAATTTAAATATGATATTATTAAAACATATT
>chr1   chr1-42 35706202    +
CAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAGGCACGTGCCACCATGCCTGGCAATTTTTGTATTTTTAGTACAGATGGGG
CAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAGGCACGTGCCACCATGCCTGGCAATTTTTGTATTTTTAGTACAGATGGGG
>chr1   chr1-41 92473985    -
GGTGGGCCCATACACACAGGCACCAGGATATACCCCCGCCTGCATCCACATGGACACGTGCCCACGTTGCATGTACACGCAGACCTGCCTATGCACACAC
GGTGGGCCCATACACACAGGCACCAGGATATACCCCCGCCTGCATCCACATGGACACGTGCCCACGTTGCATGTACACGCAGACCTGCCTATGCACACAC
>chr1   chr1-40 27177039    -
GAACCATTTGGCTGAGCCCAGCACTCTGCCCATGGTCCATGCACCCCTTGACCCTGACATCAGCAAAGCAGTCCCCATCACCAAAGTGCTATGTGCCATG
GAACCATTTGGCTGAGCCCAGCACTCTGCCCATGGTCCATGCACCCCTTGACCCTGACATTAGCAAAGCAGTCCCCATCACCAAAGTGCTATGTGCCATG
>chr1   chr1-39 245060718   -
TCTTGACATTTTGGAGTTGACACCTTGGAGGTGGTTAATCCATGACGAAACTGAAGCTAAGCCGATCTCATGTTTTGACTCCTGTCAATCCGGAGAGGAC
TCTTGACATTTTGGAGTTGACACCTTGGAGGTGGTTAATCCATGACGAAACTGAAGCTAAGCCGATCTCATGTTTTGACTCCTGTCAATCCGGAGAGGAC
>chr1   chr1-38 215781397   -
TACATATACATATACATATACATACATGTGTGTGTGCATATATATGTATATGTGTGTATATATATATATATATATATATACACACATATATGTATGTGTG
CACATATACATATACATATACATACATGTGTGTGTGCATATATATGTATATGTGTGTATATATATATATATATATATATACACACATATATGTATGTGTG
>chr1   chr1-37 42831546    -
AACAACGTATGTCCACACAAAAACTTGCACATGAATGTTCACTAGCAGCATTATTTGTAACCTGCCCAAGGTGGAAACAACCCAAATGTCTATTGACTGA
AACAACGTATGTCCACACAAAAACTTGCACATGAATGTTCACTAGCAGCATTATTTGTAACCTGCCCAAGGTGGAAACAACCCAAATGTCTATTGACTGA
>chr1   chr1-36 181673625   +
TCCACTGCCCAGAAAGAGGACATCCCTTATAGGGCCAGCGGATGGAAGCCATGGGCTGGGCAGGACATTCCTGTCCCAACCCACATGGCAGCTAGAGTCC
TCCACTGCCCAGAAAGAGGACATCCCTTATAGGACCAGCGGATGGAAGCCATGGGCTGGGCAGGACATTCCTGTCCCAACCCACATGGCAGCTAGAGTCC
>chr1   chr1-35 96851543    -
GTTGTGACCTCCCAACCCCCACAGAGGTTCACGTGTTGAAGTCTTAACCCTCAGTACCTCAGAATGTAATCATATTTGAAGATATGGTATTTATAGAGGT
GTTGTGACCTCCCAACCCCCACAGAGGTTCACGTGTTGAAGTCTTAACCCTCAGTACCTCAGAATGTAATCATATTTGAAGATATTGTATTTATAGAGGT
>chr1   chr1-34 13267476    +
ATACAGGGAAGGTTTTAATCTGTTTCAGACATTAGAAATACATATATTTATATATGGTATCTTTATTGGAGAACCTTTGGCCACATCAAAAGTATCAAAA
ATACAGGGAAGGTTTTAATCTGTTTCAGACATTAGAAATACATATATTTATATATGGTATCTTTATTGGAGAACCTTTGGCCACATCAAAAGTATCAAAA
>chr1   chr1-33 48968232    +
GAATAGTAGGCAATAAACAAAGAGAGCAACTTAGGAGCCAGATCACATGTGGCCTCTCGAGCAATATGGTAAAAGTTCTGGACTTCATTCTAGGTGAATG
TAATAGTAGGCAATAAACAAAGAGAGCAACTTAGGAGCCAGATCACATGTGGCCGCTCGAGCAATATGGTAAAAGTTCTGGACTTCATTCTAGGTGAATG
>chr1   chr1-32 88980622    +
TAGTTCAGTAAACTATTTATCAAACAGGTGTCTGGTCATTTTAACATACTCCTTGCTTTGAACAATATTCATTCATATTTGGTACAAACTCTATATCCTA
TAGTTCAGTAAACTATTTATCAAACAGGTGTCAGGTCATTTTAACATACTCCTTGCTTTGAACAATATTCATTCATACTTGGTACAAACTCTATATCCTA
>chr1   chr1-31 21950729    -
TTCCCAAATTAACTAACTTCACCTAGGTTGTGAGTATTGCAGCGTCTTATTTCTAATCCCAGTACTACTATTGGTGTGAGTCAGTAAAGATGCCTGGTGA
TTCCCAAATTAACTAACTTCACCTAGGTTGTGAGTATTGCAGCGTCTTATTTCTAATCCCAGTACTACTATTGGTGTGAGTCAGTAAAGATGCCTGGTGA
>chr1   chr1-30 9852128 +
TGTGAAATGGAGTCAGCAGAGTGAGCCGGCCTCCACTCAGTGAGCCGGGTCTCCCCCACAGCCGGCATGTGCTGACCTCCTTCCAACTGCTCTACCAAGA
TGTGAAATGGAGTCAGCAGAGTGAGCCGGCCTCCACTCAGTGAGCCGGGTCTCCCCCACAGCCGGCATGTGCTGACCTCCTTCCAACTGCTCTACCAAGA
>chr1   chr1-29 92558892    -
CAGCCTGGGCAACATGGTGAAACCCCGTCTCTACTGAAAATACAAAAATTAGCCGGGCGTGGTGGCAGGTTCCTGTAATCCCAGCTACTCGGGAGGCTGA
CAGCCTGGGCAACATGGTGAAACCCCGTCTCAACTGAAAATACAAAAATTAGCCGGGCGTGGTGGCAGGTTCCTGTAATCCCAGCTACTCGGGAGGCTGA
>chr1   chr1-28 191969685   -
ACATCACCTACTGGAATTGAAACATGGTTCCTAAGAATGGGCTCTCATGGCCAGGATCCAAAAGGAAGGCCATGGTTCCACAAGGAGACCTGTTCTGAGT
ACATCACCTACTGGAATTGAAACATGGTTCCTAAGAATGGGCTCTCATGGCCAGGATCCAAAAGGAAGGCCATGGTTCCACAAGGAGACCTGTTCTGAGT
>chr1   chr1-27 76941125    -
GGATATCAACAAACTGATTCTAAGTTTTGTATGGAGAAGCAAAGAGCCCAGGATAGCCAACACAATATAAAAGAAGAACAAAGCTGGAGGACTGACACCT
GGATATCAACAAACTGATTCTAAGTTTTGTATGGAGAAGCAAAGAGCCCAGGATAGCCAACACAATATAAAAGAAGAACAAAGCTGGAGGACTGACACCT
>chr1   chr1-26 233336762   +
AGATATACAGCAAAGTTTGAAAGCTACAGTTCTGAGGACCAGATTTATGGATTCCTTCTTATATGTTATCTGGGTTGATATAGAAATTCTTCCATGGCTA
AGATATACAGCAAAGTTTGAAAGCTACAGTTCTGAGGACCATATTTATGGATTCCTTCTTATATGTTATCTGGGTTGATATAGAAATTCTTCCATGGCTA
>chr1   chr1-25 96853884    +
GCCATTCTAACTGGTGTGAGATGGTATCTCATTGTGGTTTTGATTTGCATTTCTCTGATGGCCAGTGATGGTGAGCATTTTTTCATGTGTTTTTTGGCTG
GCCATTCTAACTGGTGTGAGATGGTATCTCATTGTGGTTTTGATTTGCATTTCTCTGATGGCCAGTGATGGTGAGCATTTTTTCATGTGTTTTTTGGCTG
>chr1   chr1-24 235841968   +
GTTGGCTACTAGCTTAGCAGAGGTGGAAAACCATGAATTTCTGGTGGTATGGATTTTTTCAGCTATTTCAGATTCACCAGCAGGATTCAGCTGCTTGGGT
GTTGGCTACTAGCTTAGCAGAGGTGCAAAACCATGAATTTCTGGTGGTATGGATTTTTTCAGCTATTTCAGATTCACCAGCAGGATCCAGCTGCTTGGGT
>chr1   chr1-23 96545357    +
AGTGAAAAAGGCTGGCTGCCCTTCAATATCATCTTCAAATGTTAACAACACTGAATATTAATAAATTTCCTTTAGCGAATAATGAATCCAGCCTTCCTTA
AGTGAAAAAGGCTGGCTGCCCTTCAATATCATCTTCAAATGTTAACAACACTGAATATTAATAAATTTCCTTTAGCGAATAATGAATCCAGCCTTCCTTA
>chr1   chr1-22 80270678    +
TTGTACACCCTATTTCTGACCAGAAGAAGGAGCATTTTGCTTTTTGCCAAATGAGAAGTGCATTCTGGAAACACTTGATGCCTGCACCACACCTCGAGTT
TTGTACACCCTATTTCTGACCAGAAGAAGGAGCATTTTGCTTTTTGCCAAATGAGAAGTGCATTCTGGAAACACTTGATGCCTGCACCACACCTCGAGTT
>chr1   chr1-21 35923260    +
CTAAGCAGCAGTGTTTTTGGATACTTTTTTTTTCTGTTTGTGAATAAGGCCAGCACTCAAGATGGGCAGCCAAGGGTGCACTGACTATTAGCTGGCCCAT
CTAAGCAGCAGTGTTTTTGGATACTTTTTTTTTCTGTTTGTGAATAAGGCCAGCACTCAAGATGGGCAGCCAAGGGTGCACTGACTATTAGCTGGCCCAT
>chr1   chr1-20 136467133   -
ACTGTGCCTGGCCTTTTTTTTTCTTTTTCTTTTTTTTACACTCTCATGTTAAAAAAAAATCTGTCCTTGTTACTATATAGATGTGCATAGTTCATTCCCT
ACTGTGCCTGGCCTTTTTTTTTCTTTTTCTTTTTTTTACACTCTCATGTTAAAAAAAAATCTGTCCCTGTTACTATATAGATGTGCATAGTTCATTCCCT
>chr1   chr1-19 160371243   +
AGGCCCTGGGCACAGGCAGAGAGCCCACCGGCTGGTCATGAGGGCCTCTTCCTTTCTCTGACCCAGGCACCTCGAGGGCTATTCTCCTGGGTTCCTTCCG
AGGCCCTGGGCACAGGCAGAGAGCCCACCGGCTGGTCATGAGGGCCTCTTCCTTTCTCTGACCCAGGCACCTCGAGGGCTCTTCTCCTGGGTTCCTTCCG
>chr1   chr1-18 69100488    -
AGTGCTGGTATTACAGGTGTGAGCCACTGTGCCTGGTCAGCAGTACTTCACTGCTATAATTTTACAACTTTTAAAGATAACCTAATTTATTTTAATTGCT
AGTGCTGGTATTACAGGTGTGAGCCACTGTGCCTGGTCAACAGTACTTCACTGCTATAATTTTACAACTTTTAAAGATACCCTAATTTATTTTAATTGCT
>chr1   chr1-17 41500328    -
GGAGTGAAATAATGAAAATATATATATTTCTTAATTTTTTGTTTTCATAATTGGAAAGTGACCTATAGGTCACTCTTTAACCTTTCCAATCATAAGAACC
GGAGTGAAATAATGAAAATATATATATTTCTTAATTTTTTGTTTTCATAATTGGAAAGTGACCTATAGGTCACTCTTTAACCTTTCCAATCATAAGAACC
>chr1   chr1-16 114154602   +
TGTATCTTTCTGCTAAGCATAACAAGAAAGACAGAAAGCTCAACGGGAGGATTGAGGCTAGACTTAAAGTAGAGATCCCCTCAGAAACTGTGGAGTGAGG
TGTATCTTTCTGCTAAGCATAACAAGAAAGACAGAAAGCTCAACGGGAGGATTGAGGCTAGACTTAAAGTAGAGATCCCCTCAGAAACTGTGGAGTGAGG
>chr1   chr1-15 79188743    -
TGCCTAGAGTGTCAAACATCCTTTATCATTTTAAAATCTAAACTCCAAAATCCTTCCTATTAGGGCCTTTCCTCCCTCGTTTCCTTTCCTCTCCCCCACC
TGCCTAGAGTGTCAAACAACCTTTATCATTTTAAAATCTAAACTCCAAAATCCTCCCTATTAGGGCCTTTCCTCCCTCGTTTCCTTTCCTCTCCCCCACC
>chr1   chr1-14 117644125   +
GCATTTCATTGTGGACTAATTTTCCCCCACTATTGAGGGAAGACCCTTTTGAGTACTCTATCTGATGCCCCATGAATGATAAAGTTTTATACTCTGGCTG
GCATTTCATTGTGGACTAATTTTCCCCCACTATTGAGGGAAGACCCTTTTGAGTACTCTATCTGATGCCCCATGAATGATAAAGTTTTATACTCTGGCTG
>chr1   chr1-13 104996993   +
TTCTTGCTGGAACACATGGTTTCACCTTTACCTTCACCCACAGCCCAATGTGCATCAATATGGAGATAATGCAGTTCCATTTGTACCTCTTTGTGATTCA
TTCTTGCTGGAACACATGTTTTCACCTTTACCTTCACCCACAGCCCAATGTGCATCAATATGGAGATAATGCAGTTCCATTTATACCTCTTTGTGGTTCA
>chr1   chr1-12 140338618   -
TTTCTTCATTTGGTTTTCAGGACAGGACTCTGCTTTGCTTTCCTCCTCTGATCACACTGGTGGTTTCTTTGCTCATTCTTCTTCCCAATCTCCCCGACCT
TTTCTTCATTTGGTTTTCAGGACAGGACTCTGCTTTGCTTTCCTCCTCTGATCACACTGGTGGTTTCTTTGCTCATTCTTCTTCCCAATCTCCCCGACCT
>chr1   chr1-11 176870999   -
GCAGACTCAATAATTGGCTATGATAACAAATACTCTTTCACATACCAGGGACAACAATATTAACTCTGGAATATTATTTTAAACATTTTCAAGGCTAGTA
GCAGACTCAAGAATTGGCTATGATAACAAATACTCTTTAACATACCAGGGACAACAATATTAACTCTGGAATATTATTTTAAACATTTTCAAGGCTAGTA
>chr1   chr1-10 34644993    -
AGTCCAGAATTCTGAGTCTGTGAGTTTACACCTTCCAACAGTGATAATCAGATATCAAGCTTGAAGACTACCAACAAAAGTGGACCAAATAGGGATCATC
AGTCCAGAATTCTGAGTCTGTGAGTTTACACCTTCCAACAGTGATAATCAGATATCAAGCTTGAAGACTACCAACAAAAGTGGACCAAATAGGGATCACC
>chr1   chr1-9  152012628   +
AAGCAATTCTCTTGCTTTAGCCTCCCGAGAAGCTCGGATTACAGGCATGTCCACCACACCCAGCTAATTCTTTTGTATTTTTAGTAGACATGGGGTTTTG
AAGCAATTCTCTTGCTTTAGCCTCCCGAGAAGCTCGGATTACAGGCATGTCCACCACACCCAGCTAATTCTTTTGTATTTTTAGTAGACATGGGGTTTTG
>chr1   chr1-8  79478959    +
GGCAACACTTGAGAACACAAAGTGAGTTCTCACTTTGGGCGGTGGTTTCAGGCTTCAGGGTGGAGTTTTGTCAGGAACCCACCCTTTTCTGCCTAGAATT
GGCAACACTTGAGAACACAAAGTGAGTTCTCACTTTGGGCGGTGGTTTCAGGCTTCAGGGTGGAGTTTTGTCAGGAACCCAACCTTTTCTGCCTAGAATT
>chr1   chr1-7  178190760   +
GTAGCCGGAATAAACAGTCACTGTGAGTTGTCCATTTTAGAGCATAGGTTTTTAGGTGGTGAAGACCTGTCCTTAGTTGAATTTGTATGTGAATTAAACT
GTAGCCGGAATAAACAGTCACTGTGAGTTGTCCATTTTAGAGCATAGGTTTTCAGGTGGTGAAGACCTGTCCTTAGTTGAATTTGTATGTGAATTAAACT
>chr1   chr1-6  42572410    +
AACCCTTTATCAGGTATGTATTATAAACATCGACTCTGTGGCTTGCATTTTCATTCTCCTTATATATCTTTTGATGAATCAAAGTTTTTAATTTTAATAT
AACCCTTTATCAGGTATGTATTATAAACATCGACTCTGTGGCTTGCATTTTCATTCTCCTTATATATCTTTTGATGAATCAAAGTTTTTAATTTGAATAT
>chr1   chr1-5  153186634   +
GTCTTGACTCTTTATCCACTTTGCCAGTCTGTGTCTTGTAATTGGGGCATTTAGCCTATTTACATTTAAGGTTAATATTGTTATGTGTGAATTTGATCCT
GTCTTGACTCTTTATCCACTTTGCCAGTCTGTGTCTTGTAATTGGGGCATTTAGCCTATTTACATTTAAGGTTAATATTGTTATGTGTGAATTTGATCCT
>chr1   chr1-4  127714516   -
AGTGGAAATAATACTCGTCAACATATGCCTTTCAAAAAAATTTTTTTTCATATTTTAAATTTACCTTTACTACCTATTTATTTGGTTCAAGGCTCCATTT
AGTGGAAATAATACTCGTCAACATATGCCTTTCAAAAAAATTTTTTTTCATATTTTAAATTTACCTTTACTACCTATTTATTTGGTTCAAGGCTCCATTT
>chr1   chr1-3  84685020    +
TATTATTAAAACTATAAATGGACCAATTAAACAAACGTGTCATGAGCCAAGGAATATAAACTAATTCTTTACACCTGAAGTCCTTTAAAATGCTTTAAAT
TATTATTAAAACTATAAATGGACCAATTAAACAAACGTGTCATGAGCCAAGGAATATAAACTAATTCTTTACACCTGAAGTCCTTTAAAATGATTTAATT
>chr1   chr1-2  62477841    +
TAGGAAAATGGAGAAACTTTAATATGAAATCTTCCTGTTTTTCACATTATGTTTAGATTGTTACAGCATAAAATTTCCAAAACATTGCAAAAAGTTTTAA
TAGGAAAATGGAGAAACTTTAATATGAAATCTTCCTGTTTTTCACATTATGTTTAGATTGTTACAGCATAAAATTTCAGAAACATTGCAAAAAGTTTTAA
>chr1   chr1-1  11355149    +
ATTTATTGGCTGTCTTTCAGGCACATTTTAGCTGTCATCCAACATTCTCAACCTTAGTCCCCTTCTCTGGGCTAAGGGGAGAATGATGGTCCTACCCCAG
ATTTATTGGCTGTCTTTCAGGCACATTTTAGCTGTCATCCAACATTCTCAACCTTAGTCCCCTTCTCTGGGCTAAGGGGAGAATGATGGTCCTACCCCAG

你可能感兴趣的:(基因数据处理)