基因数据处理1之mapping_to_cram
参考资料:
We will use the first 100,000 read-pairs from a yeast data set.
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta
We need to ensure there exists a .fai fasta index and also indices for whichever aligner we are using (Bwa-mem in this example).
samtools faidx yeast.fasta
bwa index yeast.fasta
The aligner is likely to output SAM in the same order or similar order to the input fastq files. It won’t be outputting in chromosome position order, so the output is typically not well suited to CRAM.
bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam
The -R option adds a read-group line and applies that read-group to all aligned sequence records. It is not necessary, but a recommended practice.
Ideally at this point we would be outputting CRAM directly, but at present samtools 1.0 does not have a way to indicate the reference on the command line. We can output to BAM instead and convert (below), or modify the SAM @SQ header to include MD5 sums in the M5: field.
samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam
The “-l 0” indicates to use no compression in the BAM file, as it is transitory and will be replaced by CRAM soon. We may wish to use -l 1 if disk space is short and we wish to reduce temporary file size.
samtools view -T yeast.fasta -C -o yeast.cram yeast.bam
Note that since the BAM file did not have M5 tags for the reference sequences, they are computed by Samtools and added to the CRAM. In a production environment, this step can be avoided by ensuring that the M5 tags are already in the SAM/BAM header.
The last 3 steps can be combined into a pipeline to reduce disk I/O:
bwa mem yeast.fasta y1.fastq y2.fastq | \
samtools sort -O bam -l 0 -T /tmp - | \
samtools view -T yeast.fasta -C -o yeast.cram -
See the variant calling workflow for more advanced examples.
samtools view yeast.cram
samtools mpileup -f yeast.fasta yeast.cram
有问题:
改正:原下载没有夹gzip -d
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta
参考【1】中第三个没有加 |gzip -d运行记录:
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 537M 0 842k 0 0 5753 0 27:13:40 0:02:30 27:11:10 23013 curl: (23) Failed writing body (1496 != 4344) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 559M 0 932k 0 0 6937 0 23:28:57 0:02:17 23:26:40 193k curl: (23) Failed writing body (11536 != 16384) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3782k 100 3782k 0 0 18197 0 0:03:32 0:03:32 --:--:-- 47280
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 537M 0 848k 0 0 6525 0 24:00:23 0:02:13 23:58:10 188k
curl: (23) Failed writing body (1000 != 1448)
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 559M 0 937k 0 0 6989 0 23:18:28 0:02:17 23:16:11 196k
curl: (23) Failed writing body (6872 != 16384)
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3782k 100 3782k 0 0 26203 0 0:02:27 0:02:27 --:--:-- 254k
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls
y1.fastq y2.fastq yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ cd ../
hadoop@Mcnode6:~/cloud/adam/xubo$ ls
1000genomes autoconf-archive bwa-0.7.12 htslib resources samtools yeast yeast201603101121 yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo$ cp yeast
yeast/ yeast201603101121/ yeast201603101125/
hadoop@Mcnode6:~/cloud/adam/xubo$ cp yeast201603101125/ yeast0
cp: omitting directory ‘yeast201603101125/’
hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast
yeast/ yeast201603101121/ yeast201603101125/
hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast0
hadoop@Mcnode6:~/cloud/adam/xubo$ ls
1000genomes autoconf-archive bwa-0.7.12 htslib resources samtools yeast yeast0 yeast201603101121 yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo$ cd yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls
y1.fastq y2.fastq yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 18M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:53 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools faidx yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa index yeast.fasta
[bwa_index] Pack FASTA... 0.11 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 7.49 seconds elapse.
[bwa_index] Update BWT... 0.11 sec
[bwa_index] Pack forward-only FASTA... 0.07 sec
[bwa_index] Construct SA from BWT and Occ... 2.94 sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa index yeast.fasta
[main] Real time: 13.239 sec; CPU: 10.718 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 38M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:58 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 50000 sequences (1800000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (103, 4007, 13219, 91)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (240, 2467, 3114)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8862)
[M::mem_pestat] mean and std.dev: (1893.02, 1414.42)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11736)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (243, 275, 314)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (101, 456)
[M::mem_pestat] mean and std.dev: (278.34, 50.30)
[M::mem_pestat] low and high boundaries for proper pairs: (30, 527)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (3111, 3320, 3565)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (2203, 4473)
[M::mem_pestat] mean and std.dev: (3352.46, 321.09)
[M::mem_pestat] low and high boundaries for proper pairs: (1749, 4927)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (291, 2611, 3188)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8982)
[M::mem_pestat] mean and std.dev: (2036.04, 1501.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11879)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 50000 reads in 7.764 CPU sec, 8.119 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem -R @RG\tID:foo\tSM:bar\tLB:library1 yeast.fasta y1.fastq y2.fastq
[main] Real time: 8.523 sec; CPU: 7.867 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll
total 47020
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 11:59 ./
drwxrwxr-x 12 hadoop hadoop 4096 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1341 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12157188 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 3039278 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 6078608 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8366688 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 46M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:59 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 53M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:00 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view -T yeast.fasta -C -o yeast.cram yeast.bam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 54M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 945K 3月 10 12:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem yeast.fasta y1.fastq y2.fastq | \
> samtools sort -O bam -l 0 -T /tmp - | \
> samtools view -T yeast.fasta -C -o yeast.cram -
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 50000 sequences (1800000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (103, 4007, 13219, 91)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (240, 2467, 3114)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8862)
[M::mem_pestat] mean and std.dev: (1893.02, 1414.42)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11736)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (243, 275, 314)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (101, 456)
[M::mem_pestat] mean and std.dev: (278.34, 50.30)
[M::mem_pestat] low and high boundaries for proper pairs: (30, 527)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (3111, 3320, 3565)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (2203, 4473)
[M::mem_pestat] mean and std.dev: (3352.46, 321.09)
[M::mem_pestat] low and high boundaries for proper pairs: (1749, 4927)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (291, 2611, 3188)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8982)
[M::mem_pestat] mean and std.dev: (2036.04, 1501.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11879)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 50000 reads in 7.820 CPU sec, 8.093 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem yeast.fasta y1.fastq y2.fastq
[main] Real time: 8.217 sec; CPU: 7.886 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 54M
drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 945K 3月 10 12:03 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l
50000
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10
SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC 15=@9:@C3GGGDGGBGGGGGCIIIIGIIIIIEIIGIIIII AS:i:36 XS:i:26 MD:Z:36 NM:i:0
SRR507778.11604 83 I 402 60 36M = 3869 3433 CTCACTTGTATACTGATTTTACGTACGCACACGGAT GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.10609 83 I 2661 40 36M = 6131 3436 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII AS:i:36 XS:i:36 MD:Z:36 NM:i:0
SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT GGGGADIGIHIIHHHIEHIHIHI5833+?=8>B@FBF?9B7AGGGB AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20233 83 I 3132 60 36M = 6388 3222 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGGGGBD4DDGGEDGDDG@GAA1CBEEEE3D AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG EGBG8GCB8BBBB####################### AS:i:0 XS:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
1381594
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
I 62 a 1 ^], 1
I 63 T 1 , 5
I 64 C 1 , =
I 65 C 1 , @
I 66 T 1 , 9
I 67 A 1 , :
I 68 A 1 , @
I 69 C 1 , C
I 70 A 1 , 3
I 71 C 1 , <
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll
total 54728
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop 4096 3月 10 11:57 ../
-rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:27 y1.fastq
-rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:29 y2.fastq
-rw-rw-r-- 1 hadoop hadoop 6919225 3月 10 12:00 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 12:03 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 11:55 yeast.fasta
-rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb
-rw-rw-r-- 1 hadoop hadoop 1341 3月 10 11:58 yeast.fasta.ann
-rw-rw-r-- 1 hadoop hadoop 12157188 3月 10 11:58 yeast.fasta.bwt
-rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai
-rw-rw-r-- 1 hadoop hadoop 3039278 3月 10 11:58 yeast.fasta.pac
-rw-rw-r-- 1 hadoop hadoop 6078608 3月 10 11:58 yeast.fasta.sa
-rw-rw-r-- 1 hadoop hadoop 8366688 3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l
50000
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10
SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC 15=@9:@C3GGGDGGBGGGGGCIIIIGIIIIIEIIGIIIII AS:i:36 XS:i:26 MD:Z:36 NM:i:0
SRR507778.11604 83 I 402 60 36M = 3869 3433 CTCACTTGTATACTGATTTTACGTACGCACACGGAT GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.10609 83 I 2661 40 36M = 6131 3436 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII AS:i:36 XS:i:36 MD:Z:36 NM:i:0
SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT GGGGADIGIHIIHHHIEHIHIHI5833+?=8>B@FBF?9B7AGGGB AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20233 83 I 3132 60 36M = 6388 3222 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGGGGBD4DDGGEDGDDG@GAA1CBEEEE3D AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG EGBG8GCB8BBBB####################### AS:i:0 XS:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
1381594
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
I 62 a 1 ^], 1
I 63 T 1 , 5
I 64 C 1 , =
I 65 C 1 , @
I 66 T 1 , 9
I 67 A 1 , :
I 68 A 1 , @
I 69 C 1 , C
I 70 A 1 , 3
I 71 C 1 , <
【1】 http://www.htslib.org/workflow/#mapping_to_cram
【2】 http://www.htslib.org/doc/samtools.html
下载文件:
指令:
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta
运行:
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 537M 0 842k 0 0 5753 0 27:13:40 0:02:30 27:11:10 23013
curl: (23) Failed writing body (1496 != 4344)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 559M 0 932k 0 0 6937 0 23:28:57 0:02:17 23:26:40 193k
curl: (23) Failed writing body (11536 != 16384)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3782k 100 3782k 0 0 18197 0 0:03:32 0:03:32 --:--:-- 47280