Adam学习之7代码kmer.scala完善(统计和SaveAsFile)
代码:
package testAdam import org.apache.spark._ import org.bdgenomics.adam.rdd.ADAMContext import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection} import java.text.SimpleDateFormat import java.util._; object kmer { def main(args:Array[String]){ val conf=new SparkConf().setAppName("test Adam kmer").setMaster("local") // val conf=new SparkConf().setAppName("test Adam kmer").setMaster("local") // val conf=new SparkConf().setAppName("test Adam kmer") val sc=new SparkContext(conf) val ac = new ADAMContext(sc) // Load alignments from disk //val reads = ac.loadAlignments("/data/NA21144.chrom11.ILLUMINA.adam", //val reads = ac.loadAlignments("/xubo/adam/output/small.adam", val reads = ac.loadAlignments("hdfs://<strong>Master</strong>:9000/xubo/adam/output/small.adam", projection = Some(Projection(AlignmentRecordField.sequence,AlignmentRecordField.readMapped,AlignmentRecordField.mapq))) // Generate, count and sort 21-mers val kmers =reads.flatMap(_.getSequence.sliding(21).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false) kmers.take(10).foreach(println) // Print the top 10 most common 21-mers //SaveAsFile val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() ) val soutput="hdfs://<span style="font-size: 13.3333px; font-family: Arial, Helvetica, sans-serif;"><strong>Master</strong></span><span style="font-size: 12px; font-family: Arial, Helvetica, sans-serif;">:9000/xubo/adam/output/kmer/"+iString+"/smallkmers21.adam";</span> println("kmers.count(reduceByKey):"+kmers.count) kmers.saveAsTextFile(soutput) val sum0=for((a,b)<-kmers) yield a println("kmers.count(no reduce):"+sum0.sum) sc.stop() } }
Master需要是真实IP
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/G:/149/jar%e9%87%8d%e8%a6%81/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/D:/1win7/java/otherJar/adam-cli_2.10-0.18.3-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2016-03-07 11:13:28 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-03-07 11:13:34 WARN MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set. 2016-03-07 11:13:38 WARN :139 - Your hostname, xubo-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:16c%17, but we couldn't find any external IP address! SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. (4,TCTTTCTTTCTTTCTTTCTTT) (4,TTTCTTTCTTTCTTTCTTTCT) (3,CTTTCTTTCTTTCTTTCTTTC) (3,TTCTTTCTTTCTTTCTTTCTT) (2,TCTTTTTCTTTCTTTCTTTCT) (2,TTCTTTTTCTTTCTTTCTTTC) (2,TTTCTTTTTCTTTCTTTCTTT) (1,ATTGGATATCCTCCCAAATTT) (1,AGGCATGAGGCACCGCGCCTG) (1,CTACTGCCCAACAAGTCCCTA) kmers.count(reduceByKey):1087 kmers.count(no reduce):1100.0 2016-3-7 11:14:15 INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 1 2016-3-7 11:14:16 WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 20 records. 2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block 2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 69 ms. row count = 20