Adam学习2之adam-shell使用
环境:
集群:Ubuntu14.04 +Spark 1.5.2 +scala2.10
//本地:window7 64 +eclipse4.3.2+scala2.10.4
代码:
import org.bdgenomics.adam.rdd.ADAMContext import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection} val ac = new ADAMContext(sc) // Load alignments from disk val reads = ac.loadAlignments("/xubo/adam/output/small.adam", projection = Some( Projection( AlignmentRecordField.sequence, AlignmentRecordField.readMapped, AlignmentRecordField.mapq ) ) ) // Generate, count and sort 21-mers val kmers =reads.flatMap(_.getSequence.sliding(21).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false) // Print the top 10 most common 21-mers kmers.take(10).foreach(println)
路径:hadoop@Mcnode1:~/cloud/adam/xubo/testAdam34/kmer.scala
运行结果:
hadoop@Mcnode1:~/cloud/adam/xubo/testAdam34$ adam-shell -i kmer.scala -i kmer.scala --jars /home/hadoop/cloud/adam/adam-cli/target/adam-cli_2.10-0.18.3-SNAPSHOT.jar Using SPARK_SHELL=/home/hadoop/cloud/spark-1.5.2//bin/spark-shell Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. Loading kmer.scala... import org.bdgenomics.adam.rdd.ADAMContext import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection} ac: org.bdgenomics.adam.rdd.ADAMContext = org.bdgenomics.adam.rdd.ADAMContext@31264ff reads: org.apache.spark.rdd.RDD[org.bdgenomics.formats.avro.AlignmentRecord] = MapPartitionsRDD[1] at map at ADAMContext.scala:167 kmers: org.apache.spark.rdd.RDD[(Long, String)] = ShuffledRDD[5] at sortByKey at <console>:27 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. (4,TCTTTCTTTCTTTCTTTCTTT) (4,TTTCTTTCTTTCTTTCTTTCT) (3,CTTTCTTTCTTTCTTTCTTTC) (3,TTCTTTCTTTCTTTCTTTCTT) (2,TCTTTTTCTTTCTTTCTTTCT) (2,TTCTTTTTCTTTCTTTCTTTC) (2,TTTCTTTTTCTTTCTTTCTTT) (1,ATTGGATATCCTCCCAAATTT) (1,AGGCATGAGGCACCGCGCCTG) (1,CTACTGCCCAACAAGTCCCTA)
参考
【1】 https://github.com/bigdatagenomics/adam