更多代码请见:https://github.com/xubo245/SparkLearning
1解释
connectedComponents源码:返回连接成分的顶点值:包含顶点Id,属性没了
/** * Compute the connected component membership of each vertex and return a graph with the vertex * value containing the lowest vertex id in the connected component containing that vertex. * * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] */ def connectedComponents(): Graph[VertexId, ED] = { ConnectedComponents.run(graph) }
mask源码:返回的是current graph和other graph的公共子图
/** * Restricts the graph to only the vertices and edges that are also in `other`, but keeps the * attributes from this graph. * @param other the graph to project this graph onto * @return a graph with vertices and edges that exist in both the current graph and `other`, * with vertex and edge data from the current graph */ def mask[VD2: ClassTag, ED2: ClassTag](other: Graph[VD2, ED2]): Graph[VD, ED]
2.代码:
/** * @author xubo * ref http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html * time 20160503 */ package org.apache.spark.graphx.learning import org.apache.spark._ import org.apache.spark.graphx._ // To make some of the examples work we will also need RDD import org.apache.spark.rdd.RDD object GraphOperatorsStructuralMask { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("GraphOperatorsStructuralMask").setMaster("local[4]") // Assume the SparkContext has already been constructed val sc = new SparkContext(conf) // Create an RDD for the vertices val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")), (4L, ("peter", "student")))) // Create an RDD for edges val relationships: RDD[Edge[String]] = sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"), Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague"))) // Define a default user in case there are relationship with missing user val defaultUser = ("John Doe", "Missing") // Build the initial Graph val graph = Graph(users, relationships, defaultUser) // Notice that there is a user 0 (for which we have no information) connected to users // 4 (peter) and 5 (franklin). println("vertices:"); graph.subgraph(each => each.srcId != 100L).vertices.collect.foreach(println) println("\ntriplets:"); graph.triplets.map( triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1).collect.foreach(println(_)) graph.edges.collect.foreach(println) // Run Connected Components val ccGraph = graph.connectedComponents() // No longer contains missing field // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") // Restrict the answer to the valid subgraph val validCCGraph = ccGraph.mask(validGraph) println("\nccGraph:"); println("vertices:"); ccGraph.vertices.collect.foreach(println) println("edegs:"); ccGraph.edges.collect.foreach(println) println("\nvalidGraph:"); validGraph.vertices.collect.foreach(println) println("\nvalidCCGraph:"); validCCGraph.vertices.collect.foreach(println) } }分析:
先对图进行connectedComponents,转换成新的图ccGraph,然后再对原图Graph进行subgraph操作,最后再mask取交集
3.结果:
vertices: (4,(peter,student)) (0,(John Doe,Missing)) (5,(franklin,prof)) (2,(istoica,prof)) (3,(rxin,student)) (7,(jgonzal,postdoc)) triplets: rxin is the collab of jgonzal istoica is the colleague of franklin franklin is the advisor of rxin franklin is the pi of jgonzal peter is the student of John Doe franklin is the colleague of John Doe Edge(3,7,collab) Edge(2,5,colleague) Edge(5,3,advisor) Edge(5,7,pi) Edge(4,0,student) Edge(5,0,colleague) ccGraph: vertices: (4,0) (0,0) (5,0) (2,0) (3,0) (7,0) edegs: Edge(3,7,collab) Edge(2,5,colleague) Edge(5,3,advisor) Edge(5,7,pi) Edge(4,0,student) Edge(5,0,colleague) validGraph: (4,(peter,student)) (5,(franklin,prof)) (2,(istoica,prof)) (3,(rxin,student)) (7,(jgonzal,postdoc)) validCCGraph: (4,0) (5,0) (2,0) (3,0) (7,0)
参考
【1】 http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html
【2】https://github.com/xubo245/SparkLearning