汪本成

graphx初涉，结合源码学习一

Graphx中的重要概念

graph

1.graph成员变量有:vertices,edges,triplets.

2.在triplets中，同时记录着edge和vertex

成员函数

函数分成几大类

对所有顶点或边的操作，但不改变图结构本身，如mapEdges, mapVertices
子图,类似于集合操作中的filter subGraph
图的分割，即paritition操作，这个对于Spark计算来说，很关键，正是因为有了不同的Partition,才有了并行处理的可能, 不同的PartitionStrategy,其收益不同。最容易想到的就是利用Hash来将整个图分成多个区域。
outerJoinVertices 顶点的外连接操作

图的运算和操作 GraphOps

图的常用算法是集中抽象到GraphOps这个类中，在Graph里作了隐式转换，将Graph转换为GraphOps,源码如下：

/**
   * Implicitly extracts the [[GraphOps]] member from a graph.
   *
   * To improve modularity the Graph type only contains a small set of basic operations.
   * All the convenience operations are defined in the [[GraphOps]] class which may be
   * shared across multiple graph implementations.
   */
  implicit def graphToGraphOps[VD: ClassTag, ED: ClassTag]
      (g: Graph[VD, ED]): GraphOps[VD, ED] = g.ops

支持的操作如下

collectNeighborIds
collectNeighbors
collectEdges
joinVertices
filter
pickRandomVertex
pregel
pageRank
staticPageRank
connectedComponents
triangleCount
stronglyConnectedComponents

源码如下:

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.graphx

import scala.reflect.ClassTag
import scala.util.Random

import org.apache.spark.SparkException
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD

import org.apache.spark.graphx.lib._

/**
 * Contains additional functionality for [[Graph]]. All operations are expressed in terms of the
 * efficient GraphX API. This class is implicitly constructed for each Graph object.
 *
 * @tparam VD the vertex attribute type
 * @tparam ED the edge attribute type
 */
class GraphOps[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]) extends Serializable {

  /** The number of edges in the graph. */
  @transient lazy val numEdges: Long = graph.edges.count()

  /** The number of vertices in the graph. */
  @transient lazy val numVertices: Long = graph.vertices.count()

  /**
   * The in-degree of each vertex in the graph.
   * @note Vertices with no in-edges are not returned in the resulting RDD.
   */
  @transient lazy val inDegrees: VertexRDD[Int] =
    degreesRDD(EdgeDirection.In).setName("GraphOps.inDegrees")

  /**
   * The out-degree of each vertex in the graph.
   * @note Vertices with no out-edges are not returned in the resulting RDD.
   */
  @transient lazy val outDegrees: VertexRDD[Int] =
    degreesRDD(EdgeDirection.Out).setName("GraphOps.outDegrees")

  /**
   * The degree of each vertex in the graph.
   * @note Vertices with no edges are not returned in the resulting RDD.
   */
  @transient lazy val degrees: VertexRDD[Int] =
    degreesRDD(EdgeDirection.Either).setName("GraphOps.degrees")

  /**
   * Computes the neighboring vertex degrees.
   *
   * @param edgeDirection the direction along which to collect neighboring vertex attributes
   */
  private def degreesRDD(edgeDirection: EdgeDirection): VertexRDD[Int] = {
    if (edgeDirection == EdgeDirection.In) {
      graph.aggregateMessages(_.sendToDst(1), _ + _, TripletFields.None)
    } else if (edgeDirection == EdgeDirection.Out) {
      graph.aggregateMessages(_.sendToSrc(1), _ + _, TripletFields.None)
    } else { // EdgeDirection.Either
      graph.aggregateMessages(ctx => { ctx.sendToSrc(1); ctx.sendToDst(1) }, _ + _,
        TripletFields.None)
    }
  }

  /**
   * Collect the neighbor vertex ids for each vertex.
   *
   * @param edgeDirection the direction along which to collect
   * neighboring vertices
   *
   * @return the set of neighboring ids for each vertex
   */
  def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexId]] = {
    val nbrs =
      if (edgeDirection == EdgeDirection.Either) {
        graph.aggregateMessages[Array[VertexId]](
          ctx => { ctx.sendToSrc(Array(ctx.dstId)); ctx.sendToDst(Array(ctx.srcId)) },
          _ ++ _, TripletFields.None)
      } else if (edgeDirection == EdgeDirection.Out) {
        graph.aggregateMessages[Array[VertexId]](
          ctx => ctx.sendToSrc(Array(ctx.dstId)),
          _ ++ _, TripletFields.None)
      } else if (edgeDirection == EdgeDirection.In) {
        graph.aggregateMessages[Array[VertexId]](
          ctx => ctx.sendToDst(Array(ctx.srcId)),
          _ ++ _, TripletFields.None)
      } else {
        throw new SparkException("It doesn't make sense to collect neighbor ids without a " +
          "direction. (EdgeDirection.Both is not supported; use EdgeDirection.Either instead.)")
      }
    graph.vertices.leftZipJoin(nbrs) { (vid, vdata, nbrsOpt) =>
      nbrsOpt.getOrElse(Array.empty[VertexId])
    }
  } // end of collectNeighborIds

  /**
   * Collect the neighbor vertex attributes for each vertex.
   *
   * @note This function could be highly inefficient on power-law
   * graphs where high degree vertices may force a large amount of
   * information to be collected to a single location.
   *
   * @param edgeDirection the direction along which to collect
   * neighboring vertices
   *
   * @return the vertex set of neighboring vertex attributes for each vertex
   */
  def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexId, VD)]] = {
    val nbrs = edgeDirection match {
      case EdgeDirection.Either =>
        graph.aggregateMessages[Array[(VertexId, VD)]](
          ctx => {
            ctx.sendToSrc(Array((ctx.dstId, ctx.dstAttr)))
            ctx.sendToDst(Array((ctx.srcId, ctx.srcAttr)))
          },
          (a, b) => a ++ b, TripletFields.All)
      case EdgeDirection.In =>
        graph.aggregateMessages[Array[(VertexId, VD)]](
          ctx => ctx.sendToDst(Array((ctx.srcId, ctx.srcAttr))),
          (a, b) => a ++ b, TripletFields.Src)
      case EdgeDirection.Out =>
        graph.aggregateMessages[Array[(VertexId, VD)]](
          ctx => ctx.sendToSrc(Array((ctx.dstId, ctx.dstAttr))),
          (a, b) => a ++ b, TripletFields.Dst)
      case EdgeDirection.Both =>
        throw new SparkException("collectEdges does not support EdgeDirection.Both. Use" +
          "EdgeDirection.Either instead.")
    }
    graph.vertices.leftJoin(nbrs) { (vid, vdata, nbrsOpt) =>
      nbrsOpt.getOrElse(Array.empty[(VertexId, VD)])
    }
  } // end of collectNeighbor

  /**
   * Returns an RDD that contains for each vertex v its local edges,
   * i.e., the edges that are incident on v, in the user-specified direction.
   * Warning: note that singleton vertices, those with no edges in the given
   * direction will not be part of the return value.
   *
   * @note This function could be highly inefficient on power-law
   * graphs where high degree vertices may force a large amount of
   * information to be collected to a single location.
   *
   * @param edgeDirection the direction along which to collect
   * the local edges of vertices
   *
   * @return the local edges for each vertex
   */
  def collectEdges(edgeDirection: EdgeDirection): VertexRDD[Array[Edge[ED]]] = {
    edgeDirection match {
      case EdgeDirection.Either =>
        graph.aggregateMessages[Array[Edge[ED]]](
          ctx => {
            ctx.sendToSrc(Array(new Edge(ctx.srcId, ctx.dstId, ctx.attr)))
            ctx.sendToDst(Array(new Edge(ctx.srcId, ctx.dstId, ctx.attr)))
          },
          (a, b) => a ++ b, TripletFields.EdgeOnly)
      case EdgeDirection.In =>
        graph.aggregateMessages[Array[Edge[ED]]](
          ctx => ctx.sendToDst(Array(new Edge(ctx.srcId, ctx.dstId, ctx.attr))),
          (a, b) => a ++ b, TripletFields.EdgeOnly)
      case EdgeDirection.Out =>
        graph.aggregateMessages[Array[Edge[ED]]](
          ctx => ctx.sendToSrc(Array(new Edge(ctx.srcId, ctx.dstId, ctx.attr))),
          (a, b) => a ++ b, TripletFields.EdgeOnly)
      case EdgeDirection.Both =>
        throw new SparkException("collectEdges does not support EdgeDirection.Both. Use" +
          "EdgeDirection.Either instead.")
    }
  }

  /**
   * Join the vertices with an RDD and then apply a function from the
   * vertex and RDD entry to a new vertex value.  The input table
   * should contain at most one entry for each vertex.  If no entry is
   * provided the map function is skipped and the old value is used.
   *
   * @tparam U the type of entry in the table of updates
   * @param table the table to join with the vertices in the graph.
   * The table should contain at most one entry for each vertex.
   * @param mapFunc the function used to compute the new vertex
   * values.  The map function is invoked only for vertices with a
   * corresponding entry in the table otherwise the old vertex value
   * is used.
   *
   * @example This function is used to update the vertices with new
   * values based on external data.  For example we could add the out
   * degree to each vertex record
   *
   * {{{
   * val rawGraph: Graph[Int, Int] = GraphLoader.edgeListFile(sc, "webgraph")
   *   .mapVertices((_, _) => 0)
   * val outDeg = rawGraph.outDegrees
   * val graph = rawGraph.joinVertices[Int](outDeg)
   *   ((_, _, outDeg) => outDeg)
   * }}}
   *
   */
  def joinVertices[U: ClassTag](table: RDD[(VertexId, U)])(mapFunc: (VertexId, VD, U) => VD)
    : Graph[VD, ED] = {
    val uf = (id: VertexId, data: VD, o: Option[U]) => {
      o match {
        case Some(u) => mapFunc(id, data, u)
        case None => data
      }
    }
    graph.outerJoinVertices(table)(uf)
  }

  /**
   * Filter the graph by computing some values to filter on, and applying the predicates.
   *
   * @param preprocess a function to compute new vertex and edge data before filtering
   * @param epred edge pred to filter on after preprocess, see more details under
   *  [[org.apache.spark.graphx.Graph#subgraph]]
   * @param vpred vertex pred to filter on after prerocess, see more details under
   *  [[org.apache.spark.graphx.Graph#subgraph]]
   * @tparam VD2 vertex type the vpred operates on
   * @tparam ED2 edge type the epred operates on
   * @return a subgraph of the orginal graph, with its data unchanged
   *
   * @example This function can be used to filter the graph based on some property, without
   * changing the vertex and edge values in your program. For example, we could remove the vertices
   * in a graph with 0 outdegree
   *
   * {{{
   * graph.filter(
   *   graph => {
   *     val degrees: VertexRDD[Int] = graph.outDegrees
   *     graph.outerJoinVertices(degrees) {(vid, data, deg) => deg.getOrElse(0)}
   *   },
   *   vpred = (vid: VertexId, deg:Int) => deg > 0
   * )
   * }}}
   *
   */
  def filter[VD2: ClassTag, ED2: ClassTag](
      preprocess: Graph[VD, ED] => Graph[VD2, ED2],
      epred: (EdgeTriplet[VD2, ED2]) => Boolean = (x: EdgeTriplet[VD2, ED2]) => true,
      vpred: (VertexId, VD2) => Boolean = (v: VertexId, d: VD2) => true): Graph[VD, ED] = {
    graph.mask(preprocess(graph).subgraph(epred, vpred))
  }

  /**
   * Picks a random vertex from the graph and returns its ID.
   */
  def pickRandomVertex(): VertexId = {
    val probability = 50.0 / graph.numVertices
    var found = false
    var retVal: VertexId = null.asInstanceOf[VertexId]
    while (!found) {
      val selectedVertices = graph.vertices.flatMap { vidVvals =>
        if (Random.nextDouble() < probability) { Some(vidVvals._1) }
        else { None }
      }
      if (selectedVertices.count > 1) {
        found = true
        val collectedVertices = selectedVertices.collect()
        retVal = collectedVertices(Random.nextInt(collectedVertices.size))
      }
    }
   retVal
  }

  /**
   * Convert bi-directional edges into uni-directional ones.
   * Some graph algorithms (e.g., TriangleCount) assume that an input graph
   * has its edges in canonical direction.
   * This function rewrites the vertex ids of edges so that srcIds are smaller
   * than dstIds, and merges the duplicated edges.
   *
   * @param mergeFunc the user defined reduce function which should
   * be commutative and associative and is used to combine the output
   * of the map phase
   *
   * @return the resulting graph with canonical edges
   */
  def convertToCanonicalEdges(
      mergeFunc: (ED, ED) => ED = (e1, e2) => e1): Graph[VD, ED] = {
    val newEdges =
      graph.edges
        .map {
          case e if e.srcId < e.dstId => ((e.srcId, e.dstId), e.attr)
          case e => ((e.dstId, e.srcId), e.attr)
        }
        .reduceByKey(mergeFunc)
        .map(e => new Edge(e._1._1, e._1._2, e._2))
    Graph(graph.vertices, newEdges)
  }

  /**
   * Execute a Pregel-like iterative vertex-parallel abstraction.  The
   * user-defined vertex-program `vprog` is executed in parallel on
   * each vertex receiving any inbound messages and computing a new
   * value for the vertex.  The `sendMsg` function is then invoked on
   * all out-edges and is used to compute an optional message to the
   * destination vertex. The `mergeMsg` function is a commutative
   * associative function used to combine messages destined to the
   * same vertex.
   *
   * On the first iteration all vertices receive the `initialMsg` and
   * on subsequent iterations if a vertex does not receive a message
   * then the vertex-program is not invoked.
   *
   * This function iterates until there are no remaining messages, or
   * for `maxIterations` iterations.
   *
   * @tparam A the Pregel message type
   *
   * @param initialMsg the message each vertex will receive at the on
   * the first iteration
   *
   * @param maxIterations the maximum number of iterations to run for
   *
   * @param activeDirection the direction of edges incident to a vertex that received a message in
   * the previous round on which to run `sendMsg`. For example, if this is `EdgeDirection.Out`, only
   * out-edges of vertices that received a message in the previous round will run.
   *
   * @param vprog the user-defined vertex program which runs on each
   * vertex and receives the inbound message and computes a new vertex
   * value.  On the first iteration the vertex program is invoked on
   * all vertices and is passed the default message.  On subsequent
   * iterations the vertex program is only invoked on those vertices
   * that receive messages.
   *
   * @param sendMsg a user supplied function that is applied to out
   * edges of vertices that received messages in the current
   * iteration
   *
   * @param mergeMsg a user supplied function that takes two incoming
   * messages of type A and merges them into a single message of type
   * A.  ''This function must be commutative and associative and
   * ideally the size of A should not increase.''
   *
   * @return the resulting graph at the end of the computation
   *
   */
  def pregel[A: ClassTag](
      initialMsg: A,
      maxIterations: Int = Int.MaxValue,
      activeDirection: EdgeDirection = EdgeDirection.Either)(
      vprog: (VertexId, VD, A) => VD,
      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
      mergeMsg: (A, A) => A)
    : Graph[VD, ED] = {
    Pregel(graph, initialMsg, maxIterations, activeDirection)(vprog, sendMsg, mergeMsg)
  }

  /**
   * Run a dynamic version of PageRank returning a graph with vertex attributes containing the
   * PageRank and edge attributes containing the normalized edge weight.
   *
   * @see [[org.apache.spark.graphx.lib.PageRank$#runUntilConvergence]]
   */
  def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double] = {
    PageRank.runUntilConvergence(graph, tol, resetProb)
  }


  /**
   * Run personalized PageRank for a given vertex, such that all random walks
   * are started relative to the source node.
   *
   * @see [[org.apache.spark.graphx.lib.PageRank$#runUntilConvergenceWithOptions]]
   */
  def personalizedPageRank(src: VertexId, tol: Double,
    resetProb: Double = 0.15) : Graph[Double, Double] = {
    PageRank.runUntilConvergenceWithOptions(graph, tol, resetProb, Some(src))
  }

  /**
   * Run Personalized PageRank for a fixed number of iterations with
   * with all iterations originating at the source node
   * returning a graph with vertex attributes
   * containing the PageRank and edge attributes the normalized edge weight.
   *
   * @see [[org.apache.spark.graphx.lib.PageRank$#runWithOptions]]
   */
  def staticPersonalizedPageRank(src: VertexId, numIter: Int,
    resetProb: Double = 0.15) : Graph[Double, Double] = {
    PageRank.runWithOptions(graph, numIter, resetProb, Some(src))
  }

  /**
   * Run PageRank for a fixed number of iterations returning a graph with vertex attributes
   * containing the PageRank and edge attributes the normalized edge weight.
   *
   * @see [[org.apache.spark.graphx.lib.PageRank$#run]]
   */
  def staticPageRank(numIter: Int, resetProb: Double = 0.15): Graph[Double, Double] = {
    PageRank.run(graph, numIter, resetProb)
  }

  /**
   * Compute the connected component membership of each vertex and return a graph with the vertex
   * value containing the lowest vertex id in the connected component containing that vertex.
   *
   * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
   */
  def connectedComponents(): Graph[VertexId, ED] = {
    ConnectedComponents.run(graph)
  }

  /**
   * Compute the number of triangles passing through each vertex.
   *
   * @see [[org.apache.spark.graphx.lib.TriangleCount$#run]]
   */
  def triangleCount(): Graph[Int, ED] = {
    TriangleCount.run(graph)
  }

  /**
   * Compute the strongly connected component (SCC) of each vertex and return a graph with the
   * vertex value containing the lowest vertex id in the SCC containing that vertex.
   *
   * @see [[org.apache.spark.graphx.lib.StronglyConnectedComponents$#run]]
   */
  def stronglyConnectedComponents(numIter: Int): Graph[VertexId, ED] = {
    StronglyConnectedComponents.run(graph, numIter)
  }
} // end of GraphOps

RDD

RDD是Spark体系的核心，那么Graphx中引入了哪些新的RDD呢：

VertexRDD:VertexRDD[A]继承自RDD[(VertexID, A)]并且添加了额外的限制，那就是每个VertexID只能出现一次。此外，VertexRDD[A]代表了一组属性类型为A的顶点。在内部，这通过保存顶点属性到一个可重复使用的hash-map数据结构来获得。所以，如果两个VertexRDDs从相同的基本VertexRDD获得（如通过filter或者mapValues），它们能够在固定的时间内连接而不需要hash评价。为了利用这个索引数据结构，VertexRDD暴露了一下附加的功能,VertexRDD源码如下：

package org.apache.spark.graphx

import scala.reflect.ClassTag

import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.rdd._
import org.apache.spark.storage.StorageLevel

import org.apache.spark.graphx.impl.RoutingTablePartition
import org.apache.spark.graphx.impl.ShippableVertexPartition
import org.apache.spark.graphx.impl.VertexAttributeBlock
import org.apache.spark.graphx.impl.VertexRDDImpl

/**
 * Extends `RDD[(VertexId, VD)]` by ensuring that there is only one entry for each vertex and by
 * pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be
 * joined efficiently. All operations except [[reindex]] preserve the index. To construct a
 * `VertexRDD`, use the [[org.apache.spark.graphx.VertexRDD$ VertexRDD object]].
 *
 * Additionally, stores routing information to enable joining the vertex attributes with an
 * [[EdgeRDD]].
 *
 * @example Construct a `VertexRDD` from a plain RDD:
 * {{{
 * // Construct an initial vertex set
 * val someData: RDD[(VertexId, SomeType)] = loadData(someFile)
 * val vset = VertexRDD(someData)
 * // If there were redundant values in someData we would use a reduceFunc
 * val vset2 = VertexRDD(someData, reduceFunc)
 * // Finally we can use the VertexRDD to index another dataset
 * val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile)
 * val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b }
 * // Now we can construct very fast joins between the two sets
 * val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)
 * }}}
 *
 * @tparam VD the vertex attribute associated with each vertex in the set.
 */
abstract class VertexRDD[VD](
    sc: SparkContext,
    deps: Seq[Dependency[_]]) extends RDD[(VertexId, VD)](sc, deps) {

  implicit protected def vdTag: ClassTag[VD]

  private[graphx] def partitionsRDD: RDD[ShippableVertexPartition[VD]]

  override protected def getPartitions: Array[Partition] = partitionsRDD.partitions

  /**
   * Provides the `RDD[(VertexId, VD)]` equivalent output.
   */
  override def compute(part: Partition, context: TaskContext): Iterator[(VertexId, VD)] = {
    firstParent[ShippableVertexPartition[VD]].iterator(part, context).next().iterator
  }

  /**
   * Construct a new VertexRDD that is indexed by only the visible vertices. The resulting
   * VertexRDD will be based on a different index and can no longer be quickly joined with this
   * RDD.
   */
  def reindex(): VertexRDD[VD]

  /**
   * Applies a function to each `VertexPartition` of this RDD and returns a new VertexRDD.
   */
  private[graphx] def mapVertexPartitions[VD2: ClassTag](
      f: ShippableVertexPartition[VD] => ShippableVertexPartition[VD2])
    : VertexRDD[VD2]

  /**
   * Restricts the vertex set to the set of vertices satisfying the given predicate. This operation
   * preserves the index for efficient joins with the original RDD, and it sets bits in the bitmask
   * rather than allocating new memory.
   *
   * It is declared and defined here to allow refining the return type from `RDD[(VertexId, VD)]` to
   * `VertexRDD[VD]`.
   *
   * @param pred the user defined predicate, which takes a tuple to conform to the
   * `RDD[(VertexId, VD)]` interface
   */
  override def filter(pred: Tuple2[VertexId, VD] => Boolean): VertexRDD[VD] =
    this.mapVertexPartitions(_.filter(Function.untupled(pred)))

  /**
   * Maps each vertex attribute, preserving the index.
   *
   * @tparam VD2 the type returned by the map function
   *
   * @param f the function applied to each value in the RDD
   * @return a new VertexRDD with values obtained by applying `f` to each of the entries in the
   * original VertexRDD
   */
  def mapValues[VD2: ClassTag](f: VD => VD2): VertexRDD[VD2]

  /**
   * Maps each vertex attribute, additionally supplying the vertex ID.
   *
   * @tparam VD2 the type returned by the map function
   *
   * @param f the function applied to each ID-value pair in the RDD
   * @return a new VertexRDD with values obtained by applying `f` to each of the entries in the
   * original VertexRDD.  The resulting VertexRDD retains the same index.
   */
  def mapValues[VD2: ClassTag](f: (VertexId, VD) => VD2): VertexRDD[VD2]

  /**
   * For each VertexId present in both `this` and `other`, minus will act as a set difference
   * operation returning only those unique VertexId's present in `this`.
   *
   * @param other an RDD to run the set operation against
   */
  def minus(other: RDD[(VertexId, VD)]): VertexRDD[VD]

  /**
   * For each VertexId present in both `this` and `other`, minus will act as a set difference
   * operation returning only those unique VertexId's present in `this`.
   *
   * @param other a VertexRDD to run the set operation against
   */
  def minus(other: VertexRDD[VD]): VertexRDD[VD]

  /**
   * For each vertex present in both `this` and `other`, `diff` returns only those vertices with
   * differing values; for values that are different, keeps the values from `other`. This is
   * only guaranteed to work if the VertexRDDs share a common ancestor.
   *
   * @param other the other RDD[(VertexId, VD)] with which to diff against.
   */
  def diff(other: RDD[(VertexId, VD)]): VertexRDD[VD]

  /**
   * For each vertex present in both `this` and `other`, `diff` returns only those vertices with
   * differing values; for values that are different, keeps the values from `other`. This is
   * only guaranteed to work if the VertexRDDs share a common ancestor.
   *
   * @param other the other VertexRDD with which to diff against.
   */
  def diff(other: VertexRDD[VD]): VertexRDD[VD]

  /**
   * Left joins this RDD with another VertexRDD with the same index. This function will fail if
   * both VertexRDDs do not share the same index. The resulting vertex set contains an entry for
   * each vertex in `this`.
   * If `other` is missing any vertex in this VertexRDD, `f` is passed `None`.
   *
   * @tparam VD2 the attribute type of the other VertexRDD
   * @tparam VD3 the attribute type of the resulting VertexRDD
   *
   * @param other the other VertexRDD with which to join.
   * @param f the function mapping a vertex id and its attributes in this and the other vertex set
   * to a new vertex attribute.
   * @return a VertexRDD containing the results of `f`
   */
  def leftZipJoin[VD2: ClassTag, VD3: ClassTag]
      (other: VertexRDD[VD2])(f: (VertexId, VD, Option[VD2]) => VD3): VertexRDD[VD3]

  /**
   * Left joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is
   * backed by a VertexRDD with the same index then the efficient [[leftZipJoin]] implementation is
   * used. The resulting VertexRDD contains an entry for each vertex in `this`. If `other` is
   * missing any vertex in this VertexRDD, `f` is passed `None`. If there are duplicates,
   * the vertex is picked arbitrarily.
   *
   * @tparam VD2 the attribute type of the other VertexRDD
   * @tparam VD3 the attribute type of the resulting VertexRDD
   *
   * @param other the other VertexRDD with which to join
   * @param f the function mapping a vertex id and its attributes in this and the other vertex set
   * to a new vertex attribute.
   * @return a VertexRDD containing all the vertices in this VertexRDD with the attributes emitted
   * by `f`.
   */
  def leftJoin[VD2: ClassTag, VD3: ClassTag]
      (other: RDD[(VertexId, VD2)])
      (f: (VertexId, VD, Option[VD2]) => VD3)
    : VertexRDD[VD3]

  /**
   * Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index. See
   * [[innerJoin]] for the behavior of the join.
   */
  def innerZipJoin[U: ClassTag, VD2: ClassTag](other: VertexRDD[U])
      (f: (VertexId, VD, U) => VD2): VertexRDD[VD2]

  /**
   * Inner joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is
   * backed by a VertexRDD with the same index then the efficient [[innerZipJoin]] implementation
   * is used.
   *
   * @param other an RDD containing vertices to join. If there are multiple entries for the same
   * vertex, one is picked arbitrarily. Use [[aggregateUsingIndex]] to merge multiple entries.
   * @param f the join function applied to corresponding values of `this` and `other`
   * @return a VertexRDD co-indexed with `this`, containing only vertices that appear in both
   *         `this` and `other`, with values supplied by `f`
   */
  def innerJoin[U: ClassTag, VD2: ClassTag](other: RDD[(VertexId, U)])
      (f: (VertexId, VD, U) => VD2): VertexRDD[VD2]

  /**
   * Aggregates vertices in `messages` that have the same ids using `reduceFunc`, returning a
   * VertexRDD co-indexed with `this`.
   *
   * @param messages an RDD containing messages to aggregate, where each message is a pair of its
   * target vertex ID and the message data
   * @param reduceFunc the associative aggregation function for merging messages to the same vertex
   * @return a VertexRDD co-indexed with `this`, containing only vertices that received messages.
   * For those vertices, their values are the result of applying `reduceFunc` to all received
   * messages.
   */
  def aggregateUsingIndex[VD2: ClassTag](
      messages: RDD[(VertexId, VD2)], reduceFunc: (VD2, VD2) => VD2): VertexRDD[VD2]

  /**
   * Returns a new `VertexRDD` reflecting a reversal of all edge directions in the corresponding
   * [[EdgeRDD]].
   */
  def reverseRoutingTables(): VertexRDD[VD]

  /** Prepares this VertexRDD for efficient joins with the given EdgeRDD. */
  def withEdges(edges: EdgeRDD[_]): VertexRDD[VD]

  /** Replaces the vertex partitions while preserving all other properties of the VertexRDD. */
  private[graphx] def withPartitionsRDD[VD2: ClassTag](
      partitionsRDD: RDD[ShippableVertexPartition[VD2]]): VertexRDD[VD2]

  /**
   * Changes the target storage level while preserving all other properties of the
   * VertexRDD. Operations on the returned VertexRDD will preserve this storage level.
   *
   * This does not actually trigger a cache; to do this, call
   * [[org.apache.spark.graphx.VertexRDD#cache]] on the returned VertexRDD.
   */
  private[graphx] def withTargetStorageLevel(
      targetStorageLevel: StorageLevel): VertexRDD[VD]

  /** Generates an RDD of vertex attributes suitable for shipping to the edge partitions. */
  private[graphx] def shipVertexAttributes(
      shipSrc: Boolean, shipDst: Boolean): RDD[(PartitionID, VertexAttributeBlock[VD])]

  /** Generates an RDD of vertex IDs suitable for shipping to the edge partitions. */
  private[graphx] def shipVertexIds(): RDD[(PartitionID, Array[VertexId])]

} // end of VertexRDD

EdgeRDD：EdgeRDD[ED]继承自RDD[Edge[ED]]，使用定义在 PartitionStrategy的各种分区策略中的一个在块分区中组织边。在每个分区中，边属性和相邻结构被分别保存，当属性值改变时，它们可以最大化的重用。

EdgeRDD暴露了三个额外的函数，其源代码如下：

abstract class EdgeRDD[ED](
    sc: SparkContext,
    deps: Seq[Dependency[_]]) extends RDD[Edge[ED]](sc, deps) {

  // scalastyle:off structural.type
  private[graphx] def partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])] forSome { type VD }
  // scalastyle:on structural.type

  override protected def getPartitions: Array[Partition] = partitionsRDD.partitions

  override def compute(part: Partition, context: TaskContext): Iterator[Edge[ED]] = {
    val p = firstParent[(PartitionID, EdgePartition[ED, _])].iterator(part, context)
    if (p.hasNext) {
      p.next()._2.iterator.map(_.copy())
    } else {
      Iterator.empty
    }
  }

  /**
   * Map the values in an edge partitioning preserving the structure but changing the values.
   *
   * @tparam ED2 the new edge value type
   * @param f the function from an edge to a new edge value
   * @return a new EdgeRDD containing the new edge values
   */
  def mapValues[ED2: ClassTag](f: Edge[ED] => ED2): EdgeRDD[ED2]

  /**
   * Reverse all the edges in this RDD.
   *
   * @return a new EdgeRDD containing all the edges reversed
   */
  def reverse: EdgeRDD[ED]

  /**
   * Inner joins this EdgeRDD with another EdgeRDD, assuming both are partitioned using the same
   * [[PartitionStrategy]].
   *
   * @param other the EdgeRDD to join with
   * @param f the join function applied to corresponding values of `this` and `other`
   * @return a new EdgeRDD containing only edges that appear in both `this` and `other`,
   *         with values supplied by `f`
   */
  def innerJoin[ED2: ClassTag, ED3: ClassTag]
      (other: EdgeRDD[ED2])
      (f: (VertexId, VertexId, ED, ED2) => ED3): EdgeRDD[ED3]

  /**
   * Changes the target storage level while preserving all other properties of the
   * EdgeRDD. Operations on the returned EdgeRDD will preserve this storage level.
   *
   * This does not actually trigger a cache; to do this, call
   * [[org.apache.spark.graphx.EdgeRDD#cache]] on the returned EdgeRDD.
   */
  private[graphx] def withTargetStorageLevel(targetStorageLevel: StorageLevel): EdgeRDD[ED]
}

较之EdgeRdd，VertexRDD更为重要，其上的操作也很多，主要集中于Vertex之上属性的合并，说到合并就不得不扯到关系代数和集合论，所以在VertexRdd中能看到许多类似于sql中的术语，如

leftJoin
innerJoin

至于leftJoin, innerJoin, outerJoin的区别，建议谷歌一下，不再赘述，以上源代码里也有详细解释。

Graphx在生产场景的应用

图的存储和加载

在大数据的环境下，如果图很巨大，表示顶点和边的数据不足以放在一个文件中怎么办？用HDFS。

一般来说，我们会将所有与顶点相关的内容保存在一个文件中vertexFile，所有与边相关的信息保存在另一个文件中edgeFile。

生成某一个具体的图时，用edge就可以表示图中顶点的关联关系，同时图的结构也表示出来了。

生成图

graphLoader是graphx中专门用于图的加载和生成，最重要的函数就是edgeListFile，源码如下：

package org.apache.spark.graphx

import org.apache.spark.storage.StorageLevel
import org.apache.spark.{Logging, SparkContext}
import org.apache.spark.graphx.impl.{EdgePartitionBuilder, GraphImpl}

/**
 * Provides utilities for loading [[Graph]]s from files.
 */
object GraphLoader extends Logging {

  /**
   * Loads a graph from an edge list formatted file where each line contains two integers: a source
   * id and a target id. Skips lines that begin with `#`.
   *
   * If desired the edges can be automatically oriented in the positive
   * direction (source Id < target Id) by setting `canonicalOrientation` to
   * true.
   *
   * @example Loads a file in the following format:
   * {{{
   * # Comment Line
   * # Source Id <\t> Target Id
   * 1   -5
   * 1    2
   * 2    7
   * 1    8
   * }}}
   *
   * @param sc SparkContext
   * @param path the path to the file (e.g., /home/data/file or hdfs://file)
   * @param canonicalOrientation whether to orient edges in the positive
   *        direction
   * @param numEdgePartitions the number of partitions for the edge RDD
   * Setting this value to -1 will use the default parallelism.
   * @param edgeStorageLevel the desired storage level for the edge partitions
   * @param vertexStorageLevel the desired storage level for the vertex partitions
   */
  def edgeListFile(
      sc: SparkContext,
      path: String,
      canonicalOrientation: Boolean = false,
      numEdgePartitions: Int = -1,
      edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
      vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY)
    : Graph[Int, Int] =
  {
    val startTime = System.currentTimeMillis

    // Parse the edge data table directly into edge partitions
    val lines =
      if (numEdgePartitions > 0) {
        sc.textFile(path, numEdgePartitions).coalesce(numEdgePartitions)
      } else {
        sc.textFile(path)
      }
    val edges = lines.mapPartitionsWithIndex { (pid, iter) =>
      val builder = new EdgePartitionBuilder[Int, Int]
      iter.foreach { line =>
        if (!line.isEmpty && line(0) != '#') {
          val lineArray = line.split("\\s+")
          if (lineArray.length < 2) {
            throw new IllegalArgumentException("Invalid line: " + line)
          }
          val srcId = lineArray(0).toLong
          val dstId = lineArray(1).toLong
          if (canonicalOrientation && srcId > dstId) {
            builder.add(dstId, srcId, 1)
          } else {
            builder.add(srcId, dstId, 1)
          }
        }
      }
      Iterator((pid, builder.toEdgePartition))
    }.persist(edgeStorageLevel).setName("GraphLoader.edgeListFile - edges (%s)".format(path))
    edges.count()

    logInfo("It took %d ms to load the edges".format(System.currentTimeMillis - startTime))

    GraphImpl.fromEdgePartitions(edges, defaultVertexAttr = 1, edgeStorageLevel = edgeStorageLevel,
      vertexStorageLevel = vertexStorageLevel)
  } // end of edgeListFile

}

PageRank

什么是PageRank

PageRank是Google专有的算法，用于衡量特定网页相对于搜索引擎索引中的其他网页而言的重要程度。它由Larry Page 和 Sergey Brin在20世纪90年代后期发明。PageRank实现了将链接价值概念作为排名因素。PageRank将对页面的链接看成投票，指示了重要性。

PageRank的核心思想

分析步骤用文字表述如下：

网页和网页之间的关系用图来表示
网页A和网页B之间的连接关系表示任意一个用户从网页A到转到网页B的可能性（概率）
所有网页的排名用一维向量来B来表示

所有网页之间的连接用矩阵A来表示，所有网页排名用B来表示。

PageRank如何进行优化

网页的排名计算最终抽象为矩阵相乘，于是可以联系到矩阵相乘可以并行化处理，spark这里借助的是Prege模型，PageRank里面定义的函数源码如下：

object PageRank extends Logging {


  /**
   * Run PageRank for a fixed number of iterations returning a graph
   * with vertex attributes containing the PageRank and edge
   * attributes the normalized edge weight.
   *
   * @tparam VD the original vertex attribute (not used)
   * @tparam ED the original edge attribute (not used)
   *
   * @param graph the graph on which to compute PageRank
   * @param numIter the number of iterations of PageRank to run
   * @param resetProb the random reset probability (alpha)
   *
   * @return the graph containing with each vertex containing the PageRank and each edge
   *         containing the normalized weight.
   */
  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], numIter: Int,
    resetProb: Double = 0.15): Graph[Double, Double] =
  {
    runWithOptions(graph, numIter, resetProb)
  }

  /**
   * Run PageRank for a fixed number of iterations returning a graph
   * with vertex attributes containing the PageRank and edge
   * attributes the normalized edge weight.
   *
   * @tparam VD the original vertex attribute (not used)
   * @tparam ED the original edge attribute (not used)
   *
   * @param graph the graph on which to compute PageRank
   * @param numIter the number of iterations of PageRank to run
   * @param resetProb the random reset probability (alpha)
   * @param srcId the source vertex for a Personalized Page Rank (optional)
   *
   * @return the graph containing with each vertex containing the PageRank and each edge
   *         containing the normalized weight.
   *
   */
  def runWithOptions[VD: ClassTag, ED: ClassTag](
      graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
      srcId: Option[VertexId] = None): Graph[Double, Double] =
  {
    val personalized = srcId isDefined
    val src: VertexId = srcId.getOrElse(-1L)

    // Initialize the PageRank graph with each edge attribute having
    // weight 1/outDegree and each vertex with attribute resetProb.
    // When running personalized pagerank, only the source vertex
    // has an attribute resetProb. All others are set to 0.
    var rankGraph: Graph[Double, Double] = graph
      // Associate the degree with each vertex
      .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
      // Set the weight on the edges based on the degree
      .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
      // Set the vertex attributes to the initial pagerank values
      .mapVertices { (id, attr) =>
        if (!(id != src && personalized)) resetProb else 0.0
      }

    def delta(u: VertexId, v: VertexId): Double = { if (u == v) 1.0 else 0.0 }

    var iteration = 0
    var prevRankGraph: Graph[Double, Double] = null
    while (iteration < numIter) {
      rankGraph.cache()

      // Compute the outgoing rank contributions of each vertex, perform local preaggregation, and
      // do the final aggregation at the receiving vertices. Requires a shuffle for aggregation.
      val rankUpdates = rankGraph.aggregateMessages[Double](
        ctx => ctx.sendToDst(ctx.srcAttr * ctx.attr), _ + _, TripletFields.Src)

      // Apply the final rank updates to get the new ranks, using join to preserve ranks of vertices
      // that didn't receive a message. Requires a shuffle for broadcasting updated ranks to the
      // edge partitions.
      prevRankGraph = rankGraph
      val rPrb = if (personalized) {
        (src: VertexId , id: VertexId) => resetProb * delta(src, id)
      } else {
        (src: VertexId, id: VertexId) => resetProb
      }

      rankGraph = rankGraph.joinVertices(rankUpdates) {
        (id, oldRank, msgSum) => rPrb(src, id) + (1.0 - resetProb) * msgSum
      }.cache()

      rankGraph.edges.foreachPartition(x => {}) // also materializes rankGraph.vertices
      logInfo(s"PageRank finished iteration $iteration.")
      prevRankGraph.vertices.unpersist(false)
      prevRankGraph.edges.unpersist(false)

      iteration += 1
    }

    rankGraph
  }

  /**
   * Run a dynamic version of PageRank returning a graph with vertex attributes containing the
   * PageRank and edge attributes containing the normalized edge weight.
   *
   * @tparam VD the original vertex attribute (not used)
   * @tparam ED the original edge attribute (not used)
   *
   * @param graph the graph on which to compute PageRank
   * @param tol the tolerance allowed at convergence (smaller => more accurate).
   * @param resetProb the random reset probability (alpha)
   *
   * @return the graph containing with each vertex containing the PageRank and each edge
   *         containing the normalized weight.
   */
  def runUntilConvergence[VD: ClassTag, ED: ClassTag](
    graph: Graph[VD, ED], tol: Double, resetProb: Double = 0.15): Graph[Double, Double] =
  {
      runUntilConvergenceWithOptions(graph, tol, resetProb)
  }

  /**
   * Run a dynamic version of PageRank returning a graph with vertex attributes containing the
   * PageRank and edge attributes containing the normalized edge weight.
   *
   * @tparam VD the original vertex attribute (not used)
   * @tparam ED the original edge attribute (not used)
   *
   * @param graph the graph on which to compute PageRank
   * @param tol the tolerance allowed at convergence (smaller => more accurate).
   * @param resetProb the random reset probability (alpha)
   * @param srcId the source vertex for a Personalized Page Rank (optional)
   *
   * @return the graph containing with each vertex containing the PageRank and each edge
   *         containing the normalized weight.
   */
  def runUntilConvergenceWithOptions[VD: ClassTag, ED: ClassTag](
      graph: Graph[VD, ED], tol: Double, resetProb: Double = 0.15,
      srcId: Option[VertexId] = None): Graph[Double, Double] =
  {
    val personalized = srcId.isDefined
    val src: VertexId = srcId.getOrElse(-1L)

    // Initialize the pagerankGraph with each edge attribute
    // having weight 1/outDegree and each vertex with attribute 1.0.
    val pagerankGraph: Graph[(Double, Double), Double] = graph
      // Associate the degree with each vertex
      .outerJoinVertices(graph.outDegrees) {
        (vid, vdata, deg) => deg.getOrElse(0)
      }
      // Set the weight on the edges based on the degree
      .mapTriplets( e => 1.0 / e.srcAttr )
      // Set the vertex attributes to (initalPR, delta = 0)
      .mapVertices { (id, attr) =>
        if (id == src) (resetProb, Double.NegativeInfinity) else (0.0, 0.0)
      }
      .cache()

    // Define the three functions needed to implement PageRank in the GraphX
    // version of Pregel
    def vertexProgram(id: VertexId, attr: (Double, Double), msgSum: Double): (Double, Double) = {
      val (oldPR, lastDelta) = attr
      val newPR = oldPR + (1.0 - resetProb) * msgSum
      (newPR, newPR - oldPR)
    }

    def personalizedVertexProgram(id: VertexId, attr: (Double, Double),
      msgSum: Double): (Double, Double) = {
      val (oldPR, lastDelta) = attr
      var teleport = oldPR
      val delta = if (src==id) 1.0 else 0.0
      teleport = oldPR*delta

      val newPR = teleport + (1.0 - resetProb) * msgSum
      val newDelta = if (lastDelta == Double.NegativeInfinity) newPR else newPR - oldPR
      (newPR, newDelta)
    }

    def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
      if (edge.srcAttr._2 > tol) {
        Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
      } else {
        Iterator.empty
      }
    }

    def messageCombiner(a: Double, b: Double): Double = a + b

    // The initial message received by all vertices in PageRank
    val initialMessage = if (personalized) 0.0 else resetProb / (1.0 - resetProb)

    // Execute a dynamic version of Pregel.
    val vp = if (personalized) {
      (id: VertexId, attr: (Double, Double), msgSum: Double) =>
        personalizedVertexProgram(id, attr, msgSum)
    } else {
      (id: VertexId, attr: (Double, Double), msgSum: Double) =>
        vertexProgram(id, attr, msgSum)
    }

    Pregel(pagerankGraph, initialMessage, activeDirection = EdgeDirection.Out)(
      vp, sendMessage, messageCombiner)
      .mapVertices((vid, attr) => attr._1)
  } // end of deltaPageRank

好像篇幅写的有点长，接下来的我会在下一篇博客中更新，请大家随时关注！

你可能感兴趣的:(大数据-Graphx)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
如何利用大数据与AI技术革新相亲交友体验 h17711347205 回归算法安全系统架构交友小程序
在数字化时代，大数据和人工智能（AI）技术正逐渐革新相亲交友体验，为寻找爱情的过程带来前所未有的变革（编辑h17711347205）。通过精准分析和智能匹配，这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据，为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为，系统能够深入了解用户的兴趣和需求，从而提供更
未来软件市场是怎么样的？做开发的生存空间如何？ cesske 软件需求
目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的？做开发的生存空间如何？一、未来软件市场的发展趋势技术趋势：人工智能与机器学习：随着技术的不断成熟，人工智能将在更多领域得到应用，如智能客服、自动驾驶、智能制造等，这将极大地推动软件市场的增长。云计算与大数据：云计算服务将继续普及，大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率，并
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
[转载] NoSQL简介 weixin_30325793 大数据数据库运维
摘自“百度百科”。NoSQL，泛指非关系型的数据库。随着互联网web2.0网站的兴起，传统的关系数据库在应付web2.0网站，特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心，暴露了很多难以克服的问题，而非关系型的数据库则由于其本身的特点得到了非常迅速的发展。NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战，尤其是大数据应用难题。虽然NoSQL流行语
Kafka详细解析与应用分析芊言芊语 kafka 分布式
Kafka是一个开源的分布式事件流平台（EventStreamingPlatform），由LinkedIn公司最初采用Scala语言开发，并基于ZooKeeper协调管理。如今，Kafka已经被Apache基金会纳入其项目体系，广泛应用于大数据实时处理领域。Kafka凭借其高吞吐量、持久化、分布式和可靠性的特点，成为构建实时流数据管道和流处理应用程序的重要工具。Kafka架构Kafka的架构主要由
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
疫情，疫情东山草
2020年，疫情爆发，至今已近三年，反反复复，此起彼伏。不但没被消灭，还自我发展，从德尔塔到奥密克戎，与时俱进的变异着。去年11月，疫情之下，大数据800米范围内，都成为时空伴随者。“你的码儿有没有变颜色”“你绿码还是黄码”成为那段时间的流行语，当然少不了的还有全员核酸。段子手整出来一首歌：我走过你走过的路,这算不算相逢？我吹过你吹过的风，这算不算相拥？800米内我们不曾擦肩而过，你却要我14天相
在服务器计算节点中使用 jupyter Lab ranshan567 程序人生
JupyterLab是一个基于网页的交互式开发环境,用于科学计算、数据分析和机器学.jupyterlab是jupyternotebook的下一代产品,集成了更多功能,使用起来更方便.在进行数据分析及可视化时，个人电脑不能满足大数据的分析需求，就需要用到高性能计算机集群资源，然而计算机集群的计算节点往往没有联网功能，所以在计算机集群中使用jupyterLab需要进行一些配置。具体的步骤如下：
大数据真实面试题---SQL The博宇大数据面试题——SQL 大数据 mysql sql 数据库 big data
视频号数据分析组外包招聘笔试题时间限时45分钟完成。题目根据3张表表结构，写出具体求解的SQL代码（搞笑品类定义：视频分类或者视频创建者分类为“搞笑”）1、表创建语句：createtablet_user_video_action_d(dsint,user_idstring,video_idstring,action_typeint,`timestamp`bigint)rowformatdelimi
Spark 组件 GraphX、Streaming 叶域大数据 spark spark 大数据分布式
Spark组件GraphX、Streaming一、SparkGraphX1.1GraphX的主要概念1.2GraphX的核心操作1.3示例代码1.4GraphX的应用场景二、SparkStreaming2.1SparkStreaming的主要概念2.2示例代码2.3SparkStreaming的集成2.4SparkStreaming的应用场景SparkGraphX用于处理图和图并行计算。Graph
Flume：大规模日志收集与数据传输的利器傲雪凌霜，松柏长青后端大数据 flume 大数据
Flume：大规模日志收集与数据传输的利器在大数据时代，随着各类应用的不断增长，产生了海量的日志和数据。这些数据不仅对业务的健康监控至关重要，还可以通过深入分析，帮助企业做出更好的决策。那么，如何高效地收集、传输和存储这些海量数据，成为了一项重要的挑战。今天我们将深入探讨ApacheFlume，它是如何帮助我们应对这些挑战的。一、Flume概述ApacheFlume是一个分布式、可靠、可扩展的日志
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
架构评审的自动化与人工智能: 如何提高效率光剑书架上的书架构自动化人工智能运维
1.背景介绍架构评审是软件开发过程中的一个关键环节，它旨在确保软件架构的质量、可维护性和可扩展性。传统的架构评审通常是由人工进行，需要大量的时间和精力。随着大数据技术和人工智能的发展，自动化和人工智能技术已经开始应用于架构评审，从而提高评审的效率和准确性。在本文中，我们将讨论如何通过自动化和人工智能技术来提高架构评审的效率。我们将从以下几个方面进行讨论：背景介绍核心概念与联系核心算法原理和具体操作
【数字化供应链】数字化供应链架构、全景管理、全流程贯通方案数字化建设方案数字化转型数据治理主数据数据仓库供应链数字仓储智慧物流智慧仓储物流园区架构微服务数据挖掘大数据人工智能
原文《数字化供应链架构、全景管理、全流程贯通方案》PPT格式。主要从供应链管理全景、智慧供应链建设总体目标、供应链总体业务流程、供应链总体功能架构、供应链总体技术架构、供应链全流程贯通、供应链全领域管理、供应链数据数据分析、供应链决策中台等进行建设。本文仅对主要内容进行介绍。来源网络公开渠道，旨在交流学习，如有侵权联系速删，更多参考公众号：优享智库基于先进IT技术、大数据能力、物联网应用、区块链平
80 鑫_259b
科普一个谈恋爱的方法。在以前，谈恋爱千难万难，就难在对对方不知底细，不知道对方希望自己是一个怎样的人，要耗费大量的时间去试探、再磨合，往往会因为一些小事一些细节，满盘皆输。在一个信息化的时代，在一个大数据近乎变成了流行语的时代，我们要跟上时代的步伐，通过大数据，去寻找异性最希望自己展现出来的形象是什么，才可以在爱情的道路上少走弯路。那这个大数据怎么操作呢？上街发问卷？问别人的择偶标准？一来会被打死
解锁企业潜能，Vatee万腾平台引领智能新纪元自媒体经济说其他
在数字化转型的浪潮中，企业正站在一个前所未有的十字路口，面对着前所未有的机遇与挑战。解锁企业内在潜能，实现跨越式发展，已成为众多企业的共同追求。而Vatee万腾平台，作为智能科技的先锋，正以其强大的智能赋能能力，引领企业步入一个全新的智能纪元。Vatee万腾平台，是一个集成了人工智能、大数据、云计算等前沿技术的综合性智能服务平台。它不仅仅是一个技术工具，更是企业转型升级的加速器，能够深入企业运营的
释放“AI+”新质生产力，深算院如何“把大数据变小”？ YashanDB YashanDB 国产数据库数据库数据库大数据
近期，南都·湾财社推出《新质·中国造》栏目，深入千行百业，遍访湾区企业，解锁湾区新质生产力，共探高质量发展之道。本期对话深圳计算科学研究院YashanDB首席技术官陈志标，探讨国产数据库如何实现创新突围，抢抓数字经济时代的新机遇。以下是专访内容：如何应对AI时代所面临的算力挑战？南都·湾财社：数据、算力和算法是发展人工智能的三要素，深算院做了怎样的前瞻性布局？陈志标：今年，政府工作报告中首次提及开
数字化智能工厂数字化供应链架构、全景管理、全流程贯通方案数字化建设方案智能制造数字工厂制造业数字化转型工业互联网架构
随着信息技术的飞速发展，数字化转型已成为制造企业提升竞争力的关键途径。数字化智能工厂通过集成先进的物联网(IoT)、大数据、云计算、人工智能(AI)等技术，实现了生产过程的智能化、供应链管理的精准化及决策的科学化。本方案旨在构建一套完善的数字化供应链架构，实现全景管理、全流程贯通、智慧化升级，以数据为驱动，强化技术支撑与安全管理体系，推动企业向智能制造迈进。一、数字化供应链架构1.**集成化平台构
日记——我的歌单静若小猴
又到一年一度大数据汇总的时候了，听歌已经成为很多人生活里的一种乐趣。春夏秋冬，我们都有自己喜欢的歌，歌词歌曲唱出沃尔玛你的心声。还记得大学时候最喜欢听的《春天里》，我有一天单曲回放了30遍，总觉得听着仿佛看到自己声音。还有的歌，初听不知曲中意，再听已经是曲终人，听着歌流泪，听着歌入睡……还记得那些年少的故事吗，总觉得自己才是故事外的人，却不是自己已经入歌。一段时间会喜欢一个人的音乐，一段时间会沉静
Linux dmesg命令：显示开机信息 fafadsj666 linux 数据库数据挖掘机器学习大数据
通过学习《Linux启动管理》一章可以知道，在系统启动过程中，内核还会进行一次系统检测（第一次是BIOS进行加测），但是检测的过程不是没有显示在屏幕上，就是会快速的在屏幕上一闪而过那么，如果开机时来不及查看相关信息，我们是否可以在开机后查看呢？答案是肯定的，使用dmesg命令就可以。无论是系统启动过程中，还是系统运行过程中，只要是内核产生的信息，都会被存储在系统缓冲区中，已经为大家精心准备了大数据
大数据新视界 --大数据大厂之揭秘大数据时代 Excel 魔法：大厂数据分析师进阶秘籍青云交大数据新视界 Excel 数据分析函数公式数据透视表图表功能规划求解数据分析工具库大数据新视界数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
大数据新视界 --大数据大厂之数据挖掘入门：用 R 语言开启数据宝藏的探索之旅青云交大数据新视界数据库大数据数据挖掘 R 语言算法案例未来趋势应用场景学习建议大数据新视界
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
高职人工智能训练师边缘计算实训室解决方案武汉唯众智创人工智能训练师边缘计算实训室人工智能训练师实训室边缘计算实训室
一、引言随着物联网（IoT）、大数据、人工智能（AI）等技术的飞速发展，计算需求日益复杂和多样化。传统的云计算模式虽在一定程度上满足了这些需求，但在处理海量数据、保障实时性与安全性、提升计算效率等方面仍面临诸多挑战。在此背景下，边缘计算作为一种新兴的计算模式应运而生，通过将计算能力推向数据生成或用户所在的网络边缘，显著降低了数据传输的延迟，提升了处理效率，并增强了数据安全性。针对高等职业院校的人工
python基于django/flask的NBA球员大数据分析与可视化python+java+node.js QQ_511008285 python django flask java spring boot 数据分析
前端开发框架:vue.js数据库mysql版本不限后端语言框架支持：1java(SSM/springboot)-idea/eclipse2.Nodejs+Vue.js-vscode3.python(flask/django)--pycharm/vscode4.php(thinkphp/laravel)-hbuilderx数据库工具：Navicat/SQLyog等都可以本文针对NBA球员的大数据进行
Java基于spring boot的国产电影数据分析与可视化python+java+node.js QQ_511008285 java spring boot 数据分析 python django vue.js flask
前端开发框架:vue.js数据库mysql版本不限后端语言框架支持：1java(SSM/springboot)-idea/eclipse2.Nodejs+Vue.js-vscode3.python(flask/django)--pycharm/vscode4.php(thinkphp/laravel)-hbuilderx数据库工具：Navicat/SQLyog等都可以该系统使用进行大数据处理和
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI linux PHP android
╔-----------------------------------╗┆
zookeeper admin 笔记 braveCS zookeeper
Required Software 1) JDK>=1.6 2)推荐使用ensemble的ZooKeeper(至少3台)，并run on separate machines 3)在Yahoo!，zk配置在特定的RHEL boxes里，2个cpu，2G内存，80G硬盘数据和日志目录 1)数据目录里的文件是zk节点的持久化备份，包括快照和事务日
Spring配置多个连接池 easterfly spring
项目中需要同时连接多个数据库的时候，如何才能在需要用到哪个数据库就连接哪个数据库呢？ Spring中有关于dataSource的配置： <bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" &nb
Mysql 171815164 mysql
例如，你想myuser使用mypassword从任何主机连接到mysql服务器的话。 GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%'IDENTIFIED BY 'mypassword' WI TH GRANT OPTION; 如果你想允许用户myuser从ip为192.168.1.6的主机连接到mysql服务器，并使用mypassword作
CommonDAO（公共/基础DAO） g21121 DAO
好久没有更新博客了，最近一段时间工作比较忙，所以请见谅，无论你是爱看呢还是爱看呢还是爱看呢，总之或许对你有些帮助。 DAO(Data Access Object)是一个数据访问（顾名思义就是与数据库打交道）接口，DAO一般在业
直言有讳永夜-极光感悟随笔
1.转载地址:http://blog.csdn.net/jasonblog/article/details/10813313 精华: “直言有讳”是阿里巴巴提倡的一种观念，而我在此之前并没有很深刻的认识。为什么呢？就好比是读书时候做阅读理解，我喜欢我自己的解读，并不喜欢老师给的意思。在这里也是。我自己坚持的原则是互相尊重，我觉得阿里巴巴很多价值观其实是基本的做人
安装CentOS 7 和Win 7后，Win7 引导丢失随便小屋 centos
一般安装双系统的顺序是先装Win7，然后在安装CentOS，这样CentOS可以引导WIN 7启动。但安装CentOS7后，却找不到Win7 的引导，稍微修改一点东西即可。一、首先具有root 的权限。即进入Terminal后输入命令su，然后输入密码即可二、利用vim编辑器打开/boot/grub2/grub.cfg文件进行修改 v
Oracle备份与恢复案例 aijuans oracle
Oracle备份与恢复案例一. 理解什么是数据库恢复当我们使用一个数据库时，总希望数据库的内容是可靠的、正确的，但由于计算机系统的故障（硬件故障、软件故障、网络故障、进程故障和系统故障）影响数据库系统的操作，影响数据库中数据的正确性，甚至破坏数据库，使数据库中全部或部分数据丢失。因此当发生上述故障后，希望能重构这个完整的数据库，该处理称为数据库恢复。恢复过程大致可以分为复原(Restore)与
JavaEE开源快速开发平台G4Studio v5.0发布無為子
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V5.0版本已经正式发布。访问G4Studio网站 http://www.g4it.org 2013-04-06 发布G4Studio_V5.0版本功能新增 (1). 新增了调用Oracle存储过程返回游标，并将游标映射为Java List集合对象的标
Oracle显示根据高考分数模拟录取百合不是茶 PL/SQL编程 oracle例子模拟高考录取学习交流
题目要求: 1,创建student表和result表 2,pl/sql对学生的成绩数据进行处理 3,处理的逻辑是根据每门专业课的最低分线和总分的最低分数线自动的将录取和落选 1,创建student表,和result表学生信息表; create table student( student_id number primary key,--学生id
优秀的领导与差劲的领导 bijian1013 领导管理团队
责任优秀的领导：优秀的领导总是对他所负责的项目担负起责任。如果项目不幸失败了，那么他知道该受责备的人是他自己，并且敢于承认错误。差劲的领导：差劲的领导觉得这不是他的问题，因此他会想方设法证明是他的团队不行，或是将责任归咎于团队中他不喜欢的那几个成员身上。努力工作优秀的领导：团队领导应该是团队成员的榜样。至少，他应该与团队中的其他成员一样努力工作。这仅仅因为他
js函数在浏览器下的兼容 Bill_chen jquery 浏览器 IE DWR ext
做前端开发的工程师，少不了要用FF进行测试，纯js函数在不同浏览器下，名称也可能不同。对于IE6和FF，取得下一结点的函数就不尽相同： IE6：node.nextSibling,对于FF是不能识别的； FF：node.nextElementSibling,对于IE是不能识别的；兼容解决方式：var Div = node.nextSibl
【JVM四】老年代垃圾回收：吞吐量垃圾收集器(Throughput GC) bit1129 垃圾回收
吞吐量与用户线程暂停时间衡量垃圾回收算法优劣的指标有两个：吞吐量越高，则算法越好暂停时间越短，则算法越好首先说明吞吐量和暂停时间的含义。垃圾回收时，JVM会启动几个特定的GC线程来完成垃圾回收的任务，这些GC线程与应用的用户线程产生竞争关系，共同竞争处理器资源以及CPU的执行时间。GC线程不会对用户带来的任何价值，因此，好的GC应该占
J2EE监听器和过滤器基础白糖_ J2EE
Servlet程序由Servlet，Filter和Listener组成，其中监听器用来监听Servlet容器上下文。监听器通常分三类：基于Servlet上下文的ServletContex监听，基于会话的HttpSession监听和基于请求的ServletRequest监听。 ServletContex监听器 ServletContex又叫application
博弈AngularJS讲义(16) - 提供者 boyitech js AngularJS api Angular Provider
Angular框架提供了强大的依赖注入机制，这一切都是有注入器(injector)完成. 注入器会自动实例化服务组件和符合Angular API规则的特殊对象，例如控制器，指令，过滤器动画等。那注入器怎么知道如何去创建这些特殊的对象呢？ Angular提供了5种方式让注入器创建对象，其中最基础的方式就是提供者(provider), 其余四种方式(Value, Fac
java-写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 bylijinnan java
public class CommonSubSequence { /** * 题目：写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 * 写一个版本算法复杂度O(N^2)和一个O(N) 。 * * O(N^2)：对于a中的每个字符，遍历b中的每个字符，如果相同，则拷贝到新字符串中。 * O(
sqlserver 2000 无法验证产品密钥 Chen.H sql windows SQL Server Microsoft
在 Service Pack 4 (SP 4), 是运行 Microsoft Windows Server 2003、 Microsoft Windows Storage Server 2003 或 Microsoft Windows 2000 服务器上您尝试安装 Microsoft SQL Server 2000 通过卷许可协议 (VLA) 媒体。这样做, 收到以下错误信息CD KEY的 SQ
[新概念武器]气象战争 comsci
气象战争的发动者必须是拥有发射深空航天器能力的国家或者组织.... 原因如下: 地球上的气候变化和大气层中的云层涡旋场有密切的关系,而维持一个在大气层某个层次
oracle 中 rollup、cube、grouping 使用详解 daizj oracle grouping rollup cube
oracle 中 rollup、cube、grouping 使用详解 -- 使用oracle 样例表演示转自namesliu -- 使用oracle 的样列库，演示 rollup, cube, grouping 的用法与使用场景 --- ROLLUP ，为了理解分组的成员数量，我增加了分组的计数 COUNT(SAL)
技术资料汇总分享 Dead_knight 技术资料汇总分享
本人汇总的技术资料，分享出来，希望对大家有用。 http://pan.baidu.com/s/1jGr56uE 资料主要包含： Workflow->工作流相关理论、框架(OSWorkflow、JBPM、Activiti、fireflow...) Security->java安全相关资料(SSL、SSO、SpringSecurity、Shiro、JAAS...) Ser
初一下学期难记忆单词背诵第一课 dcj3sjt126com english word
could 能够 minute 分钟 Tuesday 星期二 February 二月 eighteenth 第十八 listen 听 careful 小心的，仔细的 short 短的 heavy 重的 empty 空的 certainly 当然 carry 携带；搬运 tape 磁带 basket 蓝子 bottle 瓶 juice 汁，果汁 head 头；头部
截取视图的图片, 然后分享出去 dcj3sjt126com OS Objective-C
OS 7 has a new method that allows you to draw a view hierarchy into the current graphics context. This can be used to get an UIImage very fast. I implemented a category method on UIView to get the vi
MySql重置密码 fanxiaolong MySql重置密码
方法一: 在my.ini的[mysqld]字段加入： skip-grant-tables 重启mysql服务，这时的mysql不需要密码即可登录数据库然后进入mysql mysql>use mysql; mysql>更新 user set password=password('新密码') WHERE User='root'; mysq
Ehcache（03）——Ehcache中储存缓存的方式 234390216 ehcache MemoryStore DiskStore 存储驱除策略
Ehcache中储存缓存的方式目录 1 堆内存（MemoryStore） 1.1 指定可用内存 1.2 驱除策略 1.3 元素过期 2 &nbs
spring mvc中的@propertysource jackyrong spring mvc
在spring mvc中，在配置文件中的东西，可以在java代码中通过注解进行读取了： @PropertySource 在spring 3.1中开始引入比如有配置文件 config.properties mongodb.url=1.2.3.4 mongodb.db=hello 则代码中 @PropertySource(&
重学单例模式 lanqiu17 单例 Singleton 模式
最近在重新学习设计模式，感觉对模式理解更加深刻。觉得有必要记下来。第一个学的就是单例模式，单例模式估计是最好理解的模式了。它的作用就是防止外部创建实例，保证只有一个实例。单例模式的常用实现方式有两种，就人们熟知的饱汉式与饥汉式，具体就不多说了。这里说下其他的实现方式静态内部类方式: package test.pattern.singleton.statics; publ
.NET开源核心运行时，且行且珍惜 netcome java .net 开源
背景 2014年11月12日，ASP.NET之父、微软云计算与企业级产品工程部执行副总裁Scott Guthrie，在Connect全球开发者在线会议上宣布，微软将开源全部.NET核心运行时，并将.NET 扩展为可在 Linux 和 Mac OS 平台上运行。.NET核心运行时将基于MIT开源许可协议发布，其中将包括执行.NET代码所需的一切项目——CLR、JIT编译器、垃圾收集器（GC）和核心
使用oscahe缓存技术减少与数据库的频繁交互 Everyday都不同 Web 高并发 oscahe缓存
此前一直不知道缓存的具体实现，只知道是把数据存储在内存中，以便下次直接从内存中读取。对于缓存的使用也没有概念，觉得缓存技术是一个比较”神秘陌生“的领域。但最近要用到缓存技术，发现还是很有必要一探究竟的。缓存技术使用背景：一般来说，对于web项目，如果我们要什么数据直接jdbc查库好了，但是在遇到高并发的情形下，不可能每一次都是去查数据库，因为这样在高并发的情形下显得不太合理——
Spring+Mybatis 手动控制事务 toknowme mybatis
@Override public boolean testDelete(String jobCode) throws Exception { boolean flag = false; &nbs
菜鸟级的android程序员面试时候需要掌握的知识点 xp9802 android
熟悉Android开发架构和API调用掌握APP适应不同型号手机屏幕开发技巧熟悉Android下的数据存储熟练Android Debug Bridge Tool 熟练Eclipse/ADT及相关工具熟悉Android框架原理及Activity生命周期熟练进行Android UI布局熟练使用SQLite数据库；熟悉Android下网络通信机制，S