Local vector
Labeled point
Local matrix
Distributed matrix
RowMatrix
IndexedRowMatrix
CoordinateMatrix
BlockMatrix
MLlib supports local vectors and matrices stored on a single machine, as well as distributed matrices backed by one or more RDDs. Local vectors and local matrices are simple data models that serve as public interfaces. The underlying linear algebra operations are provided by Breeze and jblas. A training example used in supervised learning is called a “labeled point” in MLlib.
MLlib支持 在单独节点上本地化存储局部向量(local vectors) 和局部矩阵(local matrices),也可以依赖一个或更多的RDD来进行分布式的存储矩阵。局部向量和局部矩阵是简单的数据模型,被作为公共接口。底层的线性代数操作由 Breeze 和 jblas 提供。在MLlib中,一个使用监督式学习的例子被叫做“labeled point”。
A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. For example, a vector (1.0, 0.0, 3.0)
can be represented in dense format as [1.0, 0.0, 3.0]
or in sparse format as (3, [0, 2], [1.0, 3.0])
, where 3
is the size of the vector.
一个局部向量由一个从0开始的整数类型索引和一个double类型的值组成,被存储在一个单独的机器上。MLlib支持两种类型的局部向量:密集型和稀疏行。一个密集型依靠一个double型数组来代表他的entry值,而一个稀疏型向量依靠两个并行数组:索引数组和值数组。举个例子,一个向量(1.0,0.0,3.0)可以被表示为密集型格式:[1.0, 0.0, 3.0] 或者被表示为稀疏型格式:(3, [0,2], [1.0, 3.0]),元组的第一个值3是向量的数量。
Scala
The base class of local vectors is Vector
, and we provide two implementations: DenseVector
and SparseVector
. We recommend using the factory methods implemented in Vectors
to create local vectors.
局部向量的基本类型是Vector,我们提供了两种实现:DenseVector
and SparseVector
.
我们推荐使用 Vectors 已经实现了的
工厂方法来创建局部向量。
Refer to the Vector
Scala docs and Vectors
Scala docs for details on the API.
详细信息请参阅 Vector
Scala docs and Vectors
Scala docs API.
import org.apache.spark.mllib.linalg.{Vector, Vectors} // Create a dense vector (1.0, 0.0, 3.0). val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0))) //创建一个密集型局部向量(density) val dv = Vectors.dense(Array(1.0,0.0,3.0)) val densityVector = Vectors.dense(1.0,0.0,3.0) //创建一个稀疏型局部向量(sparse),两种方式: //一:使用并行数组:格式-> (size,index[Int],values[Double]) val sv1 = Vectors.sparse(3,Array(0,2),Array(1.0,3.0)) //二:使用Seq:格式-> (size,Seq((index,values)+)) val sv2 = Vectors.sparse(3,Seq((0,1.0),(2,3.0))) println(dv) println(densityVector) println(sv1) println(sv2) println(sv3) result: [1.0,0.0,3.0] [1.0,0.0,3.0] (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0])
Note: Scala imports scala.collection.immutable.Vector
by default, so you have to importorg.apache.spark.mllib.linalg.Vector
explicitly to use MLlib’s Vector
.