In this post, we will examine the architecture of the scala collection after we have introduced the API of the scala collection.
The priciple of the scala collection API is to implement the most operation "template" that can be flexible from individual base classes and implementation.
the Scala collection architecture is built around the builder pattern and traversals, here is the skeleton of the Builder type.
package scala.collection.generic class Builder [-Elem, +To] { def += (elem : Elem) : this.type def result() : To def clear() def mapResult(f : To => NewTo) : Builder[Elem, NewTo] = ... }you can use the builder to construct the collection, commonly you can do with mapResult on the Builder to get desired result (types).
import scala.collection.mutable._ // you commonly apply mapResult on the Builder to build results that you like val buf = new ArrayBuffer[Int] val bldr = buf mapResult (_.toArray)
A set of operation supported by the Scala collections are like filter, with the "same result type" principle wherever possible, a transformation method on a collection will yield a collection of the same type.
First let's see the TraversableLike trait,. With the "same result type" principle, by using the generic builders and traversals over collection in so-called implementation traits. these traits are named with Like suffix, e.g. Traversable has an implementation traits called TraversableLike .
It is common that the implementaion trait not only has a element type, but also parameters over the collection 's representation type.
so
trait TraversableLike[+Elem, +Repr] { .. }with map function, it only allows creations of new instances of the same collection types whereas an instance of the same collection type constructor, but possibly with a different elemen type. what is more, even the type constructor of a function like map might depend in non-trivial ways on the other argument types, here are examples.
import collection.immutable.BitSet // map operation may need the collectio type' constructor and new element type // and what is more , even the result type may depends on the may depend in non-trivial ways on teh other argument types, here is an example. // val bits = BitSet(1, 2, 3) // BitSet[Int] bits map(_* 2) // result type is BitSet[Int] bits map (_.toFloat) // result type is Set[Float]the problem with BitSet is not an isolated one, with the map and pattern matching, here is the example.
// BitSet is not an isolated one, here is more Map("a" -> 1, "b" -> 2) map { case (x, y) => (y, x) } // result in Map[Int, String] Map("a" -> 1, "b" -> 2) map { case (x, y) => y} // result in List[Int]This is how the SCala solve the problem with map overloading, not the simple form , but more systematic form of overloading that is provided by the implicit parameter.
// there seems to be some typo in the original script. // // while there are some variant - this is the most possible one // // B: the element type of collection to build. // That: the type of collection to build // This: the type the factory applies // Elem: the element of collection the factory applies def map[B, That] (p : Elem => B) (implicit bf : CanBuilderFrom[This, B, That]) : That = { val b = bf(this) for (x <- thius) b += f(x) b.result }here is the definition of the CanBuildFrom trait,
// the type definition of CanBuildFrom // From: type for which the builder factory applies. // Elem: the type of element to be built // To: The type of collection to build package scala.collection.generic trait CanBuildFrom[-From, -Elem, +To] { // creates a new builder def apply(from: From) : Builder [Elem, To] }and a more genral build type is
// a more general builder type CanBuilderFrom[Set[_], A, Set[A]] // Set[_] means an arbitrary SEt type.and an example of that is
// example of Static type val xs : Iterable[Int] = List(1, 2, 3) val ys = xs map(x => x * x )
with this, you map through an Iterable then you get an iterable, scala accomplish this by one more indirection, that is the apply method of CanBuildFrom is passed the source collection as argument. ANd most builder factories forward the call to the method genericBuilder factories of a collectoin, the genericBuilder method in turns calls the builder that belongs to the collection. it uses virtual dispatch to pick up the best dynamic type that corresponds to these contraint.
how is dynamic type implemented? build factories for generic traversables forward call to a method genericBuilder of a collection// , the genericBuilder method in turns calls the builder that belongs to the collection in which it defines.
First we define the Base, and A, T, G, U class that forms a DNA sequence. we will use that as the element of the new collections.
abstract class Base case Object A extends Base case Object T extends Base case Object G extends Base case Object U extends Base object Base { val fromInt : Int => Base = Array(A, T, G, U) val toInt: Base => Int = Map(A => 0, T => 1, G -> 2, U -> 3) }
We next define some a DNA strands of RNA, we make some compact representation , because there is only 4 types of bases, so theoretically we can only represent one base with only two bits.
import collection.IndexedSeqLike import collection.mutable.{Builder, ArrayBuilder} import collection.generic.CanBuildFrom final class RNA1 private (val groups : Array[Int], val length : Int) extends IndexedSeq[Base] { import RNA1._ def apply(idx : Int) : Base = { // you will need to define this appy(idx : Int) : Base = { .. } method otherwise it will complains RNA1 need to be abstract, since "method apply in trait // SeqLike of type (idx : Int) Base is not defined. " if (idx < 0 || length <= idx) throw new IndexOutOfBoundsException() Base.fromInt(groups(idx / N) >> (idx % N * S) & M) } } object RNA1 { // Number of bits necessary to represent group private val S = 2 // Number of groups that fit in an Int private val N = 32 / S // Bitmask to isolate a group private val M = (1 << S) - 1 def fromSeq(buf : Seq[Base]) : RNA1 = { val groups = new Array[Int]((buf.length + N - 1) / N); for (i <- 0 until buf.length) groups(i / N) |= Base.toInt(buf(i)) << (i % N * S) new RNA1(groups, buf.length) } def apply(bases : Base*) = fromSeq(bases) }First the apply defined in the Class RNA is an indexer that gives you the Base at index n, and the Apply method at the Object RNA defines how to convert from a sequence of Base to a RNA strand.
with the RNA1 definition, we can have the following.
// an examples that uses the RNA1 val xs = List(A, G, T, U) RNA1.fromSeq(xs) val rna1 = RNA1(A, U, G, G, T) rna1.length rna1.last rna1.take(3) // the take method will return Vector(A, U, G)as you can see from the last setence, that the return value of a Take method which returns a Vector of bases, why Vectors, you may wonder, the reason for that is because the default implementation of IndexedSeq, which is returned because RNA1 is inherited.
You can do something like this to get the take method to return RNA1 instead, but there are tens if not hundreds of methods (drop, take..) you cannot do all of them..
// the reason, take from IndexedSeq has a take method that returns a IndexedSeq, and the default implementation of IndexedSeq // is Vector, for one, we can override the take method def take (count :Int) : RNA1 = RNA1.fromSeq(super.take(count))
what we can do now is like this:
import scala.collection.mutable.ArrayBuffer final class RNA2 private ( val groups : Array[Int], val length : Int ) extends IndexedSeq[Base] with IndexedSeqLike[Base, RNA2] { import RNA2._ override def newBuilder : Builder[Base, RNA2] = new ArrayBuffer[Base] mapResult fromSeq def apply(idx : Int) : Base = { if (idx < 0 || length <= idx) throw new IndexOutOfBoundsException() Base.fromInt(groups(idx / N) >> (idx % N * S) & M ) } } object RNA2 { // Number of bits necessary to represent group private val S = 2 // Number of groups that fit in an Int private val N = 32 / S // Bitmask to isolate a group private val M = (1 << S) - 1 def fromSeq(buf : Seq[Base]) : RNA2 = { val groups = new Array[Int]((buf.length + N - 1) / N); for (i <- 0 until buf.length) groups(i / N) |= Base.toInt(buf(i)) << (i % N * S) new RNA2(groups, buf.length) } def apply(bases : Base*) = fromSeq(bases) }as you can see key here is that the RNA2 class mix in the trait IndexedSeqLike trait, and inside its imeplementation, we override the newBuilder, and we what we do is to do an mapResult call on the ArrayBuffer.
now, we can see that the rna2.take(3) will yeild another RNA strand, not Vector any more.
what we wantted to have is the code like below.
// however, what we want is like "map on a RNA get another RNA". // "++ two RNA strands will yield another RNA strand", however, it does not. . val rna = RNA(A, U, G, G, T) rna map { case A => T case b => b } // change A to T, the rest remains rna ++ rnabut yet, this is not supported. see the code examples below.
The next key to support it is the CanBuildFrom, and we should have an implicit CanBuildFrom instance, remember what we have seen before,
def map[B, That] (p : Elem => B) (implicit bf : CanBuilderFrom[This, B, That]) : That = { val b = bf(this) for (x <- thius) b += f(x) b.result }so this is the new RNA code
import collection.IndexedSeqLike import collection.mutable.{Builder, ArrayBuffer} import collection.generic.CanBuildFrom // we fix the map function problem. final class RNA private (val groups : Array[Int], val length : Int) extends IndexedSeq[Base] with IndexedSeqLike[Base, RNA] { import RNA._ // Mandatory re-implementation of "newBuilder" in "IndexedSeq' override protected[this] def newBuilder : Builder[Base, RNA] = RNA.newBuilder // Mandatory implementatio of 'apply' in 'IndexedSeq' def apply(idx : Int) : Base = { // you will need to define this appy(idx : Int) : Base = { .. } method otherwise it will complains RNA1 need to be abstract, since "method apply in trait // SeqLike of type (idx : Int) Base is not defined. " if (idx < 0 || length <= idx) throw new IndexOutOfBoundsException() Base.fromInt(groups(idx / N) >> (idx % N * S) & M) } // Optional re-implementation of foreach // to make it more efficient override def foreach[U](f : Base => U ) : Unit = { var i = 0 var b = 0 while (i < length) { b = if (i % N == 0) groups(i / N) else b >>> S f(Base.fromInt(b & M)) i += 1 } } } object RNA { // Number of bits necessary to represent group private val S = 2 // Number of groups that fit in an Int private val N = 32 / S // Bitmask to isolate a group private val M = (1 << S) - 1 def fromSeq(buf : Seq[Base]) : RNA = { val groups = new Array[Int]((buf.length + N - 1) / N); for (i <- 0 until buf.length) groups(i / N) |= Base.toInt(buf(i)) << (i % N * S) new RNA(groups, buf.length) } def apply(bases : Base*) = fromSeq(bases) def newBuilder : Builder[Base, RNA] = new ArrayBuffer mapResult fromSeq implicit def canBuildFrom: CanBuildFrom[RNA, Base, RNA] = new CanBuildFrom[RNA, Base, RNA] { def apply() : Builder[Base, RNA] = newBuilder // you have to define the apply method as such, otherwise, "object creaton impossible, since method apply in trait canBuildFrom of type() Builder[Base, RNA] not defined" def apply(from : RNA) : Builder[Base, RNA] = newBuilder } }NOw, with that , we can finally do what we design to do
// now we test again the code val rna = RNA(A, U, G, G, T) rna map { case A => T case b => b } // no longer returns the Vector[Base]. rna ++ rna
We have seen the examples that use and integrate new collections, what about if we wantted to integrate some new Set and Maps.
the map that we will invent is called the Patricia trie, the patricia trie is so named as an abbreviation for "practical algorithm to retrieve information codeed in Alphanumeric".
what we want to have is something like this:
val m = PrefixMap("abc" -> 0, "abd" -> 1, "al" -> 2, "all" -> 3, "xy" -> 4) m:PrefixMap[Int] = Map((abc, 0), (abd, 1), (al, 2), (all, 3), (xy, 4)) m withPrefix "a" resl14: PrefixMap[Int] = Map((bc, 0), (bd, 1), (l, 2), (ll, 3))here is the class definition.
import collection._ class PrefixMap[T] extends mutable.Map[String, T] with mutable.MapLike[String, T, PrefixMap[T]] { var suffixes : immutable.Map[Char, PrefixMap[T]] = Map.empty var value : Option[T] = None def get(s : String) : Option[T] = if (s.isEmpty()) value else suffixes get (s(0)) flatMap (_.get(s substring 1)) def withPrefix(s : String) : PrefixMap[T] = { if (s.isEmpty()) this else { val leading = s(0) suffixes get leading match { case None => suffixes = suffixes + (leading -> empty) case _ => } suffixes( leading) withPrefix (s substring 1) } } override def update(s : String, elem : T) = withPrefix(s).value = Some(elem) override def remove(s : String) : Option[T] = if (s.isEmpty) { val prev = value; value = None; prev } else suffixes get (s(0)) flatMap (_.remove(s substring 1)) def iterator: Iterator[(String, T)] = (for (v <- value.iterator) yield ("", v)) ++ (for ((chr, m) <- suffixes.iterator; (s, v) <- m.iterator) yield (chr +: s, v)) def += (kv : (String, T)) : this.type = { update(kv._1, kv._2); this } def -= (s : String) : this.type = { remove(s); this} override def empty = new PrefixMap[T] }and here is the defition of PrefixMap object .
import scala.collection.mutable.{Builder, MapBuilder} import scala.collection.generic.CanBuildFrom object PrefixMap { def empty[T] = new PrefixMap[T] def apply[T](kvs : (String, T)*) : PrefixMap[T] = { val m : PrefixMap[T] = empty for (kv <- kvs) m += kv m } def newBuilder[T]: Builder[(String, T), PrefixMap[T]] = new MapBuilder [String, T, PrefixMap[T]](empty) implicit def canBuildFrom[T] : CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] = // CanBuildFrom applies on PrefixMap[_], and the element type is (String, T) - a tuple that each element is a key-value pair, // and the result type is also an PrefixMap[T] new CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] { def apply(from: PrefixMap[_]) = newBuilder[T] def apply() = newBuilder[T] } }now, the test code is as follow.
// now we can use it // use examples are as below. // val res_1 = PrefixMap("hello" -> 5, "hi" -> 2) PrefixMap.empty[String] res_1 map { case (k, v) => ( k + "!" , "x" * v) } // PrefixMap[String] = Map(hello!,xxx), (hi!,xx))
We have seen the two above examples, one is to integrate into a new collection and the other is to integrate a new Map/Set, it does not require a lot of coding thanks to the good design and architecture of the Scala collection API.
so in summary here is what we need to do for an good way of integrating new collections.
Summary