Scalaz(24)- 泛函数据结构: Tree-数据游览及维护

  上节我们讨论了Zipper-串形不可变集合(immutable sequential collection)游标,在串形集合中左右游走及元素维护操作。这篇我们谈谈Tree。在电子商务应用中对于xml,json等格式文件的处理要求非常之普遍,scalaz提供了Tree数据类型及相关的游览及操作函数能更方便高效的处理xml,json文件及系统目录这些树形结构数据的相关编程。scalaz Tree的定义非常简单:scalaz/Tree.scala

* A multi-way tree, also known as a rose tree. Also known as Cofree[Stream, A].
 */
sealed abstract class Tree[A] {

  import Tree._

  /** The label at the root of this tree. */
  def rootLabel: A

  /** The child nodes of this tree. */
  def subForest: Stream[Tree[A]]
...


Tree是由一个A值rootLabel及一个流中子树Stream[Tree[A]]组成。Tree可以只由一个A类型值rootLabel组成,这时流中子树subForest就是空的Stream.empty。只有rootLabel的Tree俗称叶(leaf),有subForest的称为节(node)。scalaz为任何类型提供了leaf和node的构建注入方法:syntax/TreeOps.scala

final class TreeOps[A](self: A) {
  def node(subForest: Tree[A]*): Tree[A] = Tree.node(self, subForest.toStream)

  def leaf: Tree[A] = Tree.leaf(self)
}

trait ToTreeOps {
  implicit def ToTreeOps[A](a: A) = new TreeOps(a)
}


实际上注入方法调用了Tree里的构建函数:

trait TreeFunctions {
  /** Construct a new Tree node. */
  def node[A](root: => A, forest: => Stream[Tree[A]]): Tree[A] = new Tree[A] {
    lazy val rootLabel = root
    lazy val subForest = forest

    override def toString = "<tree>"
  }

  /** Construct a tree node with no children. */
  def leaf[A](root: => A): Tree[A] = node(root, Stream.empty)

Tree提供了构建和模式拆分函数:

object Tree extends TreeInstances with TreeFunctions {
  /** Construct a tree node with no children. */
  def apply[A](root: => A): Tree[A] = leaf(root)

  object Node {
    def unapply[A](t: Tree[A]): Option[(A, Stream[Tree[A]])] = Some((t.rootLabel, t.subForest))
  }
}


我们可以直接构建Tree:

 Tree("ALeaf") === "ALeaf".leaf                  //> res5: Boolean = true
  val tree: Tree[Int] =
    1.node(
      11.leaf,
      12.node(
        121.leaf),
     2.node(
      21.leaf,
      22.leaf)
     )                                            //> tree  : scalaz.Tree[Int] = <tree>
  tree.drawTree                                   //> res6: String = "1
                                                  //| |
                                                  //| +- 11
                                                  //| |
                                                  //| +- 12
                                                  //| |  |
                                                  //| |  `- 121
                                                  //| |
                                                  //| `- 2
                                                  //|    |
                                                  //|    +- 21
                                                  //|    |
                                                  //|    `- 22
                                                  //| "


Tree实现了下面众多的接口函数:

sealed abstract class TreeInstances {
  implicit val treeInstance: Traverse1[Tree] with Monad[Tree] with Comonad[Tree] with Align[Tree] with Zip[Tree] = new Traverse1[Tree] with Monad[Tree] with Comonad[Tree] with Align[Tree] with Zip[Tree] {
    def point[A](a: => A): Tree[A] = Tree.leaf(a)
    def cobind[A, B](fa: Tree[A])(f: Tree[A] => B): Tree[B] = fa cobind f
    def copoint[A](p: Tree[A]): A = p.rootLabel
    override def map[A, B](fa: Tree[A])(f: A => B) = fa map f
    def bind[A, B](fa: Tree[A])(f: A => Tree[B]): Tree[B] = fa flatMap f
    def traverse1Impl[G[_]: Apply, A, B](fa: Tree[A])(f: A => G[B]): G[Tree[B]] = fa traverse1 f
    override def foldRight[A, B](fa: Tree[A], z: => B)(f: (A, => B) => B): B = fa.foldRight(z)(f)
    override def foldMapRight1[A, B](fa: Tree[A])(z: A => B)(f: (A, => B) => B) = (fa.flatten.reverse: @unchecked) match {
      case h #:: t => t.foldLeft(z(h))((b, a) => f(a, b))
    }
    override def foldLeft[A, B](fa: Tree[A], z: B)(f: (B, A) => B): B =
      fa.flatten.foldLeft(z)(f)
    override def foldMapLeft1[A, B](fa: Tree[A])(z: A => B)(f: (B, A) => B): B = fa.flatten match {
      case h #:: t => t.foldLeft(z(h))(f)
    }
    override def foldMap[A, B](fa: Tree[A])(f: A => B)(implicit F: Monoid[B]): B = fa foldMap f
    def alignWith[A, B, C](f: (\&/[A, B]) ⇒ C) = { 
      def align(ta: Tree[A], tb: Tree[B]): Tree[C] =
        Tree.node(f(\&/(ta.rootLabel, tb.rootLabel)), Align[Stream].alignWith[Tree[A], Tree[B], Tree[C]]({
          case \&/.This(sta) ⇒ sta map {a ⇒ f(\&/.This(a))}
          case \&/.That(stb) ⇒ stb map {b ⇒ f(\&/.That(b))}
          case \&/.Both(sta, stb) ⇒ align(sta, stb)
        })(ta.subForest, tb.subForest))
      align _
    }
    def zip[A, B](aa: => Tree[A], bb: => Tree[B]) = {
      val a = aa
      val b = bb
      Tree.node(
        (a.rootLabel, b.rootLabel),
        Zip[Stream].zipWith(a.subForest, b.subForest)(zip(_, _))
      )
    }
  }

  implicit def treeEqual[A](implicit A0: Equal[A]): Equal[Tree[A]] =
    new TreeEqual[A] { def A = A0 }

  implicit def treeOrder[A](implicit A0: Order[A]): Order[Tree[A]] =
    new Order[Tree[A]] with TreeEqual[A] {
      def A = A0
      import std.stream._
      override def order(x: Tree[A], y: Tree[A]) =
        A.order(x.rootLabel, y.rootLabel) match {
          case Ordering.EQ =>
            Order[Stream[Tree[A]]].order(x.subForest, y.subForest)
          case x => x
        }
    }


那么Tree就是个Monad,也是Functor,Applicative,还是traversable,foldable。Tree也实现了Order,Equal实例,可以进行值的顺序比较。我们就用些例子来说明吧:

// 是 Functor...
    (tree map { v: Int => v + 1 }) ===
    2.node(
      12.leaf,
      13.node(
        122.leaf),
     3.node(
      22.leaf,
      23.leaf)
     )                                            //> res7: Boolean = true

 // ...是 Monad
    1.point[Tree] === 1.leaf                      //> res8: Boolean = true
    val t2 = tree >>= (x => (x == 2) ? x.leaf | x.node((-x).leaf))
                                                  //> t2  : scalaz.Tree[Int] = <tree>
    t2 === 1.node((-1).leaf, 2.leaf, 3.node((-3).leaf, 4.node((-4).leaf)))
                                                  //> res9: Boolean = false
    t2.drawTree                                   //> res10: String = "1
                                                  //| |
                                                  //| +- -1
                                                  //| |
                                                  //| +- 11
                                                  //| |  |
                                                  //| |  `- -11
                                                  //| |
                                                  //| +- 12
                                                  //| |  |
                                                  //| |  +- -12
                                                  //| |  |
                                                  //| |  `- 121
                                                  //| |     |
                                                  //| |     `- -121
                                                  //| |
                                                  //| `- 2
                                                  //|    |
                                                  //|    +- 21
                                                  //|    |  |
                                                  //|    |  `- -21
                                                  //|    |
                                                  //|    `- 22
                                                  //|       |
                                                  //|       `- -22
                                                  //| "
 // ...是 Foldable
    tree.foldMap(_.toString) === "1111212122122"  //> res11: Boolean = true


说到构建Tree,偶然在网上发现了这么一个Tree构建函数:

  def pathTree[E](root: E, paths: Seq[Seq[E]]): Tree[E] = {
    root.node(paths groupBy (_.head) map {
      case (parent, subpaths) =>
        pathTree(parent, subpaths collect {
          case pp +: rest if rest.nonEmpty => rest
        })
    } toSeq: _*)
  }


据说这个pathTree函数能把List里的目录结构转化成Tree。先看看到底是不是具备如此功能:

  val paths = List(List("A","a1","a2"),List("B","b1"))
                                                  //> paths  : List[List[String]] = List(List(A, a1, a2), List(B, b1))
  pathTree("root",paths) drawTree                 //> res0: String = ""root"
                                                  //| |
                                                  //| +- "A"
                                                  //| |  |
                                                  //| |  `- "a1"
                                                  //| |     |
                                                  //| |     `- "a2"
                                                  //| |
                                                  //| `- "B"
                                                  //|    |
                                                  //|    `- "b1"
                                                  //| "
 val paths = List(List("A","a1","a2"),List("B","b1"),List("B","b2","b3"))
             //> paths  : List[List[String]] = List(List(A, a1, a2), List(B, b1), List(B, b2,
                                                  //|  b3))
  pathTree("root",paths) drawTree                 //> res0: String = ""root"
                                                  //| |
                                                  //| +- "A"
                                                  //| |  |
                                                  //| |  `- "a1"
                                                  //| |     |
                                                  //| |     `- "a2"
                                                  //| |
                                                  //| `- "B"
                                                  //|    |
                                                  //|    +- "b2"
                                                  //|    |  |
                                                  //|    |  `- "b3"
                                                  //|    |
                                                  //|    `- "b1"
                                                  //| "


果然能行,而且还能把"B"节点合并汇集。这个函数的作者简直就是个神人,起码是个算法和FP语法运用大师。我虽然还无法达到大师的程度能写出这样的泛函程序,但好奇心是挡不住的,总想了解这个函数是怎么运作的。可以用一些测试数据来逐步跟踪一下:

  val paths = List(List("A"))           //> paths  : List[List[String]] = List(List(A))
  val gpPaths =paths.groupBy(_.head)    //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A-> List(List(A)))
  List(List("A")) collect { case pp +: rest if rest.nonEmpty => rest }
                                                  //> res0: List[List[String]] = List()


通过上面的跟踪约化我们看到List(List(A))在pathTree里的执行过程。这里把复杂的groupBy和collect函数的用法和结果了解了。实际上整个过程相当于:

 "root".node(
       "A".node(List().toSeq: _*)
       ) drawTree                                 //> res3: String = ""root"
                                                  //| |
                                                  //| `- "A"
                                                  //| "


如果再增加一个点就相当于:

 "root".node(
     "A".node(List().toSeq: _*),
     "B".node(List().toSeq: _*)
     ) drawTree                                   //> res4: String = ""root"
                                                  //| |
                                                  //| +- "A"
                                                  //| |
                                                  //| `- "B"
                                                  //| "


加多一层:

  val paths = List(List("A","a1"))                //> paths  : List[List[String]] = List(List(A, a1))
  val gpPaths =paths.groupBy(_.head)              //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A
                                                  //|  -> List(List(A, a1)))
  List(List("A","a1")) collect { case pp +: rest if rest.nonEmpty => rest }
                                                  //> res0: List[List[String]] = List(List(a1))

//化解成
 "root".node(
       "A".node(
          "a1".node(
           List().toSeq: _*)
           )
       ) drawTree                                 //> res3: String = ""root"
                                                  //| |
                                                  //| `- "A"
                                                  //|    |
                                                  //|    `- "a1"
                                                  //| "


合并目录:

  val paths = List(List("A","a1"),List("A","a2")) //> paths  : List[List[String]] = List(List(A, a1), List(A, a2))
  val gpPaths =paths.groupBy(_.head)              //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A
                                                  //|  -> List(List(A, a1), List(A, a2)))
  List(List("A","a1"),List("A","a2")) collect { case pp +: rest if rest.nonEmpty => rest }
                                                  //> res0: List[List[String]] = List(List(a1), List(a2))

//相当产生结果
"root".node(
       "A".node(
          "a1".node(
           List().toSeq: _*)
           ,
          "a2".node(
           List().toSeq: _*)
           )
       ) drawTree                                 //> res3: String = ""root"
                                                  //| |
                                                  //| `- "A"
                                                  //|    |
                                                  //|    +- "a1"
                                                  //|    |
                                                  //|    `- "a2"
                                                  //| "


相信这些跟踪过程足够了解整个函数的工作原理了。
有了Tree构建方法后就需要Tree的游动和操作函数了。与串形集合的直线游动不同的是,树形集合游动方式是分岔的。所以Zipper不太适用于树形结构。scalaz特别提供了树形集合的定位游标TreeLoc,我们看看它的定义:scalaz/TreeLoc.scala

final case class TreeLoc[A](tree: Tree[A], lefts: TreeForest[A],
                            rights: TreeForest[A], parents: Parents[A]) {
...
trait TreeLocFunctions {
  type TreeForest[A] =
  Stream[Tree[A]]

  type Parent[A] =
  (TreeForest[A], A, TreeForest[A])

  type Parents[A] =
  Stream[Parent[A]]


树形集合游标TreeLoc由当前节点tree、左子树lefts、右子树rights及父树parents组成。lefts,rights,parents都是在流中的树形Stream[Tree[A]]。
用Tree.loc可以直接对目标树生成TreeLoc:

 /** A TreeLoc zipper of this tree, focused on the root node. */
  def loc: TreeLoc[A] = TreeLoc.loc(this, Stream.Empty, Stream.Empty, Stream.Empty)
 
 val tree: Tree[Int] =
    1.node(
      11.leaf,
      12.node(
        121.leaf),
     2.node(
      21.leaf,
      22.leaf)
     )                           //> tree  : scalaz.Tree[Int] = <tree>

  tree.loc                      //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream())


TreeLoc的游动函数:

  def root: TreeLoc[A] =
    parent match {
      case Some(z) => z.root
      case None    => this
    }

  /** Select the left sibling of the current node. */
  def left: Option[TreeLoc[A]] = lefts match {
    case t #:: ts     => Some(loc(t, ts, tree #:: rights, parents))
    case Stream.Empty => None
  }

  /** Select the right sibling of the current node. */
  def right: Option[TreeLoc[A]] = rights match {
    case t #:: ts     => Some(loc(t, tree #:: lefts, ts, parents))
    case Stream.Empty => None
  }

  /** Select the leftmost child of the current node. */
  def firstChild: Option[TreeLoc[A]] = tree.subForest match {
    case t #:: ts     => Some(loc(t, Stream.Empty, ts, downParents))
    case Stream.Empty => None
  }

  /** Select the rightmost child of the current node. */
  def lastChild: Option[TreeLoc[A]] = tree.subForest.reverse match {
    case t #:: ts     => Some(loc(t, ts, Stream.Empty, downParents))
    case Stream.Empty => None
  }

  /** Select the nth child of the current node. */
  def getChild(n: Int): Option[TreeLoc[A]] =
    for {lr <- splitChildren(Stream.Empty, tree.subForest, n)
         ls = lr._1
    } yield loc(ls.head, ls.tail, lr._2, downParents)

我们试着用这些函数游动:

 val tree: Tree[Int] =
    1.node(
      11.leaf,
      12.node(
        121.leaf),
     2.node(
      21.leaf,
      22.leaf)
     )                                            //> tree  : scalaz.Tree[Int] = <tree>
  tree.loc                                        //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream())
  val l = for {
   l1 <- tree.loc.some
   l2 <- l1.firstChild
   l3 <- l1.lastChild
   l4 <- l3.firstChild
   } yield (l1,l2,l3,l4)                          //> l  : Option[(scalaz.TreeLoc[Int], scalaz.TreeLoc[Int], scalaz.TreeLoc[Int],
                                                  //|  scalaz.TreeLoc[Int])] = Some((TreeLoc(<tree>,Stream(),Stream(),Stream()),T
                                                  //| reeLoc(<tree>,Stream(),Stream(<tree>, <tree>),Stream((Stream(),1,Stream()),
                                                  //|  ?)),TreeLoc(<tree>,Stream(<tree>, <tree>),Stream(),Stream((Stream(),1,Stre
                                                  //| am()), ?)),TreeLoc(<tree>,Stream(),Stream(<tree>, ?),Stream((Stream(<tree>,
                                                  //|  <tree>),2,Stream()), ?))))
  
  l.get._1.getLabel                               //> res8: Int = 1
  l.get._2.getLabel                               //> res9: Int = 11
  l.get._3.getLabel                               //> res10: Int = 2
  l.get._4.getLabel                               //> res11: Int = 21


跳动函数:

  /** Select the nth child of the current node. */
  def getChild(n: Int): Option[TreeLoc[A]] =
    for {lr <- splitChildren(Stream.Empty, tree.subForest, n)
         ls = lr._1
    } yield loc(ls.head, ls.tail, lr._2, downParents)

  /** Select the first immediate child of the current node that satisfies the given predicate. */
  def findChild(p: Tree[A] => Boolean): Option[TreeLoc[A]] = {
    @tailrec
    def split(acc: TreeForest[A], xs: TreeForest[A]): Option[(TreeForest[A], Tree[A], TreeForest[A])] =
      (acc, xs) match {
        case (acc, Stream.cons(x, xs)) => if (p(x)) Some((acc, x, xs)) else split(Stream.cons(x, acc), xs)
        case _                         => None
      }
    for (ltr <- split(Stream.Empty, tree.subForest)) yield loc(ltr._2, ltr._1, ltr._3, downParents)
  }

  /**Select the first descendant node of the current node that satisfies the given predicate. */
  def find(p: TreeLoc[A] => Boolean): Option[TreeLoc[A]] =
    Cobind[TreeLoc].cojoin(this).tree.flatten.find(p)


find用法示范:

  val tree: Tree[Int] =
    1.node(
      11.leaf,
      12.node(
        121.leaf),
     2.node(
      21.leaf,
      22.leaf)
     )                                            //> tree  : scalaz.Tree[Int] = <tree>
  tree.loc                                        //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream())
  val l = for {
   l1 <- tree.loc.some
   l2 <- l1.find{_.getLabel == 2}
   l3 <- l1.find{_.getLabel == 121}
   l4 <- l2.find{_.getLabel == 22}
   l5 <- l1.findChild{_.rootLabel == 12}
   l6 <- l1.findChild{_.rootLabel == 2}
  } yield l6                                      //> l  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),St
                                                  //| ream(),Stream((Stream(),1,Stream()), ?)))


注意:上面6个跳动都成功了。如果无法跳转结果会是None
insert,modify,delete这些操作函数:

  /** Replace the current node with the given one. */
  def setTree(t: Tree[A]): TreeLoc[A] = loc(t, lefts, rights, parents)

  /** Modify the current node with the given function. */
  def modifyTree(f: Tree[A] => Tree[A]): TreeLoc[A] = setTree(f(tree))

  /** Modify the label at the current node with the given function. */
  def modifyLabel(f: A => A): TreeLoc[A] = setLabel(f(getLabel))

  /** Get the label of the current node. */
  def getLabel: A = tree.rootLabel

  /** Set the label of the current node. */
  def setLabel(a: A): TreeLoc[A] = modifyTree((t: Tree[A]) => node(a, t.subForest))

  /** Insert the given node to the left of the current node and give it focus. */
  def insertLeft(t: Tree[A]): TreeLoc[A] = loc(t, lefts, Stream.cons(tree, rights), parents)

  /** Insert the given node to the right of the current node and give it focus. */
  def insertRight(t: Tree[A]): TreeLoc[A] = loc(t, Stream.cons(tree, lefts), rights, parents)

  /** Insert the given node as the first child of the current node and give it focus. */
  def insertDownFirst(t: Tree[A]): TreeLoc[A] = loc(t, Stream.Empty, tree.subForest, downParents)

  /** Insert the given node as the last child of the current node and give it focus. */
  def insertDownLast(t: Tree[A]): TreeLoc[A] = loc(t, tree.subForest.reverse, Stream.Empty, downParents)

  /** Insert the given node as the nth child of the current node and give it focus. */
  def insertDownAt(n: Int, t: Tree[A]): Option[TreeLoc[A]] =
    for (lr <- splitChildren(Stream.Empty, tree.subForest, n)) yield loc(t, lr._1, lr._2, downParents)

  /** Delete the current node and all its children. */
  def delete: Option[TreeLoc[A]] = rights match {
    case Stream.cons(t, ts) => Some(loc(t, lefts, ts, parents))
    case _                  => lefts match {
      case Stream.cons(t, ts) => Some(loc(t, ts, rights, parents))
      case _                  => for (loc1 <- parent) yield loc1.modifyTree((t: Tree[A]) => node(t.rootLabel, Stream.Empty))
    }
  }

用法示范:

  val tr = 1.leaf                                 //> tr  : scalaz.Tree[Int] = <tree>
  val tl = for {
    l1 <- tr.loc.some
    l3 <- l1.insertDownLast(12.leaf).some
    l4 <- l3.insertDownLast(121.leaf).some
    l5 <- l4.root.some
    l2 <- l5.insertDownFirst(11.leaf).some
    l6 <- l2.root.some
    l7 <- l6.find{_.getLabel == 12}
    l8 <- l7.setLabel(102).some
  } yield l8                                      //> tl  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),S
                                                  //| tream(),Stream((Stream(),1,Stream()), ?)))
  
  tl.get.toTree.drawTree                          //> res8: String = "1
                                                  //| |
                                                  //| +- 11
                                                  //| |
                                                  //| `- 102
                                                  //|    |
                                                  //|    `- 121
                                                  //| "
  

setTree和delete会替换当前节点下的所有子树:

  val tree: Tree[Int] =
    1.node(
      11.leaf,
      12.node(
        121.leaf),
     2.node(
      21.leaf,
      22.leaf)
     )                                            //> tree  : scalaz.Tree[Int] = <tree>
   def modTree(t: Tree[Int]): Tree[Int] = {
      val l = for {
        l1 <- t.loc.some
        l2 <- l1.find{_.getLabel == 22}
        l3 <- l2.setTree { 3.node (31.leaf) }.some
      } yield l3
      l.get.toTree
   }                                              //> modTree: (t: scalaz.Tree[Int])scalaz.Tree[Int]
   val l = for {
   l1 <- tree.loc.some
   l2 <- l1.find{_.getLabel == 2}
   l3 <- l2.modifyTree{modTree(_)}.some
   l4 <- l3.root.some
   l5 <- l4.find{_.getLabel == 12}
   l6 <- l5.delete
  } yield l6                                      //> l  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),St
                                                  //| ream(),Stream((Stream(),1,Stream()), ?)))
  l.get.toTree.drawTree                           //> res7: String = "1
                                                  //| |
                                                  //| +- 11
                                                  //| |
                                                  //| `- 2
                                                  //|    |
                                                  //|    +- 21
                                                  //|    |
                                                  //|    `- 3
                                                  //|       |
                                                  //|       `- 31
                                                  //| "


通过scalaz的Tree和TreeLoc数据结构,以及一整套树形结构游览、操作函数,我们可以方便有效地实现FP风格的不可变树形集合编程。











你可能感兴趣的:(编程,scala,functional,programmi,scalaz)