这次的作业叫做Object-Oriented Sets。要完成一个完整的类,实现取最大值、排序等方法。由于是函数式编程,这些的实现方法和以往我知道的完全不一样。
总结
TweetSet有两个子类,Empty和NonEmpty,使用BFT实现。有一点比较惊奇的是父类的方法可以创建一个子类的对象,以往并没有认识到这一点。
在方法的实现上,filter和mostRetweeted都需要使用辅助方法(acc)来做递归。
类似于contains的实现方法,很巧妙的对类进行了遍历。
有的方法需要在两个子类里面以不同的方法实现。对于把一个类分成父类和两个子类这一点,我的感觉是为了最大程度的把子类的公共部分抽象出来,杀鸡用了牛刀。
我的实现里面,mostRetweeted调用了mostAcc,同时传递了一个新建的Tweet。为了更优美应该分别在两个子类里面实现。
程序优化很少,感觉运行效率很低,很奇怪有些公司用scala来运营网站等。
我的程序
package objsets
import common._
import TweetReader._
/**
* A class to represent tweets.
*/
class Tweet(val user: String, val text: String, val retweets: Int) {
override def toString: String =
"User: " + user + "\n" +
"Text: " + text + " [" + retweets + "]"
}
/**
* This represents a set of objects of type `Tweet` in the form of a binary search
* tree. Every branch in the tree has two children (two `TweetSet`s). There is an
* invariant which always holds: for every branch `b`, all elements in the left
* subtree are smaller than the tweet at `b`. The eleemnts in the right subtree are
* larger.
*
* Note that the above structure requires us to be able to compare two tweets (we
* need to be able to say which of two tweets is larger, or if they are equal). In
* this implementation, the equality / order of tweets is based on the tweet's text
* (see `def incl`). Hence, a `TweetSet` could not contain two tweets with the same
* text from different users.
*
*
* The advantage of representing sets as binary search trees is that the elements
* of the set can be found quickly. If you want to learn more you can take a look
* at the Wikipedia page [1], but this is not necessary in order to solve this
* assignment.
*
* [1] http://en.wikipedia.org/wiki/Binary_search_tree
*/
abstract class TweetSet {
/**
* This method takes a predicate and returns a subset of all the elements
* in the original set for which the predicate is true.
*
* Question: Can we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def filter(p: Tweet => Boolean): TweetSet = filterAcc(p,new Empty())
/**
* This is a helper method for `filter` that propagetes the accumulated tweets.
*/
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet
/**
* Returns a new `TweetSet` that is the union of `TweetSet`s `this` and `that`.
*
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def union(that: TweetSet): TweetSet = filterAcc(x => this.contains(x),that.filterAcc(x=>that.contains(x),new Empty()))
/**
* Returns the tweet from this set which has the greatest retweet count.
*
* Calling `mostRetweeted` on an empty set should throw an exception of
* type `java.util.NoSuchElementException`.
*
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def mostRetweeted: Tweet=mostAcc(new Tweet("tmp","tmp",0))
def mostAcc(tmp:Tweet):Tweet
/**
* Returns a list containing all tweets of this set, sorted by retweet count
* in descending order. In other words, the head of the resulting list should
* have the highest retweet count.
*
* Hint: the method `remove` on TweetSet will be very useful.
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def descendingByRetweet: TweetList
/**
* The following methods are already implemented
*/
/**
* Returns a new `TweetSet` which contains all elements of this set, and the
* the new element `tweet` in case it does not already exist in this set.
*
* If `this.contains(tweet)`, the current set is returned.
*/
def incl(tweet: Tweet): TweetSet
/**
* Returns a new `TweetSet` which excludes `tweet`.
*/
def remove(tweet: Tweet): TweetSet
/**
* Tests if `tweet` exists in this `TweetSet`.
*/
def contains(tweet: Tweet): Boolean
/**
* This method takes a function and applies it to every element in the set.
*/
def foreach(f: Tweet => Unit): Unit
}
class Empty extends TweetSet {
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = acc
def mostAcc(tmp:Tweet):Tweet=tmp
def descendingByRetweet: TweetList = Nil
/**
* The following methods are already implemented
*/
def contains(tweet: Tweet): Boolean = false
def incl(tweet: Tweet): TweetSet = new NonEmpty(tweet, new Empty, new Empty)
def remove(tweet: Tweet): TweetSet = this
def foreach(f: Tweet => Unit): Unit = ()
}
class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
if (p(elem)) right.filterAcc(p,left.filterAcc(p,acc.incl(elem)))
else right.filterAcc(p,left.filterAcc(p,acc))
}
def mostAcc(tmp:Tweet):Tweet={
if (tmp.retweets>elem.retweets) right.mostAcc(left.mostAcc(tmp))
else right.mostAcc(left.mostAcc(elem))
}
def descendingByRetweet: TweetList = {
new Cons(this.mostRetweeted,this.remove(this.mostRetweeted).descendingByRetweet)
}
/**
* The following methods are already implemented
*/
def contains(x: Tweet): Boolean =
if (x.text < elem.text) left.contains(x)
else if (elem.text < x.text) right.contains(x)
else true
def incl(x: Tweet): TweetSet = {
if (x.text < elem.text) new NonEmpty(elem, left.incl(x), right)
else if (elem.text < x.text) new NonEmpty(elem, left, right.incl(x))
else this
}
def remove(tw: Tweet): TweetSet =
if (tw.text < elem.text) new NonEmpty(elem, left.remove(tw), right)
else if (elem.text < tw.text) new NonEmpty(elem, left, right.remove(tw))
else left.union(right)
def foreach(f: Tweet => Unit): Unit = {
f(elem)
left.foreach(f)
right.foreach(f)
}
}
trait TweetList {
def head: Tweet
def tail: TweetList
def isEmpty: Boolean
def foreach(f: Tweet => Unit): Unit =
if (!isEmpty) {
f(head)
tail.foreach(f)
}
}
object Nil extends TweetList {
def head = throw new java.util.NoSuchElementException("head of EmptyList")
def tail = throw new java.util.NoSuchElementException("tail of EmptyList")
def isEmpty = true
}
class Cons(val head: Tweet, val tail: TweetList) extends TweetList {
def isEmpty = false
}
object GoogleVsApple {
val google = List("android", "Android", "galaxy", "Galaxy", "nexus", "Nexus")
val apple = List("ios", "iOS", "iphone", "iPhone", "ipad", "iPad")
lazy val googleTweets: TweetSet = TweetReader.allTweets.filter(y=>google.exists( x => y.text.contains(x)))
lazy val appleTweets: TweetSet = TweetReader.allTweets.filter(y=>apple.exists( x => y.text.contains(x)))
/**
* A list of all tweets mentioning a keyword from either apple or google,
* sorted by the number of retweets.
*/
lazy val trending: TweetList = googleTweets.union(appleTweets).descendingByRetweet
}
object Main extends App {
// Print the trending tweets
GoogleVsApple.trending foreach println
}
题目要求
Download the objsets.zip handout archive file.
In this assignment you will work with an object-oriented representations based on binary trees.
Object-Oriented Sets
For this part, you will earn credit by completing the TweetSet.scala file. This file defines an abstract class TweetSet with two concrete subclasses, Empty which represents an empty set, and NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet), which represents a non-empty set as a binary tree rooted at elem. The tweets are indexed by their text bodies: the bodies of all tweets on the left are lexicographically smaller than elem and all bodies of elements on the right are lexicographically greater.
Note also that these classes are immutable: the set-theoretic operations do not modify this but should return a new set.
Before tackling this assignment, we suggest you first study the already implemented methods contains and incl for inspiration.
1 Filtering
Implement filtering on tweet sets. Complete the stubs for the methods filter and filterAcc. filter takes as argument a function, the predicate, which takes a tweet and returns a boolean. filter then returns the subset of all the tweets in the original set for which the predicate is true. For example, the following call:
tweets.filter(tweet => tweet.retweets > 10)
applied to a set tweets of two tweets, say, where the first tweet was not retweeted and the second tweet was retweeted 20 times should return a set containing only the second tweet.
Hint: start by defining the helper method filterAcc which takes an accumulator set as a second argument. This accumulator contains the ongoing result of the filtering.
/** This method takes a predicate and returns a subset of all the elements
* in the original set for which the predicate is true.
*/
def filter(p: Tweet => Boolean): TweetSet
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet
The definition of filter in terms of filterAcc should then be straightforward.
2 Taking Unions
Implement union on tweet sets. Complete the stub for the method union. The method union takes another set that, and computes a new set which is the union of this and that, i.e. a set that contains exactly the elements that are either in this or in that, or in both.
def union(that: TweetSet): TweetSet
Note that in this exercise it is your task to find out in which class(es) to define the union method (should it be abstract in class TweetSet?).
3 Sorting Tweets by Their Influence
The more often a tweet is “re-tweeted” (that is, repeated by a different user with or without additions), the more influential it is.
The goal of this part of the exercise is to add a method descendingByRetweet to TweetSet which should produce a linear sequence of tweets (as an instance of class TweetList), ordered by their number of retweets:
def descendingByRetweet: TweetList
This method reflects a common pattern when transforming data structures. While traversing one data structure (in this case, a TweetSet), we’re building a second data structure (here, an instance of class TweetList). The idea is to start with the empty list Nil (containing no tweets), and to find the tweet with the most retweets in the input TweetSet. This tweet is removed from the TweetSet (that is, we obtain a new TweetSet that has all the tweets of the original set except for the tweet that was “removed”; this immutable set operation, remove, is already implemented for you), and added to the result list by creating a new Cons. After that, the process repeats itself, but now we are searching through a TweetSet with one less tweet.
Hint: start by implementing the method mostRetweeted which returns the most popular tweet of a TweetSet.
4 Tying everything together
In the last step of this assignment your task is to detect influential tweets in a set of recent tweets. We are providing you with a TweetSet containing several hundred tweets from popular tech news sites in the past few days, located in the TweetReader object (file “TweetReader.scala”). TweetReader.allTweets returns an instance of TweetSet containing a set of all available tweets.
Furthermore, you are given two lists of keywords. The first list corresponds to keywords associated with Google and Android smartphones, while the second list corresponds to keywords associated with Apple and iOS devices. Your objective is to detect which platform has generated more interest or activity in the past few days.
As a first step, use the functionality you implemented in the first parts of this assignment to create two different TweetSets, googleTweets and appleTweets. The first TweetSet, googleTweets, should contain all tweets that mention (in their “text”) one of the keywords in the google list. The second TweetSet, appleTweets, should contain all tweets that mention one of the keyword in the apple list. Their signature is as follows:
lazy val googleTweets: TweetSet
lazy val appleTweets: TweetSet
Hint: use the exists method of List and contains method of class java.lang.String.
From the union of those two TweetSets, produce trending, an instance of class TweetList representing a sequence of tweets ordered by their number of retweets:
lazy val trending: TweetList