Classification(1)Find Phrases from String

Classification(1)Find Phrases from String

1. Find Import Phrase in All the Content
Start my Local Zeppelin
> bin/ start

Because My local Zeppelin is connecting to my virtual box yarn cluster. So I need to start my virtual box and ubuntu-master, ubuntu-dev1, ubuntu-dev2.

How to Load Jar

How to Connect to S3
val rdd = sc.textFile("s3n://sillycat/jobs.csv")

How to Add Customer Jar to Zeppelin
in the file
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/home/spark-seed-assembly-0.0.1.jar,/home/classifier-assembly-1.0.jar" Format will Help a lot
# Classification System #

### What is this repository for? ###

* NLP and classification

### How do I get set up? (TODO)###

* Summary of set up

Special Character in HTML

Really Nice Codes to Filter the Charactors

Get Phrases from One String
* Counts phrases using a sliding window.
* Example:
* In:  getPhrasesInTitle(Job("foo foo foo foo foo foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5)
* In:  getPhrasesInTitle(Job("foo foo foo foo foo foo bar foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5, foo bar -> 1, bar foo -> 1)
def getPhrasesInTitle(job: Job, numWordsInPhrase: Int) = {
    val phrases = job.title.split(" ").sliding(numWordsInPhrase).foldLeft(Map("" -> 0)) {
        (phraseCounts: Map[String, Int], phrase: Array[String]) =>
            phrase.size == numWordsInPhrase match {
                case true =>
                    val str = phrase.mkString(" ")
                    val count = phraseCounts.getOrElse(str, 0) + 1
                    phraseCounts + (str -> count)
                case false =>
    phrases - ""

One Map Operation
scala> val m1 = Map( ""->0, "s1" ->1)
val m2 = m1 - ""
m2: scala.collection.immutable.Map[String,Int] = Map(s1 -> 1)
val m3 = m2 - "s1"
m3: scala.collection.immutable.Map[String,Int] = Map()

Merge Map

Then merge the map by map1 |+| map2
How to add scalaz-core in your class path

Directly on Command
> wget
> scala -cp scalaz-core_2.10-7.1.3.jar
scala> import scalaz.Scalaz._
scala> val k1 = Map( "key"->1, "key22"->3)
k1: scala.collection.immutable.Map[String,Int] = Map(key -> 1, key22 -> 3)
scala> val k2 = Map( "key1"->11, "key122"->13)
k2: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13)
scala> val k3 = k1 |+| k2
k3: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13, key -> 1, key22 -> 3)

Or put the jar in one place and this will work
> scala -cp lib/*

The Whole Flow of Phrase Finding will be
item = “foo foo foo foo” —> Map(“foo foo” -> 4, “ok hello” -> 3) item => ).reduce(_ |+| _ )

Scala Skill Tip
1. How to use _
var className: ClassName = _
similar to
var className: ClassName = null

2. foldLeft/: and foldRight:\ and fold
val numbers = List(5,1,3,3)
numbers.fold(0) { (z, i) =>
This function will init the 0, use 0 and add one element in the list, the result will be 5, then the result will add another element in the list.

Another UseCase
class Foo(val name: String, val age: Int, val sex: Symbol)
object Foo {
     def apply(name:String, age:Int, sex: Symbol) = new Foo(name, age, sex)

val fooList = Foo(“Carl”, 33, ‘male) :: Foo(“Kiko”, 23, ‘female) :: Nil
val stringList = fooList.foldLeft(List[String]()) { (z, f) =>
     val title = match {
          case ‘male => “Mr."
          case ‘female => “Ms."
     z :+ s”$title ${}, ${f.age}"
}      //stringList(0) Mr. Carl, 33

folerLeft will begin from Left, folderRight will from Right, fold will be no order.

3. Iterator.Sliding
sliding[B>:A](size: Int, step: Int)   size of the window, step of the window
scala> (1 to 5).iterator.sliding(3).toList
res0: List[Seq[Int]] = List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))

scala> (1 to 5).iterator.sliding(4, 3).toList
res1: List[Seq[Int]] = List(List(1, 2, 3, 4), List(4, 5))

scala> (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
res2: List[Seq[Int]] = List(List(1, 2, 3, 4))

scala underscore
