Scala - Extractors

By now you have probably grown accustomized to the concise way data can be decomopsed and anaylzed using pattern matching. 

You might wish to be able to create your own kind of patterns. Extractors give you to do so. This chapter explains what extractors are and how you can use them to define patterns that are decoupled from an object representation. 

We will as always start our analysis with the extracting examples. here is what we have: "An example : extracting email address". 

we might need to define examples as follow. 

// given that we want to extract the email address information from an email objet, we might ends up having the following 
// helper method. 

def isMail(s: String) : Boolean
def domain(s :String) : String
def user(s : String) : String

and to use them . 

// with this functions, you would parse a given string s as follow. 
//
if (isEmail (s)) println((user(s) + " AT " + domain(s))
else println("not an email address")
and suppose that we have the Email that is matchable, so 
// you saw already , that we can use the pattern matching to do the job 
// assume that you can pattern matching on the following. 

EMail(user, domain)


// now that can be translated to the following. 
s match {
  case EMail(user, domain) => printn(user + " AT " + domain)
  case _ => println("not an email address")
}
and you can even use a more complicated examples. 
// a more complicate prolbem wherer we finding two successive email address with the same user part would be translated to the following.  pattern 
ss match {
  case EMail(ui, d1) :: EMail(u2, d2) :: _ if (u1 == u2) => ...
}

Extractors 

object EMail { 
  // the Injection method (optional)
  def apply(user : String, domain : String) = user + "@" + domain
  
  // the extraction method (mandatory)
  def unapply(str : String) :Option[(String, String)] = { 
    val parts = str split "@"
    if (parts.length == 2) Some(parts(0), parts(1)) else None
  }
}
or you can even use the following code to define Email, which has the same effect .. 
// or you can do the same to make the  Email object
//

object Email extends ((String, String) => String) { 
  // the Injection method (optional)
  def apply(user : String, domain : String) = user + "@" + domain
  
  // the extraction method (mandatory)
  def unapply(str : String) :Option[(String, String)] = { 
    val parts = str split "@"
    if (parts.length == 2) Some(parts(0), parts(1)) else None
  }  
  
}
with this you can do the following pattern matching. 
// either way, you give a couple of string, and you get back a email string back 
//


val x : Any 

x match { case EMail (user, domain) => ... }
the general rule of Scala apply/unapply method is as follow , you unapply on a apply object, you get the same things back (the apply is called injection, and the unapply is calle extraction) .
EMail.unapply(EMail.apply(user, domain))


//should return 
Some(user, domain)

EMail.unapply(obj) match {
  case Some(u, d) => EMail.apply(u, d)
}

Patterns with Zero or one Variables

It is also possible of rthe extractor binds to one variable or none-variable at all? (how does a pattern matching bind to nothing??) 

as always, we can use an examples.  we include the EMail object definition as well. 

object EMail { 
  // the Injection method (optional)
  def apply(user : String, domain : String) = user + "@" + domain
  
  // the extraction method (mandatory)
  def unapply(str : String) :Option[(String, String)] = { 
    val parts = str split "@"
    if (parts.length == 2) Some(parts(0), parts(1)) else None
  }
}


// the case whether a pattern binds just one variable is treated differently, however, there is no one-tuple in Scala,  To return just one pattern element
// the unapply method simply wraps elements itself in a Some. for Examples, the extractor object in listing. 

// following examples shows that the pattern binds to one variable
//
object Twice {
  def apply(s : String) : String = s + s
  def unapply(s : String) : Option[String] = {
    val length = s.length / 2
    val half = s.substring(0, length)
    if (half == s.substring(length)) Some(half) else None
  }
}
as we said before that extractor pattern does not bind to any variable, in this case, the corresponding unapply method returns a boolean - true for success and false for failure.  an example is as follow. 
// for a extractor, you can bind to no variable,  
// like the below where you bind to nothing, but it return only true/false.  
// Only a unapply method is defined, there is no point making a apply method (why we need ?)
object UpperCase {
  def unapply(s : String) : Boolean = s.toUpperCase == s
}
and an examples shows applying all previous defined extractors in its patterns matching code. 
def userTwiceUpper(s : String) = s match {
  case EMail(Twice(x @ UpperCase()), domain) => 
    "Match: " + x + " in domain " + domain
  case _ =>
    "No match "
}
and examples shows how that works. 
// the test code
//

userTwiceUpper("[email protected]") // success

userTwiceUpper("[email protected]") // failed. 

userTwiceUpper("[email protected]") // this fail,  because it is not a upper case 

Variable Argument extractors

we can also binds multiple variables, variable length variables, let's see an example to show what we ant to do . 


object EMail { 
  // the Injection method (optional)
  def apply(user : String, domain : String) = user + "@" + domain
  
  // the extraction method (mandatory)
  def unapply(str : String) :Option[(String, String)] = { 
    val parts = str split "@"
    if (parts.length == 2) Some(parts(0), parts(1)) else None
  }
}


// suppose that we want to match a domain name, 
dom match { 
  case Domain("org", "acm") => println("acm.org")
  case Domain("com", "sun", "java") => println("java.sun.com")
  case Domain("net", _*) => println("a .net domain") // _* is the key 
}
now, we define the Domain classes the key here is the unapplySeq method, as follow. 



object Domain { 
  // the injection method (optional) 
  def apply(parts: String*) : String = parts.reverse.mkString(".")
  
  // the extraction method (mandatory)
  def unapplySeq(whole : String) : Option[Seq[String]] = Some(whole.split("\\.").reverse)
}
and we can use the following helper class and test code to test our code. 



def isTomInDotCom(s : String) : Boolean = s match {
  case EMail ("tom" , Domain("com", _*)) => true
  case _ => false
}

// this gives you the expected results. 


isTomInDotCom("[email protected]") //true, 

isTomInDotCom("[email protected]") // false, 

isTomInDotCom("[email protected]") // false 
and it is also possible to return some fixed elements from an unapplySeq together with the variable parts. here is the extractor code. 



// you can also return some fixed elements from an unapplySeq together with the varable part. (this is a expressed by returnig all elements in tuple, where the variable parts comes last. ) 
// the key here is the unapplySeq method so that you can return a sequence of mathcing  
// 
object ExpandedEMail {
  // you have to 
  // def unapply(email : String): Option[(String, Seq[String])] = {
  def unapplySeq(email : String): Option[(String, Seq[String])] = { 
    val parts = email split "@"
    if (parts.length == 2) 
      Some(parts(0), parts(1).split("\\.").reverse)
    else 
      None
  }
}
the code to test it out is 



// to test out the code ExpandedEMail 
//
val s =  "[email protected]"

// val ExpandedEMail(name, topdom, subdomains @ _*) = s
  
val name : String = "a"
val topdom : String = "b"
val subdoms : Seq[String] = Seq[String]("a", "b")

val ExpandedEMail(name, topdom, subdoms @ _*) = s 

Extractors and sequence patterns. 


We have seen thast we can do sequence pattern matching on list variables and otherss, how is that possible, here it is .  


// examples of using sequence such as 

List()
List(x, y, _*)

Array(x, 0, 0, _)

// how is the scala.list has the sequence pattern matching. 
//

package scala 

object List { 
  def apply[T] (elems : T*) = elems.toList
  def unapplySeq[T](x : List[T]) : Option[Seq[T]] = Some(x)
}

Extractors versus cases classes.


the key here we are discussing is the implementation independence

Case class are more efficient because the scala compiler can optimize for the case classes, and the unapply/unapplySeq method will require additional parsing of the code 

While the extractor are more flexible, and it has the trait of the representation independence, that is because case calss has to expose the internal of the represenation, as its constructor .  

Regiar expression 

To use regular expression, you have to use the following imports code. 

// to use the regular expression 
//
import scala.util.matching.Regex
to consruct a Regex object, here is what you do 

val Decimal = new Regex("(-)?(\\d+)(\\.\\d*)?")

// while you can use the """ quote to get rid of the annoying \\ escape thigns. 
// the """ is called the raw string 
val Decimal = new Regex("""(-)?(\d+)(\.\d*)?""");

// another way to write regular expression is as follow 
val Decimal = """(-)?(\d+)(\.\d*)?""".r
and how is the .r method that convert a string to regexpressin? 

// this is how the .r string is defined. 
package scala.runtime
import scala.util.matching.Regex

def StringOps(self : String) ... {
  ...
  def r = new Regex(self)
}

Searching for regular expressions 

// searching for regular expressions
//

regex findFirstIn str


regex findAllIn str


regex findPrefixOf str
with the regular expression, you can do the searching on the regular expression. 

// e.g you could find the inp7ut sequence below and then search decimal numbers in it 


val Decimal = """(-)?(\d+)(\.\d*)?""".r

val  input = "for -1.0 to 99 by 3 "

  
for (s <- Decimal findAllIn input) 
  println(s)
  
Decimal findPrefixOf input   

Extracting with regular expressions. 

// Extracting with regular expression s
// regular expression has built-in unapply method 
// see the examples as follow 

val Decimal(sian, integerpart, decimalpart) = "-1.23"

  
val Decimal(sign, integerpart, decimalpart) = "1.0"
  
  
for (Decimal(s, i, d) <- Decimal findAllIn  input)
  println("(sign : " + s + ", integer: " + i + ", decimal: " + d )

Conclulsion 

in this chapter, you saw you how to generize pattern matching with extractors. Extractors let you define your own kind of patterns. Which need not correspond to the type expression you select on. This gives you more flexibility in the kinds of patterns you can use for matching. 

你可能感兴趣的:(scala)