By now you have probably grown accustomized to the concise way data can be decomopsed and anaylzed using pattern matching.
You might wish to be able to create your own kind of patterns. Extractors give you to do so. This chapter explains what extractors are and how you can use them to define patterns that are decoupled from an object representation.
We will as always start our analysis with the extracting examples. here is what we have: "An example : extracting email address".
we might need to define examples as follow.
// given that we want to extract the email address information from an email objet, we might ends up having the following // helper method. def isMail(s: String) : Boolean def domain(s :String) : String def user(s : String) : String
and to use them .
// with this functions, you would parse a given string s as follow. // if (isEmail (s)) println((user(s) + " AT " + domain(s)) else println("not an email address")and suppose that we have the Email that is matchable, so
// you saw already , that we can use the pattern matching to do the job // assume that you can pattern matching on the following. EMail(user, domain) // now that can be translated to the following. s match { case EMail(user, domain) => printn(user + " AT " + domain) case _ => println("not an email address") }and you can even use a more complicated examples.
// a more complicate prolbem wherer we finding two successive email address with the same user part would be translated to the following. pattern ss match { case EMail(ui, d1) :: EMail(u2, d2) :: _ if (u1 == u2) => ... }
object EMail { // the Injection method (optional) def apply(user : String, domain : String) = user + "@" + domain // the extraction method (mandatory) def unapply(str : String) :Option[(String, String)] = { val parts = str split "@" if (parts.length == 2) Some(parts(0), parts(1)) else None } }or you can even use the following code to define Email, which has the same effect ..
// or you can do the same to make the Email object // object Email extends ((String, String) => String) { // the Injection method (optional) def apply(user : String, domain : String) = user + "@" + domain // the extraction method (mandatory) def unapply(str : String) :Option[(String, String)] = { val parts = str split "@" if (parts.length == 2) Some(parts(0), parts(1)) else None } }with this you can do the following pattern matching.
// either way, you give a couple of string, and you get back a email string back // val x : Any x match { case EMail (user, domain) => ... }the general rule of Scala apply/unapply method is as follow , you unapply on a apply object, you get the same things back (the apply is called injection, and the unapply is calle extraction) .
EMail.unapply(EMail.apply(user, domain)) //should return Some(user, domain) EMail.unapply(obj) match { case Some(u, d) => EMail.apply(u, d) }
It is also possible of rthe extractor binds to one variable or none-variable at all? (how does a pattern matching bind to nothing??)
as always, we can use an examples. we include the EMail object definition as well.
object EMail { // the Injection method (optional) def apply(user : String, domain : String) = user + "@" + domain // the extraction method (mandatory) def unapply(str : String) :Option[(String, String)] = { val parts = str split "@" if (parts.length == 2) Some(parts(0), parts(1)) else None } } // the case whether a pattern binds just one variable is treated differently, however, there is no one-tuple in Scala, To return just one pattern element // the unapply method simply wraps elements itself in a Some. for Examples, the extractor object in listing. // following examples shows that the pattern binds to one variable // object Twice { def apply(s : String) : String = s + s def unapply(s : String) : Option[String] = { val length = s.length / 2 val half = s.substring(0, length) if (half == s.substring(length)) Some(half) else None } }as we said before that extractor pattern does not bind to any variable, in this case, the corresponding unapply method returns a boolean - true for success and false for failure. an example is as follow.
// for a extractor, you can bind to no variable, // like the below where you bind to nothing, but it return only true/false. // Only a unapply method is defined, there is no point making a apply method (why we need ?) object UpperCase { def unapply(s : String) : Boolean = s.toUpperCase == s }and an examples shows applying all previous defined extractors in its patterns matching code.
def userTwiceUpper(s : String) = s match { case EMail(Twice(x @ UpperCase()), domain) => "Match: " + x + " in domain " + domain case _ => "No match " }and examples shows how that works.
// the test code // userTwiceUpper("[email protected]") // success userTwiceUpper("[email protected]") // failed. userTwiceUpper("[email protected]") // this fail, because it is not a upper case
we can also binds multiple variables, variable length variables, let's see an example to show what we ant to do .
object EMail { // the Injection method (optional) def apply(user : String, domain : String) = user + "@" + domain // the extraction method (mandatory) def unapply(str : String) :Option[(String, String)] = { val parts = str split "@" if (parts.length == 2) Some(parts(0), parts(1)) else None } } // suppose that we want to match a domain name, dom match { case Domain("org", "acm") => println("acm.org") case Domain("com", "sun", "java") => println("java.sun.com") case Domain("net", _*) => println("a .net domain") // _* is the key }now, we define the Domain classes the key here is the unapplySeq method, as follow.
object Domain { // the injection method (optional) def apply(parts: String*) : String = parts.reverse.mkString(".") // the extraction method (mandatory) def unapplySeq(whole : String) : Option[Seq[String]] = Some(whole.split("\\.").reverse) }and we can use the following helper class and test code to test our code.
def isTomInDotCom(s : String) : Boolean = s match { case EMail ("tom" , Domain("com", _*)) => true case _ => false } // this gives you the expected results. isTomInDotCom("[email protected]") //true, isTomInDotCom("[email protected]") // false, isTomInDotCom("[email protected]") // falseand it is also possible to return some fixed elements from an unapplySeq together with the variable parts. here is the extractor code.
// you can also return some fixed elements from an unapplySeq together with the varable part. (this is a expressed by returnig all elements in tuple, where the variable parts comes last. ) // the key here is the unapplySeq method so that you can return a sequence of mathcing // object ExpandedEMail { // you have to // def unapply(email : String): Option[(String, Seq[String])] = { def unapplySeq(email : String): Option[(String, Seq[String])] = { val parts = email split "@" if (parts.length == 2) Some(parts(0), parts(1).split("\\.").reverse) else None } }the code to test it out is
// to test out the code ExpandedEMail // val s = "[email protected]" // val ExpandedEMail(name, topdom, subdomains @ _*) = s val name : String = "a" val topdom : String = "b" val subdoms : Seq[String] = Seq[String]("a", "b") val ExpandedEMail(name, topdom, subdoms @ _*) = s
We have seen thast we can do sequence pattern matching on list variables and otherss, how is that possible, here it is .
// examples of using sequence such as List() List(x, y, _*) Array(x, 0, 0, _) // how is the scala.list has the sequence pattern matching. // package scala object List { def apply[T] (elems : T*) = elems.toList def unapplySeq[T](x : List[T]) : Option[Seq[T]] = Some(x) }
the key here we are discussing is the implementation independence
Case class are more efficient because the scala compiler can optimize for the case classes, and the unapply/unapplySeq method will require additional parsing of the code
While the extractor are more flexible, and it has the trait of the representation independence, that is because case calss has to expose the internal of the represenation, as its constructor .
To use regular expression, you have to use the following imports code.
// to use the regular expression // import scala.util.matching.Regexto consruct a Regex object, here is what you do
val Decimal = new Regex("(-)?(\\d+)(\\.\\d*)?") // while you can use the """ quote to get rid of the annoying \\ escape thigns. // the """ is called the raw string val Decimal = new Regex("""(-)?(\d+)(\.\d*)?"""); // another way to write regular expression is as follow val Decimal = """(-)?(\d+)(\.\d*)?""".rand how is the .r method that convert a string to regexpressin?
// this is how the .r string is defined. package scala.runtime import scala.util.matching.Regex def StringOps(self : String) ... { ... def r = new Regex(self) }
// searching for regular expressions // regex findFirstIn str regex findAllIn str regex findPrefixOf strwith the regular expression, you can do the searching on the regular expression.
// e.g you could find the inp7ut sequence below and then search decimal numbers in it val Decimal = """(-)?(\d+)(\.\d*)?""".r val input = "for -1.0 to 99 by 3 " for (s <- Decimal findAllIn input) println(s) Decimal findPrefixOf input
// Extracting with regular expression s // regular expression has built-in unapply method // see the examples as follow val Decimal(sian, integerpart, decimalpart) = "-1.23" val Decimal(sign, integerpart, decimalpart) = "1.0" for (Decimal(s, i, d) <- Decimal findAllIn input) println("(sign : " + s + ", integer: " + i + ", decimal: " + d )
in this chapter, you saw you how to generize pattern matching with extractors. Extractors let you define your own kind of patterns. Which need not correspond to the type expression you select on. This gives you more flexibility in the kinds of patterns you can use for matching.