Friday, September 12, 2008
Scala Syntax Primer
Scala runs on the JVM and can directly call and be called from Java, but source compatibility was not a goal. Scala has a lot of capabilities not in Java, and to help those new features work more nicely, there are a number of differences between Java and Scala syntax that can make reading Scala code a bit of a challenge for Java programmers when first encountering Scala. This primer attempts to explain those differences. It is aimed at Java programmers, so some details about syntax which are the same as Java are omitted.
This primer is not intended to be a complete tutorial on Scala, rather it is more of a syntax reference. For a much better introduction to the language, you should buy the book Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners.
Most of these syntax differences can be explained by two of Scala's major goals:
- Minimize verbosity.
- Support a functional style.
The logic behind some of these syntax rules may at first seem arbitrary, but the rules support each other surprisingly well. Hopefully by the time you finish this primer, you will have no trouble understanding code fragments like this one:
Scala is an integrated object/functional language. In the discussion below, the terms "method" and "function" are used interchangeably.
You can run the Scala interpreter and type in these examples to get a better feel for how these rules work.
Contents
- Basics
- Keywords
- Symbols and Literals
- Expressions
- Case and Patterns
- For Expressions
- Arrays
- Tuples
- Classes
- Types
- Function Definitions
- Function Calls
- Function Sugar
Basics
- Scala classes do not need to go into files that match the class name, as they do in Java. You can put any Scala class in any Scala file. The only time it makes a difference is when you have a
class
and anobject
of the same name and want them to be companions, in which case they must be in the same file. - Semicolons are optional. If you put one statement on each line, you don't need semicolons. If you want to put multiple statements on a line, you can do so by separating them with semicolons. There are specific rules about when statements can span lines, so sometimes you have to be a bit careful when doing so.
- Every value is an object on which methods can be called. For example,
123.toString()
is valid. - Scala includes implicit transformations that allow objects to be used in unexpected ways. If you see some source code where a method call is operating on an instance of a class which does not define that method, then probably the instance is being implicitly converted to a class on which that method is defined.
- Scala's "Uniform Access Principle" means that variables and functions without parameters are accessed in the same way. In particular, a variable definition using the
val
orvar
keyword can be converted to a function definition simply by replacing that keyword with thedef
keyword. The syntax at the calling sites does not change. - Scala includes direct support for XML. This code fragment assigns an instance of type
scala.xml.Elem
to val x:val x =You can mix Scala code in with the XML by putting it in braces. This code fragment produces the same resulting value as the above code:The Title val title = "The Title" val x ={ title } - Just about everything can be nested. Packages can be nested inside packages, classes can be nested inside classes, defs can be nested inside defs.
- As in Java, annotations are indicated by the
@
character.
Keywords
- There is no
static
keyword. Methods and variables that you would declare static in Java go into anobject
rather than aclass
in Scala. Objects are singletons. - Scala has no
break
orcontinue
statements. Fortunately, Scala's support of a functional programming style reduces the need for these. - Access modifiers such as
protected
andprivate
can include a scope in square brackets, such asprivate[this]
orprotected[MyPackage1.MyPackage2]
. The default access ispublic
. - The
val
keyword declares an immutable value (a val), similar to thefinal
keyword in Java. Thevar
keyword declares a mutable variable. - Multiple items can be imported with one
import
statement:import java.text.{DateFormat,SimpleDateFormat} - Imported symbols can be renamed to other names, which provides a means to work around the problem of importing two symbols of the same name. For example, if you want to import both
java.util.Date
andjava.sql.Date
and be able to use them both without having to type the whole qualified name each time, you could do this:import java.util.{Date=>UDate} import java.sql.{Date=>SDate} - If an import is renamed to
_
, that symbol will not be imported. This allows importing everything except a specified symbol:import java.util.{Date=>_,_} - The
abstract
keyword is only used for abstract classes and traits. To declare an abstract function (def), val, variable (var), or type, you omit the=
character and body of the item. - When overriding a method in a superclass, the
override
modifier must be specified. Overriding a method without using theoverride
modifier or using the modifier when not overriding a superclass method will result in compilation error. - Some other keywords:
lazy
,implicit
Symbols and Literals
- Scala allows multi-line strings quoted with triple-quotes:
val longString = """Line 1 Line 2 Line 3""";
- Symbol names can include almost any character. In particular, they can include all of the characters normally used as operators, such as *, +, ~ and :. The backslash character (\) is a valid symbol character, and in fact is used as a method name in the
scala.xml.Elem
class. Note that"abc\"def"
is a seven character String with a double quote in the middle, but ifabc
is an instance ofscala.xml.Elem
, thenabc\"def"
is a call passing a three character String to the backslash method, a method that accepts a String argument and returns an instance ofscala.xml.NodeSeq
. - The underscore character (_) is used as a wildcard character rather than asterisk (*), such as in an
import
statement or in acase
statement to represent a "don't care" value. This is because asterisk is a valid symbol character in Scala. - As in Java, by convention class names and object names start with an upper case letter, variable names start with a lower case letter.
- In one case, whether a symbol starts with an upper or lower case actually matters to the compiler: in a
case
statement, that difference is used to disambiguate between a constant value, such asPI
ifMath.PI
has been imported (starts with upper case) and a placeholder name being introduced (whose scope is then limited to the body of the case statement).
Expressions
- Any place an expression is expected, a block of expressions surrounded by braces can be used instead. The braces act as parentheses. For example, the expression
5 * { val a = 1; a + 2 }
is valid and yields the value 15. - A mentioned above, symbol names can include almost any character, such as * and +. This is useful for defining methods that will be used as operators (see the rules under Function Calls for functions with one argument).
- The precedence of operators is determined by their first character, and is hardwired to match the usual precedence. Thus if you create the operator methods
*^
and+^
, the*^
will have higher precedence. The one exception to this rule is that if the operator ends with an equal sign (=
) and is not one of the standard relational operators, then it will have the same precedence as simple assignment. - When used as binary operators, any symbol which ends with a colon (:) is right-associative; all other symbols are left associative. The controlling object for right-associative operators goes on the right side of the operator, with its one argument on the left side. However, the left hand argument is still evaluated before the right hand argument.
- The characters +, -, ! and ~ can be used as prefix operators in any class by defining the method
unary_+
,unary_-
etc. - Every statement is an expression whose value is the last expression within that statement that was evaluated. For example, there is no
?:
ternary operator in Scala. Instead, you use a standard if/then/else statement:val x = if (n>0) "positive" else "negative" - When used on the right hand side of a value or variable declaration, the underscore character means assign the default value. This is the same as not specifying a value in Java; in Scala, not specifying an initial value declares an abstract variable.
val n = 123 //specific initial value var x:Int //abstract variable var y:Int = _ //default initial value
Case and Patterns
- Like Java, Scala has a
case
statement to allow selecting one possible code path from many based on a value. A simpleswitch
statement in Java might look like this://Java code switch (n) { case 0: action0(); break; case 1: action1(); break; default: actionDflt(); break; }The equivalent code in Scala would be:n match { case 0 => action0() case 1 => action1() case _ => actionDflt() } - Instead of the
switch
keyword as in Java, Scala uses amatch
expression. Thematch
keyword comes after the value being matched, unlike the relative positions of theswitch
keyword and the value in Java. match
works on all types, not just ints. For example, you canmatch
on a String variable and havecase
statements each with a constant String value.- No
break
statement is required, and execution does not fall through to the next case. match
statements return values. The value of amatch
statement is the value of whichever branch was executed.- The underscore is used to indicate the default case.
- In addition to constants, a
case
expression can include patterns, which allow for more complex matching. Case matching is handled by extractors, which can be implemented by writing anunapply
method in an object. - A
case
pattern can include a variable declaration with a type, in which case the variable is defined with that type and set to the value of the matched data within the body of that case.n match { //assume n is of type Number case i:Int => //i is an Int here, like (int)n would be in Java case d:Double => //d is a Double here, like (double)n in Java case _ => //no values were defined in this case } - When matching a more complex expression, you can assign a variable name to an internal part of the pattern by writing the variable name and the
@
character before the pattern:case Foo(a,b @ Bar(_)) //b gets set to the part that matches Bar(_) - Case expressions can be followed by a pattern guard before the =>. The pattern guard is the keyword
if
followed by a boolean expression:x match { //assume n is of type Number case Foo(a,b) if a==b => //here only when Foo with a==b case Foo(a,b) => //here for all other Foo case _ => //here for all non-Foo } - Case expressions work for XML. In this example, the variable
b
gets set to whatever is inside thebody
element if there is only one element there:case { b } => //b is the contentsTo match multiple elements, use_*
to match any sequence:case { b @ _* } => //b is the contents - The
catch
block of a try/catch statement uses the same syntax as the body of amatch
statement:try { // code that might throw an exception goes here } catch { case e1:IllegalArgumentException => //e1 is IllegalArgumentException here case e2:ArrayOutOfBoundsException => //e2 is valid here case e3:RuntimeException => //any other RuntimeException comes here case _ => //all other exceptions come here } - A pattern can be used on the left hand side of an assignment. For example, the following code results in assigning the values 3 and 5 to the new variables x1 and y1:
case class Point(x:Int,y:Int) //defines a simple value class val Point(x1,y1) = Point(3,5)This example is contrived, but the same kind of assignment works when calling a method that returns a value which is an object.
- A pattern can be used on the left side of the
<-
operator in a generator in afor
expression.
For Expressions
For Expressions are also called For Comprehensions.
- A
for
expression consists of thefor
keyword, a sequence of specific kinds of elements separated by semicolons or newlines and surrounded by parentheses, and theyield
keyword followed by an expression:for ( n <- 0 to 6 ; e = n%2; if e==0 ) yield n*n - The elements inside the parentheses can be any of the following:
- A generator, such as
n <- 0 to 6
, which produces multiple values and assigns them to a new val (heren
). The new val name appears to the left of the<-
operator; to the right of that operator is a value which implements theforeach
method to generate a series of values. - A definition, such as
e = n%2
, which introduces a new value by performing the specified calculation. - A filter, such as
if e==0
, which filters out the values which do not satisfy that expression.
- A generator, such as
- The val name in a generator can instead be a pattern, similarly to how a pattern can be used in an assignment statement or a case expression. For example:
val list = List((1,2),(3,4),(5,6)) for ( (a,b) <- list) yield a+byields List(3, 7, 11).
- The elements can be placed inside of braces rather than parentheses and separated by newlines rather than semicolons:
for { n <- 0 to 6 e = n%2 if e==0 } yield n*n
- When multiple generators are specified, each generator is repeated for each value produced by the preceding generator. For example, the expression
for ( x <- 0 to 4 ; y <- 0 until 3) yield (x,y)produces a value starting with (0,0), (0,1), (0,2) and ending with (4,2).
- The type of the value produced by a
for
expression is the same as the type of the first generator. - As an alternative to using
yield
followed by an expression, you can omit theyield
keyword and use a block of code in place of a single expression. - A
for
statement can always be translated into a series offoreach
,filter
,map
andflatMap
method calls. In that sense, thefor
statement is syntactic sugar.
Arrays
- Array indexes are specified with parentheses rather than square brackets.
- Array access is implemented the same way as function access, using the
apply
method. - The code
arr(index)
is converted toarr.apply(index)
. - The code
arr(index) = newval
is converted toarr.update(index,newval)
. - Arrays are declared using the
Array
keyword and with the element type in square brackets, rather than using empty square brackets after a type as is done in Java. For example, an array with space for three Strings would be declared like this:val x = new Array[String](3) - A two dimensional 3 by 3 array of Strings would be declared like this:
val x = new Array[Array[String]](3, 3).
Tuples
- Scala has built in support for Tuples, from one element to 22 elements. A Tuple is a small ordered collection of objects, where each object can have a different type.
- The types for Tuples of various sizes are Tuple1 through Tuple22. These types have N type parameters, where N is the Tuple size. For example, a two element Tuple with an Int and a String has type
Tuple2[Int,String]
. - The
Pair
object allows that word to be used instead ofTuple2
for building and matching two element Tuples. - The
Triple
object allows that word to be used instead ofTuple3
for building and matching three element Tuples. - You can create a tuple by enclosing the object in parentheses and separating them by commas:
(1, 2, "foo")
is aTuple3[Int,Int,String]
. - You can create a Tuple2 (a Pair) by using the
->
operator, which works on any value:"a" -> 25
is the same as("a", 25)
. The following expression is true:("a",25)=="a"->25This is done by an implicit conversion fromAny
toPredef.ArrowAssoc
, which contains the->
method. - The elements of a Tuple can be accessed as member fields _1, _2, _3 etc.
- If an expression returns a Tuple, that can be assigned to a set of variables or vals. The following code assigns 5 to the new val
tens
and 8 to the new valones
.def div10(n:Int):Tuple2[Int,Int] = (n/10, n%10) val (tens, ones) = div10(58)This is a case of using a pattern on the left hand side of an assignment, as mentioned in the section on Cases and Patterns.
Classes
- The primary constructor for a class is coded in-line in the class definition, i.e. the constructor statements are not contained within a definition inside the class. The constructor parameters are declared immediately after the class name, and superclass arguments are placed after the name of the class being extended.
- Class parameters can be preceded by
val
to make them immutable instance values (vals), or byvar
to make them instance variables. - Class parameters can be preceded by an access modifier such as
private
orprotected
. By default, class parameters usingval
orvar
are public. - The primary constructor can be made private by adding the access modifier
private
before the parameter list. trait
is likeinterface
in Java, but can include implementation code. Classes in Scala don'timplement
traits, theyextend
them same as classes. If a class extends multiple traits, or extends a class plus traits, the keywordwith
is used rather than commas as in Java.- Case classes are defined by adding the
case
keyword before theclass
keyword. This automatically does the equivalent of the following:- Prepends
val
to all parameters, making them immutable instance values. - Creates
equals
andhashCode
methods so that instances of that class can safely be used in collections. - Creates a companion
object
of the same name with anapply
method with the same args as declared for the class, to allow creation of instances without using thenew
keyword, and with anunapply
method to allow the class name to be used as an extractor incase
statements.
- Prepends
- Anonymous classes can be defined without reference to an extending class, in which case they extend Object:
val x = new { def cat(a:String, b:String) = a+b }
- The type-parameterized
isInstanceOf
method is used to determine if an object is an instance of a specific class:if (x.isInstanceOf[Double]) ... - Similar to
isInstanceOf
, a value can be cast to a specific type by using the type-parameterizedasInstanceOf
method:if (x.isInstanceOf[Double]) { val d = x.asInstanceOf[Double] //operate on d } else { //not a double }However, the above construct is not typically used; instead, that functionality is implemented with acase
statement, which simultaneously tests for a type and sets a new value of that type:x match { case d:Double => //operate on d case _ => //not a double } - The
isInstanceOf
method can be used to test if an object matches a trait as well as a class. It can also be used to test an instance against a structural definition, which can be used to test if an instance implements a specific method:type HasAddActionListenerMethod = { def addActionListener(a:ActionListener) } uiElement match { case c:HasAddActionListenerMethod => c.addActionListener(new ActionListener() { override def actionPerformed(ev:ActionEvent) { //insert your actionPerformed code here } }) } - Class literals are written
classOf[MyClass]
as opposed toMyClass.class
as in Java.
Types
- All values in Scala are objects, so (except for compatibility with Java) there is no int/Integer or double/Double distinction. All integers are of type
Int
and all doubles are of typeDouble
. (In previous versions of Scala, either upper caseInt
or lower caseint
was accepted, but convention now is to use only the upper case version, and this may be enforced by the compiler in the future.) - Type specifications are written as
name:type
rather thantype name
as in Java. This is to allow the type to be omitted in many cases, since Scala does type inference. For example, writen:Int
rather thanint n
. - Types for generics are specified in square brackets
[T]
rather than in angle brackets
as in Java. Thus a generic type might be specified asF[A,B,C]
. - Scala supports covariant and contravariant type specifications at the definition site. These are declared with a leading + for covariant types and a leading - for contravariant types. Thus a function declaration
F[+A,-B]
means F is qualified by a covariant type A and a contravariant type B. - Types can be specified with upper and lower bounds. The expression
T<:U
means type U is an upper bound for T, whereasT>:U
means type U is a lower bound for type T. - Types can be specified with view bounds, which are similar to upper bounds: The expression
T<%U
means type U is a view bound for T, which allows for implicit conversion to T and can thus support more actual types. - A higher kinded type with two type qualifiers, such as
Pair[String,Int]
, can be written in infix notation by placing the higher kinded type name between its two type qualifiers, such asString Pair Int
. This makes more sense if the higher kinded type name happens to use operator characters such as +. Thus when you see a type such asQuantity[M + M2]
, as used in the Quantity class in this file, that is the same asQuantity[+[M,M2]]
, so look for a type called+
that takes two type qualifiers. - Existential types are supported with an expression like this:
T forSome { type T }where the contents of the braces is some type declaration. This is mainly used when interfacing to Java code that either has raw types or uses Java's
?
wildcard type. - The
type T
in an existential type specification can be replaced by a more complex expression:List[T] forSome { type T <: Component }In the above example, we are saying T is some type which is a subtype of Component. - The shorthand
List[_]is the same thing asList[T] forSome { type T }
- The shorthand
List[_ <: Component]is the same thing asList[T] forSome { type T <: Component }
- Type variables can be defined by using the
type
keyword. Similar to a typedef in C, the type variable simplifies code when a complicated type is used many times:type ALS = Array[List[String]] val a:ALS val b:ALSType variables can also be abstract, in which case they must eventually be defined by a subclass. - A trait may include code that accesses another trait, in which case the class that includes the first trait must also include the second trait. In order to make this work, the first trait must include a "self type" referencing the second trait. The self type declaration is the first line of the body, usually declaring a type for
this
, but optionally using a different name in place ofthis
:trait foo { //method for foo trait } trait bar { this : foo => //methods for bar trait, which can access foo methods } - Inner class types can be referenced using the outer and inner class names separated by a dot (.) as in Java, or using a pound sign (#). The dot syntax specifies a path-dependent type; the pound syntax specifies the generic inner class. For example, if you had this code:
class Outer { class Inner {} }then you would use
Outer#Inner
to refer generically to that inner class. If you had an instancex
of class Outer, you would refer to the specific class Inner in that instance by usingx.Inner
, which is a distinct type from theInner
class within any other instance ofOuter
, and a subtype of the genericOuter#Inner
class.
Function Definitions
- The return type of a function is written after the function's parameter list and preceded by a semicolon, similar to the type specification for a variable. For example, a function which would be declared in Java as
//Java code public String toString(StringBuffer buf)would be declared in Scala asdef toString(buf:StringBuffer):String
- Functions which do not return a value are declared as having the type
Unit
rather thanvoid
as in Java. If a function never returns (such as if it always throws a Throwable) the return type isNothing
. - A function with no parameters can be declared without parentheses, in which case it must be called with no parentheses. This provides support for the Uniform Access Principle, such that the caller does not know if the symbol is a variable or a function with no parameters.
- The function body is preceded by "=" if it returns a value (i.e. the return type is something other than Unit), but the return type and the "=" can be omitted when the type is Unit (i.e. it looks like a procedure as opposed to a function).
- Braces around the body are not required (if the body is a single expression); more precisely, the body of a function is just an expression, and any expression with multiple parts must be enclosed in braces (an expression with one part may optionally be enclosed in braces).
- Vararg parameters are declared by appending an asterisk to the argument, like this:
def printf(format:String, args:Any*):StringThe parameter gets turned into an array within the method, so in the above example the
args
parameter would have the typeArray[Any]
within the body of theprintf
function.
Function Calls
- When a class has an
apply
method,foo(bar)
(wherefoo
is an instance of that class) translates tofoo.apply(bar)
.- Likewise for an object. If you see
Foo(bar)
that is most likely a call to theapply
method ofobject Foo
. - As with any method, the
apply
method can be overloaded, with different versions having different signatures. - Functions are instances of a class (Function1, Function2, etc), so the same rule applies to any function object.
- A method named
unapply
in an object definition is also treated specially: it is invoked as an extractor when the object name is used in a case statement pattern.
- Likewise for an object. If you see
- Functions with zero or one argument can be called without the dot and parentheses.
- But any expression can have parentheses around it, so you can omit the dot and still use parentheses.
- And since you can use braces anywhere you can use parentheses, you can omit the dot and put in braces, which can contain multiple statements.
- Functions with no arguments can be called without the parentheses. For example, the
length()
function onString
can be invoked as"abc".length
rather than"abc".length()
. If the function is a Scala function defined without parentheses, then the function must be called without parentheses. - By convention, functions with no arguments that have side effects, such as println, are called with parentheses; those without side effects are called without parentheses.
Function Sugar
"Syntactic sugar" is added syntax to make certain constructs easier or more natural to specify. The step in which the compiler replaces these constructs by their more verbose equivalents is called "desugaring".
- Functions with one parameter (including anonymous functions) are instances of type
Function1[A,B]
, functions with two parameters are of typeFunction2[A,B,C]
, etc. The last type in the list of parameter types is the return value type, so there is always one more than the number N of parameters. A function with no parameters is an instance ofFunction0[A]
. The nameFunction
with no number is equivalent toFunction1
. (A,B)=>C
is shorthand ("syntactic sugar") forFunction2[A,B,C]
.- A
Function1[A,B]
can be written as(A)=>B
, or as justA=>B
. - A
Function0[A]
(i.e. a function with no parameters) can be written as()=>A
. This function can be called with or without parentheses (as mentioned in the Function Definition section). - A function with no parameter list can also be specified with no parentheses as
=>A
. This function must be called without parentheses. If you are declaring a variablex
of this type, the declaration looks likex: =>A
. This signature is often used for call-by-name parameters. - When passing an anonymous function (also called a function literal), you can use a shorthand in which you directly write the body of the function, using underscores where each of the function parameters is to go (as long as Scala has enough information to infer the type). For example, if you are folding a list to sum all the elements, you can write it the long way:
val list = List(1,2,3,4,5) list.foldLeft(0) { (a:Int, b:Int) => a+b }or, by taking advantage of Scala's type inference (and using the same value for
list
):list.foldLeft(0) { (a, b) => a+b }or, using underscores as in-line parameter placeholders:list.foldLeft(0) { _ + _ }or using the equivalent method/:
(which also does afoldLeft
):(0/:list)(_+_)In this last form, we are taking advantage of the following shorthands:- The
/:
operator is equivalent to thefoldLeft
method (the List class defines both methods). - The
foldLeft
method (and the equivalent/:
operator method) uses a curried parameter list, with the first parameter list having only one method. This allows us to take advantage of the next step. - Since the
foldLeft
method takes only one parameter (in the first parameter list), we can invoke it without the dot and parentheses. - Since the operator name ends with a colon, it is right-associative, so the
list
object goes on the right and the0
argument goes on the left. - The second parameter list contains only one item (the function to apply to the fold), and the function we are passing in has only one expression, so we can use parentheses rather than braces.
- Scala has enough information to infer the types of the two parameters in the function literal, so we do not need to specify the types of the parameters.
- We are only using each parameter in the literal once, so we can use the underscore shorthand and not have to declare the names of the parameters in the function literal.
- We can remove all the space without creating ambiguity.
- The
- If a function literal, as used in the above example, is a single method call that takes only one argument, then the method name alone may be specified. Under this rule, this:
args.foreach( (x:Any) => println(x) )becomes this (the other intermediate forms given above are also valid):args.foreach(println)
- Instead of using an underscore as a placeholder for an argument, if a function name is followed by a space and an underscore, the underscore is a placeholder for an entire argument list. This is a partially applied function.
- Another example, this one from Tony Morris's Introduction to High-level Programming with Scala:
def compose[A, B, C](f: B => C, g: A => B): A => C = (a: A) => f(g(a))
- This defines a
compose
function with three type parameters A, B and C. The regular function parameters are in parentheses. - The parameter
f
is a function that takes one argument of typeB
and returns a value of typeC
. The parameterg
is also a function. - The return type of the
compose
function, which appears after the colon that follows the parentheses, is a function that takes one argument of typeA
and returns a value of typeC
. - The body of the function appears after the
=
character, and is a function which has a parametera
of typeA
and returns the valuef(g(a))
. - Note that executing the
compose
function does NOT execute functionsf
andg
, but rather returns a function object which, when invoked, will executef(g)
on its argument. - This would thus be used as follows:
def plus1(n:Int):Int = n+1 def intToParenString(n:Int) = "("+n.toString+")" val plus1string = compose(intToParenString,plus1) val x = plus1string(10) //this executes plus1 and intToParenString, //sets x to the string (11)
- This defines a
Updated 2008-10-19: added classOf, added infix type notation.