Scala in a Nutshell

Scala in a Nutshell

Intro

Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It seamlessly integrates features of object-oriented and functional languages.

object-oriented functional statically typed extensible interoperates

Basics

@main def hello(): Unit = println("hello Scala!")

Variables

val a: Int = 2
val b: String = "hello"

var c: Int = 3
  • val - value variable works like final in Java, value cannot be changed after initialization
  • var - vary variable, value can be changed after definition/initialization

Expressions

Block

Control Structures

Unified Types

Intro

Scala in a Nutshell_第1张图片

Scala is a unique language in that it’s statically typed, but often feels flexible and dynamic. For instance, thanks to type inference you can write code like this without explicitly specifying the variable types:

val a = 1
val b = 2.0
val c = "Hi!"

Union types in Scala 3

def isTruthy(a: Boolean | Int | String): Boolean = ???
def dogCatOrWhatever(): Dog | Plant | Car | Sun = ???

Statically-typed programming languages offer a number of benefits, including:

  • Helping to provide strong IDE support
  • Eliminating many classes of potential errors at compile time
  • Assisting in refactoring
  • Providing strong documentation that cannot be outdated since it is type checked

Inferred Types

val x: Int = 1
val y: Double = 1

val a = 1
val b = List(1, 2, 3)
val m = Map(1 -> "one", 2 -> "two")

// don't need to declare the type when defining value binders
scala> val a = 1
val a: Int = 1

scala> val b = List(1, 2, 3)
val b: List[Int] = List(1, 2, 3)

scala> val m = Map(1 -> "one", 2 -> "two")
val m: Map[Int, String] = Map(1 -> one, 2 -> two)

Generics

// here we declare the type parameter A
//          v
class Stack[A]:
  private var elements: List[A] = Nil
  //                         ^
  //  Here we refer to the type parameter
  //          v
  def push(x: A): Unit =
    elements = elements.prepended(x)
  def peek: A = elements.head
  def pop(): A =
    val currentTop = peek
    elements = elements.tail
    currentTop

Intersection Types

Used on types, the & operator creates a so called intersection type. The type A & B represents values that are both of the type A and of the type B at the same time.

trait Resettable:
  def reset(): Unit

trait Growable[A]:
  def add(a: A): Unit

def f(x: Resettable & Growable[String]): Unit =
  x.reset()
  x.add("first")

x is a subtype of both Resettable and Growable.

& is commutative: A & B is the same type as B & A.

Union Types

Used on types, the | operator creates a so-called union type. The type A | B represents values that are either of the type A or of the type B.

case class Username(name: String)
case class Password(hash: Hash)

def help(id: Username | Password) =
  val user = id match
    case Username(name) => lookupName(name)
    case Password(hash) => lookupPassword(hash)
  // more code here ...

Inference of Union Types

The compiler assigns a union type to an expression only if such a type is explicitly given.

val name = Username("Eve")     // name: Username = Username(Eve)
val password = Password(123)   // password: Password = Password(123)

scala> val a = if true then name else password
val a: Object = Username(Eve)

scala> val b: Password | Username = if true then name else password
val b: Password | Username = Username(Eve)

Union types are duals of intersection types. And like & with intersection types, | is also commutative: A | B is the same type as B | A.

Variance

trait Item { def productNumber: String }
trait Buyable extends Item { def price: Int }
trait Book extends Buyable { def isbn: String }

// an example of an invariant type
trait Pipeline[T]:
  def process(t: T): T

// an example of a covariant type
trait Producer[+T]:
  def make: T

// an example of a contravariant type
trait Consumer[-T]:
  def take(t: T): Unit

In general there are three modes of variance:

  • invariant—the default, written like Pipeline[T]
  • covariant—annotated with a +, such as Producer[+T], likein Java
  • contravariant—annotated with a -, like in Consumer[-T], like in Java

When?

  • Producers are typically covariant, and mark their type parameter with +. This also holds for immutable collections (List, Vector).
  • Consumers are typically contravariant, and mark their type parameter with -.
  • Types that are both producers and consumers have to be invariant, and do not require any marking on their type parameter. Mutable collections like Array fall into this category.

Algebraic Data Types

Opaque Types

Structural Types

class Record(elems: (String, Any)*) extends Selectable:
  private val fields = elems.toMap
  def selectDynamic(name: String): Any = fields(name)

type Person = Record {
  val name: String
  val age: Int
}

val person = Record(
  "name" -> "Emma",
  "age" -> 42
).asInstanceOf[Person]

println(s"${person.name} is ${person.age} years old.")

The parent type Record in this example is a generic class that can represent arbitrary records in its elems argument. This argument is a sequence of pairs of labels of type String and values of type Any. When you create a Person as a Record you have to assert with a typecast that the record defines the right fields of the right types. Record itself is too weakly typed, so the compiler cannot know this without help from the user. In practice, the connection between a structural type and its underlying generic representation would most likely be done by a database layer, and therefore would not be a concern of the end user.

Record extends the marker trait scala.Selectable and defines a method selectDynamic, which maps a field name to its value. Selecting a structural type member is done by calling this method. The person.name and person.age selections are translated by the Scala compiler to:

person.selectDynamic("name").asInstanceOf[String]
person.selectDynamic("age").asInstanceOf[Int]

Besides selectDynamic, a Selectable class sometimes also defines a method applyDynamic. This can then be used to translate function calls of structural members. So, if a is an instance of Selectable, a structural call like a.f(b, c) translates to:

a.applyDynamic("f")(b, c)

Dependent Function Types

Other Types

  • Type lambdas
  • Match types
  • Existential types
  • Higher-kinded types
  • Singleton types
  • Refinement types
  • Kind polymorphism

Functions

HOF (High Order Functions)

a function that (a) takes other functions as input parameters or (b) returns a function as a result.

HOF are possible because functions are first-class values.

Methods

implicit (v2)

Extends existing class without inherits it. It is replaced by extension methods in Scala 3.

LogicalPlan.scala

class LogicalPlan (val name: String) {
}

ParserUtils.scala

object ParserUtils {
  implicit class EnhancedLogicalPlan(val plan: LogicalPlan) extends AnyVal {
    def optional(ctx: AnyRef)(f: => LogicalPlan): LogicalPlan = {
      if (ctx != null) {
        println(s"$ctx: ${plan.name}")
        f
      } else {
        plan
      }
    }
  }
}

test.scala

import ParserUtils._

object Hello {
    def main(args: Array[String]): Unit = {
        val plan = new LogicalPlan("logical plan")
    	plan.optional("hello") {
      		print("scala")
      		plan
    	}
    }
}

extension methods in Scala 3

case class Circle(x: Double, y: Double, radius: Double) {}

extension (c: Circle)
  def circumference: Double = c.radius * math.Pi * 2

object ExtensionMethodsTest {
  def main(args: Array[String]): Unit = {
    val c = new Circle(1.0, 2.0, 5.5)
    println(c.circumference)
  }
}

currying

def left[A, B](as: Seq[A], init: B)(op: (B, A) => B) = {
  var ans = init
  as.foreach(item => {
    ans = op(ans, item)
  })
  ans
}

@main def run() = {
  println("Hello, World!")
  println(left(Seq(1, 2, 3), 0)(_+_))
}
  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
    astBuilder.visitSingleStatement(parser.singleStatement()) match {
      case plan: LogicalPlan => plan
      case _ =>
        val position = Origin(None, None)
        throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)
    }
  }
  protected def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
    logDebug(s"Parsing command: $command")

    val lexer = new SqlBaseLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
    lexer.removeErrorListeners()
    lexer.addErrorListener(ParseErrorListener)

    val tokenStream = new CommonTokenStream(lexer)
    val parser = new SqlBaseParser(tokenStream)
    parser.addParseListener(PostProcessor)
    parser.addParseListener(UnclosedCommentProcessor(command, tokenStream))
    parser.removeErrorListeners()
    parser.addErrorListener(ParseErrorListener)
    parser.legacy_setops_precedence_enbled = conf.setOpsPrecedenceEnforced
    parser.legacy_exponent_literal_as_decimal_enabled = conf.exponentLiteralAsDecimalEnabled
    parser.SQL_standard_keyword_behavior = conf.ansiEnabled

    try {
      try {
        // first, try parsing with potentially faster SLL mode
        parser.getInterpreter.setPredictionMode(PredictionMode.SLL)
        toResult(parser)
      }
      catch {
        case e: ParseCancellationException =>
          // if we fail, parse with LL mode
          tokenStream.seek(0) // rewind input stream
          parser.reset()

          // Try Again.
          parser.getInterpreter.setPredictionMode(PredictionMode.LL)
          toResult(parser)
      }
    }
    catch {
      case e: ParseException if e.command.isDefined =>
        throw e
      case e: ParseException =>
        throw e.withCommand(command)
      case e: AnalysisException =>
        val position = Origin(e.line, e.startPosition)
        throw new ParseException(Option(command), e.message, position, position)
    }
  }

Traits

Classes

Case Classes

Singleton Objects

Collections

  • Sequences
  • Maps
  • Sets
    Scala in a Nutshell_第2张图片

Immutable

Scala in a Nutshell_第3张图片

Mutable

Scala in a Nutshell_第4张图片

Functional Programming

Definition

The essence of Scala is a fusion of functional and object-oriented programming in a typed settings:

  • Functions for logic
  • Objects for the modularity
import spark.implicits._
import java.io.File
new File("/data/projects/tpcds/data").listFiles.filter(_.isDirectory).map(_.listFiles.filter(_.isFile)).flatten.foreach(f => spark.read.options(Map("delimiter" -> "|")).csv(f"file://${f.getCanonicalPath}").write.options(Map("compression"->"SNAPPY")).parquet(f"file:///data/projects/tpcds/parquet/${f.getParentFile.getName}"))

Definition for Wikipedia

Functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that each return a value, rather than a sequence of imperative statements which change the state of the program.

In functional programming, functions are treated as first-class citizens, meaning that they can be bound to names (including local identifiers), passed as arguments, and returned from other functions, just as any other data type can. This allows programs to be written in a declarative and composable style, where small functions are combined in a modular manner.

It can also be helpful to know that experienced functional programmers have a strong desire to see their code as math, that combining pure functions together is like combining a series of algebraic equations.

The feeling that you’re writing math-like equations (expressions) is the driving desire that leads you to use only pure functions and immutable values, because that’s what you use in algebra and other forms of math.

immutable values

use List, Vector, Map, and Set. Use case class, whose constructor parameters are val by default.

pure functions

A pure function can be defined as:

  • A function f is pure if, given the same input x, it always returns the same output f(x)
  • The function’s output depends only on its input variables and it’s implementation
  • It only computes the output and does not modify the world around it (no side effects)

This implies:

  • It doesn’t modify its input parameters
  • It doesn’t mutate any hidden state
  • It doesn’t have any “back doors”: It doesn’t read data from the outside world (including the console, web services , databases, files, etc.), or write data to the outside world

Of course an application isn’t very useful if it can’t read or write to the outside world, so people make this recommendation:

Write the core of your application using pure functions, and then write an impure “wrapper” around that core to interact with the outside world.

Error handling

Use Option, Some and None

def makeInt(s: String): Option[Int] = 
	try
		Some(Integer.parseInt(s.trim))
     catch
		case e: Exception => None

Consume

  • match

    makeInt(x) match
      case Some(i) => println(i)
      case None => println("That didn’t work.")
    
  • for

    val y = for
      a <- makeInt(stringA)
      b <- makeInt(stringB)
      c <- makeInt(stringC)
    yield
      a + b + c
    

    if any of the 3 strings can’t be converted to an Int, y will be None

Using Option to replace null

class Address(
  var street1: String,
  var street2: Option[String],   // an optional value
  var city: String, 
  var state: String, 
  var zip: String
)

Alternatives

For example, a trio of classes known as Try/Success/Failure work in the same manner, but (a) you primarily use these classes when your code can throw exceptions, and (b) you want to use the Failure class because it gives you access to the exception message. For example, these Try classes are commonly used when writing methods that interact with files, databases, and internet services, as those functions can easily throw exceptions.

Concurrency

When you want to write parallel and concurrent applications in Scala, you can use the native Java Thread—but the Scala Future offers a more high level and idiomatic approach, so it’s preferred.

A Future represents a value which may or may not currently be available, but will be available at some point, or an exception if that value could not be made available.

你可能感兴趣的:(编程语言,大数据,scala,开发语言)