Pattern Matching in Scala



An Introduction to Pattern Matching in Scala

I'm Brian Clapper, an independent consultant, and instructor for ProTech. Today we’re going to talk about pattern matching. It’s a killer feature in Scala and one of the most interesting capabilities in the language. Those of you coming from a Java background might find this particularly interesting, because even with Java 8, there’s nothing like this in Java.

In Part 1, we begin by talking about something called apply and update. This will provide us with the necessary context to talk about some even cooler features (pattern matching and case classes) in Part 2.

The content in this tutorial is taken from ProTech's Scala Courseware, which is a collaboration with Pearson Education, Inc. The material is adapted from Cay S. Horstmann's introductory Scala book, Scala for the Impatient.

Part 1: The Apply and Update Methods for Scala Match

Scala lets you extend the function call syntax

f(arg1, arg2, ...)

to values other than functions. If f is not a function or method, then this expression is equivalent to the call

f.apply(arg1, arg2, ...)

unless it occurs to the left of an assignment. The expression

f(arg1, arg2, ...) = value

corresponds to the call

f.update(arg1, arg2, ..., value)

This mechanism is used in arrays and maps. For example,

val scores = new scala.collection.mutable.HashMap[String, Int]
scores("Bob") = 100 // Calls scores.update("Bob", 100)
val bobsScore = scores("Bob") // Calls scores.apply("Bob")

The apply method is also commonly used in companion objects to construct objects without calling new. For example, consider a Fraction class.

class Fraction(n: Int, d: Int) {
  ...
}

object Fraction {
  def apply(n: Int, d: Int) = new Fraction(n, d)
}

Because of the apply method, we can construct a fraction as Fraction(3, 4) instead of new Fraction(3, 4).

That sounds like a small thing, but if you have many Fraction values, it is a welcome improvement:

val result = Fraction(3, 4) * Fraction(2, 5)

Extractors

An extractor is an object with an unapply method. Think of the unapply method as being the opposite of the apply method of a companion object.

  • An apply method takes construction parameters and turns them into an object.
  • An unapply method takes an object and extracts values from it, usually the values from which the object was constructed.

Consider the Fraction class from the preceding section. The apply method makes a fraction from a numerator and denominator. An unapply method retrieves the numerator and denominator. You can use it in a variable definition:

var Fraction(a, b) = Fraction(3, 4) * Fraction(2, 5)
// a, b are initialized with the numerator and denominator of the result

or a pattern match:

case Fraction(a, b) => ... // a, b are bound to the numerator and denominator

(We’ll talk more about pattern matching later.)

In general, a pattern match can fail. Therefore, the unapply method returns an Option containing a tuple with one value for each matched variable. In our case, we return an Option[(Int, Int)].

object Fraction {
  def unapply(input: Fraction) =
    if (input.den == 0) None else Some((input.num, input.den))
}

Just to show the possibility, this method returns None when the denominator is zero, indicating no match.

Other uses for unapply

In the preceding example, the apply and unapply methods are inverses of one another. However, that is not a requirement. You can use extractors to extract information from an object of any type.

For example, suppose you want to extract first and last names from a string:

val author = "Cay Horstmann"
val Name(first, last) = author // Calls Name.unapply(author)

So, let’s make Name object with an unapply method that returns an Option[(String, String)]. If the match succeeds, it’ll return a pair with the first and last name. The components of the pair will be bound to the variables in the pattern. Otherwise, it’ll return None.

object Name {
  def unapply(input: String) = {
    val pos = input.indexOf(" ")
    if (pos == -1) None
    else Some((input.substring(0, pos), input.substring(pos + 1)))
  }
}
NOTE: In the above example, there’s no Name class. The Name object is an extractor for String objects.

Extractors with One or No Arguments

In Scala, there are no tuples with just one component. If the unapply method extracts a single value, it should just return an Option of the target type. For example,

object Number {
  def unapply(input: String): Option[Int] =
    try {
      Some(Integer.parseInt(input.trim))
    }
    catch {
      case ex: NumberFormatException => None
    }
}

With this extractor, you can extract a number from a string:

val Number(n) = "1729"

An extractor can just test its input without extracting any value. In that case, the unapply method should return a Boolean:

object IsCompound {
  def unapply(input: String) = input.contains(" ")
}

You can use this extractor to add a test to a pattern:

author match {
  case Name(first, last @ IsCompound()) => ...
    // Matches if the author is Peter van der Linden
  case Name(first, last) => ...
}

The unapplySeq Method

To extract an arbitrary sequence of values, the method needs to be called unapplySeq. It returns an Option[Seq[A]], where A is the type of the extracted values. For example, a Name extractor can produce a sequence of the name’s components:

object Name {
  def unapplySeq(input: String): Option[Seq[String]] =
    if (input.trim == "") None else Some(input.trim.split("\\s+"))
}

Now you can match for any number of variables:

author match {
  case Name(first, last) => ...
  case Name(first, middle, last) => ...
  case Name(first, "van", "der", last) => ...
  ...
}

Part 2: Scala Pattern Matching

Okay, that's the background. Now we're to the cool stuff: pattern Matching and switch statements.

In Part 1 we talked about apply and update. Now, with that background in place, we can start to talking about pattern Matching.

Pattern matching is like a better switch statement.

A Better Switch

Here is the equivalent of the C-style switch statement in Scala:

var sign = ...
val ch: Char = ...

ch match {
  case '+' => sign = 1
  case '-' => sign = -1
  case _   => sign = 0
}

The equivalent of default is the catch-all case _ pattern. It is a good idea to have such a catch-all pattern. If no pattern matches, a MatchError is thrown.

Unlike the switch statement, Scala pattern matching does not suffer from the "fall-through" problem. (In C and its derivatives, you must use explicit break statements to exit a switch at the end of each branch, or you will fall through to the next branch. This is annoying and error-prone.)

Similar to if, match is an expression, not a statement. The preceding code can be simplified to

val sign = ch match {
  case '+' => 1
  case '-' => -1
  case _   => 0
}

You can use the match statement with any types, not just numbers. For example:

color match {
  case Color.RED => ...
  case Color.BLACK => ...
  ...
}

Guards

Suppose we want to extend our example to match all digits. In a C-style switch statement, you would simply add multiple case labels:

switch(c) {
  case '0':
  case '1':
  ...
  case '9':
  ...
}

(Except that, of course, you can’t use … but must write out all ten cases explicitly.) In Scala, you add a guard clause to a pattern, like this:

ch match {
  case '+' => sign = 1
  case '-' => sign = -1
  case _ if Character.isDigit(ch) => digit = Character.digit(ch, 10)
  case _ => sign = 0
}

The guard clause can be any Boolean condition.

Note that patterns are always matched top-to-bottom. If the pattern with the guard clause doesn’t match, the catch-all pattern is attempted.

Variables in Patterns

If the case keyword is followed by a variable name, then the match expression is assigned to that variable. For example:

str(i) match {
  case '+' => sign = 1
  case '-' => sign = -1
  case ch  => digit = Character.digit(ch, 10)
}

You can think of case _ as a special case of this feature, where the variable name is _.

You can use the variable name in a guard:

str(i) match {
  case ch if Character.isDigit(ch) => digit = Character.digit(ch, 10)
  ...
}
Unfortunately, variable patterns can conflict with constant expressions:

 

import scala.math._
x match {
  case Pi => ...
  ...
}

 

How does Scala know that Pi is a constant, not a variable? The rule is that a variable must start with a lowercase letter.

 

If you have a lowercase constant, enclose it in backquotes:

 

import java.io.File._
str match {
  case `pathSeparator` => ...
  ...
}

Type Patterns

You can match on the type of an expression, for example:

obj match {
  case x: Int => x
  case s: String => Integer.parseInt(s)
  case _: BigInt => Int.MaxValue
  case _ => 0
}

In Scala, this form is preferred over using the isInstanceOf operator.

Note the variable names in the patterns. In the first pattern, the match is bound to x as an Int, and in the second pattern, it is bound to s as a String. No asInstanceOf casts are needed!

When you match against a type, you must supply a variable name. Otherwise, you match the object:

 

obj match {
  case _: BigInt => Int.MaxValue // Matches any object of type BigInt
  case BigInt    => -1           // Matches the BigInt object of type Class
}

Matches occur at runtime, and generic types are erased in the Java virtual machine. For that reason, you cannot make a type match for a specific Map type:

 

case m: Map[String, Int] => ... // Don't do this.

 

You can match a generic map:

case m: Map[_, _] => ... // OK

 

However, arrays are not erased. You can match an Array[Int].

Matching Arrays, Lists, and Tuples

To match an array against its contents, use Array expressions in the patterns, like this:

arr match {
  case Array(0) => "0"
  case Array(x, y) => x + " " + y
  case Array(0, _*) => "0 ..."
  case _ => "something else"
}

The first pattern matches the array containing 0. The second pattern matches any array with two elements, and it binds the variables x and y to the elements. The third pattern matches any array starting with zero.

You can match lists in the same way, with List expressions. Alternatively, you can use the :: operator:

lst match {
  case 0 :: Nil => "0"
  case x :: y :: Nil => x + " " + y
  case 0 :: tail => "0 ..."
  case _ => "something else"
}

With tuples, use the tuple notation in the pattern:

pair match {
  case (0, _) => "0 ..."
  case (y, 0) => y + " 0"
  case _ => "neither is 0"
}

Again, note how the variables are bound to parts of the list or tuple. Since these bindings give you easy access to parts of a complex structure, this operation is called destructuring.

Extractors

In the preceding section, you have seen how patterns can match arrays, lists, and tuples. These capabilities are provided by extractors: objects with an unapply or unapplySeq method that extract values from an object.

(We talked about extractors earlier.) The unapply method is provided to extract a fixed number of objects, while unapplySeq extracts a sequence whose length can vary.

For example, consider the expression:

arr match {
  case Array(0, x) => ...
  ...
}

The Array companion object is an extractor: It defines an unapplySeq method. That method is called with the expression that is being matched, not with what appears to be the parameters in the pattern.

Array.unapplySeq(arr) yields a sequence of values, namely the values in the array. The first value is compared with zero, and the second one is assigned to x.

Regular Expression as Extractors

Regular expressions provide another good use of extractors. When a regular expression has groups, you can match each group with an extractor pattern. For example:

val pattern = "([0-9]+) ([a-z]+)".r
"99 bottles" match {
  case pattern(num, item) => ...
    // Sets num to "99", item to "bottles"
}

The call pattern.unapplySeq("99 bottles") yields a sequence of strings that match the groups. These are assigned to the variables num and item.

Note that here the extractor isn’t a companion object but an instantiated regular expression object.

Patterns in Variable Declarations

In the preceding sections, you have seen how patterns can contain variables. You can use these patterns inside variable declarations. For example,

val (x, y) = (1, 2)

simultaneously defines x as 1 and y as 2. That is useful for functions that return a pair:

val (q, r) = BigInt(10) /% 3

The /% method returns a pair containing the quotient and the remainder, which are captured in the variables q and r.

The same syntax works for any patterns with variable names:

val Array(first, second, _*) = arr

assigns the first and second element of the array arr to the variables first and second.

Patterns in for Expressions

You can use patterns with variables in for comprehensions. For each traversed value, the variables are bound. This makes it possible to traverse a map:

import scala.collection.JavaConversions.propertiesAsScalaMap
// Converts Java Properties to a Scala map—just to get an interesting example
for ((k, v) <- System.getProperties())
  println(k + " -> " + v)

For each (key, value) pair in the map, k is bound to the key and v to the value.

In a for comprehension, match failures are silently ignored. For example, the following loop prints all keys with empty value, skipping over all others:

for ((k, "") <- System.getProperties())
  println(k)

You can also use a guard. Note that the if goes after the ← symbol:

for ((k, v) <- System.getProperties() if v == "")
  println(k)

Final Thoughts

So, what you should take away from this is, that pattern matching is an extremely powerful feature in this language. It provides you, the programmer, with an amazing amount of power to do all kinds of matches, with a very concise and readable syntax.

Since it’s all just syntactic sugar for unapply and unapplySeq, as you build out your own classes, you can build pattern matching into them and they will automatically work with these matches.

For my part, when I have to go back and program in Java, I really miss this feature.

Scala Resources

Published February 5, 2015