The groupBy method from Scala’s collection library

Scala’s collection library is a wonderfully crafted piece of software. When learning a language I think it pays to look at the available collections and their functionality. In Scala there a many useful collections and methods which give you a lot of powerful tools.
In this post, I want to look at the groupBy method defined in Traversable.

Let’s look at an example before explaining how it works:

val birds = List("Golden Eagle", "Gyrfalcon", "American Robin",
                 "Mountain BlueBird", "Mountain-Hawk Eagle")
val groupedByFirstLetter = birds.groupBy(_.charAt(0))

This will print:

Map(M -> List(Mountain BlueBird, Mountain-Hawk Eagle), G -> List(Golden Eagle, Gyrfalcon),
       A -> List(American Robin))

(the line breaks are not part of the output, I just added them for readability).

What does this code do? It takes the list and groups the elements by the first character of each bird species in the list. It builds a Map in which the keys are the first characters of the bird species in the list and the value is a List of all bird species that have the same first character.

Here is the official method definition from the Scala docs:

def groupBy [K] (f: (A) ⇒ K): Map[K, Traversable[A]]

f is a so called discriminator function. You use that function to specify the criteria by which you want to group the values in the Traversable. K is the type of the keys used in the returned Map. The return value is a Map with a key K and a Traversable of A which is the type contained in the Traversable.
Method definitions in Scala can be intimidating for beginners but the good thing is that using those methods is normally very easy.
Let’s look at some more examples to make it clear what groupBy does.

val cats = List("Tiger", "Lion", "Puma", "Leopard",
                  "Jaguar", "Cheetah", "Bobcat")
val groupedByLength = cats.groupBy(_.length)

This one is very easy. It builds a Map that groups the cat species by the length of their name. If you print groupedByLength, you will get:

Map(5 -> List(Tiger), 7 -> List(Leopard, Cheetah), 4 -> List(Lion, Puma),
        6 -> List(Jaguar, Bobcat))

Here is another example:

val raptors = List("Golden Eagle", "Bald Eagle", "Prairie Falcon",
                      "Peregrine Falcon", "Harpy Eagle", "Red Kite")
val kinds = raptors.groupBy {
   case bird if bird.contains("Eagle") => "eagle"
   case bird if bird.contains("Falcon") => "falcon"
   case _ => "unknown"
}

In this example we have a List of raptor species (birds of prey). We want to group similar birds together. In our example we build a Map that groups all the eagle and all the falcon species together and all other birds are treated as “unknown”.
If you print the kinds variable, you will get:

Map(unknown -> List(Red Kite), eagle -> List(Golden Eagle, Bald Eagle, Harpy Eagle),
       falcon -> List(Prairie Falcon, Peregrine Falcon))

The last example shows how to use the groupBy and the mapValues method can be used to count all the words in a List of strings. You can use this for example to read in a file and count all the words in the file.

val words = List("one", "two", "one", "three", "four", "two", "one")
val counts = words.groupBy(w => w).mapValues(_.size)

If you print counts, you will get:

Map(one -> 3, two -> 2, four -> 1, three -> 1)

The groupBy will return a Map like this:

Map(one -> List(one, one, one), two -> List(two, two), four -> List(four), three -> List(three))

Because we don’t want to use a list with all occurrences of each word but the number of occurrences, we use the mapValues method that applies a function to all the values, in our case just the size method that returns the length of the list.

As you can see, using groupBy is not difficult and it gives you a very powerful tool. Without this methods, you would have to write more complicated and longer code which reduces readability. The Scala collections provide a huge amount of useful stuff. It is definitely worth learning what’s available in the standard library of a programming language you use regularly.

For more information about Scala’s collections, see here:
http://www.scala-lang.org/api/current/scala/collection/package.html

For a comparison with Ruby’s group_by see:
The group_by method from Ruby’s Enumerable mixin (and compared with Scala)

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Comment Spam Protection by WP-SpamFree