Understanding java.util.concurrent.CompletionService

The java.util.concurrent.CompletionService is a useful interface in the JDK standard libraries but few developers know it.
One could live without it as you can of course program this functionality with the other interfaces and classes within java.util.concurrent but it is convenient to have a solution that is already available and less error prone then doing it yourself. I always prefer stuff that is already available within the JDK over implementing my own solution with the same features – unless as an exercise at home!

Image you have a list of separate tasks that take a while, e.g. 10 tasks that each download an URL and return the content as a String.
Depending on the network, the size of the downloaded content and other factors, the time to download each URL will take various amounts of time.
When you execute them in parallel you may want to start doing something with the downloaded content as soon as the first task is done. No need to wait for the other 9 tasks to complete because that would mean you would always have to wait until all URLs are downloaded before doing something useful with each individual result.

You can of course execute all of the and get a List of Future objects and then poll on each one but it is easier to just use a CompletionService.

See the following example:

First, some dummy Callable:

import java.util.Random;
import java.util.concurrent.Callable;

public class LongRunningTask implements Callable<String> {

	public String call() {
		// do stuff and return some String
		try {
			Thread.sleep(Math.abs(new Random().nextLong() % 5000));
		} catch (InterruptedException e) {
			Thread.currentThread().interrupt();
		}
		return Thread.currentThread().getName();
	}
}

The LongRunningTask is just a place holder for a real Task you might want to implement.
In this dummy example, it just sleeps for a random amount of time and returns a String that contains the name of the current thread.

Second, an example using a CompletionService that uses the Callable above.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class CompletionServiceExample {

	// dummy helper to create a List of Callables return a String
	public static List<Callable<String>> createCallableList() {
		List<Callable<String>> callables = new ArrayList<>();
		for (int i = 0; i < 10; i++) {
			callables.add(new LongRunningTask());
		}
		return callables;
	}

	public static void main(String[] args) {

		ExecutorService executorService = Executors.newFixedThreadPool(10);

		CompletionService<String> taskCompletionService = new ExecutorCompletionService<String>(
				executorService);

		try {
			List<Callable<String>> callables = createCallableList();
			for (Callable<String> callable : callables) {
				taskCompletionService.submit(callable);
			}
			for (int i = 0; i < callables.size(); i++) {
				Future<String> result = taskCompletionService.take();
				System.out.println(result.get());
			}
		} catch (InterruptedException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		} catch (ExecutionException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		}
		executorService.shutdown();
	}
}

Note: The examples don’t have proper exception handling to keep it simple. Don’t copy this into your production code!

The CompletionServiceExample shows how to use a CompletionService. You create an instance of ExecutorCompletionService (the only implementation of the CompletionService interface available with Java 7 or older versions) and then you submit all Callables to the CompletionService.

As soon as a task is completed, it is put in an internal java.util.concurrent.BlockingQueue (a highly efficient queue for Producer/Consumer problems and communication between threads).

From that queue, you can get the results of the finished tasks with take. If no task is yet available, take will wait until something is available.
In this case we just print the result (the name of the current threat executing the Callable).

This is all you need to know to use a CompletionService. It is really simple. There is a lot of cool stuff in the JDK and in the java.util.concurrent package. Make sure to browse through the docs from time to time before inventing your own solution.

Short book review: The Linux Programming Interface

There are many programming and computer books out there and I’ve read many. Some were good, some were bad, some just plain fucking terrible.

But some books out there are so good that they become classics and a standard for other books and “must reads” for everyone interested in the topic. The book “The Linux Programming Interface” by Michael Kerrisk is such a book. It is the best book on Linux system programming ever published and will be for many years to come.

I don’t do much system programming at work and my main interest is in Java and Scala but most of the software I’ve written professionally runs on a Linux system (Linux is fantastic for your Java stuff!). So it always helps to know how the underlying operating system works. Some programmers focus too much on fancy frameworks but ignore the basics like how an OS works, how memory is allocated and freed, how HTTP works, etc. You don’t have to be an expert on this stuff but should know at least the basics. And every Java, Scala, Ruby or Python programmer should know some C and know how memory management works – you will love Java or Scala even more after learning about that. :-)

Let’s talk about the book: It is a huge tome with more than 1,500 pages. It covers everything you will want to know about Linux and Unix system programming. The focus is on Linux but standard APIs like POSIX are covered in detail and so this book is also great if you work with Solaris, FreeBSD and other Unix like operating systems. Functionality specific to Linux like epoll is also covered in detail.

The examples are all written in C. This makes senses as Linux itself is written mostly in C and the kernel and standard APIs are C. But most of this functionality is also available in libraries for languages like Python, Ruby or – if you really want to go down that road – for Perl.

I won’t go into details of each chapter – they are all great. The text is easy to read and the explanations are always clear and very detailed. The source code is clear and easy to read – even if you don’t know much C.
The examples are easy to get running on your own Linux system and if you want to use a Ruby or Python library that wraps the C calls, you won’t have much trouble understanding them.

Whenever I want to know something about low level Linux system programming I find it in this awesome book.

As I wrote above, this book will be – or already is – a classic and a must read for everyone who wants to learn Linux system programming or just runs programs on Linux or another Unix system.

The book’s website is here:
The Linux Programming Interface

Scala’s parallel collections and the aggregate method

In this post we look at the aggregate method available in the Scala collection library. It may look a little intimidating at first but can really be your friend when you can do something in parallel using Scala’s parallel collections.

As an example we will count the words in a file. Our sample file has 5,000 lines with 10 words in each line separated by a space.

In this example we read in all the lines in the file into a List like this:

val lines = Source.fromFile("/home/markus/tmp/bla.txt").getLines.toList

This may not be the most efficient way to read files, especially very large one, but it will do here to show how the aggregate method works:

We use the following helper method and regular expression to count the words in a single line:

val singleWordRegex = java.util.regex.Pattern.compile("\\w+")

def countWords(line: String): Int = {
  var wordCount = 0
  val matcher = singleWordRegex.matcher(line)
  while (matcher.find) wordCount += 1
  wordCount
}

A typical way to use Scala to count the words in the whole file would be:

val sum = lines.foldLeft(0)((x, line) => x + countWords(line))
// sum = 50000

Using foldLeft is a standard way to do this in Scala.

Let’s now look at the aggregate method and do the same:


val sum = lines.aggregate(0)((x, line) => x + countWords(line), _ + _)
// sum = 50000

This looks very similar to foldLeft, except for the "_ + _" at the end, so you may wonder what the exact difference is between foldLeft and aggregate. Let’s look at the method declaration and implementation of the aggregate method in Scala’s scala.collection.immutable.List class. This class inherits the aggregate method from trait TraversableOnce where the method is defined like this (if you use Eclipse, just put your cursor over the method and press F3 to go to the method definition):

def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B =
  foldLeft(z)(seqop)
// line break added for readability

If you look closely you may think that I am kidding because here aggregate is just a call to foldLeft. The 3rd parameter (the "_ + _" in the example above) is just ignored here.
So for collections like scala.collection.immutable.List that inherit from TraversableOnce, calling aggregate is just like foldLeft and you won’t get any advantage and a small disadvantage as your code is slightly more complicated to read.

So what’s the point of aggregate?

Let’s look at this code:

val sum = lines.par.aggregate(0)((x, line) => x + countWords(line), _ + _)
// sum = 50000

It is almost the same as the example above with one small but important difference: par. This method turnes the List lines into a parallel collection (to be more specific in this case it will be a scala.collection.parallel.immutable.ParSeq[String]).

If you look at the implementation of aggregate again (press F3 again in Eclipse), you will see this (for Scala 2.9.0.1):

def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
  executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}

I am not going into the details here (but do this if you are interested, it’s an interesting read), but as you can see now the 3rd parameter (combop) is indeed used and it is no longer a simple call to foldLeft.

So now let’s talk about what aggregate does when called on a parallel collection.
Like foldLeft it has a start value (the 1st parameter) and it accumulates values using the content of the collection and the provided function called seqop. So far this is just like foldLeft. But as a parallel collection it can use use several threads to improve performance. For this the collection is split into parts and the seqop is called on the elements of each separate part of the collection. Once this is done, combop (short for combination operator) is used to combine the separate results we gained with seqop.
If you have several processors as almost all computers now do, you can use true parallelism and gain a lot of performance from using parallel collections. You don’t need to care about the details of how the collections are split and how they are distributed over the available processes as the library does this for you. Under the hood, Scala uses some of the powerful mechanisms from the java.util.concurrent framework (btw: learning about java.util.concurrent is a must for all Java and Scala developers).

You should always make performance tests when using parallel collections. For simple things and small collections, the overhead of distributing them over the different cores may not give you a performance gain and may even slow your application down. Parallel collections are perfect if you have large collections, do calculations on the elements that take some time and the algorithm can be split into sub tasks (not all algorithms can’t).

I made a small test that uses foldLeft, aggregate on a non parallel collection and aggregate on a parallel collection. I ran all examples 10,000 times (to help the JVM to warm up and optimize) and calculated the average time it took in nano seconds. Here are the results (I have a 4 core machine):

Here are the results in nano seconds:

method average minimum
foldLeft 4259846.0 4046714.0
aggregate non parallel 4189567.0 4057543.0
aggregate parallel 2047955.0 1073732.0

As excepted the aggregate method on the non parallel collection is about as fast as the foldLeft. But on a parallel collection it is much faster!
In this case the average time was about 2 times as fast. I also checked the minimum time and there the parallel version was about 4 times (as expected with 4 cores) as fast as the non parallel version of aggregate and foldLeft.

Of course aggregate is just one useful method for Scala’s parallel collection. There is MUCH MORE available and as a Scala developer you should definitely learn more about them.

More information about the aggregate can be found the the API docs:
http://www.scala-lang.org/api/current/scala/collection/parallel/ParIterableLike.html

For more information about Scala’s parallel collection, see this video:
Scala Parallel Collections by Aleksandar Prokopec

and this post in infoq:
Scala 2.9.0 Introduces Parallel Collections

The aggregate method is just a small part of Scala’s powerful collection library. Scala has one of the best collection libraries of all programming languages out there and in addition to Scala’s own collection library you can also use all of Java’s collections in Scala giving you almost everything you will need in your daily work as a software developer.
Scala’s collection alone are worth learning the language. When looking for a new language always look at the available collections. If they suck, find another language! Scala’s collection DO NOT suck!

Short book review: Functional Programming for Java Developers

This is a short review of the book Functional Programming for Java Developers.

With FP getting a lot of attention lately more and more Java developers want to learn about it. While Java is not really a functional language many ideas from FP can also be used in Java, although not always a elegantly as in languages like Scala or Clojure.

In July 2011 O’Reilly published Functional Programming for Java Developers. It is written by Dean Wampler who is also a coauthor of the wonderful Programming Scala book (also available online).

This short new book provides the basics of FP for Java developers who do not yet know much about FP.

The author explains why FP is important today and why it makes sense to learn it. He explains the basics of immutable types and shows some of the common functions found in functional languages like map, foldLeft, filter, etc. The implementations are done in Java and are easy to follow. It is clear from the code and the explanations that the author really knows his stuff.

An interesting chapter explains some new concurrency ideas like actors and STM (software transactional memory). Those things are not restricted to functional programming but languages like Erlang, Haskell or Scala support some or all of those new ideas in the language itself or as libraries.

The most important thing to take from this book are the ideas. Of course it is more convenient to use Scala for FP than Java (although this will get better with Java 8), but that doesn’t mean you can’t do it with Java.
Ideas are always more important than a language. If you understand a concept in one language, you will normally be ably to apply it in another even if it may be more work in one than the other.
After reading this book, I think you will have a good basic understanding of the advantages and ideas of FP.

At the end of the book the author gives a lot of links to interesting websites, other languages and frameworks (including functional programming libraries for Java!).

What’s not to like?
Given to the shortness of the book sometimes the reader wishes for a little more details but the idea behind the book is to be short, so it is inevitable that not everything can be explained in great details.
I would like to see an online supplement where the code examples from the book are also implemented in Scala and Clojure so the reader can see how much easier it is in those languages.

For the relatively low price (I paid a little less than 12 Euros) I think this is a great short book for every Java developer who is new to FP and want’s to see what it is all about and how to get starting using Java.
Highly recommended!

I recommend the ebook. It is just a short book which can easily be read on a computer.

After reading this book, make sure to follow the authors advice and look at the resources for further learning.
I personally recommend learning Scala as it combines the power of OOP and FP. I also recommend learning Haskell as it is a pure functional language (Scala, of course is not). I just started playing with Haskell and and will take some time to get your mind around it but it will definitely be worth it.

The group_by method from Ruby’s Enumerable mixin (and compared with Scala)

A few weeks ago I wrote about the groupBy method from Scala’s collection library. Today I will port those example to Ruby for which there is also such a method, just with s slightly different name (group_by instead of groupBy).

For the idea behind the method, see the post about Scala’s version. Here I will put the Scala code from the other post and show the equivalent code for Ruby next to it.
Note: I’ve used Ruby 1.9.2p290 for all examples.

1st Example:

Here we group an Array (in Scala List) of strings (with names of different bird species) by their first name.

Scala:

val birds = List("Golden Eagle", "Gyrfalcon", "American Robin",
                 "Mountain BlueBird", "Mountain-Hawk Eagle")
val groupedByFirstLetter = birds.groupBy(_.charAt(0))

Ruby:

birds = ["Golden Eagle", "Gyrfalcon", "American Robin",
            "Mountain BlueBird", "Mountain-Hawk Eagle"]
grouped_by_first_letter = birds.group_by { |s| s[0] }

# {"G"=>["Golden Eagle", "Gyrfalcon"], "A"=>["American Robin"],
# "M"=>["Mountain BlueBird", "Mountain-Hawk Eagle"]}

The value of the resulting Hash is shown in the comment. The line breaks were added by me to make it more readable. They are not part of the result.

2nd Example:

Here we group and Array (in Scala List) of strings by their length.

Scala:

val cats = List("Tiger", "Lion", "Puma", "Leopard",
                  "Jaguar", "Cheetah", "Bobcat")
val groupedByLength = cats.groupBy(_.length)

Ruby:


cats = ["Tiger", "Lion", "Puma", "Leopard", "Jaguar",
        "Cheetah", "Bobcat"]
grouped_by_length = cats.group_by { |cat| cat.length }

#{5=>["Tiger"], 4=>["Lion", "Puma"], 7=>["Leopard", "Cheetah"], 6=>["Jaguar", "Bobcat"]}

3rd Example:

Here we group some raptor (birds of prey) species by their kind. Everything that is not an Eagle or Falcon will be “Unknown”.

Scala:

val raptors = List("Golden Eagle", "Bald Eagle", "Prairie Falcon",
                      "Peregrine Falcon", "Harpy Eagle", "Red Kite")
val kinds = raptors.groupBy {
   case bird if bird.contains("Eagle") => "eagle"
   case bird if bird.contains("Falcon") => "falcon"
   case _ => "unknown"
}

Ruby:

raptors = ["Golden Eagle", "Bald Eagle", "Prairie Falcon", "Peregrine Falcon",
           "Harpy Eagle", "Red Kite"]

kinds = raptors.group_by do |species|
 case species
    when /.*Eagle.*/
    "Eagle"
  when /.*Falcon.*/
    "Falcon"
  else
    "unknown"
  end
end

# {"Eagle"=>["Golden Eagle", "Bald Eagle", "Harpy Eagle"],
# "Falcon"=>["Prairie Falcon", "Peregrine Falcon"], "unknown"=>["Red Kite"]}

4th Example

Here we take an Array (List in Scala) of String and group them by the number of occurrences in the Array.

Scala:

val words = List("one", "two", "one", "three", "four", "two", "one")
val counts = words.groupBy(w => w).mapValues(_.size)

Ruby:

words = ["one", "two", "one", "three", "four", "two", "one"]
count = words.group_by { |w| w }.inject({}) do |tmphash, (k,v)|
      tmphash[k] = v.size
      tmphash
end

# {"one"=>3, "two"=>2, "three"=>1, "four"=>1}

Ruby’s Hash class does not have a mapValues method like Scala’s Maps do (at least I couldn’t find one in the standard library) , so I used inject instead which is very similar to foldLeft in Scala. This is not as elegant as mapValues but get’s the job done as well.

Conclusion

As you can see the Ruby version of the group_by method is as easy to use as Scala’s version. Both Ruby and Scala provide a lot of great tools to work with collections. Scala’s collection library is much bigger than Ruby’s standard collections, though. Scala supports many more data structures and also a lot of immutable data structures because Scala also encourages functional programming (whereas Ruby is mostly only OOP).

No matter what language you use, learn as much about the available collections in the standard library (or available open source packages) as possible because this will save you a lot of time when using those languages.

A little functional Java – the map function

Functional programming is getting a lot of interested nowadays. Languages like Clojure or Scala are on the rise and even languages like Haskell which where formerly considered to be mostly academic are getting more and more popular.
And more and more Java programmers are thinking about functional programming. Java 8 will get lambda expressions which will be a first step towards a more functional programming style. And there are already libraries available that help Java developers to program in a more functional style even if it is not as elegant as using Scala or Clojure.

Today I want to write a little about the map function (not to be confused with the map data structure, e.g. a HashMap in Java) which is popular in all functional programming languages I know.
I is very simple to understand. It is used on a collection (often a list) of values and is given a function as a parameter that is applied to every element in that collection.

A first example in Scala

Let’s look at a few examples in Scala before we have a look at how to do it in Java.

val numbers = List(1,2,3,4,5)
val squared = numbers.map(x => x * 2)

If you print squared you will see:

List(2, 4, 6, 8, 10)

It shouldn’t be difficult to figure out what’s going on here. The map method is invoked an the numbers List. The argument passed to map is a function (in Scala you can pass functions to other functions or methods) and this function is then applied to all values of the collection the map method is called on and a new collection (here a scala.collection.immutable.List) is returned with the results of the supplied function applied on every element.
In Scala this is so common that you can write it a little shorter:

val squared = numbers.map(_ * 2)

The _ is a placeholder for the current element on which passed is applied function.

This is just a very simple function but you can passed every function to map in Scala, giving you a very powerful tool to work with collections. In fact the Scala collection library offers A LOT more than just map – be sure to check it out. One example can be found on my blog here.

Just for the curious, here is how to do it in Clojure:

(println (map #(* % 2) [1, 2, 3, 4 ,5]))

Using Java

Java (before Java 8 at least) doesn’t allow you to pass functions to other methods/functions like Scala. But you can still use something similar using functions as objects.
There are several libraries out there that help Java programmers with functional programming. One I recently came across is totallylazy. It doesn’t have much documentation yet but is under active development and the unit tests and the source code are helpful to figure out how to use it.

Here is a simple example that shows how to use map:

import com.googlecode.totallylazy.Sequence;
import static com.googlecode.totallylazy.Sequences.sequence;
import static com.googlecode.totallylazy.numbers.Numbers.increment;

public class MapTest {
    public static void main(String[] args) {
        Sequence<Integer> numbers = sequence(1, 2, 3, 4, 5, 6);
        Sequence<Number> incremented = numbers.map(increment());
        System.out.println(incremented);
    }
}

It should again be pretty obvious what’s going on here. We call map on a Sequence object and pass it increment() (which is a com.googlecode.totallylazy.Callable1) which comes with totallylazy and just increments every value in the Sequence.

totallylazy comes with a bunch of useful Callables but often you will want to supply your own version. Here is an example that multiplies every element by a given factor:

import com.googlecode.totallylazy.Callable1;
import com.googlecode.totallylazy.Sequence;
import static com.googlecode.totallylazy.Sequences.sequence;

public class MapTest {

    public static Callable1<Integer, Integer> multiplyByFactor(final int factor) {
        return new Callable1<Integer, Integer>() {
            public Integer call(Integer number) throws Exception {
                return number * factor;
            }
        };
    }
    public static void main(String[] args) {
        Sequence<Integer> numbers = sequence(1, 2, 3, 4, 5, 6);
        Sequence<Integer> multiplied = numbers.map(multiplyByFactor(5));
        System.out.println(multiplied);

    }
}

Here we created our own Callable1 and pass it to map. It will print:

5,10,15,20,25

Again, this is not very difficult and you can check the totallylazy source code to see for many more examples. Of course you can also pass a anonymous class for a Callable1 if you only need it locally and don’t want to reuse it.

There are also methods called mapConcurrently that call map (surprise!) concurrently.

An alternative to totallylazy is Functional Java which has been under development for longer and might be a better choice at the moment for production ready code.

Both libraries offer a lot more than just map.

If you only need map, it is not difficult to write something similar like Callable1 yourself.

As you can see, although not difficult in Java, it requires a lot more code than the examples in Scala or Clojure.
If you want to do a lot of functional programming I highly recommend using a library like totallylazy or Functional Java – or even better a functional language like Scala or Clojure.
Scala gives you the power of both OOP and FP and let’s you choose which is best for a given task.
And even if you don’t want to use Scala or Clojure I highly recommend learning them. It will make you a better Java programmer!

If you want to look at a pure functional language (a very good idea to understand the ideas behind FP), have a look at Haskell. A good starting point is the freely available book: Learn You a Haskell

The groupBy method from Scala’s collection library

Scala’s collection library is a wonderfully crafted piece of software. When learning a language I think it pays to look at the available collections and their functionality. In Scala there a many useful collections and methods which give you a lot of powerful tools.
In this post, I want to look at the groupBy method defined in Traversable.

Let’s look at an example before explaining how it works:

val birds = List("Golden Eagle", "Gyrfalcon", "American Robin",
                 "Mountain BlueBird", "Mountain-Hawk Eagle")
val groupedByFirstLetter = birds.groupBy(_.charAt(0))

This will print:

Map(M -> List(Mountain BlueBird, Mountain-Hawk Eagle), G -> List(Golden Eagle, Gyrfalcon),
       A -> List(American Robin))

(the line breaks are not part of the output, I just added them for readability).

What does this code do? It takes the list and groups the elements by the first character of each bird species in the list. It builds a Map in which the keys are the first characters of the bird species in the list and the value is a List of all bird species that have the same first character.

Here is the official method definition from the Scala docs:

def groupBy [K] (f: (A) ⇒ K): Map[K, Traversable[A]]

f is a so called discriminator function. You use that function to specify the criteria by which you want to group the values in the Traversable. K is the type of the keys used in the returned Map. The return value is a Map with a key K and a Traversable of A which is the type contained in the Traversable.
Method definitions in Scala can be intimidating for beginners but the good thing is that using those methods is normally very easy.
Let’s look at some more examples to make it clear what groupBy does.

val cats = List("Tiger", "Lion", "Puma", "Leopard",
                  "Jaguar", "Cheetah", "Bobcat")
val groupedByLength = cats.groupBy(_.length)

This one is very easy. It builds a Map that groups the cat species by the length of their name. If you print groupedByLength, you will get:

Map(5 -> List(Tiger), 7 -> List(Leopard, Cheetah), 4 -> List(Lion, Puma),
        6 -> List(Jaguar, Bobcat))

Here is another example:

val raptors = List("Golden Eagle", "Bald Eagle", "Prairie Falcon",
                      "Peregrine Falcon", "Harpy Eagle", "Red Kite")
val kinds = raptors.groupBy {
   case bird if bird.contains("Eagle") => "eagle"
   case bird if bird.contains("Falcon") => "falcon"
   case _ => "unknown"
}

In this example we have a List of raptor species (birds of prey). We want to group similar birds together. In our example we build a Map that groups all the eagle and all the falcon species together and all other birds are treated as “unknown”.
If you print the kinds variable, you will get:

Map(unknown -> List(Red Kite), eagle -> List(Golden Eagle, Bald Eagle, Harpy Eagle),
       falcon -> List(Prairie Falcon, Peregrine Falcon))

The last example shows how to use the groupBy and the mapValues method can be used to count all the words in a List of strings. You can use this for example to read in a file and count all the words in the file.

val words = List("one", "two", "one", "three", "four", "two", "one")
val counts = words.groupBy(w => w).mapValues(_.size)

If you print counts, you will get:

Map(one -> 3, two -> 2, four -> 1, three -> 1)

The groupBy will return a Map like this:

Map(one -> List(one, one, one), two -> List(two, two), four -> List(four), three -> List(three))

Because we don’t want to use a list with all occurrences of each word but the number of occurrences, we use the mapValues method that applies a function to all the values, in our case just the size method that returns the length of the list.

As you can see, using groupBy is not difficult and it gives you a very powerful tool. Without this methods, you would have to write more complicated and longer code which reduces readability. The Scala collections provide a huge amount of useful stuff. It is definitely worth learning what’s available in the standard library of a programming language you use regularly.

For more information about Scala’s collections, see here:
http://www.scala-lang.org/api/current/scala/collection/package.html

For a comparison with Ruby’s group_by see:
The group_by method from Ruby’s Enumerable mixin (and compared with Scala)

spray: A lightweight Scala framework for building RESTful web services on top of Akka

I just came across spray, a lightweight Scala framework for building RESTful web services on top of Akka.

Scala is cool, Akka is cool, Rest is a great way to build service. So spray should be cool too. I haven’t yet played with it but will do so very soon. The examples on the website look great. Definitely worth trying if you are into Scala and or REST (and if not, it is time to get started!)

More information:
spray website:
https://github.com/spray/spray/wiki

spray 0.7 announcement:
spray 0.7.0 released

How to use java.util.concurrent.CountDownLatch

With some applications that use several threads there are situations where one thread can only start after some others have completed.
For example, image a program that downloads a bunch of web pages, zips them and send then the zip file via email. If you program this in a multithreaded way, the thread that zips the downloaded web pages cannot start before the downloads are complete.
How do you do this? One very simple way is to use a CountDownLatch from the java.util.concurrent package ( a package every Java developer should have a closer look at).

With a CountDownLatch you can specify a number and then count down by 1 once an operation has completed. If all operations have completed and the count is 0, another thread that uses the same CountDownLatch as a synchronisation tool using the await method can do it’s work.

Let’s look at a simple example. The first class is a simple Runnabe that does some work. In our example it does nothing really useful, it just sleeps for a random number of milliseconds to simulate some work.


import java.util.Random;
import java.util.concurrent.CountDownLatch;

public class Worker implements Runnable {

    private CountDownLatch countDownLatch;

    public Worker(CountDownLatch countDownLatch) {
        this.countDownLatch = countDownLatch;
    }

    @Override
    public void run() {
        try {
            Thread.sleep(getRandomSeconds()); // sleep random time to simulate long running task
            System.out.println("Counting down: " + Thread.currentThread().getName());
            this.countDownLatch.countDown();
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
    }

    // returns a long between 0 and 9999
    private long getRandomSeconds() {
        Random generator = new Random();
        return Math.abs(generator.nextLong() % 10000);
    }
}

The only really interesting line here is the call to:
this.countDownLatch.countDown();
Once the task is done, the counter in the CountDownLatch is decremented by one.

Here is the 2nd class that uses this Runnable

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class WorkManager {

    private CountDownLatch countDownLatch;
    private static final int NUMBER_OF_TASKS = 5;

    public WorkManager() {
        countDownLatch = new CountDownLatch(NUMBER_OF_TASKS);
    }

    public void finishWork() {
        try {
            System.out.println("START WAITING");
            countDownLatch.await();
            System.out.println("DONE WAITING");
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
    }

    public void startWork() {
        ExecutorService executorService = Executors.newFixedThreadPool(NUMBER_OF_TASKS);

        for (int i = 0; i < NUMBER_OF_TASKS; i++) {
            Worker worker = new Worker(countDownLatch);
            executorService.execute(worker);
        }
        executorService.shutdown();
    }

    public static void main(String[] args) {
        WorkManager workManager = new WorkManager();
        System.out.println("START WORK");
        workManager.startWork();
        System.out.println("WORK STARTED");
        workManager.finishWork();
        System.out.println("FINISHED WORK");
    }
}

The startWork method uses and ExecutorService (another useful and important class from java.util.concurrent) to start the Runnables.
In the method finishWork we call the await method that waits until the counter inside the CountDownLatch is 0.

If you run this example you get the following output:

START WORK
WORK STARTED
START WAITING
Counting down: pool-1-thread-3
Counting down: pool-1-thread-4
Counting down: pool-1-thread-1
Counting down: pool-1-thread-5
Counting down: pool-1-thread-2
DONE WAITING
FINISHED WORK

As you can see, 5 different threads are started and the finishWork method does not complete it’s work until the
CountDownLatch is at 0.

As you can see, using a CountDownLatch is very easy. There are other similar classes in the java.util.concurrent like a CyclicBarrier which is worth looking at. In upcoming posts, I will write more about the java.util.concurrent package ant it’s useful classes, interfaces and methods.
Instead of the ExecutorService you can just use java.lang.Thread but I recommend always using the higher level ExecutorService whenever possible. With all the stuff in java.util.concurrent, there is rarely a need to use the low level classes like java.lang.Thread (which does not mean you shouldn’t know how they work!).

BitSets in Scala are much more fun than in Java

Recently I played with BitSets in Java because I needed an efficient way to store huge amounts of long values.
For Java there is the java.util.BitSet class. It is a very efficient implementation when you only need to store bit values.

Here is a trivial example on how to use it:


import java.util.BitSet;

public class BitSetTest {

    public static void main(String args[]) {

        BitSet primeBits = new BitSet();
        primeBits.set(2);
        primeBits.set(3);
        primeBits.set(5);
        primeBits.set(7);
        primeBits.set(11);

        BitSet evenBits = new BitSet();
        evenBits.set(0);
        evenBits.set(2);
        evenBits.set(4);
        evenBits.set(6);
        evenBits.set(8);
        evenBits.set(10);

        //primeBits.and(evenBits);  // will result in primeBits only containing 2
        primeBits.andNot(evenBits);

        System.out.println(primeBits); // {3, 5, 7, 11}

    }
}

It is very easy to understand and use. But it seemed to require quite a lot of code and using methods like “and” doesn’t seem as natural as using “&” to get the intersection of two BitSets or “&~” to get the difference. In a language like C++ you can use operator overloading to get a more natural way of dealing which those operations.

After playing with the Java code, I looked up BitSets in Scala and with Scala using BitSets is much more fun.
See here for a simple example:


import scala.collection.immutable.BitSet
object BitsetTest {

  def main(args : Array[String]) : Unit = {

    val primebits = BitSet(2, 3, 5, 7, 11)
    val evenBits =  BitSet(0, 2, 4, 6, 8, 10)

    val evenSet = Set(0, 2, 4, 6, 8, 10);

    println(primebits & evenBits)  // BitSet(2)
    println(primebits & evenSet)  // BitSet(2)

    println(primebits &~ evenBits)  // BitSet(3, 5, 7, 11)
    println(primebits &~ evenSet)   // BitSet(3, 5, 7, 11)

 }

}

Scala BitSets have many advantages over Java BitSets:

  • In Scala you can add values to a BitSet in the Constructor. In Java you can only create a empty BitSet (optionally specifying it’s size) and then you have to use the set method to add the individual elements.
  • Scala allows you to define methods like “&” “&~” or “+” which allows for a more concise and natural code. Some more examples:
    
        // add single integers to the list
        val morePrimes = primebits + 13 + 17  
    
        val primeList = List(19, 23, 29)
    
         // using ++ you can add elements from other collections
        println(morePrimes ++ primeList)  // BitSet(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
    
        // remove 11
        println(morePrimes - 11)  // BitSet(2, 3, 5, 7, 13, 17)
    
        // remove all elements in evenBits from morePrimes
        println(morePrimes -- evenBits) // BitSet(3, 5, 7, 11, 13, 17)
    
  • In Scala BitSet comes in two versions: scala.collection.immutable.BitSet and scala.collection.mutable.BitSet. They are almost identical but the mutable version changes the bits in place. This is the same behaviour as in the Java java.util.BitSet class and is slightly faster than the immutable one (no copying required). I prefer the immutable one when performance is not an issue (make sure to profile to see if you really need the mutable one) because immutable data structures are much better for concurrency.
  • In Scala the BitSet classes are part of the Scala collection framework and give all the great many methods available for other collections.
    Here are a few examples:

    
      println(primebits.filter(_ % 2 == 0))  // BitSet(2)
    
      println(primebits.filterNot(_ % 2 == 0))  // BitSet(3, 5, 7, 11)
    
      println(primebits.map(_ * 3))  // BitSet(6, 9, 15, 21, 33)
    

    In Java you can’t use the features of the collection framework like the new style for loop introduced with Java 5 because java.util.BitSet is not part of the collection framework. There are methods to iterate over the java.util.BitSet but it is not as convenient as using a java.util.List.

  • The Scala code also allows you to use other collections like Sets to work the BitSets and methods like “&” because those methods are overloaded to work collections of type scala.collection.Set.

As you can see from just a few simple examples, the Scala BitSets are much more powerful, require less code and result in easier to read and more natural code then the Java BitSet.
This is just one example on how you can use Scala to express ideas much more concise and easier to understand than in Java. If you haven’t yet looked at Scala, I highly recommend giving it a try – even if you never use BitSets :-)

To see all the different BitSet implementations and available methods see the Scala Docs:
http://www.scala-lang.org/api/current/scala/collection/BitSet.html