Scala: Put a long one line operation on collections in it’s own method

Scala collections are extremely powerful and once you get used to all the methods and different collections you don’t want to get back to other programming languages that don’t offer all that power.
Often you can write rather complex operations in just one line instead of many loops and tempory variables thanks to methods like filter, map, foldLeft, reduceLeft, etc.

But while it is easy to chain such method calls for powerful transformations it can take a while for someone new to Scala to understand your code:
Let’s take the following code as an example. It is a contrived example about how to build a URL query string from a Map[String, String].

val params = Map("fantasy_book_1" -> "The Hobbit",
    "fantasy_book_2" -> "The Lord of the Rings",
    "science_book_1" -> "Tropical Ecology")

val queryString = params.filterKeys(_.startsWith("fantasy"))
      .map(t => URLEncoder.encode(t._1, "UTF-8") -> URLEncoder.encode(t._2, "UTF-8"))
      .foldLeft("?")((a, t) => a + (t._1 + "=" + t._2 + "&"))
      .dropRight(1)

The code is a “one liner” even though it is better to write it in several lines as I did here to make it more readable.
What the code does is build a query string for a URL from keys and values in a Map.
The code is not really difficult.
In the first line uses the filterKeys method to filter out only the books that belong to the fantasy category. The second line does the URL encoding using UTF-8, the builds the query string using foldLeft and at the end calls dropRight to get rid of the last “&”.

This code is quite easy to understand when you have some experience with Scala. You could come up with a similar code in Ruby or even Java 8 using the new lambdas.

But I recommend putting code like this in it’s own method with a good name. This makes it more readable for people new to your code – and to you too when you get back to the code after few months.

The following code shows how this might look like:

  def createFantasyBookQueryString(parameters: Map[String, String]): String = {
    parameters.filterKeys(_.startsWith("fantasy"))
      .map(t => URLEncoder.encode(t._1, "UTF-8") -> URLEncoder.encode(t._2, "UTF-8"))
      .foldLeft("?")((a, t) => a + (t._1 + "=" + t._2 + "&"))
      .dropRight(1)
  }

Calling the method makes the code easier to read and everyone who reads your code does not need to know the details of how you build the query string.

Always try to make the code as easy to read as possible.

Note that I prefer to add the return type to the method declaration. This is not necessary because the Scala compiler can figure it out automatically but it makes the code easier to read and when you change it later and make a mistake and return a different data type the compiler will tell you immediately.

Book review: Confessions of a Public Speaker by Scott Berkun

I like to go to developer conferences. It is great to meat new people and learn something about new technologies like Clojure, Scala or Akka.
Unfortunately sometimes the talks I listen to during conferences are terrible. Very boring speaker, many bullet points, monotone voice and other things make it hard to stay awake after 5 minutes.

Giving good talks is not easy and I have been guilty of giving bad and boring talks myself.

The good news is that public speaking can be learned. Not everybody will be able to speak like Steve Jobs or Barack Obama but everybody can improve. (If you want to learn how Job and Obama do or did it, I recommend The Presentation Secrets of Steve Jobs: How to Be Insanely Great in Front of Any Audience and Say it Like Obama and Win!: The Power of Speaking with Purpose and Vision, Revised and Expanded Third Edition).

The most important thing to do is to actually practice public speaking by giving talks. For next year I plan to give at least one talk a month. All this helps to practice. But practice is only good when you practice how to do it correctly. If you practice bad things you will just get better at doing bad things.

There are many ways to learn how to be a better speaker incl:

  • Listening to talks on youtube
  • Going to great talks and ask the speaker for advice
  • Going to a Toastmasters club near you. (I will do that next year)
  • reading a good book about public speaking

Reading a good book is always a good idea and Confessions of a Public Speaker by Scott Berkun is one of the best books I’ve read so far on public speaking.

It is a much more personal book than most other English and German books I’ve read (or plan to read) on public speaking.

Scott writes about many things a public speaker (from a professional speaker to anyone who wants to give a talk) should know incl:

  • How to deal with your fear of public speaking
  • How to work a tough room
  • How to prepare
  • How to talk on television
  • How not to be boring
  • How realistic it is to make $30,000 an hour
  • How to teach people something
  • Funny and scary stories by other speakers and what went wrong during their speeches

Scott not only explains what went well but also what went wrong during his own speeches.

Scott’s book is not only a great source of information about public speaking, it is a joy to read and often very funny because Scott is also a great writer (check out this blog)

Confessions of a Public Speaker is great for you if you want to give a talk at work or during your free time. No matter what you want to talk about, this book will be helpful. Highly recommend!
Note: Just reading through the book once is a good start but to actually get a lot out of it, it is important to check your talks against the advice in the book and see how you can improve your speeches and presentations!

Get the book from amazon:

Scala Cookbook announced

The cookbooks from O’Reilly are among the most popular books among developers. I was very happy this morning when I discovered that there is now an upcoming Scala Cookbook. I am sure it will be a great book for all Scala developers.
It is written by Alvin Alexander.

Once available, I will publish a review of the book. It is currently schedule for April 2013, according the the publishers website.

Order it from amazon:

New book “Akka in Action” announced

The Akka framework is one of the stars in the Scala and Java world. It was only a matter of time until books would appear.
Now the “Akka in Action” book has been announced. It is scheduled for final publication in Spring 2013 but 2 chapters are already available.
The “in action” serious of Manning has been very good so far and I have several very good books published by Manning about different topics like C++, Java, Scala or Ruby.

More information about the upcoming Akka book can be found here:
Akka in Action

Java concurrency: Understanding CopyOnWriteArrayList and CopyOnWriteArraySet

Java has a huge amount of useful collections and several are made specifically for use in concurrent code like the ConcurrentHashMap.

Two sometimes very useful classes are the CopyOnWriteArrayList and CopyOnWriteArraySet. They implement the java.util.List and the java.util.Set interface respectively.

Let’s focus on the CopyOnWriteArrayList to understand what it is all about. Contrary to the ArrayList, this class is thread safe. This means when you use it from several threads no undefined state can occur in the list.
As will all data structures it is important to understand when to use them. As the name CopyOnWrite says, a copy of the whole list is made each time you write to the list like adding an element or remove an element. As you can figure out yourself, this can be pretty expensive when your list is large.
This means that a CopyOnWriteArrayList (and CopyOnWriteArraySet) is mostly useful when you have few modifications but many reads because reads are very cheap and don’t require synchronization.

When you iterate over a CopyOnWriteArrayList and CopyOnWriteArraySet the iterator uses a snapshot of the underlying list (or set) and does not reflect any changes to the list or set after the snapshot was created. The iterator will never throw a ConcurrentModificationException.

Here is a code example:


import java.util.Arrays;
import java.util.concurrent.CopyOnWriteArrayList;

public class CopyOnWriteTest {

	public static void main(String[] args) throws InterruptedException {

		final CopyOnWriteArrayList<Integer> numbers = new CopyOnWriteArrayList<>(
				Arrays.asList(1, 2, 3, 4, 5));

		// new thread to concurrently modify the list
		new Thread(new Runnable() {
			@Override
			public void run() {
				try {
					// sleep a little so that for loop below can print part of
					// the list
					Thread.sleep(250);
				} catch (InterruptedException e) {
					Thread.currentThread().interrupt();
				}
				numbers.add(10);
				System.out.println("numbers:" + numbers);
			}
		}).start();

		for (int i : numbers) {
			System.out.println(i);
			// sleep a little to let other thread finish adding an element
			// before iteration is complete
			Thread.sleep(100);
		}
	}
}

Note: This is not production ready code, no proper exception handling, etc

Here is the output of this code:

1
2
3
numbers:[1, 2, 3, 4, 5, 10]
4
5

As you can see the for loop only prints the numbers 1-5 and the number 10 is not printed in the for loop as it was not present when the snapshot of the iterator was taken.

Conclusion:
CopyOnWriteArrayList and CopyOnWriteArraySet (which is implemented with a CopyOnWriteArrayList) are special data structures for use cases where you want to share the data structure among several threads and have few writes and many reads.
Always make sure to do a performance test for your code on real hardware to see how it performs in your application. And make sure to read the javadoc for all the methods to really understand how the data structures work.
Of course you can also use CopyOnWriteArrayList and CopyOnWriteArraySet from other JVM languages like Scala, Clojure, JRuby or Groovy.

Immutable collections.
Sometimes you just need to create the list or set once and then later only read from it. In this case I recommend having a look at the immutable collections from Guava.. They are always thread safe (as is every really immutable object) and are a better alternative to the wrapped immutable collections that come with the JDK. See the Guava website for why that is the case.

Book review: The C++ Standard Library – A Tutorial and Reference, 2nd Edition

When Nicolai Josuttis published the first edition of The C++ Standard Library – A Tutorial and Reference, it quickly become one of the most popular C++ books. The fantastic book was a must have for every serious C++ developer who wanted to use the standard library effectively.

Now in 2012, Nicolai Josuttis has published the 2nd edition of this amazing book and although I rarely use C++ anymore and bought a copy to see what has changed since the last edition and how C++11 was different from the older C++ standard.

The book is now over 1,000 pages and additional PDFs are available on the book’s website.

The larger amount of pages was necessary as with C++11 the standard library has grown considerable with many additions like new containers in the STL, libraries for concurrency and much more.

The 2nd edition of the book covers all that is new in the C++11 standard library and also gives an overview of the new languages features like the new for loop, move semantics, lambdas or the new meanning of the auto keyword. I really liked the language overview as all those new things are important now when using the new standard library.

As in the 1st edition everything is explained in great details with many examples and a refence section for all APIs.
The text is easy to read and the examples are very clear and easy to follow.

The book is as complete as it gets. No other C++ book ever covered the standard library in such detail and I doubt that any other book ever will.

I really like the 2nd edition of the book and it is as great as the 1st edition and fully up to date with the latest C++11 information.

For C++ programmers, buying this book is a no-brainer.
Even for Java, C#, Scala or Python programmers I think this is a great read – learning another language is always a great way to improve your programming skills. Even people who don’t like C++ and prefer to use other languages like Java C# (and there are often very good reasons to do so and I also use JVM based languages most of the time) they will often admit that the C++ standard library is one of the best written software libraries out there. This book will help you understand why.

The C++ Standard Library – A Tutorial and Reference, 2nd Edition is a MUST READ for everyone using or learning C++ and it will be your constant companion when writing C++ code.

You can order it from amazon here:

amazon.com amazon.co.uk amazon.de

Clojure gets Reducers – A Library and Model for Collection Processing

I just read this announcement about Reducers – a library and model for collection processing.

What I really like are the capabilities to process collections in parallel using Java’s Fork/Join Framework with the new fold function.

This is a great addition to Clojure – and a great time to add Clojure to your programming toolkit.

I just got my copy of the new Clojure Book by Chas Emerick, Christophe Grand, Brian Carper. The first 100 pages are awesome and I am sure so is the rest of the book. This book is a great way to get started with Clojure.

By the way: If you like parallelism and collections, make sure to also have a look at Scala’s Parallel Collections.

Getting started with Eclipse CDT, Threading Building Blocks, parallel_for and C++11 lambdas

When doing concurrency with C++ you have many choices incl:

  • Posix threads: Very low level, rather ugly C API, avoid if possible.
  • C++11 concurrency features: Very interesting and good stuff, no full compiler support yet but some things already usable with some compilers.
  • Boost: Has many good things, close to the features in C++11. Ready for production.
  • Intel’s Threading Building Blocks (TBB): Many great features, definitely worth checking out for C++ concurrency work.

There are more options (do a web search). For me at the moment Intel’s Threading Building Blocks (TBB) look the most interesting (but I am following the compiler support for C++11′s concurrency support closely).

In this short tutorial I will explain how to get started with Eclipse CDT (an Eclipse plugin for C/C++), Threading Building Blocks, a parallel for loop and how to add C++11′s lambdas to the code.

I use the latest GCC (4.7.0 at the time of this writing) for all the examples. Although GCC does not yet support all of C++11 (particularly not all concurrency stuff) it already supports quite a lot of the latest features incl. lambdas which is why we can add them to our TBB example.

I assume you have GCC 4.7.0 or newer installed on your Linux machine (get it from the website and follow the installation instructions).
I’ve installed it to
/opt/cpp
You can install it where you like, adjust your settings accordingly. I’ve also set my PATH variable that the latest GCC is used and not the one that was already present on my Ubuntu 11.04.

1) Get and install TBB
I downloaded the latest TBB from the project website. I use version 4.0 here.
I downloaded both the binary distribution and the source code. Unpack both in the same directory and call
$ make
Either leave everything where it is or copy it somewhere where you want to have the libraries and header files. I’ve copied everything to:
/opt/cpp
There is now a directory:
/opt/cpp/tbb40_20120201oss
(if you have a newer version of TBB, this may look slightly more different).

2) Get and install Eclipse + CDT
I just downloaded the Eclipse IDE for C/C++ Linux Developers from the Eclipse Website and unpacked it into
/opt/eclipse/cpp
Note: I have several different Eclipse installations running. One for Java, one for Scala and more for other languages like Clojure or Groovy. I had trouble with all plugins for all the languages in one installation. Having separate installations – and separate workspaces – make things easier.

3) Create a new C++ project with C++11 support and add TBB
I just created a new Eclipse project via File -> New -> Project and then choosing C++ Project and them Empty Project with Linux GCC.
Once done I configured the project. I right-clicked on the project and selected “Properties” (down at the bottom of the menu).
The I choose C/C++ Build -> Settings. This dialog then can be a bit confusing to beginners but it is not too difficult to figure out what to do.

I opened the GCC C++ Compiler and set the following things:

Includes:
Here I added the path to the TBB include files:
/opt/cpp/tbb40_20120201oss/include/
(adjust the path according to your installation).

Miscellaneous:
Here I added this option: -std=c++11
This tells the GCC 4.7 compiler to support the latest C++11 standard. The support is not yet complete but lot’s of stuff including the lambdas used in the example code below are already supported.

For the GCC C++ Linker I set the following things:

Libraries:
At the top window, you can specify the libraries. I just added “tbb” here. Note that no “-l” was required, this is added automatically.
I also set the Library Search Path to:
/opt/cpp/tbb40_20120201oss/build/linux_intel64_gcc_cc4.7.0_libc2.13_kernel2.6.38_release/
Again, you have to adjust this for your installation. This directory is where the TBB libraries are after compiling the sources.

5) Add code and run it
I added a new C++ source file by right-clicking on the project and then chose New -> Source File (or New -> Other -> C/C++ -> Source File when Eclipse does not already show the option for source file).

I added some source code like the sample below and compiled it with “Ctrl-B” and then ran it with “Ctrl-F11″.

#include <iostream>
#include "tbb/parallel_for.h"

using namespace tbb;
using namespace std;

void mult_by_two(int &i) {
	i = i * 2;
}

void update_parallel(int* a, int n) {
	parallel_for(0, n, 1, [=](int i) {
		mult_by_two(a[i]);
	});
}

void print_array(int* a, int n) {
	for(int i = 0; i < n; i++) {
		cout << a[i] << ":";
	}
	cout << endl;
}

int main() {
	/* NOTE: This is just for showing how to use the parallel_for loop
	 In practice it makes absolutely no sense to parallelize this with
	 an array that has only 5 elements! */
	int l = 5;
	int numbers[l];
	for(int i = 0; i < l; i++) {
		numbers[i] = i;
	}
	print_array(numbers, l);
	update_parallel(numbers, l);
	print_array(numbers, l);
}

The interesting part is the update_parallel function which takes an array, uses a parallel_for loop and then using a lambda as an argument to parallel_foe which tells the loop what to do with each element.
This is just a very simple example and both the parallel_fore and C++11 lambdas are much more powerful than what is shown in this example.

One important note: This example is just for demonstrating how to use Eclipse CDT, C++11, lambdas and TBB. It makes absolutely no sense updating an array with 5 elements in parallel as the overhead for making this run in parallel would be much worse than just iterating over the short array. With much larger arrays and maybe much more complicated operations on each element, things change. Make sure to test the performance on your production hardware!

As you can see, once you got everything working, it is easy to use. You can also use a different compiler, e.g. Intel’s C++ compiler (which is free for non-commercial use on Linux). See Intel’s website for more information.

TBB is a very interesting library and definitely worth having a closer look when you want to do concurrency today with C++. The new C++11 standard will not support all the things that are already in TBB (e.g. the standard currently has not thread-safe collections).
TBB’s documentation is good and easy to read.

If you need concurrency and parallelism in your programs – and if you want to or must use C++ – TBB is probably your best choice right now.
Make sure to also look at other options like C++11, Boost, Cilk Plus (also from Intel), Java (which has great concurrency support and is really fast) or Scala incl. the Akka toolkit.

Should you learn C++ in 2012?

Last year, the new C++ standard C++11 was finally published. It contains lot’s of interesting new stuff like lambdas, a threading library, a memory model, hash tables and much more.
See Bjarne Stroustrup’s C++11 FAQ for more on the new standard.

Nicolai Josuttis, author of the wonderful The C++ Standard Library: A Tutorial and Reference (2nd Edition)
recently said in an interview:
“Due to the complexity of C++, the support for the ordinary programmer is incredibly bad compared to other languages, and that’s a major drawback.”

I agree (but make sure to read the whole interview). Despite many improvements in C++11, C++ is still not an easy language. Java, while not as easy as some may think, is still an easier language with very good performance (often almost as fast as C++) and offers better IDEs (maybe Visual C++ is as good as current Java IDEs but no luck here on Linux), a much more comprehensive standard library (the JDK), more libraries (e.g. Hibernate, Spring, JEE, Lucene, etc) and no memory leaks (well, almost none – you can still build a memory leak in Java but it is more difficult to do so than in C++). To be fair, with modern C++11, it is much easier to avoid memory leeks.

Still, there are at least 4 very good reasons to learn C++ in 2012:

1) Raw performance
In some cases, highly optimized C++ can be faster than the best Java code. This won’t be easy and really tuning C++ can be time consuming (similar to all other languages) and error prone. Still, sometimes this is just what you need. This might be interesting, for example, with large data centers, scientific computing, bid data analysis, etc.

2) You are closer to the OS and Hardware
Although you can wrap OS system calls in other languages like Java or Python, sometimes you want to be very close to the OS, e.g. when writing device drivers or low level network servers. In those cases, C++ could be your fried. I would prefer C++ over C here as C++ is much more powerful and has a much better standard library.

3) You learn something!
This is the most important reason for me. I learned C++ more than 10 years ago and I learned a lot about programming, design and how to write clean code. And I think EVERY programmer should have used a language that requires you to allocate and free memory (it is much easier now with the latest C++ standard and stuff like smart pointers but you still need to understand how new and delete work in order to properly use C++). Many Java or Python programmers forget all about memory. Learning C++ will help you learn about memory management.
The C++ standard template library (STL) is one of the best crafted libraries out there and you will definitely learn something when you improve your understand of using this powerful library.

4) C++11 is much cooler than the older C++
C++11 has many cool features like lambdas, auto keyword, move semantics and much more. It is definitely a much better language than C++98 was.
See this overview by Herb Sutter about the new features in C++11:
Elements of Modern C++ Style.

To sum up, I think everyone should learn at least some C++. If you have been programming in Java, Scala, Ruby or Python for the last decade and never touched C or C++, now with C++11 I think it is time to learn C++11 and improve your programming skills while doing so.

I probably won’t be using C++ much at work in the coming years and when I need a more powerful language than Java, I will probably go for Scala, but I will have a closer look at C++11 and try to learn and understand the new features, particularly those about multithreading and concurrency.
Remember: It always helps to learn a new programming language even when you don’t plan to use it in production.

For learning C++11, here are a 3 good books (more on C++11 for sure will follow soon):


Great book on the new concurrency features of C++11.


An earlier edition of this book taught me a lot about C++ when I started to learn the language.
The latest edition also covers C++11.


The best book on the C++ standard library and one of the best programming books ever written. Now in it’s 2nd edition to cover C++11. I just god my copy and it is a fantastic book.

Understanding java.util.concurrent.CompletionService

The java.util.concurrent.CompletionService is a useful interface in the JDK standard libraries but few developers know it.
One could live without it as you can of course program this functionality with the other interfaces and classes within java.util.concurrent but it is convenient to have a solution that is already available and less error prone then doing it yourself. I always prefer stuff that is already available within the JDK over implementing my own solution with the same features – unless as an exercise at home!

Image you have a list of separate tasks that take a while, e.g. 10 tasks that each download an URL and return the content as a String.
Depending on the network, the size of the downloaded content and other factors, the time to download each URL will take various amounts of time.
When you execute them in parallel you may want to start doing something with the downloaded content as soon as the first task is done. No need to wait for the other 9 tasks to complete because that would mean you would always have to wait until all URLs are downloaded before doing something useful with each individual result.

You can of course execute all of the and get a List of Future objects and then poll on each one but it is easier to just use a CompletionService.

See the following example:

First, some dummy Callable:

import java.util.Random;
import java.util.concurrent.Callable;

public class LongRunningTask implements Callable<String> {

	public String call() {
		// do stuff and return some String
		try {
			Thread.sleep(Math.abs(new Random().nextLong() % 5000));
		} catch (InterruptedException e) {
			Thread.currentThread().interrupt();
		}
		return Thread.currentThread().getName();
	}
}

The LongRunningTask is just a place holder for a real Task you might want to implement.
In this dummy example, it just sleeps for a random amount of time and returns a String that contains the name of the current thread.

Second, an example using a CompletionService that uses the Callable above.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class CompletionServiceExample {

	// dummy helper to create a List of Callables return a String
	public static List<Callable<String>> createCallableList() {
		List<Callable<String>> callables = new ArrayList<>();
		for (int i = 0; i < 10; i++) {
			callables.add(new LongRunningTask());
		}
		return callables;
	}

	public static void main(String[] args) {

		ExecutorService executorService = Executors.newFixedThreadPool(10);

		CompletionService<String> taskCompletionService = new ExecutorCompletionService<String>(
				executorService);

		try {
			List<Callable<String>> callables = createCallableList();
			for (Callable<String> callable : callables) {
				taskCompletionService.submit(callable);
			}
			for (int i = 0; i < callables.size(); i++) {
				Future<String> result = taskCompletionService.take();	
				System.out.println(result.get()); 
			}
		} catch (InterruptedException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		} catch (ExecutionException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		}
		executorService.shutdown();
	}
}

Note: The examples don’t have proper exception handling to keep it simple. Don’t copy this into your production code!

The CompletionServiceExample shows how to use a CompletionService. You create an instance of ExecutorCompletionService (the only implementation of the CompletionService interface available with Java 7 or older versions) and then you submit all Callables to the CompletionService.

As soon as a task is completed, it is put in an internal java.util.concurrent.BlockingQueue (a highly efficient queue for Producer/Consumer problems and communication between threads).

From that queue, you can get the results of the finished tasks with take. If no task is yet available, take will wait until something is available.
In this case we just print the result (the name of the current threat executing the Callable).

This is all you need to know to use a CompletionService. It is really simple. There is a lot of cool stuff in the JDK and in the java.util.concurrent package. Make sure to browse through the docs from time to time before inventing your own solution.