Category Archives: Java

Java concurrency: Understanding CopyOnWriteArrayList and CopyOnWriteArraySet

Java has a huge amount of useful collections and several are made specifically for use in concurrent code like the ConcurrentHashMap.

Two sometimes very useful classes are the CopyOnWriteArrayList and CopyOnWriteArraySet. They implement the java.util.List and the java.util.Set interface respectively.

Let’s focus on the CopyOnWriteArrayList to understand what it is all about. Contrary to the ArrayList, this class is thread safe. This means when you use it from several threads no undefined state can occur in the list.
As will all data structures it is important to understand when to use them. As the name CopyOnWrite says, a copy of the whole list is made each time you write to the list like adding an element or remove an element. As you can figure out yourself, this can be pretty expensive when your list is large.
This means that a CopyOnWriteArrayList (and CopyOnWriteArraySet) is mostly useful when you have few modifications but many reads because reads are very cheap and don’t require synchronization.

When you iterate over a CopyOnWriteArrayList and CopyOnWriteArraySet the iterator uses a snapshot of the underlying list (or set) and does not reflect any changes to the list or set after the snapshot was created. The iterator will never throw a ConcurrentModificationException.

Here is a code example:


import java.util.Arrays;
import java.util.concurrent.CopyOnWriteArrayList;

public class CopyOnWriteTest {

	public static void main(String[] args) throws InterruptedException {

		final CopyOnWriteArrayList<Integer> numbers = new CopyOnWriteArrayList<>(
				Arrays.asList(1, 2, 3, 4, 5));

		// new thread to concurrently modify the list
		new Thread(new Runnable() {
			@Override
			public void run() {
				try {
					// sleep a little so that for loop below can print part of
					// the list
					Thread.sleep(250);
				} catch (InterruptedException e) {
					Thread.currentThread().interrupt();
				}
				numbers.add(10);
				System.out.println("numbers:" + numbers);
			}
		}).start();

		for (int i : numbers) {
			System.out.println(i);
			// sleep a little to let other thread finish adding an element
			// before iteration is complete
			Thread.sleep(100);
		}
	}
}

Note: This is not production ready code, no proper exception handling, etc

Here is the output of this code:

1
2
3
numbers:[1, 2, 3, 4, 5, 10]
4
5

As you can see the for loop only prints the numbers 1-5 and the number 10 is not printed in the for loop as it was not present when the snapshot of the iterator was taken.

Conclusion:
CopyOnWriteArrayList and CopyOnWriteArraySet (which is implemented with a CopyOnWriteArrayList) are special data structures for use cases where you want to share the data structure among several threads and have few writes and many reads.
Always make sure to do a performance test for your code on real hardware to see how it performs in your application. And make sure to read the javadoc for all the methods to really understand how the data structures work.
Of course you can also use CopyOnWriteArrayList and CopyOnWriteArraySet from other JVM languages like Scala, Clojure, JRuby or Groovy.

Immutable collections.
Sometimes you just need to create the list or set once and then later only read from it. In this case I recommend having a look at the immutable collections from Guava.. They are always thread safe (as is every really immutable object) and are a better alternative to the wrapped immutable collections that come with the JDK. See the Guava website for why that is the case.

Understanding java.util.concurrent.CompletionService

The java.util.concurrent.CompletionService is a useful interface in the JDK standard libraries but few developers know it.
One could live without it as you can of course program this functionality with the other interfaces and classes within java.util.concurrent but it is convenient to have a solution that is already available and less error prone then doing it yourself. I always prefer stuff that is already available within the JDK over implementing my own solution with the same features – unless as an exercise at home!

Image you have a list of separate tasks that take a while, e.g. 10 tasks that each download an URL and return the content as a String.
Depending on the network, the size of the downloaded content and other factors, the time to download each URL will take various amounts of time.
When you execute them in parallel you may want to start doing something with the downloaded content as soon as the first task is done. No need to wait for the other 9 tasks to complete because that would mean you would always have to wait until all URLs are downloaded before doing something useful with each individual result.

You can of course execute all of the and get a List of Future objects and then poll on each one but it is easier to just use a CompletionService.

See the following example:

First, some dummy Callable:

import java.util.Random;
import java.util.concurrent.Callable;

public class LongRunningTask implements Callable<String> {

	public String call() {
		// do stuff and return some String
		try {
			Thread.sleep(Math.abs(new Random().nextLong() % 5000));
		} catch (InterruptedException e) {
			Thread.currentThread().interrupt();
		}
		return Thread.currentThread().getName();
	}
}

The LongRunningTask is just a place holder for a real Task you might want to implement.
In this dummy example, it just sleeps for a random amount of time and returns a String that contains the name of the current thread.

Second, an example using a CompletionService that uses the Callable above.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class CompletionServiceExample {

	// dummy helper to create a List of Callables return a String
	public static List<Callable<String>> createCallableList() {
		List<Callable<String>> callables = new ArrayList<>();
		for (int i = 0; i < 10; i++) {
			callables.add(new LongRunningTask());
		}
		return callables;
	}

	public static void main(String[] args) {

		ExecutorService executorService = Executors.newFixedThreadPool(10);

		CompletionService<String> taskCompletionService = new ExecutorCompletionService<String>(
				executorService);

		try {
			List<Callable<String>> callables = createCallableList();
			for (Callable<String> callable : callables) {
				taskCompletionService.submit(callable);
			}
			for (int i = 0; i < callables.size(); i++) {
				Future<String> result = taskCompletionService.take();	
				System.out.println(result.get()); 
			}
		} catch (InterruptedException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		} catch (ExecutionException e) {
			// no real error handling. Don't do this in production!
			e.printStackTrace();
		}
		executorService.shutdown();
	}
}

Note: The examples don’t have proper exception handling to keep it simple. Don’t copy this into your production code!

The CompletionServiceExample shows how to use a CompletionService. You create an instance of ExecutorCompletionService (the only implementation of the CompletionService interface available with Java 7 or older versions) and then you submit all Callables to the CompletionService.

As soon as a task is completed, it is put in an internal java.util.concurrent.BlockingQueue (a highly efficient queue for Producer/Consumer problems and communication between threads).

From that queue, you can get the results of the finished tasks with take. If no task is yet available, take will wait until something is available.
In this case we just print the result (the name of the current threat executing the Callable).

This is all you need to know to use a CompletionService. It is really simple. There is a lot of cool stuff in the JDK and in the java.util.concurrent package. Make sure to browse through the docs from time to time before inventing your own solution.

Short book review: Functional Programming for Java Developers

This is a short review of the book Functional Programming for Java Developers.

With FP getting a lot of attention lately more and more Java developers want to learn about it. While Java is not really a functional language many ideas from FP can also be used in Java, although not always a elegantly as in languages like Scala or Clojure.

In July 2011 O’Reilly published Functional Programming for Java Developers. It is written by Dean Wampler who is also a coauthor of the wonderful Programming Scala book (also available online).

This short new book provides the basics of FP for Java developers who do not yet know much about FP.

The author explains why FP is important today and why it makes sense to learn it. He explains the basics of immutable types and shows some of the common functions found in functional languages like map, foldLeft, filter, etc. The implementations are done in Java and are easy to follow. It is clear from the code and the explanations that the author really knows his stuff.

An interesting chapter explains some new concurrency ideas like actors and STM (software transactional memory). Those things are not restricted to functional programming but languages like Erlang, Haskell or Scala support some or all of those new ideas in the language itself or as libraries.

The most important thing to take from this book are the ideas. Of course it is more convenient to use Scala for FP than Java (although this will get better with Java 8), but that doesn’t mean you can’t do it with Java.
Ideas are always more important than a language. If you understand a concept in one language, you will normally be ably to apply it in another even if it may be more work in one than the other.
After reading this book, I think you will have a good basic understanding of the advantages and ideas of FP.

At the end of the book the author gives a lot of links to interesting websites, other languages and frameworks (including functional programming libraries for Java!).

What’s not to like?
Given to the shortness of the book sometimes the reader wishes for a little more details but the idea behind the book is to be short, so it is inevitable that not everything can be explained in great details.
I would like to see an online supplement where the code examples from the book are also implemented in Scala and Clojure so the reader can see how much easier it is in those languages.

For the relatively low price (I paid a little less than 12 Euros) I think this is a great short book for every Java developer who is new to FP and want’s to see what it is all about and how to get starting using Java.
Highly recommended!

I recommend the ebook. It is just a short book which can easily be read on a computer.

After reading this book, make sure to follow the authors advice and look at the resources for further learning.
I personally recommend learning Scala as it combines the power of OOP and FP. I also recommend learning Haskell as it is a pure functional language (Scala, of course is not). I just started playing with Haskell and and will take some time to get your mind around it but it will definitely be worth it.

Print book from amazon.com:

Kindle book from amazon.com:

A little functional Java – the map function

Functional programming is getting a lot of interested nowadays. Languages like Clojure or Scala are on the rise and even languages like Haskell which where formerly considered to be mostly academic are getting more and more popular.
And more and more Java programmers are thinking about functional programming. Java 8 will get lambda expressions which will be a first step towards a more functional programming style. And there are already libraries available that help Java developers to program in a more functional style even if it is not as elegant as using Scala or Clojure.

Today I want to write a little about the map function (not to be confused with the map data structure, e.g. a HashMap in Java) which is popular in all functional programming languages I know.
I is very simple to understand. It is used on a collection (often a list) of values and is given a function as a parameter that is applied to every element in that collection.

A first example in Scala

Let’s look at a few examples in Scala before we have a look at how to do it in Java.

val numbers = List(1,2,3,4,5)
val squared = numbers.map(x => x * 2)

If you print squared you will see:

List(2, 4, 6, 8, 10)

It shouldn’t be difficult to figure out what’s going on here. The map method is invoked an the numbers List. The argument passed to map is a function (in Scala you can pass functions to other functions or methods) and this function is then applied to all values of the collection the map method is called on and a new collection (here a scala.collection.immutable.List) is returned with the results of the supplied function applied on every element.
In Scala this is so common that you can write it a little shorter:

val squared = numbers.map(_ * 2)

The _ is a placeholder for the current element on which passed is applied function.

This is just a very simple function but you can passed every function to map in Scala, giving you a very powerful tool to work with collections. In fact the Scala collection library offers A LOT more than just map – be sure to check it out. One example can be found on my blog here.

Just for the curious, here is how to do it in Clojure:

(println (map #(* % 2) [1, 2, 3, 4 ,5]))

Using Java

Java (before Java 8 at least) doesn’t allow you to pass functions to other methods/functions like Scala. But you can still use something similar using functions as objects.
There are several libraries out there that help Java programmers with functional programming. One I recently came across is totallylazy. It doesn’t have much documentation yet but is under active development and the unit tests and the source code are helpful to figure out how to use it.

Here is a simple example that shows how to use map:

import com.googlecode.totallylazy.Sequence;
import static com.googlecode.totallylazy.Sequences.sequence;
import static com.googlecode.totallylazy.numbers.Numbers.increment;

public class MapTest {
    public static void main(String[] args) {
        Sequence<Integer> numbers = sequence(1, 2, 3, 4, 5, 6);
        Sequence<Number> incremented = numbers.map(increment());
        System.out.println(incremented);
    }
}

It should again be pretty obvious what’s going on here. We call map on a Sequence object and pass it increment() (which is a com.googlecode.totallylazy.Callable1) which comes with totallylazy and just increments every value in the Sequence.

totallylazy comes with a bunch of useful Callables but often you will want to supply your own version. Here is an example that multiplies every element by a given factor:

import com.googlecode.totallylazy.Callable1;
import com.googlecode.totallylazy.Sequence;
import static com.googlecode.totallylazy.Sequences.sequence;

public class MapTest {

    public static Callable1<Integer, Integer> multiplyByFactor(final int factor) {
        return new Callable1<Integer, Integer>() {
            public Integer call(Integer number) throws Exception {
                return number * factor;
            }
        };
    }
    public static void main(String[] args) {
        Sequence<Integer> numbers = sequence(1, 2, 3, 4, 5, 6);
        Sequence<Integer> multiplied = numbers.map(multiplyByFactor(5));
        System.out.println(multiplied);
        
    }
}

Here we created our own Callable1 and pass it to map. It will print:

5,10,15,20,25

Again, this is not very difficult and you can check the totallylazy source code to see for many more examples. Of course you can also pass a anonymous class for a Callable1 if you only need it locally and don’t want to reuse it.

There are also methods called mapConcurrently that call map (surprise!) concurrently.

An alternative to totallylazy is Functional Java which has been under development for longer and might be a better choice at the moment for production ready code.

Both libraries offer a lot more than just map.

If you only need map, it is not difficult to write something similar like Callable1 yourself.

As you can see, although not difficult in Java, it requires a lot more code than the examples in Scala or Clojure.
If you want to do a lot of functional programming I highly recommend using a library like totallylazy or Functional Java – or even better a functional language like Scala or Clojure.
Scala gives you the power of both OOP and FP and let’s you choose which is best for a given task.
And even if you don’t want to use Scala or Clojure I highly recommend learning them. It will make you a better Java programmer!

If you want to look at a pure functional language (a very good idea to understand the ideas behind FP), have a look at Haskell. A good starting point is the freely available book: Learn You a Haskell

How to use java.util.concurrent.CountDownLatch

With some applications that use several threads there are situations where one thread can only start after some others have completed.
For example, image a program that downloads a bunch of web pages, zips them and send then the zip file via email. If you program this in a multithreaded way, the thread that zips the downloaded web pages cannot start before the downloads are complete.
How do you do this? One very simple way is to use a CountDownLatch from the java.util.concurrent package ( a package every Java developer should have a closer look at).

With a CountDownLatch you can specify a number and then count down by 1 once an operation has completed. If all operations have completed and the count is 0, another thread that uses the same CountDownLatch as a synchronisation tool using the await method can do it’s work.

Let’s look at a simple example. The first class is a simple Runnabe that does some work. In our example it does nothing really useful, it just sleeps for a random number of milliseconds to simulate some work.


import java.util.Random;
import java.util.concurrent.CountDownLatch;

public class Worker implements Runnable {

    private CountDownLatch countDownLatch;

    public Worker(CountDownLatch countDownLatch) {
        this.countDownLatch = countDownLatch;
    }

    @Override
    public void run() {
        try {
            Thread.sleep(getRandomSeconds()); // sleep random time to simulate long running task
            System.out.println("Counting down: " + Thread.currentThread().getName());
            this.countDownLatch.countDown();
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
    }

    // returns a long between 0 and 9999
    private long getRandomSeconds() {
        Random generator = new Random();
        return Math.abs(generator.nextLong() % 10000);
    }
}

The only really interesting line here is the call to:
this.countDownLatch.countDown();
Once the task is done, the counter in the CountDownLatch is decremented by one.

Here is the 2nd class that uses this Runnable

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class WorkManager {

    private CountDownLatch countDownLatch;
    private static final int NUMBER_OF_TASKS = 5;

    public WorkManager() {
        countDownLatch = new CountDownLatch(NUMBER_OF_TASKS);
    }

    public void finishWork() {
        try {
            System.out.println("START WAITING");
            countDownLatch.await();
            System.out.println("DONE WAITING");
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
    }

    public void startWork() {
        ExecutorService executorService = Executors.newFixedThreadPool(NUMBER_OF_TASKS);

        for (int i = 0; i < NUMBER_OF_TASKS; i++) {
            Worker worker = new Worker(countDownLatch);
            executorService.execute(worker);
        }
        executorService.shutdown();
    }

    public static void main(String[] args) {
        WorkManager workManager = new WorkManager();
        System.out.println("START WORK");
        workManager.startWork();
        System.out.println("WORK STARTED");
        workManager.finishWork();
        System.out.println("FINISHED WORK");
    }
}

The startWork method uses and ExecutorService (another useful and important class from java.util.concurrent) to start the Runnables.
In the method finishWork we call the await method that waits until the counter inside the CountDownLatch is 0.

If you run this example you get the following output:

START WORK
WORK STARTED
START WAITING
Counting down: pool-1-thread-3
Counting down: pool-1-thread-4
Counting down: pool-1-thread-1
Counting down: pool-1-thread-5
Counting down: pool-1-thread-2
DONE WAITING
FINISHED WORK

As you can see, 5 different threads are started and the finishWork method does not complete it’s work until the
CountDownLatch is at 0.

As you can see, using a CountDownLatch is very easy. There are other similar classes in the java.util.concurrent like a CyclicBarrier which is worth looking at. In upcoming posts, I will write more about the java.util.concurrent package ant it’s useful classes, interfaces and methods.
Instead of the ExecutorService you can just use java.lang.Thread but I recommend always using the higher level ExecutorService whenever possible. With all the stuff in java.util.concurrent, there is rarely a need to use the low level classes like java.lang.Thread (which does not mean you shouldn’t know how they work!).

BitSets in Scala are much more fun than in Java

Recently I played with BitSets in Java because I needed an efficient way to store huge amounts of long values.
For Java there is the java.util.BitSet class. It is a very efficient implementation when you only need to store bit values.

Here is a trivial example on how to use it:


import java.util.BitSet;

public class BitSetTest {
    
    public static void main(String args[]) {
        
        BitSet primeBits = new BitSet();
        primeBits.set(2);
        primeBits.set(3);
        primeBits.set(5);
        primeBits.set(7);
        primeBits.set(11);
        
        BitSet evenBits = new BitSet();
        evenBits.set(0);
        evenBits.set(2);
        evenBits.set(4);
        evenBits.set(6);
        evenBits.set(8);
        evenBits.set(10);
        
        //primeBits.and(evenBits);  // will result in primeBits only containing 2
        primeBits.andNot(evenBits);
        
        System.out.println(primeBits); // {3, 5, 7, 11}
        
    }
}

It is very easy to understand and use. But it seemed to require quite a lot of code and using methods like “and” doesn’t seem as natural as using “&” to get the intersection of two BitSets or “&~” to get the difference. In a language like C++ you can use operator overloading to get a more natural way of dealing which those operations.

After playing with the Java code, I looked up BitSets in Scala and with Scala using BitSets is much more fun.
See here for a simple example:


import scala.collection.immutable.BitSet
object BitsetTest {

  def main(args : Array[String]) : Unit = {

    val primebits = BitSet(2, 3, 5, 7, 11)
    val evenBits =  BitSet(0, 2, 4, 6, 8, 10)
    
    val evenSet = Set(0, 2, 4, 6, 8, 10);
  
    println(primebits & evenBits)  // BitSet(2)
    println(primebits & evenSet)  // BitSet(2)
    
    println(primebits &~ evenBits)  // BitSet(3, 5, 7, 11)
    println(primebits &~ evenSet)   // BitSet(3, 5, 7, 11)

 }

}

Scala BitSets have many advantages over Java BitSets:

  • In Scala you can add values to a BitSet in the Constructor. In Java you can only create a empty BitSet (optionally specifying it’s size) and then you have to use the set method to add the individual elements.
  • Scala allows you to define methods like “&” “&~” or “+” which allows for a more concise and natural code. Some more examples:
    
        // add single integers to the list
        val morePrimes = primebits + 13 + 17  
    
    
        val primeList = List(19, 23, 29)
        
         // using ++ you can add elements from other collections
        println(morePrimes ++ primeList)  // BitSet(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
       
        // remove 11
        println(morePrimes - 11)  // BitSet(2, 3, 5, 7, 13, 17)
    
        // remove all elements in evenBits from morePrimes
        println(morePrimes -- evenBits) // BitSet(3, 5, 7, 11, 13, 17)
    
    
  • In Scala BitSet comes in two versions: scala.collection.immutable.BitSet and scala.collection.mutable.BitSet. They are almost identical but the mutable version changes the bits in place. This is the same behaviour as in the Java java.util.BitSet class and is slightly faster than the immutable one (no copying required). I prefer the immutable one when performance is not an issue (make sure to profile to see if you really need the mutable one) because immutable data structures are much better for concurrency.
  • In Scala the BitSet classes are part of the Scala collection framework and give all the great many methods available for other collections.
    Here are a few examples:

    
      println(primebits.filter(_ % 2 == 0))  // BitSet(2)
        
      println(primebits.filterNot(_ % 2 == 0))  // BitSet(3, 5, 7, 11)
        
      println(primebits.map(_ * 3))  // BitSet(6, 9, 15, 21, 33)
    
    

    In Java you can’t use the features of the collection framework like the new style for loop introduced with Java 5 because java.util.BitSet is not part of the collection framework. There are methods to iterate over the java.util.BitSet but it is not as convenient as using a java.util.List.

  • The Scala code also allows you to use other collections like Sets to work the BitSets and methods like “&” because those methods are overloaded to work collections of type scala.collection.Set.

As you can see from just a few simple examples, the Scala BitSets are much more powerful, require less code and result in easier to read and more natural code then the Java BitSet.
This is just one example on how you can use Scala to express ideas much more concise and easier to understand than in Java. If you haven’t yet looked at Scala, I highly recommend giving it a try – even if you never use BitSets :-)

To see all the different BitSet implementations and available methods see the Scala Docs:
http://www.scala-lang.org/api/current/scala/collection/BitSet.html

Using DirectoryStreams in Java 7

Java 7 comes with lot’s of new stuff for IO. The new interfaces and classes added to the java.nio package contain lot’s of useful functionality for working with files and other things like asynchronous IO.

Here I want to show you a little bit about the interface DirectoryStream which is very useful when you want to work with the content of a directory (e.g. the files it contains).

In order to create a new Directory Stream you use one of the methods in the new utility class java.nio.file.Files. This method contains tons of useful methods for working with files and directories. A few let you create a new DirectoryStream. This interface can be used like a collection because it extends Iterable. That is you can use it in a for like this:
for (Path p : directoryStream)
which is very convenient.

Below are a few methods that show you how to use a directory stream:

First we define a simple utility methods:

private static void printStreamInfo(DirectoryStream<Path> dirStream) {
    for (Path path : dirStream) {
        System.out.println("Filename: " + path.getFileName());
    }
}

This method just iterates over a DirectoryStream and prints out the filename. The interface Path is also new in Java 7 and represents a file system path like a file or a directory. It also contains a huge amount of useful methods. Make sure to check it out.

Here is a method that prints the file name for all files in a given directory (it does not walk into subdirectories. For this you will need the FileVisitor interface, which I will describe in an upcoming post).

private static void printInfoForAllFiles(Path dirPath) {
    try (DirectoryStream<Path> dirStream = Files.newDirectoryStream(dirPath)) {
        printStreamInfo(dirStream);
    } catch (IOException ex) {
        //no valid exception handling for production code!!!
        System.out.println("Cannot open dir " + dirPath + ": " + ex.getMessage());
    }
}

This method is very simple. It just creates a new DirectoryStream and then calls the method printStreamInfo shown above. The code uses the Try with resources feature that comes with Java 7. You can also catch a DirectoryIteratorException which is a RunTimeException thrown when there is an error iterating over the DirectoryStream

Here is another, very similar method:

private static void printInfoForFilesWithPattern(Path dirPath, String globPattern) {
    try (DirectoryStream<Path> dirStream = Files.newDirectoryStream(dirPath, globPattern)) {
        printStreamInfo(dirStream);
    } catch (IOException ex) {
        //no valid exception handling for production code!!!
        System.out.println("Cannot open dir " + dirPath + ": " + ex.getMessage());
    }
}

The difference here is the glob pattern. You can use it do show only files that correspond to a special pattern. For example to only list all the Ruby files in your directory, use "*rb". To show both Python and Ruby files, you can use "*.{rb,py}". There are many more patterns available. See here for a detailed documentation:
http://download.oracle.com/javase/tutorial/essential/io/fileOps.html

You can call the above method like this:

 // only java files
 printInfoForFilesWithPattern(Paths.get("/home/markus/temp/"), "*java");
        
 // all files ending with "a", for example all java and scala files
 printInfoForFilesWithPattern(Paths.get("/home/markus/temp/"), "*a");
        
 // all files staring with "L"
 printInfoForFilesWithPattern(Paths.get("/home/markus/temp/"), "L*");
        
 // all Ruby and Python files (or other files ending with "rb" or "py")
 printInfoForFilesWithPattern(Paths.get("/home/markus/temp/"), "*.{rb,py}");

The last example uses the DirectoryStream.Filter interface to list only files that are larger that a given amount of kilobytes.
This interface defines only the method accept(T entry) which returns true of an entry confirms to the rules specified by the filter.

private static void printInfoForLargeFiles(Path dirPath, final int sizeInKB) {
    DirectoryStream.Filter<Path> largeFileFilter = new DirectoryStream.Filter<>() {

        @Override
        public boolean accept(Path path) throws IOException {
            if (Files.size(path) > (sizeInKB * 1024)) {
                return true;
            }
            return false;
        }
    };
        
    try (DirectoryStream<Path> dirStream = Files.newDirectoryStream(dirPath, largeFileFilter)) {
        printStreamInfo(dirStream);
    } catch (IOException ex) {
        //no valid exception handling for production code!!!
        System.out.println("Cannot open dir " + dirPath + ": " + ex.getMessage());
    }
}

As you can see, using the new DirectoryStream and DirectoryStream.Filter is very simple. All the new methods, classes and interfaces added to Java 7 for file IO make working with files very easy and convenient. If you already use Java 7 and have to work with files, make sure to have a close look.

A much more detailed documentation can be found in the Java Tutorial for Java 7 on the Oracle website:
File I/O (Featuring NIO.2)

How to make Java classes immutable

Immutable classes have been a hot topic lately. The rise of functional languages who operate mostly on immutable data and the advantage of immutable data when using multiple threads (correct immutable classes are thread safe) have also created a new interest in immutable data in Java.
While some languages like Scala encourage (but do not enforce) a programming style using immutable classes, in the Java world this is less common but it can be done nonetheless. In face, many classes in the JDK are immutable, for example classes like Long, Integer, Double, etc are all immutable. Also BigInteger or BigDecimal and of course the String class are immutable.

Immutable classes are also more secure. For example an attacker could change the members of your classes and do bad stuff with it. For example he could subclass your classes and send an email from one of the overridden methods with private data.

In this article, I show you how to turn an ordinary mutable classes into an immutable one.

A mutable class

Let’s imagine your boss wants a new class that represents a bill for an online shop. Here is a first example of a mutable class called Bill. This is of course not a realistic example of a real online shop. :-)


import java.util.Date;

public class Bill {
    
    private int amount;
    private Date date;

    public Bill(int amount, Date date) {
        this.amount = amount;
        this.date = date;
    }

    public int getAmount() {
        return amount;
    }

    public void setAmount(int amount) {
        this.amount = amount;
    }

    public Date getDate() {
        return date;
    }

    public void setDate(Date date) {
        this.date = date;
    }
}

In this version of the class, all the members can be changed after instances of the class have been created.

An immutable class

Let’s make this class immutable.


import org.joda.time.DateTime;

public final class Bill {
	 
    private final int amount;
    private final DateTime dateTime;
 
    public Bill(int amount, DateTime dateTime) {
        this.amount = amount;
        this.dateTime = dateTime;
    }
 
    public int getAmount() {
        return amount;
    }
 
    public DateTime getDateTime() {
        return dateTime;
    }
}

In this example, several changes have been made to make the class immutable:

  • The class is final. That means no subclasses can be created. A subclass of an immutable class can be made mutable again. As noted above, an attacker could use that to get to confidential data.
  • The variables are all final and cannot be changed after construction
  • In the constructor we use the import org.joda.time.DateTime class. This is a better version than the java.util.Date because it is immutable. Using a java.util.Date would be dangerous as it is a mutable class and we can’t control the calling thread (which might modify it).
  • There are no setter methods for the members.

This version of the Bill class is immutable. Now imagine your boss calls you and tells you that you need to implement another method which increased the amount of the bill after the Bill object was already created. At first you try to explain that changed the state of the class is not possible but your boss insists on this change.
You think a little bit and come up with this design of the new method


 public Bill addAmount(int amount) {
        return new Bill(this.amount + amount, dateTime));
 }

This does the trick. Instead of changing the internal state of the Bill object and using a void method, you create a completely new Bill object and return it. The caller of the addAmount method will have to use this new object if he wants to use the correct bill. This is similar to methods like replace of the String class. They don’t really change the string on which you called the method but return a new String object.

Using immutable collections in immutable classes

Now your boss comes again and tells you that the Bill object must also keep a list of orders.
In order to do that, first you have to make an immutable Order object.


public final class Order {
    
    private final int id;

    public Order(int id) {
        this.id = id;
    }

    public int getId() {
        return id;
    }
}

The new version of the Bill object now looks like this:


import org.joda.time.DateTime;

import com.google.common.collect.ImmutableList;

public final class Bill {

	private final int amount;
	private final DateTime dateTime;
	private final ImmutableList<Order> orders;

	public Bill(int amount, DateTime dateTime, ImmutableList<Order> orders) {
		this.amount = amount;
		this.dateTime = dateTime;
		this.orders = orders;
	}

	public ImmutableList<Order> getOrders() {
		return orders;
	}

	public int getAmount() {
		return amount;
	}

	public DateTime getDateTime() {
		return dateTime;
	}

	public Bill addAmount(int amount) {
		return new Bill(this.amount + amount, dateTime, orders);
	}

	public Bill addOrder(Order newOrder) {
		ImmutableList<Order> newOrderList = new ImmutableList.Builder<Order>()
				.addAll(orders).add(newOrder).build();
        	return new Bill(this.amount, dateTime, newOrderList);
	}
}

This version uses a new final com.google.common.collect.ImmutableList. This
is part of the Google Guava library where there are many different immutable collections (see here for more details: ImmutableCollectionsExplained .

The addOrder method creates a new com.google.common.collect.ImmutableList and then creates a new Bill object, similar to the addAmount method.
The caller of the addOrder method will have to use the newly returned object to use the correct Bill instance.

Note: com.google.common.collect.ImmutableList implements the java.util.List interface but I normally use the com.google.common.collect.ImmutableList in the type declaration to make it clear that I want this object to be immutable.

What about performance?

You may wonder about performance? The creation of new ones in the constructors or methods like addAmount or addOrder are more expensive than in a mutable class. In some situations this can be a disadvantage of your immutable classes but in most projects this probably won’t matter. To be sure, you should of course profile and test your application.

If possible immutable classes are preferred for thread safety and security. If you come from functional languages like Haskell or Lisp, this will feel very natural to you anyway. If you’ve been using mutable classes mostly, this may require some new thinking but could greatly improve your code. Of course you always have to decide for each class you develop if it makes sense.

Why Scala seems difficult but really isn’t

When I learned C++ and Java a long time ago, I loved Bruce Eckel’s books Thinking in C++ (two volumes) and Thinking in Java .
They had very clear and detailed explanations and I learned a lot from them. So I value his opinion. Bruce also often wrote very positively about Python, a language I like a lot.

A few days ago, he published a great article about Scala:

Scala: The Static Language that Feels Dynamic”

It is a very interesting article which shows that Scala is not complex – at least not more complex than Java. He writes “… Scala should be a lot easier than learning Java!”.

I agree. When you use the subset of Scala that let’s you do with Scala what you can do with Java, it is not more difficult at all, probably even easier than Java.

Here are a few reasons why I think Scala seems more difficult:

1) Programmers think they can master Scala within a few days or weeks

Programmers are used to Java and learning something new is always an effort. Not everyone is willing to really make the effort. To truly master Java, it takes years of practice. Some programmers think after programming in Java for 10 years or more, they should become as proficient in Scala as they are in Java within a few weeks. But that’s not how it works, no matter how easy a language is. You can write simple or even somewhat complex programs in Scala after a few days studying it, but to truly master it, it will take a lot of time and effort. But this can be very rewarding. I am still relatively new to Scala and don’t claim to be an expert or even understand everything about the language. But when I play with it, learn or read something about, I always have fun and learn something. But I know to give it time to become an expert. If you just started with Scala, give it time. When you are stuck, ask on the mailing lists or try a different website or book with explanations. Don’t give up. Even if you end up not liking Scala very much, you will definitely become a better programmer by learning it.

2) Programmers don’t know anything or not much about functional programming

Java is not a functional programming language. If a programmer has been using mostly Java and C++ for the last years, he may have created great software with Java in an object oriented way. This works great with Java and there is absolutely nothing wrong with that. I love Java and OOP is a great way for building software. But when people start playing with Scala they are also confronted with functional programming. You can do Scala in a pure OOP way but you will get more out of the language if you also master functional programming (FP). Because most programmers don’t have much experience with it, it seems very difficult and strange at first. But like I wrote above, you will have to give it some time to really understand it. It will be worth it and you will even think differently about your Java code when you’ve played more with high order functions, closures and recursion. FP is not always better or worse than OOP. It is another tool that is good to have in your toolbox. Sometimes FP is a better fit, sometimes OOP and sometimes a combination. This is why Scala supports both.

3) Many Scala websites are blogs can seem intimidating to beginners

When someone is really good at a language, he or she often want’s to show it. This is why Ruby sometimes looks very complicated when a Ruby guru writes a blog post with 50 lines of code and 10 different meta programming techniques within them.
The same is true for Scala. In many blog posts and websites, you find stuff about Monads, advanced FP, very concise but not necessarily very readable code and other things you won’t need in your daily business and other stuff that is way too confusing for a beginner.
This can be and sometimes is very intimidating for beginners who just want to read a file with Scala.
That doesn’t mean I don’t like those blogs, but I think the Scala community should publish more simple stuff.

4) Lack of a good cookbook for Scala

Many programmers learn by examples of how to do everyday tasks like opening files, sending an emails or building a socket server. For many languages there are great cookbooks with hundreds of recipes about how to do that. Such a book does not yet exist for Scala. I think it could really help to bring even more people to this wonderful language.

Conclusion: Scala is not difficult – just keep on learning

This are just a few reasons why I think Scala sometimes seems more difficult than it really is. If you don’t know Scala yet, I highly recommend reading Bruce Eckel’s article mentioned above.
If you already know some Scala but struggle a bit to grasp some of it’s concepts, keep on learning. As I wrote, you cannot become a Scala expert within 3 weeks. Keep pushing until you are comfortable with Scala and you will very likely really love it. And if you don’t you will nonetheless have learned a lot. (Btw, this holds true for all languages).

File system events with Java 7

In the last post, I showed how to listen to Linux file system events using C, Ruby and Python.
In this post, we look at Java 7. Java 7 has several new classes in the java.nio.file package that let you listen to file system events. The number of events available is not as extensive as the ones in the C, Ruby and Python example.
If you want to use the inotify mechanism directly in Java, look at the following libraries:
JNotify
inotify-java

In this example, we only look at the features of the new classes that come with Java 7. For this test, I used jdk-7-ea-bin-b144 64 bit on a Linux machine. It should work exactly like that when Java 7 is final.

Here is the source code:


package com.markusjais;

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Path;
import java.nio.file.StandardWatchEventKinds;
import java.nio.file.WatchEvent;
import java.nio.file.WatchEvent.Kind;
import java.nio.file.WatchKey;
import java.nio.file.WatchService;

// Simple class to watch directory events.
class DirectoryWatcher implements Runnable {

    private Path path;

    public DirectoryWatcher(Path path) {
        this.path = path;
    }

    // print the events and the affected file
    private void printEvent(WatchEvent<?> event) {
        Kind<?> kind = event.kind();
        if (kind.equals(StandardWatchEventKinds.ENTRY_CREATE)) {
            Path pathCreated = (Path) event.context();
            System.out.println("Entry created:" + pathCreated);
        } else if (kind.equals(StandardWatchEventKinds.ENTRY_DELETE)) {
            Path pathDeleted = (Path) event.context();
            System.out.println("Entry deleted:" + pathDeleted);
        } else if (kind.equals(StandardWatchEventKinds.ENTRY_MODIFY)) {
            Path pathModified = (Path) event.context();
            System.out.println("Entry modified:" + pathModified);
        }
    }

    @Override
    public void run() {
        try {
            WatchService watchService = path.getFileSystem().newWatchService();
            path.register(watchService, StandardWatchEventKinds.ENTRY_CREATE,
                    StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_DELETE);

            // loop forever to watch directory
            while (true) {
                WatchKey watchKey;
                watchKey = watchService.take(); // this call is blocking until events are present

                // poll for file system events on the WatchKey
                for (final WatchEvent<?> event : watchKey.pollEvents()) {
                    printEvent(event);
                }

                // if the watched directed gets deleted, get out of run method
                if (!watchKey.reset()) {
                    System.out.println("No longer valid");
                    watchKey.cancel();
                    watchService.close();
                    break;
                }
            }

        } catch (InterruptedException ex) {
            System.out.println("interrupted. Goodbye");
            return;
        } catch (IOException ex) {
            ex.printStackTrace();  // don't do this in production code. Use a loggin framework
            return;
        }
    }
}

public class FileEventTest {

    public static void main(String[] args) throws InterruptedException {
        Path pathToWatch = FileSystems.getDefault().getPath("/tmp/java7");
        DirectoryWatcher dirWatcher = new DirectoryWatcher(pathToWatch);
        Thread dirWatcherThread = new Thread(dirWatcher);
        dirWatcherThread.start();
        
        // interrupt the program after 10 seconds to stop it.
        Thread.sleep(10000);
        dirWatcherThread.interrupt();

    }
}


This is a simple example on how to use the new classes. I created a new Thread that listens in an infinite loop for changes in the directory “/tmp/java7″. For each event (when a file is created, modified or deleted), the event and the file name is printed to Stdout. Note that this also works when creating or deleting directories.

Basically you create a WatchService, register the directory to watch (with the events to watch for), loop forever, create a WatchKey and poll on the WatchKey for events, then go over the events and do something with them, like printing as in this example. When done processing the events, reset the WatchKey so that it can contain new events.
The method reset returns true if the WatchKey is still valid. When you delete the watched directory, it returns false and in this example, the code breaks out of the while loop and terminates.

Note that in a real production system, you would probably not use System.out.println but do something else, like updating the directory view in a file manager, sending an email (for example, when watching a directory for activities that are not allowed, etc) or other actions.

In this example, I interrupt the program after 10 seconds. This is just to show you how to end watching a directory.

To test it, create the directory “/tmp/java7″ and then create, modify and delete a few files in it. To see the reset method in action, remove the directory. If you want to play longer than 10 seconds, just remove the call to interrupt at the end of the main method.

For more information, see the javadoc of Java 7:
http://download.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html