jga & the java 1.5 forloop

Intro

One of the new features in Java 1.5 is the new forloop syntax. The new forloop is analagous to a foreach construct that is common in other languages. The syntax hides the existance of an iterator, and is even extended to cases that didn't have an iterator before (arrays). It also extends to enumerated types.

Using the old syntax, looping through a collection would be expressed as:

List<Fruit> fruits = Arrays.asList(
     apple, banana, blackberry, blueberry, cherry, grape, grapefruit,
     lemon, lime, mango, orange, pear, strawberry, tomato, watermelon);
     
...

for (Iterator<Fruit> iter = fruits.iterator(); iter.hasNext(); ) {
    Fruit f = iter.next();
    // perform some fruit-based operation on all fruits
} 

The more concise syntax allows us to do the same work without having to describe the iterator specifically: since the bulk of those cases is boilerplate, abstracting it away is generally to be considered a good thing.

for (Fruit fruit : fruits) {
    // perform the same operation
}

So, if hiding the iterator is a step forward, can we take things another step and capture more of the common patterns as part of the loop mechanism? The answer is 'sort of'. We can't completely hide the fact that decisions are being made, and that some elements are being processed. What we can do is hide a lot of the mechanics of the process, using the algorithmic support wrapped up in static methods defined in the net.sf.jga.util.Iterables class.

note: all of the code can be found here.

Filter

One of the more common idioms using the new forloop syntax will certainly be to conditionally execute the body of the forloop.

for (Fruit f : fruits) {
    if (f /* meets some condition */ ) {
  	    System.out.println(f);
    }
}

Using filter allows the test to become part of the loop itself.

import static net.sf.jga.util.Iterables.*;
...
for(Fruit f : filter(fruits, growsOnTree)) {
    System.out.println(f);
}

To accomplish this, we need to build the filter:

import net.sf.jga.fn.UnaryFunctor;
import net.sf.jga.fn.property.CompareProperty;

UnaryFunctor<Fruit,Boolean> growsOnTree =
      new CompareProperty<Fruit,PlantType>(Fruit.class,"Type",PlantType.tree);
...

Note that we're assuming that in the class Fruit exists the method

public PlantType getType() { ... }

OK -- so what. Does anyone really expect that this is easier to read then the alternative?

for(Fruit f : fruits) {
    if (f.getType() == PlantType.tree)
        System.out.println(f);      
}

At the point of call, it is, somewhat. Admittedly, the conceptual overhead of declaring the functor is intimidating at first. While it appears that doing filtering with a functor instead of an embedded conditional statement doesn't pay off in the simplest cases, there is considerably more power available when using the functor form.

The declaration of the growsOnTree filter is certainly less readable than the traditional if test, but its major advantage is that it is a declaration. It can appear anywhere in the source file, giving us a way to separate the selection logic from the procedural logic.

Also, because functors are full-fledged objects, they can be stored outside of the source file entirely: in a resource file or database. They can also be passed around the program, opening up different alternatives for refactoring.

For example: suppose our application needs to print subsets of our fruit data. We can pass in a predicate (a Functor that returns a boolean value) that will return true if we should print the item.

public void printSomeFruits(UnaryFunctor<Fruit,Boolean> whichFruits) {
    for (Fruit f : filter(fruits, whichFruits) 
        System.out.println(f);
    }
}

Without functors, about the best we could do is a method such as

public void printFruitsOfType(PlantType type) {
    for (Fruit f : fruits) 
        if (f.getType() == type)
            System.out.println(f);
    }
}

The former can be used with any test: not just testing for a specific value of type. For example, suppose we wanted to print everything except the tree fruits.

printSomeFruits(new UnaryNegate<Fruit>(growsOnTree));

This of course is not limited to just testing the type of plant: any test that can examine one item and return a boolean can be used.

Unique

In many cases, we will work with ordered collections and need to process each value in the collection only once regardless of how many times it occurs. Using unique with an orderered collection will ensure that each value is procesed only once.

List<Fruit> citrus = Arrays.asList(
    grapefruit, lemon, lemon, lemon, lime, lime, orange, orange );
...
for(Fruit f : unique(citrus)) {
    System.out.println(f);
}

Unique Citrus
=============
grapefruit
lemon
lime
orange

Obviously this works best when the list is ordered: when there are duplicates in an unordered list, there will be duplicates in the result.

By default, the unique() method uses a test based on the equals method of the objects in the list. There is another form that can be used to supply the test. For example, suppose the Fruit class includes this Comparator instance

import net.sf.jga.util.GenericComparator;
...
static public Comparator comp =
    new GenericComparator(new GetProperty(Fruit.class, "Name")); 

We can work with a list that contains duplicates with a test that requires a comparator, like this:

BinaryFunctor<Fruit,Fruit,Boolean> sameKind = new EqualTo<Fruit>(Fruit.comp);
        
for(Fruit f : unique(citrus, sameKind)) {
    System.out.println(f);
}

In this simplified example, the results would appear the same

Unique Citrus
=============
grapefruit
lemon
lime
orange

[ Note: in the above example, note the use of the GenericComparator. It takes a UnaryFunctor<T,Comparable> at construction: when comparing two objects, it applies the functor to each and compares he results. A large portion of the Comparators in the world are of a relatively simple form like this one: determine the ordering of two objects by comparing the values of a single property. When you need more than one property, consider ChainedComparator ]

Transform

In many cases, we need to process each object in a collection and use the results. In this example, we'll work with a list that is sorted on PlantType.

List<Fruit> fruitsByType = Arrays.asList(
     apple, banana, cherry, grapefruit, lemon, lime, mango, orange,
     pear, blackberry, blueberry, strawberry, tomato, grape,
     watermelon);

Using a functor that will return the type of each plant it is given, we can build a list of the types of plants in our data.

UnaryFunctor<Fruit,PlantType> getType = 
        new GetProperty<Fruit,PlantType>(Fruit.class,"Type");

for(PlantType f : transform(fruitsByType, getType)) {
    System.out.println(f);
}

This, however, results in a list that contains one entry for each fruit. What we probably want is one entry for each unique type of fruit. Each of the methods in the Iterables facade returns an Iterable, which makes it possible to nest the methods arbitrarily.

for(PlantType f : unique(transform(fruitsByType, getType))) {
    System.out.println(f);
}

which returns:

Unique Types
============
tree
bush
vine

In user interface work, this turns out to be a great way to build a list of type data: such a list is used frequently to populate a ListModel or a ComboBoxModel.

Merge

Sometimes, it is necessary to combine elements from multiple collections. merge, in its various flavors, allows a single iteration through two collections (and by extension, any number of collections).

By default, merge works with collections of objects that extend Comparable. The other two forms allow us work with arbitrary objects by passing a comparator or a functor. If a functor is passed, it should return boolean true when the current element of the first list is less than or equal to the current element of the second list.

If the input collections are already sorted, then merge will have the traditional semantics, ie, the resulting list will be the sorted union of the contents of the two input collections.

List<Fruit> yellow = Arrays.asList(banana, lemon, pear);

List<Fruit> citrus = Arrays.asList(
    grapefruit, lemon, lemon, lemon, lime, lime, orange, orange);

for(Fruit f : merge(yellow, citrus, Fruit.comp)) {
    System.out.println(f);
}

Merged Fruits
=============
banana
grapefruit
lemon
lemon
lemon
lemon
lime
lime
orange
orange
pear

As we saw previously with transform, merge can be nested within unique to remove duplicate elements.

for(Fruit f : unique(merge(yellow, citrus, Fruit.comp))) {
    System.out.println(f);
}

Merged Fruits
=============
banana
grapefruit
lemon
lime
orange
pear

A couple of other uses are easily implemented. First, its easy to implement an append operation, where the contents of list are appended to another. To perform an append, pass a constant true as the functor.

BinaryFunctor<Fruit,Fruit,Boolean> constantTrue = 
        new ConstantBinary<Fruit,Fruit,Boolean>(Boolean.TRUE);

for(Fruit f : merge(yellow, unique(citrus), constantTrue)) {
    System.out.println(f);
}

Appended Fruits
===============
banana
lemon
pear
grapefruit
lemon
lime
orange

Another use of the merge is a simple shuffle. The basic idea is that we're going to choose between the two lists at random. First, we need some way to generate a coin flip. In normal java, the code we'd want is the boolean expression:

Math.random() <= 0.5d

There are three elements here: Math.random (a generator), the constant value, and the less than comparison. To get this into functor form, we start by building a LessEqual binary functor. We need to bind the second argument of the functor to the constant value 0.5d. Then, we want to hook the first argument of the functor to a generator that produces random numbers. The result looks like this:

Generator<Boolean> coinflip = 
    new LessEqual<Double>().bind2nd(0.5d).generate(new Random());

[ Note: this next section has been updated to reflect functors that were added in 0.6. It had shown how to derive and use an adaptor functor in order to use a Generator where a BinaryFunctor was expected. In 0.6, GenerateBinary is included in the net.sf.jga.fn.adaptor package. If you're curious, the old version is extracted here. ]

The next problem is that the merge operation is going to try and pass two arguments to our Generator. Generator by definition doesn't support taking arguments, so we need to allow our coin flip to receive two arguments, even though they will be ignored. To do this, we'll need to use one of the Adaptors.

BinaryFunctor<Fruit,Fruit,Boolean> shuffle = 
        new GenerateBinary<Fruit,Fruit,Boolean>(coinflip);

for(Fruit f : merge(yellow, unique(citrus), shuffle)) {
    System.out.println(f);
}

Shuffled Fruits
===============
grapefruit
lemon
lime
banana
lemon
pear
orange

[ Note: if you pull down the HEAD revision, or use the forthcoming 0.7 release, all of the above can be replaced with

GenericParser parser = GenericParser.getInstance();
...
BinaryFunctor<Fruit,Fruit,Boolean> shuffle = 
    parser.parseBinary("Math.random() <= 0.5d", Fruit.class, Fruit.class, Boolean.class);

]

LookAhead

jga provides a number of Iterators that add functionality to the basic Iterator interface. These are the real strength of the library, and most of the algorithms are implemented by using, and in many cases returning, one of these (see the package docs). In fact, this is how all of the preceding methods are implemented: there is an underlying iterator for every one of the previous examples. The Iterables static method in every case simply builds the proper type of iterator and returns it.

All of the iterator implied by the first set of methods above work within the basic Iterator interface. They do their work while executing the hasNext() and next() methods. There are other iterators in jga that provide additional power, but do so by adding methods to the Iterator interface. These iterators could have been supported by static methods in the Iterables class, but it would have been of no benefit: you need a reference to the iterator in order to take advantage of the power that has been added to Iterator.

To support the new forloop syntax without sacrificing the power of these iterators, they all implement the Iterable interface by returning references to themselves. You can declare and instantiate the iterator outside of the forloop, and pass the iterator itself to the forloop. This way, you still have access to the iterator itself from within the forloop body.

The first of these is the LookAheadIterator. As its name suggests, the LookAheadIterator allows you to peek at elements that the iterator has not yet returned. You can declare how far ahead you'd like to look, and the iterator will (as long as there are enough items remaining to be returned) allow you to look at elements before the iterator 'consumes' them by returning them to you in the loop.

LookAheadIterator is a wrapper around some other iterator. You must pass another iterator to the LookAheadIterator's constructor, and you may optionally declare how far ahead of the current element you will be looking (the default is 1 position). LookAheadIterator provides two methods that let you see beyond the current element. The first, hasNextPlus(int n) will return true if there are 'n' elements yet to be returned by the iterator. In other words, it will return true if you could call next() n times without running out of elements. The other method, peek(int n) will return the element that is n positions beyond the current element. This would be the element we'd get on the nth call to next().

System.out.println("Looking Ahead");
System.out.println("=============");

LookAheadIterator<Fruit> lai = 
        new LookAheadIterator<Fruit>(yellow.iterator());

for(Fruit f : lai) {
    Fruit f1 = lai.hasNextPlus(1) ? lai.peek(1) : null;
    System.out.println(f +"\t"+f1);
}

Looking Ahead
=============
banana  lemon
lemon   pear
pear    null

Note that it isn't a good idea to use the iterator that you pass to the LookAheadIterator's constructor after you start using the LookAhead: it will cause the LookAhead's internal buffers to get out of sync, causing strange results.

Caching

The CachingIterator is somewhat like the LookAheadIterator except that it retains items already returned to the caller, rather than allowing access to items yet to be returned to the caller. By default, the cache size is 1, which will allow anyone with a reference to the iterator to access an item just returned without affecting the state of the iterator. CachingIterator provides two methods with similar purposes to LookAheadIterator: hasCached(int n) and cached(int n).

There's a little bit of a gotcha when working with the CachingIterator: because the forloop is consuming the iterator in the plumbing, the first element cached will always be the same item that is currently referred to by the loop variable. Therefore, when using a CachingIterator with the new forloop syntax, we don't gain any extra power until you set our cache size to be 2 or greater.

System.out.println("Looking Behind");
System.out.println("==============");

CachingIterator<Fruit> ci = new CachingIterator<Fruit>(yellow.iterator(), 2);

for(Fruit f : ci) {
    Fruit f1 = ci.hasCached(2) ? ci.cached(2) : null;
    System.out.println(f +"\t"+f1);
}

Looking Behind
==============
banana  null
lemon   banana
pear    lemon
                

As with the LookAheadIterator, it isn't a good idea to use the iterator that you pass to the CachingIterator's constructor. The CachingIterator won't have had a chance to store elements in its internal buffer.

Finding

The last iterator we'll consider (ironically, though, it was the first to be written) is the FindIterator. The LookAhead and Caching iterators allow us to access information that we normally wouldn't have access to. They don't, however, affect the underlying iterator in any visible way -- we're still going to get all of the elements in the underlying iterator in order.

The FindIterator allows us to skip items in at iteration programatically, by using a functor that describes the next item we wish to see. To do this, FindIterator provides a method

findNext(UnaryFunctor<T,Boolean> filter)

that advances the underlying iterator until an element is found for which the filter is true. The FindIterator's next() method will then return the element that was found in the underlying iterator. This can be used, for example, to skip data at the beginning of an iteration, as shown in the first example, which skips fruits at the beginning of the list that grow on trees.

System.out.println("Skipping Trees");
System.out.println("==============");

FindIterator<Fruit> fi = new FindIterator<Fruit>(fruits.iterator());

// skips the initial part of the list until a fruit is found
// that does not grow on trees
fi.findNext(noTreeFruits);

// Start processing the list from that point
for(Fruit f : fi) {
    System.out.println(f);
}

Skipping Trees
==============
blackberry
blueberry
cherry
grape
...

We skipped apple and banana: if blackberries really do grow on trees, then there's a problem in our database.

This can also be used inside the loop. This contrived example skips all adjacent tree fruits when a tree fruit is found (the skipped fruits are banana (after apple) and lemon, lime, mango, orange, and pear (after grapefruit)). Real applications of this ability arise in a number of parsing algorithms, especially in error handling (skipping tokens after an error until a known token is found).

System.out.println("Finding");
System.out.println("==============");

FindIterator<Fruit> fi = new FindIterator<Fruit>(fruits.iterator());

for(Fruit f : fi) {
    if (growsOnTree.fn(f))
        fi.findNext(noTreeFruits);
        
    System.out.println(f);
}

Finding
==============
apple
blackberry
blueberry
cherry
grape
grapefruit
strawberry
tomato
watermelon