Building LINQ in Java

Now that Java 8 has lambdas, I decided to check out what kind of lazy collection support their streams functionality had. It had some cool stuff, like

  • map
  • filter
  • flatMap
  • distinct
  • sorted
  • limit (i.e. take)
  • skip
  • reduce (i.e fold)
  • min
  • max
  • any
  • all
  • generate (for building infinite lists)

Not too bad. But, there are some problems, as far as I see it. First, you can’t extend it. With .NET they solved the lambda problem with extension methods, but Java oddly chose “defender” methods, which lets you safely update an interface declaration, but ONLY the author can do it, not any consumer. That sucks. I want other fun functions like

  • first
  • last
  • nth
  • windowed
  • intersperse
  • intercalate
  • tails
  • intersect
  • zip
  • groupRuns

And pretty much any other F#/Haskell list function. Why not? If we’re going to go with lazy evaluated lists we might as well have all the fun that comes with them. There is another problem with the Java 8 streams API, unless I’m mistaken, there is no way to end an infinite stream generated using the generate function. That really sucks since not every lazy generated stream is infinite.

So, I decided to try my hand and rebuilding LINQ in Java 8. For the impatient, the full source, tests, and benchmarks are available at my github.

Iterator chains

The basic idea here is to create an iterator for each type of processing you want to do. If you want to do a map function, you should create an iterator that wraps a source. When the iterator consumes from the underlying source it will emit a projected element. How do you do this? Well, you can use a fluent API that returns a new enumerable class that wraps a specific iterator. For example

private Function<Iterable<TSource>, Iterator<TSource>> iteratorGenerator;

public static <TSource> Enumerable<TSource> init(Iterable<TSource> source){
    return new Enumerable<>(_ig -> new EnumerableIterator<>(source));      
}                                                                          

protected Enumerable(Function<Iterable<TSource>, Iterator<TSource>> iteratorGenerator) {
    this.iteratorGenerator = iteratorGenerator;                                         
}                                                                                       

@Override                                
public Iterator<TSource> iterator() {    
    return iteratorGenerator.apply(this);
}                                        
       
// The underlying iterator

public class EnumerableIterator<TSource> implements Iterator<TSource> {
    protected Iterator<TSource> source;
    private Iterable<TSource> input;

    public EnumerableIterator(Iterable<TSource> input){
        this.input = input;

        reset();
    }

    protected void reset(){
        source = input.iterator();
    }

    @Override
    public boolean hasNext() {
        return source.hasNext();
    }

    @Override
    public TSource next() {
        return (TSource)source.next();
    }
}                                  

Lets first look at the underlying iterator. It does nothing other than iterate over the source. That’s pretty simple.

The thing that wraps it is a Enumerable class that takes a function that, when given an Iterable, returns a new iterator. Then it just exposes that iterator. In general, thats the whole thing. The iterator is wrapped in a function so that we can request a new iterator each time someone tries to iterate over this enumerable. That matches what .NET does; if someone tries to re-iterate an enumerable you get a new iterator and start over.

Take

Let’s look at a simple iterator that takes only N elements.

public class TakeIterator<TSource> extends EnumerableIterator<TSource> {
    private int takeNum;

    public TakeIterator(Iterable<TSource> results, int n) {
        super(results);
        takeNum = n;
    }

    @Override
    public boolean hasNext() {
        return source.hasNext() && takeNum > 0;
    }

    @Override
    public TSource next(){
        takeNum--;
        return source.next();
    }
}

And to create an instance of enumerable that uses this

public Enumerable<TSource> take(int n){                                    
    return enumerableWithIterator(source -> new TakeIterator<>(source, n));
}                                                                          

private <TResult> Enumerable<TResult> enumerableWithIterator(Function<Iterable<TSource>, Iterator<TResult>> generator){
    return new Enumerable<>(_ig -> generator.apply(this));                                                             
}                                                                                                                      

Basically I return a new enumerable with a lazy evaluated function that gives the iterator its underlying Iterator source. By returning a new enumerable each time we can effectively chain the iterators together. Nothing is evaluated until someone tries to get the next value.

One comment

  1. Pingback: Building LINQ in Java pt 2 | Onoffswitch.net

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>