We have all heard about how HotSpot is really good at dealing with short-lived objects (both allocation and GC), but the truth is that object allocation is still pretty costly when compared to operations like addition or multiplication. Allocating an object for each step of an iteration over a large collection to make a simple computation might sound like the kind of thing no-one would ever do, but it’s actually quite common in languages like Scala (as described in a previous post). In Java-land, if you use the Function class from Google Collections with primitive wrappers, the same issue may occur. There are many JVM improvements that could help depending on the specific case (generic specialisation, value types, fixnums to name a few), but it’s unclear if/when we’ll get them.

So, what about that title? Escape analysis was introduced during Java 6, but the information gathered was only used for lock elision. However, this information can also be used for other interesting optimisations like scalar replacement and stack allocation. There have been doubts about the benefits of stack allocation (discussed in the comments) so the focus has been on scalar replacement so that the object is never in memory. At least that’s the theory.

Edward Lee started a couple of threads in the Hotspot-dev mailing list about scalar replacement here and here which reminded me to do some experiments. Note that this feature is still in development so the results posted here are preliminary and not indicative of how it will perform once it’s final. Still, it’s interesting to see how well it works at this time. I picked the latest JDK7 build (build 41) and ran a few tests with the following arguments passed to java “-XX:MaxPermSize=256m -Xms128m -Xmx3328m -server -XX:+UseConcMarkSweepGC” and either XX:-DoEscapeAnalysis or XX:+DoEscapeAnalysis.

I started by choosing the simplest test possible. Note that either testSimpleAllocation or testNoAllocation would be commented out.

class C(val a: Int, val b: Int)

object Test {
  def main(args: Array[String]) {
    for (i <- 1 to 10) testSimpleAllocation()
    //for (i <- 1 to 10) testNoAllocation()
  }
  
  def testSimpleAllocation() = {
    System.gc()
    var time = System.currentTimeMillis;
    var i = 0
    var sum = 0
    while (i < 1000000000) {
      sum += baz(new C(i + 1, i + 2))
      i += 1
    }
    println(sum)
    println(System.currentTimeMillis - time)
  }
  
  def testNoAllocation() = {
    System.gc()
    var time = System.currentTimeMillis;
    var i = 0
    var sum = 0
    while (i < 1000000000) {
      sum += baz(i + 1, i + 2)
      i += 1
    }
    println(sum)
    println(System.currentTimeMillis - time)
  }
  
  def baz(a: Int, b: Int): Int = a + b
  
  def baz(c: C): Int = c.a + c.b
}

The result were:


testNoAllocation: 403
testSimpleAllocation with EA: 1006
testSimpleAllocation without EA: 9190

As we can see, escape analysis has a tremendous effect and the method finishes in 11% of the time taken with it disabled. However, the version with no allocation is still substantially faster.

I decided to test a foreach method that takes a Function object next (this time in Java because it does less magic behind the scenes):

package test;

public class EscapeAnalysis {
  
  interface Function<T, R> {
    R apply(T value);
  }
  
  interface IntProcedure {
    void apply(int value);
  }
  
  static class BoxedArray {
    private final int[] array;
    
    public BoxedArray(int length) {
      array = new int[length];
    }
    
    public int length() {
      return array.length;
    }
    
    public void foreach(Function<Integer, Void> function) {
      for (int i : array)
        function.apply(new Integer(i));
    }
    
    public void foreach(IntFunction function) {
      for (int i : array)
        function.apply(i);
    }

    public void set(int index, int value) {
      array[index] = value;
    }

    public void foreachWithAutoboxing(Function<Integer, Void> function) {
      for (int i : array)
        function.apply(i);
    }
  }
  
  public static void main(String[] args) {
    BoxedArray array = new BoxedArray(100000000);
    /* We are careful not to restrict our ints to the ones in the Integer.valueOf cache */
    for (int i = 0; i < array.length(); i++)
      array.set(i, i);
    
    for (int i = 0; i < 10; i++)
      test(array);
  }

  private static void test(BoxedArray array) {
    System.gc();
    long time = System.currentTimeMillis();
    final int[] sum = new int[] { 0 };
    
    /* Uncomment the one that should be executed */
    testIntegerForeach(array, sum);
//    testIntegerForeachWithAutoboxing(array, sum);
//    testIntForeach(array, sum);

    System.out.println(System.currentTimeMillis() - time);
    System.out.println(sum[0]);
  }
  
  private static void testIntegerForeachWithAutoboxing(BoxedArray array, final int[] sum) {
    array.foreachWithAutoboxing(new Function<Integer, Void>() {
      public Void apply(Integer value) {
        sum[0] += value;
        return null;
      }
    });
  }
  
  private static void testIntegerForeach(BoxedArray array, final int[] sum) {
    array.foreach(new Function<Integer, Void>() {
      public Void apply(Integer value) {
        sum[0] += value;
        return null;
      }
    });
  }

  private static void testIntForeach(BoxedArray array, final int[] sum) {
    array.foreach(new IntFunction() {
      public void apply(int value) {
        sum[0] += value;
      }
    });
  }
}

The results were:


testIntForeach: 130
testIntegerForeachWithAutoboxing with EA: 1064
testIntegerForeach with EA: 224
testIntegerForeachWithAutoboxing without EA: 1039
testIntegerForeach without EA: 1024

This test shows something interesting, EA gives no improvement if Integer.valueOf (called by auto-boxing) is used instead of new Integer. Apart from that, the results are somewhat similar to the first ones (EA provides a substantial boost, but not enough to match the specialised implementation). After quickly testing that the boxing methods in ScalaRunTime had the same effect as Integer.valueOf, I decided that it was not worth testing more complex scenarios.

It seems like there’s a lot of potential for scalar replacement, but HotSpot needs to do a better job at detecting cases where it can be used safely. If nothing else, at least knowledge of the valueOf methods should be hardcoded into the system. I hope that a more general solution is found though because other languages on the JVM may use different methods (as mentioned earlier Scala uses methods in ScalaRunTime instead). It will also be interesting to see if the performance can get even closer to the no allocation case. Since the feature is still in development, we can hope. :)

Update: The early access release of JDK 6 Update 14 also contains this feature.
Update 2: This feature is enabled by default since JDK 6 Update 23.