Comparing JVMs on ARM/Linux
By Jim Connors 15 February 2012
For quite some time, Java Standard Edition releases have included
both client and server bytecode compilers (referred to as c1
and c2 respectively), whereas Java SE-Embedded binaries
only contained the client c1 compiler. The rationale for
excluding c2 stems from the fact that (1) eliminating optional
components saves space, where in the embedded world, space is at a
premium, and (2) embedded platforms were not given serious
consideration for handling server-like workloads. But all
that is about to change. In anticipation of the ARM
processor's legitimate entrance into the server market (see Calxeda),
Oracle has, with the latest update of Java SE-Embedded (7u2), made
the c2 compiler available for ARMv7/Linux platforms, further
enhancing performance for a large class of traditional server
applications.
These two compilers go about their business in different ways. Of the two, c1 is a lighter optimizing compiler, but has faster start up. It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations. Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes. Although slower at start up, it can be shown to achieve better performance over time. As a case in point, take a look at the graph that follows.
One of the most popular Java-based applications, Apache Tomcat,
was installed on an ARMv7/Linux device. The chart
shows the relative performance, as defined by mean HTTP request
time, of the Tomcat server run with the c1 client compiler (red
line) and the c2 server compiler (blue line). The HTTP
request load was generated by an external system on a dedicated
network utilizing the ab (Apache Bench) program. The
closer the response time is to zero the better, you can see that
for the initial run of 25,000 HTTP requests, the c1 compiler
produces faster average response times than c2. It takes
time for the c2 compiler to "warm up", but once the threshold of
50,000 or so requests is met, the c2 compiler performance is
superior to c1. At 250,000 HTTP requests, mean response time
for the c2-based Tomcat server instance is 14% faster than its c1
counterpart.
It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory). Our sample device with 1GB RAM, was more than adequate for these rounds of tests. Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.
While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison. The following chart shows mean http request times for 6 different configurations:
Results remain pretty much unchanged, so only the first 4 runs
(25K-100K requests) are shown. As can be seen, The Java SE-E
VMs are on the order of 3-5x faster than their OpenJDK
counterparts irrespective of the bytecode compiler chosen.
One additional promising VM called shark was not included
in these tests because, although it built from source
successfully, it failed to run Apache Tomcat. In defense of
shark, the ARM version may still be in development (i.e.
non-stable) mode.
Creating a really fast virtual machine is hard work and takes a
lot of time to perfect. Considering the resources expended
by Oracle (and formerly Sun), it is no surprise that the
commercial Java SE VMs are excellent performers. But the
extent to which they outperform their OpenJDK counterparts is
surprising. It would be no shock if someone in the know
could demonstrate better OpenJDK results. But herein lies
one considerable problem: it is an exercise in patience and
perseverance just to locate and build a proper OpenJDK platform
suitable for a particular CPU/Linux configuration. No
offense would be taken if corrections were presented, and a
straightforward mechanism to support these OpenJDK builds were
provided.