The Benchmark
For the sake of simplicity and consistency, we'll use a subset of
the DaCapo benchmark suite.
It's an open source group of real world applications that put a
good strain on a system both from a processor and memory workload
perspective. We are aware of customers who use DaCapo to gauge
performance, and due to its availability and ease of use, enables
anyone interested to run their own set of tests in fairly short
order.
The Hardware
It would have been grand to run all these benchmarks on one platform, most notably the beloved Raspberry Pi, but unfortunately it has its limitations:
Java SE-E armel vs. armhf
The chart that follows compares the relative performance of the armel
JavaSE-E 7u40 JRE with the armhf JavaSE-E 7u40 JRE for 8
of the DaCapo component applications. These tests were
conducted on the Boundary Devices BD-SL-i.MX6. Both armel
and armhf environments were based on the Debian Wheezy
distribution running a 3.0.35 kernel. For all charts, the
smaller the result, the faster the run.
In all 8 tests, the armhf binary is faster, some only slightly, and in one case (eclipse) a few percentage points faster. The big performance gain associated with the armhf standard deals with floating point operations, and in particular, the passing of arguments directly into floating point registers. The performance gains realized by the newer armhf standard will be seen more in the native application realm than for Java SE-Embedded primarily because the Java SE-E armel VM already uses FP registers for Java floating point methods. There are still however certain floating point workloads that may show a modest performance increase (in the single digit percent range) with JavaSE-E armhf over Java SE-E armel.
Java SE-E Client Compiler (c1) vs. Server Compiler (c2)
In this section, we'll show tests results for two different
platforms, the first a single core system, followed by the same
tests on a quad-core system. To further demonstrate how
workload changes performance, we'll take advantage of the ability
to run the DaCapo component applications in three different modes:
small, default (medium) and large. The first chart displays
the aggregate time required to run the tests for the three modes,
utilizing both the 7u40 client (c1) compiler and the server (c2)
compiler. As expected, c1 outperforms c2 by a wide margin
for the tests that run only briefly. As the total time to
run the tests increases from small to large, the c2 compiler gets
a chance to "warm up" and close the gap in performance. But
it never does catch up.
Contrast the first chart with the one that follows where small,
medium and large versions of the tests were run on a quad core
system. The c2 compiler is better able to utilize the
additional compute resources supplied by this platform, the result
being that initial gap in performance between c1 and c2 for the
small version of the test is only 19%. By the time we reach
the large version, c2 outperforms c1 by 7%. The moral of the
story here is, given enough resources, the server compiler might
be the better of the VMs for your workload if it is a long-lived
process.
Java SE-E 7u40 armhf vs. Open JDK armhf
For this final section, we'll break out performance on an application-by-application basis for the following JRE/VMs:
The OpenJDK packages were pulled from the Debian Wheezy distribution.
It appears the bulk of performance work to OpenJDK/Arm still
revolves around the OpenJDK 6 platform even though Java 7 was
released over two years ago (and Java 8 is coming soon).
Regardless, Java SE still outperforms most OpenJDK tests by a wide
margin, and perhaps more importantly appears to be the much more
reliable platform considering the number of tests that failed with
the OpenJDK variants. As demonstrated in previous
benchmark results, the older armel OpenJDK VMs appear to be
more stable than the armhf versions tested here. Considering
the stated direction by the major linux distributions is to
migrate towards the armhf binary standard, this is a bit eye
opening.
As always, comments are welcome.