Before x86-64 came along, the decision on whether to use 32-bit or 64-bit mode for architectures that supported both was relatively simple: use 64-bit mode if the application requires the larger address space, 32-bit mode otherwise. After all, no point in reducing the amount of data that fits into the processor cache while increasing memory usage and bandwidth if the application doesn’t need the extra addressing space.
When it comes to x86-64, however, there’s also the fact that the number of named general-purpose registers has doubled from 8 to 16 in 64-bit mode. For CPU intensive apps, this may mean performance at the cost of extra memory usage. On the other hand, for memory intensive apps 32-bit mode might be better in if you manage to fit your application within the address space provided. Wouldn’t it be nice if there was a single JVM that would cover the common cases?
It turns out that the HotSpot engineers have been working on doing just that through a feature called Compressed oops. The benefits:
- Heaps up to 32GB (instead of the theoretical 4GB in 32-bit that in practice is closer to 3GB)
- 64-bit mode so we get to use the extra registers
- Managed pointers (including Java references) are 32-bit so we don’t waste memory or cache space
The main disadvantage is that encoding and decoding is required to translate from/to native addresses. HotSpot tries to avoid these operations as much as possible and they are relatively cheap. The hope is that the extra registers give enough of a boost to offset the extra cost introduced by the encoding/decoding.
Compressed Oops have been included (but disabled by default) in the performance release JDK6u6p (requires you to fill a survey), so I decided to try it in an internal application and compare it with 64-bit mode and 32-bit mode.
The tested application has two phases, a single threaded one followed by a multi-threaded one. Both phases do a large amount of allocation so memory bandwidth is very important. All tests were done on a dual quad-core Xeon 5400 series with 10GB of RAM. I should note that a different JDK version had to be used for 32-bit mode (JDK6u10rc2) because there is no Linux x86 build of JDK6u6p. I chose the largest heap size that would allow the 32-bit JVM to run the benchmark to completion without crashing.
I started by running the application with a smaller dataset:
JDK6u10rc2 32-bit
Single-threaded phase: 6298ms
Multi-threaded phase (8 threads on 8 cores): 17043ms
Used Heap after full GC: 430MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC
JDK6u6p 64-bit with Compressed Oops
Single-threaded phase: 6345
Multi-threaded phase (8 threads on 8 cores): 16348
Used Heap after full GC: 500MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
The performance numbers are similar and the memory usage of the 64-bit JVM with Compressed Oops is 16% larger.
JDK6u6p 64-bit
Single-threaded phase: 6463
Multi-threaded phase (8 threads on 8 cores): 18778
Used Heap after full GC: 700MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC
The performance is again similar, but the memory usage of the 64-bit JVM is much higher, over 60% higher than the 32-bit JVM one.
Let’s try the larger dataset now:
JDK6u10rc2 32-bit
Single-threaded phase: 14188ms
Multi-threaded phase (8 threads on 8 cores): 73451ms
Used Heap after full GC: 1.25GB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC
JDK6u6p 64-bit with CompressedOops
Single-threaded phase: 13742
Multi-threaded phase (8 threads on 8 cores): 76664ms
Used Heap after full GC: 1.45GB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
The performance difference and memory overhead are the same as with the smaller dataset. The benefit of Compressed Oops here is that we still have plenty of headroom while the 32-bit JVM is getting closer to its limits. This may not be apparent from the heap size after a full GC, but during the multi-threaded phase the peak memory usage is quite a bit larger and the fact that the allocation rate is high does not help. This becomes more obvious when we look at the results for the 64-bit JVM.
JDK6u6p 64-bit
Single-threaded phase: 14610
Multi-threaded phase (8 threads on 8 cores): 104992
Used Heap after full GC: 2GB
JVM Args: -XX:MaxPermSize=256m -Xms4224m -Xmx4224m -server -XX:+UseConcMarkSweepGC
I had to increase the Xms/Xmx to 4224m for the application to run to completion. Even so, the performance of the multi-threaded phase took a substantial performance hit when compared to the other two JVM configurations. All in all, the 64-bit JVM without compressed oops does not do well here.
In conclusion, it seems that compressed oops is a feature with a lot of promise and it allows the 64-bit JVM to be competitive even in cases that favour the 32-bit JVM. It might be interesting to test applications with different characteristics to compare the results. It’s also worth mentioning that since this is a new feature, it’s possible that performance will improve further before it’s integrated into the normal JDK releases. As it is though, it already hits a sweet spot and if it weren’t for the potential for instability, I would be ready to ditch my 32-bit JVM.
Update: The early access release of JDK 6 Update 14 also contains this feature.
Update 2: This feature is enabled by default since JDK 6 Update 23.
[…] For a performance comparison, see https://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/. […]
Could this help in the JNI space? e.g. libs compiled for a 32bit JVM running in an otherwise 64bit JVM?
Hi Pete,
I don’t think so. A 64-bit JVM with compressed oops would still expect the native code to be compiled in 64-bit mode. As I understand, only objects in the managed heap can be compressed.
Ismael
I hope the folks of Apple read this article and finally bring a Java 6 to everyone (e.g. me with a first generation MacBook Pro)
Sciss,
Although that would be nice, I would not hold my breath if I were you. ;)
Ismael
This is hardly new. JRockit has had this feature (we call it compressed references) for several years. IBM recently added it to their JVM, and the Apache Harmony implementation started with compressed references before they did a “normal” 64-bit JVM.
Cheers,
Henrik, JRockit team
Hi Henrik,
First of all, no-one claimed that HotSpot was the first to have some form of reference compression. In fact, the first trackback mentions that IBM and BEA JVMs support something similar.
The blog entry was mostly about testing what kind of effect the feature has on one particular real-life application and HotSpot was the natural choice since it’s open-source and the most widely used JVM.
Even though I was aware that BEA had some form of compression, I had not looked into the details. I decided to briefly check the information for all 3 JVMs you mentioned and I found that when compressed references are used (if the documentation is accurate):
– BEA only supports a 4GB heap[1].
– Harmony only supports a 4GB heap[2].
– IBM supports a 25GB heap[3], that’s closer to the Sun JVM one, but the feature is also relatively recent.
Maybe the Sun implementation is indeed something new, since it supports 32GB. That was actually one of the major advantages in my example.
Ismael
[Edited to change references into direct links]
The 4 vs 32 GB discussion is interesting. The goal with JRockit’s implementation is to make 64-bit JVMs as performant as 32-bit for small heap sizes, with the rationale that large heap sizes brings other issues that are more important to deal with (long GC pauses). We have known for a long time that 32 GB (and larger) heaps are possible with compressed references. The limit is actually 2^32 objects, not an artificial restriction on heap size. A 4 GB restriction gives the best performance since that makes the translation operation very cheap, 32 GB and above requires shifting the pointer which adds overhead, especially during GC when you basically have to dereference all objects (eg longer GC pauses). I believe this is discussed in old JavaOne presentations on JRockit, but I don’t have any links.
32 GB is possible through the observation that all objects are aligned such that the lower so many bits are always zero (carry no information). But you can equally well align objects on another boundary, expanding to 64/128 GB or larger. The problem is that you will waste memory.
At the end of the day, it all comes down to tradeoffs. Code complexity vs throughput vs GC pause times, and as always there is no one single “perfect” choice. And the actual implementation is much more important than the algorithm…
Cheers,
Henrik
Hi Henrik,
Yes, there is no single “perfect” choice and indeed it’s all about trade-offs. I just happen to think that the 32GB limit hits a sweet spot for many server apps at this point in time. It’s certainly annoying to have to endure a 60% memory size overhead just because you’re past the 4GB ceiling.
When it comes to the performance impact of shifting the pointer, you have actually just described one of the main points of this blog entry. ;) In other words, to check if it would have any measurable impact on an application with a high allocation rate (which obviously has GC implications). It turns out that it doesn’t so it seems to me that the HotSpot engineers did a good job. Of course, your mileage may vary, this is just a single data point.
And yes, 32GB is the limit to avoid potential holes between objects and the resulting wasted memory. Nikolay Igotti covered the possibility of larger heap sizes on his blog 1.5 years ago, so HotSpot would probably support it if there was enough demand.
I disagree that actual implementation is much more important than the algorithm though. They are both very important and the best implementation can’t save a bad algorithm and vice-versa (this is a generalisation and as such there might be exceptions, but the general point remains).
Ismael
[…] not the only ones doing this trick, Oracle/BEA have the -XXcompressedRefs option and Sun has the -XX:+UseCompressedOops option. Of course, each of the vendors implementations are slightly different with different […]
[…] https://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/ […]
I’ve been trying to process some large data sets using the weka data mining package (www.cs.waikato.ac.nz/ml/weka/). They were too big for Java on windows XP 32-bit so I moved to a windows XP 64-bit machine, but the memory consumption has been appalling — its using approximately 6 times the memory for the same amount of raw data! (On the 64 bit machine I can only process data sets one third the size the 32bit machine could handle despite having twice the addressable heap!)
I’ve tried using JDK 6 update 14 and compressed OOPS, but it didn’t seem to make any difference at all. I’d be grateful for any thoughts/advice. Is it a windows problem?
current JVM settings:
-XX:MaxPermSize=256m -Xms3400m -Xmx3400m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
Hi Joe,
That sounds very strange. I haven’t used Windows in a while, so I can’t say if it’s a Windows problem, but it seems unlikely.
Have you verified that your JVM settings are being picked up? You can use jconsole to verify this while the process is running.
Ismael
Thanks for the swift response.
I’ve included the output from jconsole below — and yes it is picking up the arguments.
As you can see, its currently using >3GB heap to do this job, whereas a 32bit windows xp machine was handling data sets three times this size in 1.6GB!
I’m tempted to try switching to linux, but would prefer to avoid the hassle if its something stupid I’m doing
I’ve tried JDK 6.12, and both JDK 6 Update 14 and JDK6u6p with compressed OOPS and the all seemed to perform similarly
(BTW I’m using concurrent mark sweep GC; but the ParNew GC is still running. Is there any way of disabling the latter)
thanks again for your advice
j
VM Summary
Monday, 16 March 2009 12:27:42 o’clock GMT
Connection name:
pid: 848 tpp.weka.AttributeSelectionExperimentFiltered
Virtual Machine:
Java HotSpot(TM) 64-Bit Server VM version 14.0-b12
Vendor:
Sun Microsystems Inc.
Name:
848@c09000117
Uptime:
8 minutes
Process CPU time:
4 minutes
JIT compiler:
HotSpot 64-Bit Server Compiler
Total compile time:
13.044 seconds
Live threads:
16
Peak:
17
Daemon threads:
15
Total threads started:
40
Current classes loaded:
2,698
Total classes loaded:
2,728
Total classes unloaded:
30
Current heap size:
3,265,537 kbytes
Maximum heap size:
3,477,376 kbytes
Committed memory:
3,477,376 kbytes
Pending finalization:
0 objects
Garbage collector:
Name = ‘ParNew’, Collections = 1,393, Total time spent = Unavailable
Garbage collector:
Name = ‘ConcurrentMarkSweep’, Collections = 3, Total time spent = 3.261 seconds
Operating System:
Windows XP 5.2
Architecture:
amd64
Number of processors:
2
Committed virtual memory:
3,621,828 kbytes
Total physical memory:
3,914,768 kbytes
Free physical memory:
112,560 kbytes
Total swap space:
106,064,464 kbytes
Free swap space:
101,846,808 kbytes
VM arguments:
-XX:+UseConcMarkSweepGC -Xms3400m -XX:MaxPermSize=256m -XX:+UseCompressedOops -Xmx3400m
Class path:
.
Library path:
C:\WINDOWS\system32;.;C:\WINDOWS\Sun\Java\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem
Boot class path:
C:\Program Files\Java\jre6\lib\resources.jar;C:\Program Files\Java\jre6\lib\rt.jar;C:\Program Files\Java\jre6\lib\sunrsasign.jar;C:\Program Files\Java\jre6\lib\jsse.jar;C:\Program Files\Java\jre6\lib\jce.jar;C:\Program Files\Java\jre6\lib\charsets.jar;C:\Program Files\Java\jre6\classes
Is it possible that you were using the -client JIT while using 32-bit Windows? The -server JIT has a tendency to use all the memory you give it for tasks that allocate a lot. What happens if you increase the size of the data set? Does it OOM or does it process it fine?
Ismael
[…] end result is that 64-bit now surpasses 32-bit performance in more situations. See my entry about Compressed Oops if you don’t know what I’m talking about. […]
fyi, the Azul JVM virtualizes… so a “32-bit” JVM will run as a 64-bit JVM on the backend. We do all the usual pointer munging when you have to cross the barrier (eg. for JNI calls).
Cliff
I just downloaded the jdk update 14 early access and it doesn’t accept the CompressedOops option.
C:\Program Files (x86)\Java\jdk1.6.0_14\bin>java -XX:MaxPermSize=256m -Xms3400m
-Xmx3400m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
Unrecognized VM option ‘+UseCompressedOops’
what is happening here, help please.
Alain
Hi Alain, that is strange. What does java -version say?
C:\Program Files (x86)\Java\jdk1.6.0_14\bin>java -version
java version “1.6.0_14-ea”
Java(TM) SE Runtime Environment (build 1.6.0_14-ea-b04)
Java HotSpot(TM) Client VM (build 14.0-b13, mixed mode, sharing)
Alain, are you using a 64-bit JVM? I think yours is a 32-bit JVM and that is why it doesn’t work.
Best,
Ismael
I really thought I was. Thanks for steering me in the right direction
Alain
No problem Alain, glad to help.
Juma,
We are using solaris10 32 bit JVM on our production servers with Sun one app server. As we are constatnly getting out of memory – heap error, I am planning to increase the heap size to 4G for JVM options. Current heap size is 2G and we have 8G memory available.
[…] on Q4, 2009 and it includes HotSpot 16. Even though JDK 6 Update 14 (HotSpot 14) introduced compressed references and scalar replacement, HotSpot 16 includes improved compressed references and many crucial fixes […]
[…] 32-bit or 64-bit JVM? How about a Hybrid? « Ismael Juma (tags: java performance 64) […]
[…] 32-bit or 64-bit JVM? How about a Hybrid? Load unsigned and better Compressed […]
Hi,
I tried adding “-XX:+UseCompressedOops” to our java options for running tomcat and I’m getting this error:
Unrecognized VM option ‘+UseCompressedOops’
Could not create the Java virtual machine.
We’re using java 1.6.0_12 and it is a 64 bit server (and 64 bit jvm). What might be causing this?
java -version
————-
java version “1.6.0_12”
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
————-
Thanks in advance,
fpc
Hi fpc,
You need Java 6 Update 14 or newer (I recommend Java 6 Update 18 if you want to use this feature as it contains an improved version of it).
Best,
Ismael
Thanks very much Ismael. I’ll give that a shot. :)
No problem and good luck. :)
I tried each mode: both 64 bit (usual and compressed) and 32 bit
with our application.
-Xmx12500m -server -XX:+UseCompressedOops
In compressed mode it takes 10 times more to start. Export to SPSS takes also 20 times worse, but memory consumption is much better and faster in compressed mode (up to 40%). As for me, it could be helpfull for test applications and platphorms, if it did not take so much time to compute
Hi Guys
I have a pure java + Apple Web Object 5.5 web application, right now application is on 32bit processor with 32bit JVM having 16gb ram but still client get “java.lang.OutOfMemoryError: Java heap space” so we are planning to move it on 64bit jvm and 64 bit processor and 32 gb ram. Here my question is will i need to recompile my application before it live on production server or there is no need of recompilation? ..one more thing i want to mention it’s a pure java application no native code. Thanks in advance waiting for your response.
If it’s a pure Java application, you don’t need to recompile it. Note that you also have to check whether your container is pure Java (in which case changing the JVM is enough) or requires some configuration to run in 64-bit mode.
Best,
Ismael
That is strange, what JVM version are you using?
Ismael
“This feature is enabled by default since JDK 6 Update 23”.
Is this mean that we don’t need add -X:+UseConcMarkSweepGC after JDK 6 Update 23? And how can we check if the JDK enabled this default? Thanks!
The feature I am talking about in the post and in the quoted sentence is -XX:+UseCompressedOops (not -XX:+ConcMarkSweepGC).
Here is my problem i am working on a project which uses javaws(32 bit jre) and my application build on some wrappers C++. when i try to run it with 1024M it is working fine on 64bit machine windows7 but when i try to run it with 1536M. outof memory and very strange behavious observed like one program working fine and some time it even gives me null pointer exception on calling simple JFileChooser dialog etc. and then that application will never works unless restart my application or just hangs.
[…] the old 4GB limit, through some optimizations. There are plenty of good post with the details, e.g. https://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/. Java 1.6.0_22 is the last version which has this option disabled by default. Since 1.6.0_23 it is […]
To clarify your last comment:
“This feature is enabled by default since JDK 6 Update 23″.
This means that as of jdk6u23, the option -XX:+UseCompressedOops is enabled automatically when relevent? Or just that this option is supported if specified as command line?
Hi Jumar. Yes, it is enabled automatically when relevant if you’re using update 23 or later.
Sorry Usman, the information you provided is not enough to help you. I suggest you ask on stackoverflow or a site like it.
[…] https://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/ […]