A common problem we all have to deal with is fixing memory leaks. In garbage collecting languages like Java we typically expect not to ever need to worry about memory management. However the limitations of how a garbage collector works means that we can still create leaks. Which defeats the advantage of a garbage collector.
How do leaks get formed?
The Java GC determines which objects to collect by looking for references to the object. If there is no references, then the object is no longer in use and can be safely destroyed without making the application unstable. If an object is now unused but still referenced, a leak occurs. Let's create a leak:
package org.neverfear.leaks; import java.util.HashMap; import java.util.Map; /** * Run with -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails * * @author kay * */ public class HolyBucket { private static Map<Thread, StringBuilder> builders = new HashMap<Thread, StringBuilder>(); public static int leak() { if (!builders.containsKey(Thread.currentThread())) { builders.put(Thread.currentThread(), new StringBuilder()); } return builders.size(); } public static void main(String[] args) { while (true) { (new Thread(new Runnable() { public void run() { System.out.println("Known objects = " + leak()); } })).start(); try { Thread.sleep(100); } catch (InterruptedException e) { e.printStackTrace(); } } } }
Here we see that we infinitely create threads which die very quickly which at some point call a leaky block of code. The problem here is that long after the thread is dead there still exists a reference in builders
to the dead thread. Which from the collectors perspective says that we might still want to use the thread instance. What's even more is that if the thread itself refers to objects that they never get cleaned up either. This can add up to alot of memory quickly.
Detecting leaks
Watching the memory usage via Windows task manager or unix top
for the Java process will give you an idea if you're leaking. Another more accurate method of finding memory leaks is to enable the garbage collector logs in the JVM. If you run your Java application with the JVM arguments -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails
you will get a detailed log file called gc.log
that you can use to analyse it's usage at run time. If you leave your application doing one constant thing to make the run more fair you can analyse the gc.log and usually it's obvious that memory is allocated but not released. Note that if you do different things with your application through the test run, observing a leak becomes more difficult because the heap usage will vary with the current task the process is doing. The example program above is happy just to sit and spin and it will reveal the leak in the logs.
To analyse the logs I like to use GC Viewer. This is a Java application that will parse your gc.log and give you a graph you can explore that distinguishes different actions that the GC is performing. Such as what it's invoked, when it's cleaning, your heap usage, your heap allocation and so on. Using this you will be able to see the leak. A sample graph for the above HolyBucket program:
In addition to using logging you can eye ball the output from jstat
. jstat
is a JDK tool that can attach to a particular JVM process ID and provide you with some statistics about the running process. A nice way to get the appropriate process IDs to attach to you can run jps
.
$ jps 9212 Jps 5940 Main 18656 HolyBucket
You can get information about gc totals every second by running:
$ jstat -gc 18656 1s S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 640.0 640.0 544.0 0.0 66944.0 14733.0 136576.0 33109.8 21248.0 2456.9 80 0.337 0 0.000 0.337 640.0 640.0 544.0 0.0 66944.0 28126.9 136576.0 33109.8 21248.0 2456.9 80 0.337 0 0.000 0.337 640.0 640.0 544.0 0.0 66944.0 41520.9 136576.0 33109.8 21248.0 2456.9 80 0.337 0 0.000 0.337 640.0 640.0 544.0 0.0 66944.0 54914.8 136576.0 33109.8 21248.0 2456.9 80 0.337 0 0.000 0.337 640.0 640.0 0.0 544.0 66944.0 1339.0 136576.0 33526.6 21248.0 2456.9 81 0.342 0 0.000 0.342 640.0 640.0 0.0 544.0 66944.0 14733.0 136576.0 33526.6 21248.0 2456.9 81 0.342 0 0.000 0.342 640.0 640.0 0.0 544.0 66944.0 28126.9 136576.0 33526.6 21248.0 2456.9 81 0.342 0 0.000 0.342 640.0 640.0 0.0 544.0 66944.0 41520.9 136576.0 33526.6 21248.0 2456.9 81 0.342 0 0.000 0.342 640.0 640.0 0.0 544.0 66944.0 54914.8 136576.0 33526.6 21248.0 2456.9 81 0.342 0 0.000 0.342 640.0 640.0 576.0 0.0 66944.0 1341.0 136576.0 33943.4 21248.0 2457.0 82 0.343 0 0.000 0.343 640.0 640.0 576.0 0.0 66944.0 14732.9 136576.0 33943.4 21248.0 2457.0 82 0.343 0 0.000 0.343 640.0 640.0 576.0 0.0 66944.0 28126.8 136576.0 33943.4 21248.0 2457.0 82 0.343 0 0.000 0.343 640.0 640.0 576.0 0.0 66944.0 41520.8 136576.0 33943.4 21248.0 2457.0 82 0.343 0 0.000 0.343 640.0 640.0 576.0 0.0 66944.0 54914.7 136576.0 33943.4 21248.0 2457.0 82 0.343 0 0.000 0.343
If you look under the OU (Old generation usage) column you can see that it's increasing. If you were to let this run until exhaustion it never decreases for our leaky program. It's useful to do this if you don't want to have to restart an already running Java application to apply the gc logging parameters.
Locating the leak
Okay so we can see that memory is getting allocated but never freed, how can we find the offending code?
You can use another JDK tool called jmap
that will attach to a JVM and create a memory dump of all objects on the heap which we can explore. As a first step we just want a histogram of all the classes that are loaded and how many instances have been created.
$ jmap -histo 18656 num #instances #bytes class name ---------------------------------------------- 1: 20528 80424224 [C 2: 842 57302152 [I 3: 9746 1091552 java.lang.Thread 4: 4764 576816 <methodKlass> 5: 4764 543296 <constMethodKlass> 6: 312 344632 <constantPoolKlass> 7: 7097 320280 <symbolKlass> 8: 9753 312096 java.util.HashMap$Entry 9: 284 247936 <constantPoolCacheKlass> 10: 9822 235728 java.lang.StringBuilder 11: 312 227512 <instanceKlassKlass> 12: 9753 156048 java.lang.Object 13: 25 128720 [Ljava.util.HashMap$Entry; 14: 388 73536 [B 15: 382 39728 java.lang.Class 16: 513 33096 [I 17: 976 31232 java.lang.String 18: 458 30024 [S 19: 73 25456 <methodDataKlass> 20: 41 23944 <objArrayKlassKlass>
We wait until an observable increase in heap usage has been seen again, then take a second histogram dump.
$ jmap -histo 18656 num #instances #bytes class name ---------------------------------------------- 1: 21988 86827824 [C 2: 850 25963480 [I 3: 10523 1178576 java.lang.Thread 4: 4764 576816 <methodKlass> 5: 4764 543296 <constMethodKlass> 6: 312 344632 <constantPoolKlass> 7: 10530 336960 java.util.HashMap$Entry 8: 7097 320280 <symbolKlass> 9: 10552 253248 java.lang.StringBuilder 10: 284 247936 <constantPoolCacheKlass> 11: 312 227512 <instanceKlassKlass> 12: 10530 168480 java.lang.Object 13: 25 128720 [Ljava.util.HashMap$Entry; 14: 388 73536 [B 15: 382 39728 java.lang.Class 16: 513 33096 [I 17: 458 30024 [S 18: 929 29728 java.lang.String 19: 85 29704 <methodDataKlass> 20: 41 23944 <objArrayKlassKlass>
Next step is to determine the delta between these two intervals. I have a little Python script called parse-gc-histogram.py I wrote to do the leg work, since in large systems the histogram will be huge.
$ python ParseHisto.py histo1.txt histo2.txt Num Class 1460 [C 777 java.util.HashMap$Entry 777 java.lang.Thread 777 java.lang.Object 730 java.lang.StringBuilder 12 <methodDataKlass> 8 [I 0 java.lang.Class 0 [I 0 [S 0 [Ljava.util.HashMap$Entry; 0 [B 0 <symbolKlass> 0 <objArrayKlassKlass> 0 <methodKlass> 0 <instanceKlassKlass> 0 <constantPoolKlass> 0 <constantPoolCacheKlass> 0 <constMethodKlass> -47 java.lang.String
Isn't that interesting? HashMap.Entry, Thread and Object have gone up in equal measure. The trick now is to figure out which of these is leaking. We happen to already know that HashMap.Entry, Thread, Object and StringBuilder are all leaking. We can do a visual inspection of the code to take an educated guess. Since we've only one Map in our application it must be there. A more real situation I faced not long ago was in a program that was running with 1.5G heap space and was comprised of thousands of classes many dynamically constructed and had many threads legitimately live. There wasn't a chance I could pin down where it might be coming from by inspection. The next tool to advance my quest was jhat
.
If we wanted to explore the values and references actively we can do a full heap dump. You might want to do this if the field values of instances will give you a clue as to what part of your code is using those objects.
$ jmap -dump:file=interval1.bin 18656 Dumping heap to /home/kay/interval1.bin ... Heap dump file created
This will create a file called interval1.bin. At a later date we can use another JDK tool called jhat
to open a light webserver to allow us to explore the heap dump:
$ jhat interval1.bin Reading from interval1.bin... Dump file created Sun Jan 09 19:05:57 GMT 2011 Snapshot read, resolving... Resolving 49826 objects... Chasing references, expect 9 dots......... Eliminating duplicate references......... Snapshot resolved. Started HTTP server on port 7000 Server is ready.
Having identified HashMap.Entry as the likely source of lingering references navigate to that class in the jhat web page.
The simple steps are to go to the web histogram page.
Click on the class you're interested in.
Click on an instance you believe to be leaking this can end up being a guessing game but we have to start somewhere.
The instance has a Thread key and a StringBuilder value. Clicking on what refers to that object until we find the HashMap.
At which point we can see that HolyBucket references that. So we know where in our code the leak could be coming from. Now we open up the code and look for immortal references. Some of this will be obvious if you take it as a black and white process. That is references verus non-references, this is something our code suffers from. We clearly have a reference to the Thread object in our map keys.
I'm going to skip over all the types of reference Java supports. This will play a factor in how you want to handle collection of objects and you should definitely read Strong, Soft, Weak and Phantom References (Java). Assuming you've read that, let us continue.
Our problem now is simply this. We are referring to an object that another part of the code disgards and no longer needs. What we would like to be able to do is cache the creation of a StringBuilder instance on a per thread basis but not keep the thread from dying. One approach we can take is to use weak references. There are many standard Java constructs that have a weak version of themselves. The most primitive of these is a wrapper class called WeakReference
which takes a contained instance. When the only reference to the contained object is the WeakReference
class, the WeakReference
class let's go which allows the GC to collect them. There is also WeakHashMap
which wraps the keys of Map entries with a weak reference. The key object then gets collected and the WeakHashMap
removes the entry. Note that this does not apply to the values.
A fixed program might look like this
package org.neverfear.leaks; import java.util.Map; import java.util.WeakHashMap; /** * Run with -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails * * @author kay * */ public class FixedHolyBucket { private static Map<Thread, StringBuilder> builders = new WeakHashMap<Thread, StringBuilder>(); public static int leak() { if (!builders.containsKey(Thread.currentThread())) { builders.put(Thread.currentThread(), new StringBuilder(4096)); } return builders.size(); } public static void main(String[] args) { while (true) { (new Thread(new Runnable() { public void run() { System.out.println("Known objects = " + leak()); } })).start(); try { Thread.sleep(100); } catch (InterruptedException e) { e.printStackTrace(); } } } }
All that was changed was the implementing Map
from HashMap
to WeakHashMap
!
Confirm the fix
Note that there might be more than one leak, use object histograms to confirm the numbers are getting cleared. In the above app you'll notice the number of entries in the map drops.
Conclusions
There's a great many forms of memory leak. I've only looked at reference leaks here. Others to think about are CPU leaks, when intense CPU consumption surpresses the GC from operating properly and eventually drains the program of memory.
Another is restruction leaks which is where objects who override their finalize()
method and create additional references to themselves prevent them from ever being collected. They may not do this intentionally but as a blurring of encapsulation in some parts of code.