A common problem we all have to deal with is fixing memory leaks. In garbage collecting languages like Java we typically expect not to ever need to worry about memory management. However the limitations of how a garbage collector works means that we can still create leaks. Which defeats the advantage of a garbage collector.

How do leaks get formed?

The Java GC determines which objects to collect by looking for references to the object. If there is no references, then the object is no longer in use and can be safely destroyed without making the application unstable. If an object is now unused but still referenced, a leak occurs. Let's create a leak:

package org.neverfear.leaks;
 
import java.util.HashMap;
import java.util.Map;
 
/**
 * Run with -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails
 * 
 * @author kay
 * 
 */
public class HolyBucket {
 
    private static Map<Thread, StringBuilder> builders = new HashMap<Thread, StringBuilder>();
 
    public static int leak() {
        if (!builders.containsKey(Thread.currentThread())) {
            builders.put(Thread.currentThread(), new StringBuilder());
        }
        return builders.size();
    }
 
    public static void main(String[] args) {
        while (true) {
            (new Thread(new Runnable() {
                public void run() {
                    System.out.println("Known objects = " + leak());
                }
            })).start();
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
 
}

Here we see that we infinitely create threads which die very quickly which at some point call a leaky block of code. The problem here is that long after the thread is dead there still exists a reference in builders to the dead thread. Which from the collectors perspective says that we might still want to use the thread instance. What's even more is that if the thread itself refers to objects that they never get cleaned up either. This can add up to alot of memory quickly.

Detecting leaks

Watching the memory usage via Windows task manager or unix top for the Java process will give you an idea if you're leaking. Another more accurate method of finding memory leaks is to enable the garbage collector logs in the JVM. If you run your Java application with the JVM arguments -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails you will get a detailed log file called gc.log that you can use to analyse it's usage at run time. If you leave your application doing one constant thing to make the run more fair you can analyse the gc.log and usually it's obvious that memory is allocated but not released. Note that if you do different things with your application through the test run, observing a leak becomes more difficult because the heap usage will vary with the current task the process is doing. The example program above is happy just to sit and spin and it will reveal the leak in the logs.

To analyse the logs I like to use GC Viewer. This is a Java application that will parse your gc.log and give you a graph you can explore that distinguishes different actions that the GC is performing. Such as what it's invoked, when it's cleaning, your heap usage, your heap allocation and so on. Using this you will be able to see the leak. A sample graph for the above HolyBucket program:

JVM GC Graph

In addition to using logging you can eye ball the output from jstat. jstat is a JDK tool that can attach to a particular JVM process ID and provide you with some statistics about the running process. A nice way to get the appropriate process IDs to attach to you can run jps.

$ jps
9212 Jps
5940 Main
18656 HolyBucket

You can get information about gc totals every second by running:

$ jstat -gc 18656 1s
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
640.0  640.0  544.0   0.0   66944.0  14733.0   136576.0   33109.8   21248.0 2456.9     80    0.337   0      0.000    0.337
640.0  640.0  544.0   0.0   66944.0  28126.9   136576.0   33109.8   21248.0 2456.9     80    0.337   0      0.000    0.337
640.0  640.0  544.0   0.0   66944.0  41520.9   136576.0   33109.8   21248.0 2456.9     80    0.337   0      0.000    0.337
640.0  640.0  544.0   0.0   66944.0  54914.8   136576.0   33109.8   21248.0 2456.9     80    0.337   0      0.000    0.337
640.0  640.0   0.0   544.0  66944.0   1339.0   136576.0   33526.6   21248.0 2456.9     81    0.342   0      0.000    0.342
640.0  640.0   0.0   544.0  66944.0  14733.0   136576.0   33526.6   21248.0 2456.9     81    0.342   0      0.000    0.342
640.0  640.0   0.0   544.0  66944.0  28126.9   136576.0   33526.6   21248.0 2456.9     81    0.342   0      0.000    0.342
640.0  640.0   0.0   544.0  66944.0  41520.9   136576.0   33526.6   21248.0 2456.9     81    0.342   0      0.000    0.342
640.0  640.0   0.0   544.0  66944.0  54914.8   136576.0   33526.6   21248.0 2456.9     81    0.342   0      0.000    0.342
640.0  640.0  576.0   0.0   66944.0   1341.0   136576.0   33943.4   21248.0 2457.0     82    0.343   0      0.000    0.343
640.0  640.0  576.0   0.0   66944.0  14732.9   136576.0   33943.4   21248.0 2457.0     82    0.343   0      0.000    0.343
640.0  640.0  576.0   0.0   66944.0  28126.8   136576.0   33943.4   21248.0 2457.0     82    0.343   0      0.000    0.343
640.0  640.0  576.0   0.0   66944.0  41520.8   136576.0   33943.4   21248.0 2457.0     82    0.343   0      0.000    0.343
640.0  640.0  576.0   0.0   66944.0  54914.7   136576.0   33943.4   21248.0 2457.0     82    0.343   0      0.000    0.343

If you look under the OU (Old generation usage) column you can see that it's increasing. If you were to let this run until exhaustion it never decreases for our leaky program. It's useful to do this if you don't want to have to restart an already running Java application to apply the gc logging parameters.

Locating the leak

Okay so we can see that memory is getting allocated but never freed, how can we find the offending code?

You can use another JDK tool called jmap that will attach to a JVM and create a memory dump of all objects on the heap which we can explore. As a first step we just want a histogram of all the classes that are loaded and how many instances have been created.

$ jmap -histo 18656
     num     #instances         #bytes  class name
----------------------------------------------
   1:         20528       80424224  [C
   2:           842       57302152  [I
   3:          9746        1091552  java.lang.Thread
   4:          4764         576816  <methodKlass>
   5:          4764         543296  <constMethodKlass>
   6:           312         344632  <constantPoolKlass>
   7:          7097         320280  <symbolKlass>
   8:          9753         312096  java.util.HashMap$Entry
   9:           284         247936  <constantPoolCacheKlass>
  10:          9822         235728  java.lang.StringBuilder
  11:           312         227512  <instanceKlassKlass>
  12:          9753         156048  java.lang.Object
  13:            25         128720  [Ljava.util.HashMap$Entry;
  14:           388          73536  [B
  15:           382          39728  java.lang.Class
  16:           513          33096  [I
  17:           976          31232  java.lang.String
  18:           458          30024  [S
  19:            73          25456  <methodDataKlass>
  20:            41          23944  <objArrayKlassKlass>

We wait until an observable increase in heap usage has been seen again, then take a second histogram dump.

$ jmap -histo 18656
     num     #instances         #bytes  class name
----------------------------------------------
   1:         21988       86827824  [C
   2:           850       25963480  [I
   3:         10523        1178576  java.lang.Thread
   4:          4764         576816  <methodKlass>
   5:          4764         543296  <constMethodKlass>
   6:           312         344632  <constantPoolKlass>
   7:         10530         336960  java.util.HashMap$Entry
   8:          7097         320280  <symbolKlass>
   9:         10552         253248  java.lang.StringBuilder
  10:           284         247936  <constantPoolCacheKlass>
  11:           312         227512  <instanceKlassKlass>
  12:         10530         168480  java.lang.Object
  13:            25         128720  [Ljava.util.HashMap$Entry;
  14:           388          73536  [B
  15:           382          39728  java.lang.Class
  16:           513          33096  [I
  17:           458          30024  [S
  18:           929          29728  java.lang.String
  19:            85          29704  <methodDataKlass>
  20:            41          23944  <objArrayKlassKlass>

Next step is to determine the delta between these two intervals. I have a little Python script called parse-gc-histogram.py I wrote to do the leg work, since in large systems the histogram will be huge.

$ python ParseHisto.py histo1.txt histo2.txt
                 Num Class
                1460 [C
                 777 java.util.HashMap$Entry
                 777 java.lang.Thread
                 777 java.lang.Object
                 730 java.lang.StringBuilder
                  12 <methodDataKlass>
                   8 [I
                   0 java.lang.Class
                   0 [I
                   0 [S
                   0 [Ljava.util.HashMap$Entry;
                   0 [B
                   0 <symbolKlass>
                   0 <objArrayKlassKlass>
                   0 <methodKlass>
                   0 <instanceKlassKlass>
                   0 <constantPoolKlass>
                   0 <constantPoolCacheKlass>
                   0 <constMethodKlass>
                 -47 java.lang.String

Isn't that interesting? HashMap.Entry, Thread and Object have gone up in equal measure. The trick now is to figure out which of these is leaking. We happen to already know that HashMap.Entry, Thread, Object and StringBuilder are all leaking. We can do a visual inspection of the code to take an educated guess. Since we've only one Map in our application it must be there. A more real situation I faced not long ago was in a program that was running with 1.5G heap space and was comprised of thousands of classes many dynamically constructed and had many threads legitimately live. There wasn't a chance I could pin down where it might be coming from by inspection. The next tool to advance my quest was jhat.

If we wanted to explore the values and references actively we can do a full heap dump. You might want to do this if the field values of instances will give you a clue as to what part of your code is using those objects.

$ jmap -dump:file=interval1.bin 18656
Dumping heap to /home/kay/interval1.bin ...
Heap dump file created

This will create a file called interval1.bin. At a later date we can use another JDK tool called jhat to open a light webserver to allow us to explore the heap dump:

$ jhat interval1.bin
Reading from interval1.bin...
Dump file created Sun Jan 09 19:05:57 GMT 2011
Snapshot read, resolving...
Resolving 49826 objects...
Chasing references, expect 9 dots.........
Eliminating duplicate references.........
Snapshot resolved.
Started HTTP server on port 7000
Server is ready.

Having identified HashMap.Entry as the likely source of lingering references navigate to that class in the jhat web page.

The simple steps are to go to the web histogram page.

jhat home page

Click on the class you're interested in.

jhat histogram page

Click on an instance you believe to be leaking this can end up being a guessing game but we have to start somewhere.

jhat class page

The instance has a Thread key and a StringBuilder value. Clicking on what refers to that object until we find the HashMap.

jhat instance page

jhat instance page that reveals the containing object of this map

At which point we can see that HolyBucket references that. So we know where in our code the leak could be coming from. Now we open up the code and look for immortal references. Some of this will be obvious if you take it as a black and white process. That is references verus non-references, this is something our code suffers from. We clearly have a reference to the Thread object in our map keys.

I'm going to skip over all the types of reference Java supports. This will play a factor in how you want to handle collection of objects and you should definitely read Strong, Soft, Weak and Phantom References (Java). Assuming you've read that, let us continue.

Our problem now is simply this. We are referring to an object that another part of the code disgards and no longer needs. What we would like to be able to do is cache the creation of a StringBuilder instance on a per thread basis but not keep the thread from dying. One approach we can take is to use weak references. There are many standard Java constructs that have a weak version of themselves. The most primitive of these is a wrapper class called WeakReference which takes a contained instance. When the only reference to the contained object is the WeakReference class, the WeakReference class let's go which allows the GC to collect them. There is also WeakHashMap which wraps the keys of Map entries with a weak reference. The key object then gets collected and the WeakHashMap removes the entry. Note that this does not apply to the values.

A fixed program might look like this

package org.neverfear.leaks;
 
import java.util.Map;
import java.util.WeakHashMap;
 
/**
 * Run with -Xloggc:gc.log -verbose:gc -XX:-PrintGCDetails
 * 
 * @author kay
 * 
 */
public class FixedHolyBucket {
 
    private static Map<Thread, StringBuilder> builders = new WeakHashMap<Thread, StringBuilder>();
 
    public static int leak() {
        if (!builders.containsKey(Thread.currentThread())) {
            builders.put(Thread.currentThread(), new StringBuilder(4096));
        }
        return builders.size();
    }
 
    public static void main(String[] args) {
        while (true) {
            (new Thread(new Runnable() {
                public void run() {
                    System.out.println("Known objects = " + leak());
                }
            })).start();
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
 
}

All that was changed was the implementing Map from HashMap to WeakHashMap!

Confirm the fix

Note that there might be more than one leak, use object histograms to confirm the numbers are getting cleared. In the above app you'll notice the number of entries in the map drops.

Conclusions

There's a great many forms of memory leak. I've only looked at reference leaks here. Others to think about are CPU leaks, when intense CPU consumption surpresses the GC from operating properly and eventually drains the program of memory.

Another is restruction leaks which is where objects who override their finalize() method and create additional references to themselves prevent them from ever being collected. They may not do this intentionally but as a blurring of encapsulation in some parts of code.