Friday, January 02, 2009

Avoiding memory fragmentation for fun and profit

One problem that might occur in long-running programs is that of memory fragmentation in the VM-heap. The problem is pretty simple: when you allocate and deallocate memory in the program you might end up with small "holes" all over your heap which are too small for the new data you need. In a page-allocated VM-heap world, this has a serious cost in memory usage. There are several ways to avoid the problem. The first one is to be aware of the problem and be smart when allocating memory. With the right amount of thought you can often get around the memory fragmentation, or at least minimize it. The second trick is to restart the application periodically. It is not as bad as it sounds in a UNIX system. You set up the application and fork() off a child to do the hard work. When the child has been running for some time, you kill it and fork() off another from the (non-fragmented) parent. Apache 1.x used this. The third trick is to allocate a big region of memory, keep a pointer to the first unallocated word and then reset the pointer when you are done with your request and don't need the memory anymore. Interestingly there has been considerable research in the area to attempt to automate this idea. It is known as "Region Inference" in the automated setting, but here it is used in a manual way in e.g. C to achieve the same thing. The Subversion project used a region handler some years ago when I looked at their code. I don't know if they still do however. Poul Henning Kamp uses the reset-trick in Varnish as well. The fourth trick is to use a good garbage collector. Garbage collectors have the advantage that they are allowed to move data around, so the good ones compact live data to one end of the heap periodically and clears up fragmentation in the process. Note that most garbage collectors used in "scripting languages" like Python or Ruby are pretty weak. They usually avoid doing "real" garbage collection and opts for some simple poor-mans solutions. It is unfortunate because this puts GC in a bad light. That and Java-enterprise-behemoths abusing the VM-heap :)
Post a Comment

About Me

My Photo
Lambda-loving CS Geek. Likes metal music. Likes dogs, cats. Does not like pictures of dogs and cats (unless they are lambdacats!)

Has an unhealthy coffee addiction. Calls himself the coffee zombie in the morning (BEEEEANS!)

Has a neverending curiosity gene, loves intelligence and passion.