Perl programmers shouldn't care about memory and CPU usage

| No Comments | No TrackBacks
There is this old chestnut of wisdom, that Perl programmers shouldn't care about the memory and CPU efficiency of their programs, or to put it differently "if you care about memory and CPU efficiency, use C and write your own garbage collector".

I often wondered about the wisdom of that advice, so one day I've done the opposite and started using Perl for memory, speed and CPU cycle intensive tasks. I don't want to write more C than I can help, I want to use Perl, since I like programming in Perl. So why should I care about speed and memory usage until I have to?

Certain use cases excluded, like embedded programming where memory and CPU power is extremely limited, on mainstream desktop and server hardware it is a good first approximation to start coding in Perl and improve on efficiency as needed. If Perl is your kind of language then avoid the temptation of deciding that Perl can't possibly cut it for high performance environments.
Some general guidelines apply of course: avoiding premature optimization and having a good design helps write better programs in all languages.

Talking about memory usage, for my use case initial memory footprint is much less important than the need to maintain a steady, near constant usage of it as I'm dealing with long running processes that do a lot of work.

Fortunately sticking to a few basics, it is much less of a challenge than achieve exactly just that. Perl uses a reference count based garbage collection, which means that the elephant-in-the-room mistake to watch out for is creating circular references. Reference count based garbage collection reclaims memory when the reference count of a variable reaches zero. If variable A points to B and B points to A, the reference count remains higher than zero and the memory never gets reclaimed, resulting in a memory leak.

A solution is either to avoid creating circular references, remember to explicitly break them or what's even better: use Scalar::Util's weaken, to designate one of the references to be weak, that is to mark the referenced variable free for destruction if only weak references point to it.

It is possible that despite our best efforts some code leaks. There are a variety of ways to deal with that:

  • A large array of helper modules are available, like Devel::Leak, Devel::TrackObjects, Devel::Size, Devel::Cycle just to mention a few.
  • If you suspect that perl itself or an XS library leaks memory then a memory profiler like Valgrind might be useful.
  • Memory leaks can be tricky to find even with all the tools available and personally I found that the best approach in dealing with them is not a tool but a method. Triangulating a memory leak by building a list of observations about your program from as many independent variables as possible and using a bit of logic to deduce how, under what circumstances exactly your program leaks (this is something like doing git bisect, but instead of commits you deal with observations, with the added difficulty of having to sort those observations to see a pattern).
So far I've had more problems dealing with memory leaks* that have arisen from using C based libraries or modules through XS interfaces than from using Perl code, although I did run into the problem that for my use case perl itself, version 5.10.0 proved to be too leaky (5.8.9 is fine though and in bleadperl all leaks I've had problems with are already fixed, so 5.10.1 is shaping up to be a good, stable release from a memory perspective). Perl 5 is written in C, so maybe Perl 6 is on the right track by planning to be self-hosting. :)

On the CPU cycle front avoiding premature optimization is JIT for programmers. Optimize when you have to, have enough resources to do so and when you have context where to start.

Context means information and information comes from profiling. For profilers in Perl there is only one reasonable choice, not because the lack of alternatives, but because Devel::NYTProf is that good. It is fast, versatile, robust and very informative.

By using Devel::NYTProf the information allowed me to write some high performance code that looks pretty interesting when it comes to CPU cycle usage. Most of the cycles (80%+) are spent in well-written C libraries, even though I've written most of the code in Perl, with the exception of contributing some to an XS module. I've got to eat my cake and keep it, too.

So, you like writing code in Perl? Then enjoy life, don't worry and JFDI. It might turn out much better than expected.

* You can always count on Microsoft to provide an implementation of something that happily trods off the beaten path thereby exercising corner cases.

No TrackBacks

TrackBack URL:

Leave a comment

About this Entry

This page contains a single entry by szbalint published on May 9, 2009 9:02 PM.

Charting Overchoice is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.25