Siaris

Speeding up a function with Memoization
03 May 01 - http://www.siaris.net/index.cgi/Programming/LanguageBits/Perl/20010503.rdoc

Often we may have one or more functions in a program that take a non-trivial amount of time to compute a result. If such a ‘slow’ function may be called repeatedly, and often with the same arguments as previous calls, caching the return values may result in significant speed increases in the overall program.

Consider, for example, a subroutine ‘factorial’ that returns the factorial of its integer argument. A recursive version might be:

    sub factorial {
        my $n = shift;
        print "Computing factorial($n)\n";  # just for testing!!
        return undef if $n < 0;
        return 1 if $n == 0;
        return $n * factorial($n - 1) ;
    }
    print factorial(4), "\n";
    print factorial(5), "\n";

The print statement in the above is for illustrative purposes only, and running the above snippet produces:

    Computing factorial(4)
    Computing factorial(3)
    Computing factorial(2)
    Computing factorial(1)
    Computing factorial(0)
    24
    Computing factorial(5)
    Computing factorial(4)
    Computing factorial(3)
    Computing factorial(2)
    Computing factorial(1)
    Computing factorial(0)
    120

The problem here is that even though we had already computed the factorial of 4 (and 3, and 2, and 1, and 0) in the first call, we then recomputed all of those again to get the factorial of 5. By caching the return value we can save unnecessary recomputing of results:

    {
        my @cache = ();
        sub factorial {
            my $n = shift;
            print "Called factorial($n)\n";          # testing
            return $cache[$n] if defined $cache[$n];
            print "\tComputing factorial($n)\n";     # testing
            return undef if $n < 0;
            return 1 if $n == 0;
            return $cache[$n] = $n * factorial($n - 1) ;
        }
    }
    print factorial(4), "\n";
    print factorial(5), "\n";

Now we have used an array (@cache) to store the results of calls to the factorial() function and we can look up those values easily. Running the above produces:

    Called factorial(4)
            Computing factorial(4)
    Called factorial(3)
            Computing factorial(3)
    Called factorial(2)
            Computing factorial(2)
    Called factorial(1)
            Computing factorial(1)
    Called factorial(0)
            Computing factorial(0)
    24
    Called factorial(5)
            Computing factorial(5)
    Called factorial(4)
    120

So we have a significant savings when we called the subroutine the second time (it only needed to retrieve the result of the previous call to factorial(4) instead of recomputing it and its recursive dependents). Imagine the savings if we had called factorial(50) followed by factorial(51).

The Memoize module automates this caching for us by providing a function called memoize() which we can use to install cached versions of our ‘slow’ functions. Using this module and our original factorial() function we get:

    #!/usr/bin/perl -w
    use strict;
    use Memoize;
    memoize('factorial');

    print factorial(4), "\n";
    print factorial(5), "\n";

    sub factorial {
        my $n = shift;
        print "Computing factorial($n)\n";  # just for testing!!
        return undef if $n < 0;
        return 1 if $n == 0;
        return $n * factorial($n - 1) ;
    }

Which gives output of:

    Computing factorial(4)
    Computing factorial(3)
    Computing factorial(2)
    Computing factorial(1)
    Computing factorial(0)
    24
    Computing factorial(5)
    Computing factorial(4)
    120

And we see that although the factorial(5) resulted in a call to factorial(4), no further recursion occurs because the memoize() function has installed a cached version of the same function for us.

Memoization (or caching) is a very good technique when it is applicable, and it is not limited to functions that take just a single integer argument. The documentation for the Memoize module will instruct you on using multiple arguments and normalizing the arguments for functions with variable-order arguments. There are also ways to "expire" values in the cache to free up memory (perhaps if the function has not been called for some specified amount of time its value in the cache will be deleted).

The documentation also points out three particular cases where Memoization is not appropriate: 1) a function that uses (depends on) variables or state not defined within the function or its arguments; 2) a function that has side effects (does other things besides simply computing and returning a result); and 3) a function that returns a reference that is modified by the caller.

*****