Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc0
Article Calendar
<= April, 2012
S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Passing Subroutine References

Andrew L. Johnson (First published by ItWorld.com 2001-08-23)

Last week we look at using subroutine references for creating a dispatch table, this week we will consider passing subroutine references as arguments to other subroutines.

The ability to pass subroutine references is a powerful feature. Any subroutine you write that takes arguments can be thought of a general routine and the arguments provide the specifics. For example, a subroutine that calculates the square root of a number is general, and the argument provides the specific number to calculate the square root from and return the result. Passing a subroutine reference is no different in that respect, but rather than passing something static like a number or a string, you can pass a definition of an action to take.

An excellent example is the File::Find module which defines the find() routine (we saw this module back in February). The find() routine is a general directory tree traversal subroutine whose first argument is a reference to a subroutine, followed by a list of directories to traverse. The find() routine traverses the listed directories and calls the subroutine reference for each directory entry found (it also provides a package global variable holding the current directory entry, for your subroutine to use). This gives the user of the module a very flexible means of accomplishing virtually any task involving directory traversal — because the user supplies the intended action, the find() routine itself need only worry about the vagaries of traversing directories and none of the logic of detecting certain files, removing files, renaming files, or whatever else the user wishes to do.

You already know how to create a reference to a named function, and how to call a function through its reference — there really isn’t anything else you need to know to write a subroutine that takes a subroutine reference as an argument. Let’s consider writing a reduce() routine in Perl (such a routine exists in the List::Util module on CPAN, which is implemented in C, but we will implement in Perl here).

A reduce() function takes a function reference as its first argument and then a list of values. It "reduces" the list by applying the passed function to the first two values, then to the result of that and the next value in turn. Thus, it could be used to easily produce sums of lists by supplying a function reference that simply adds two arguments together:

    #!/usr/bin/perl -w
    use strict;
    my @array = (1,2,3,4);

    my $sum = reduce(\&add,@array);
    print $sum;

    sub add {
        $_[0] + $_[1]
    }

    sub reduce {
        my $sref = shift;
        return @_ if @_ < 2;
        my $init = shift;
        for (@_) {
            $init = $sref->($init, $_);
        }
        return $init;
    }

We have taken the extra caution of simply returning the argument list without calling the sub reference if a list of only one element is passed to reduce().

The above produces a result of 10. Using the same reduce function we could produce the results of multiplying through the list, or dividing through the list, or any other action we created a function to do:

    sub product { $_[0] * $_[1] }
    sub divide  { $_[0] / $_[1] }

    my $product  = reduce(\&product, @array);
    my $division = reduce(\&divide,  @array);

We can avoid creating named subroutines by using anonymous subroutines (just like we can have anonymous arrays and hashes, we can have anonymous subroutines which are references to subroutines with no names). We create such a subroutine reference by using the ‘sub’ keyword followed immediately by the block of code (no name is specified):

    my $sref = sub{print "Hello World\n"};
    $sref->();

So now we can use reduce() as follows:

    my $sum      = reduce( sub{ $_[0] + $_[1] }, @array);
    my $product  = reduce( sub{ $_[0] * $_[1] }, @array);
    my $division = reduce( sub{ $_[0] / $_[1] }, @array);

We could shorten that further using prototypes, but that is beyond the scope of the present article.

*****

Subroutine References

Andrew L. Johnson (First published by ItWorld.com 2001-08-16)

We have used subroutine references a couple of times in previous articles, but we haven’t taken a look at them directly.

You can take a reference to an existing (named) subroutine using the backslash operator along the ampersand in the following manner:

    sub foo {
        print "This is foo\n";
    }

    my $sref = \&foo;

You can call the referenced routine by preceding the scalar holding it with an ampersand, or by using the dereference arrow and parentheses:

    &$sref();     # or:  &$sref('argument')
    $sref->();    # or:  $sref->('argument')

It is important to realize that you cannot take a reference to a subroutine and pass it arguments at the same time:

    my $sref = \&foo('argument');

What this actually does is return a reference to the return value of that function (using the given argument), which probably isn’t what you wanted.

But what are subroutine references good for? A couple of common uses are dispatch tables and passing routines as arguments to other routines — we will consider the first here. A dispatch table is simply a table (usually a hash) that allows you to select which routine to run. Consider a user interface that queries the user to enter a command to run:

    #!/usr/bin/perl -w
    use strict;

    sub this { print "You picked 'this'\n" }
    sub that { print "You picked 'that'\n" }
    sub quit { exit }

    print "Enter a command:\n";
    while (<STDIN>) {
        chomp(my $cmd = $_);
        if ($cmd eq 'this') {
            this();
        } elsif ($cmd eq 'that') {
            that();
        } elsif ($cmd eq 'quit' or $cmd eq 'exit') {
            quit();
        } else {
            print "Unrecognized command: $cmd\n";
        }
    }
    __END__

Now imagine that there are many more possible commands the user could enter and you can see that the program would grow quite large. Using a dispatch table simplifies all of the logic of the if/elsif statements into the datastructure:

    #!/usr/bin/perl -w
    use strict;

    sub this { print "You picked 'this'\n" }
    sub that { print "You picked 'that'\n" }
    sub quit { exit }

    my %dispatch = (
        this => \&this,
        that => \&that,
        quit => \&quit,
        exit => \&quit,
        );

    print "Enter a command:\n";
    while (<STDIN>) {
        chomp(my $cmd = $_);
        if ($dispatch{$cmd}) {
            $dispatch{$cmd}->();
        } else {
            print "Unrecognized command: $cmd\n";
        }
    }
    __END__

This program is actually a little longer, but it is simpler in design and simpler to maintain and update — adding new commands involves only defining a subroutine and adding another entry to the hash (we don’t have to add further elsif clauses as we would with the first example.

*****

Tieing Variables

Andrew L. Johnson (First published by ItWorld.com 2001-08-09)

We have seen tie() used in previous articles dealing with DBM files, but we haven’t really talked about what exactly tie() does, or how you can create your own tied classes.

Essentially, tie() allows you to associate a variable with a particular class. Special methods are defined in a tied class that are implicitly called whenever you access this variable. In the Digest::MD5 article (Feb 2001), we tied an ordinary hash to a Berkeley DBM file — thus our hash acted as a persistent database on disk. All the magic of storing and retrieving hash elements from the disk was encapsulated in the DB_File module and was automatically performed whenever we assigned to the hash or retrieved and element from the hash.

You can tie scalars, arrays, hashes, and filehandles by defining an appropriate tied class implementing the behavior you desire. A main difference between a tied class and an ordinary class is that you must use special names (in all caps) for the methods that will be magically called when accessing a variable.

For a concrete example, let’s say we want to have a variable that takes on a random value from a list each time we use it — we will call the class RandSelect (and thus define a module named RandSelect.pm). We want to use this variable as in the following example:

    #!/usr/bin/perl -w
    use strict;
    use RandSelect;

    tie my $rand, 'RandSelect', qw(andrew greg brad john);
    print "The first six random selections are:\n";
    print $rand, "\n" for 1..6;

Which might print out (in one particular run):

    The first six random selections are:
    andrew
    greg
    john
    john
    andrew
    brad

But first we need to create our tied class. Our class begins in the normal fashion:

    package RandSelect;
    use strict;

but things change after this. The constructor for a tied scalar variable class is called TIESCALAR, and its first argument (like class methods in general) will be the class name:

    sub TIESCALAR {
        my $class = shift;
        return bless [@_], $class;
    }

That’s the entire constructor: We grab the class name and then we return a blessed anonymous array containing the remaining arguments passed to the call to the tie() function. We also need to create the method that will be called when we retrieve the value of a tied variable (named FETCH), and the one that will be called when we assign to a tied variable (named STORE). Each of this is an ordinary method call that will receive the object itself as the first argument, and then any remaining arguments (if any).

Even though we’ve tied a scalar variable, our object is really an array reference, so that’s what we deal with when trying to fetch and store values:

    sub FETCH {
        my $self = shift;
        return $self->[rand @$self];
    }

    sub STORE {
        my $self = shift;
        @$self = @{shift()};
    }
    1;
    __END__

Our FETCH routine merely returns a random element from the array reference, and the STORE routine expects an array reference and stores a copy of it in the object. We finish off our module with a single statement returning a true value (just ‘1;’) and that completes the entire tied scalar class. (we really should provide some error checking to ensure an array reference is passed to STORE, but that is an exercise for the reader).

There is already a module on CPAN (Tie::Pick) that is similar to the above but removes each element as it is chosen from the list.

Tieing a scalar is the simplest kind of tied variable to create, there are many more methods that you need to define for tieing arrays and hashes. Further information on tie and creating tied classes can be found in the following Perl documentation pages:

    perldoc -f tie
    perldoc perltie
    perldoc Tie::Scalar
    perldoc Tie::Array
    perldoc Tie::Hash
    perldoc Tie::Handle

One final note: There is a bug in Perl 5.6.1 that sometimes causes a tied variable to be accessed twice when it is interpolated — that is why in the example above I used ‘print $rand, "\n"’ instead of the simpler ‘print "$rand\n"’. This bug should be fixed in 5.6.2.

*****

Requiring Configuration Data

Andrew L. Johnson (First published by ItWorld.com 2001-08-02)

Often we write programs that need to use configuration data supplied by the user, but we do not wish the user to have to supply that information on the command line every time they run it. In such circumstances it can be useful to allow the user to store the information somewhere and have the program utilize it on each invocation.

One way to simple way to do this is to have the user put the data after the DATA in the script and have the script read it in:

    #!/usr/bin/perl -w
    use strict;
    my %config = configure();

    foreach my $opt (keys %config) {
        print "$opt : $config{$opt}\n";
    }

    sub configure {
        my %cfg;
        while(<DATA>){
            next if /^\s*#/ or /^\s*$/;
            chomp;
            my($option, $value) = split /\s*=\s*/;
            $cfg{$option} = $value;
        }
        return %cfg;
    }

    __DATA__
    # This is the configuration section. Syntax is:
    # OPTION_NAME = VALUE

    # set user name:
    USER_NAME = nobody

    # set email address
    EMAIL = nobody@nowhere.com

    # debugging level (1,2,3)
    DEBUG = 2

The pitfalls of this approach are twofold: first, only one user of the script may configure it (each user must run their own copy of the program with their own configuration details), and second, we are responsible for parsing the configuration info.

We can get around the second pitfall if we just ask the user to use Perl’s hash syntax and set their configuration info in the hash at the top of the script:

    #!/usr/bin/perl -w
    use strict;
    my %config = (
        # set user name:
        USER_NAME => 'nobody',

        # set email address:
        EMAIL => 'nobody@nowhere.com',

        # set debugging level (1,2,3):
        DEBUG => 2,
    );

However, this is still set within the script and we want to allow for different users of the same program. To do this we use a separate config file for each user (stored in the users home directory with a special name). We can choose to use same form as the DATA example above and write our own parser, or we can specify a format that Perl can understand. One trick for having Perl do the work is to realize that Perl’s require() function (used to load in modules or other chunks of Perl code) has a return value of the last expression evaluated in the file. This means we can use an anonymous hash as our configuration format and have it be the only thing in the file — for example, a config file could look like:

    {
        # set user name:
        USER_NAME => 'nobody',

        # set email address:
        EMAIL => 'nobody@nowhere.com',

        # set debugging level (1,2,3):
        DEBUG => 2,
    }

As far as Perl is concerned, that is an anonymous hash and if this file is stored as ’.config’ in the users HOME directory (where the environment variable $HOME points to), we can simply require() it and grab the return value:

    #!/usr/bin/perl -w
    use strict;
    my $config = require "$ENV{HOME}/.config";
    foreach my $opt (keys %$config) {
        print "$opt : $config->{$opt}\n";
    }

Now each user can have their own config file and we avoid having to parse it by using a format (an anonymous hash) that Perl understands. Utilizing the return value of the require() function can be useful in other situations as well.

Named Parameters for Subroutine Calls

Andrew L. Johnson (First published by ItWorld.com 2001-07-26)

Perl doesn’t have typed or named parameters for subroutines, all parameters are passed in the @_ array as a flat list. However, since a list can also be a hash, we can fake it and create subroutines that can be called like:

    my $status = login( -username => $user,
                        -password => $pass,
                        -host     => $host,
                       );

This assumes we have already obtained values for $user and $pass from the user input or the command line (or wherever) and we have a function named login() that is apparently going to log us in to whatever $host is. Without worrying about the specifics of the function itself, let’s look at how it can process the arguments:

    sub login {
        my %args = @_;
        # do stuff
    }

Easy right, so what benefit do we get? Well, we get two benefits, the first of which is that our calling code is rather self-documenting (even if we chose horrible names for the variables themselves, the hash keys give it all away). Another important benefit is that we no longer have to worry about which order we give the arguments, and neither does our subroutine! We can call it like above, or like either of these:

    my $status = login( -password => $pass,
                        -host     => $host,
                        -username => $user,
                       );

    my $status = login( -password => $pass,
                        -username => $user,
                        -host     => $host,
                       );

And, as a bonus, if want a subroutine where some or all of the arguments are optional (in any order), we can do that too:

    sub draw_rect3D {
        my %args = shift;
        $args{-length} ||= 1;
        $args{-width}  ||= 1;
        $args{-height} ||= 1;
        $args{-units} = $args{-units} eq 'in'? 'in' : 'cm';
        # get to work
    }

Now we can call this and supply any of the parameters, all of the parameters, or none of the parameters (the default will be a 1 centimeter cube):

    draw_rect3D(-length => 2, -units => 'in'); # 2 x 1 x 1 inches
    draw_rect3D(-length => 5, -wdith => 2);    # 5 x 2 x 1 centimeters
    draw_rect3D(-length => 5,
                -height => 10,
                -units  => 'in',
                -width  => 3,
               );                  # 5 x 3 x 10 inches

This is quite unlike our regular way of just using the argument array as an array — in that case the order you use must always be the same, and optional arguments must be at the end.

You will see this style of function used often for Object constructor methods (and other OO methods) and it is widely used in the various Tk routines and methods (for GUI programming). But you can use this technique with any regular subroutine — all we are doing is using the argument list as a hash instead of a list or array.

*****