Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc1
Article Calendar
<= July, 2008
S M T W T F S
12345
6789101112
13141516171819
20212223242526
2728293031
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Understanding References, Part 3: Nested Structures

Andrew L. Johnson (First published by ItWorld.com 2001-04-19)

We finished last week with an example of solving passing multiple arrays to a subroutine by passing references to arrays rather than the arrays themselves. You may not have noticed, but in doing so we made use of a multidimensional array (though only briefly). When we pass arguments to a subroutine, the arguments are stored in the @_ array — so when we pass array references we’ve actually made an array of arrays (well, an array of array references — but that’s what a two dimensional array is in Perl). Look at this example:

    my @one = (9,8,7);
    my @two = (6,5,4);
    foo(\@one, \@two);
    sub foo {
        print "$_[1][1]\n";
    }

The above prints 5 (the second element of the second array). Ignoring how the dereference works for a minute, let’s look at a similar example not involving subroutines:

    my @one = (9,8,7);
    my @two = (6,5,4);
    my @rows = (\@one, \@two);
    print "$rows[1][1]\n";      # prints: 5

The same thing is going on here except that we are explicitly assigning to an array @rows rather than implicitly assigning to the special @_ array (which holds subroutine arguments). Now, let’s return to dereferencing to figure out why this works as it does.

We have already seen two ways of dereferencing an array reference:

    my @array = (42, 13, 12);
    my $aref  =  \@array;
    print "${$aref}[1]\n";    # explicit
    print "$$aref[1]\n";      # shortcut

An alternate method is to use the "arrow" dereference syntax:

    my @array = (42, 13, 12);
    my $aref  =  \@array;
    print "$aref->[1]\n";

In this method, perl assumes that whatever is to the left of the "arrow" resolves to a reference, so perl dereferences it and then returns the value given by the subscript on the right. This works for hashes as well:

    my %hash = (name => 'andrew', beer => 'dark-ale');
    my $href = \%hash;
    print "$href->{beer}\n";

Now, we can return to the two dimensional array example and use the arrow syntax:

    my @one = (9,8,7);
    my @two = (6,5,4);
    my @rows = (\@one, \@two);
    print "$rows[1]->[1]\n";      # prints: 5

Here, everything to the left of the arrow ($rows[1]) resolves to a reference, and then we look up the [1] subscript in that array. Because Perl has no multidimensional arrays, Perl knows that ordinarily an expression like: $rows[1][1] wouldn’t make sense. So, Perl assumes that there is always an implicit arrow between any two subscripts — this is why $rows[1][1] actually works, because internally Perl assumes it to mean $rows[1]->[1].

The same is true for hash references and any mixture of hash and array references:

    my @beer = ('dark-ale', 'pale-ale', 'stout');
    my %hash = ( name => 'andrew', beer => \@beer);
    print "$hash{name}'s favorite beer is $hash{beer}[0]\n";
    print "$hash{name} will accept: @{$hash{beer}}\n";

In the above, the $hash{beer}[0] is the same as: $hash{beer}->[0]. The thing to the left of the array resolves to an array reference and the subscript on the right gets the value at that index. Now, in the second print statement we had to use the extra curly braces to dereference to entire array reference — we could not just do:

    @$hash{beer}

Because Perl assumes that $hash is the reference (we don’t even have a $hash variable in our example) instead of $hash{beer}. In other words, without the curly braces, it is parsed as:

    @{$hash}{beer}

which isn’t what we wanted. So we have to group the expression that actually resolves to the reference inside of curly braces

    @{ $hash{beer} }

and then perl knows exactly what we are trying to dereference. Creating multidimensional structures is not difficult — but Perl gives another very useful convenience called anonymous arrays and hashes that we will look at next week.

Next Week: Understanding References (part 4: anonymous structures)

*****

Understanding References, Part 2: References to Arrays and Hashes

Andrew L. Johnson (First published by ItWorld.com 2001-04-12)

Last week we looked at just the basics of taking a reference and then dereferencing it — this week we will look at taking references to arrays and hashes and expand on our dereferencing syntax.

Remember that a scalar variable can hold one thing — basically: a string, a number, or a reference. We’ve already stored a reference to a scalar variable in another scalar variable, but we can also store a reference to an array (or a hash) in a scalar variable as well:

    my @array = (42, 13, 99);
    my $aref  = \@array;

In this case, the variable $aref holds a reference to @array (or, in terms of last week’s discussion, it holds the memory slot number (address) of the array — actually, an array is many slots but we don’t have to worry about that at the moment). But how do we get at the array through the reference?

Recall the syntax we used to dereference a scalar variable:

    my $foo = 42;
    my $bar = \$foo;
    print $$bar;

That last line is actually a slight shortcut — the more explicit version would be:

    print ${$bar};

That is, we put the reference inside of curly braces and then precede it with the type of value of value we are dereferencing — in this case, we use a $ because the value we are getting at is a scalar value. So, for that array case we do the following:

    my @array = (42, 13, 99);
    my $aref  = \@array;
    print "@{$aref}\n";        # print whole array (explicit method)
    print "@$aref\n";          # print whole array (shortcut method)

To get at just one element, let’s first look at getting one element from the real array:

    print $array[1];     # prints: 13

Breaking this down we have: a type symbol for the type of value (arrays hold scalar values as elements), the array name, and the index. To do the same thing with a reference we replace the array name with the reference to it:

    print ${$aref}[1];  # explicit method
    print $$aref[1];    # shortcut method

We can do the same things with hashes as well:

    my %hash = ( name => 'Andrew', beer => 'Dark Ale' );
    my $href = \%hash;

    print "${$href}{name} ";       # explicit
    print "drinks $$href{beer}\n"; # shortcut

    # or to iterate over the hash:

    foreach my $key ( keys %$href ) {
        print "$key : $$href{$key}\n";
    }

How do references help us? One useful thing they allow us to do is to pass multiple arrays and/or hashes into a subroutine. Remember that a subroutine receives its parameters as one long flat list of arguments, so to create a subroutine that compares two arrays to see if they contain the same numerical elements we need to use references:

    my @one = (42, 13, 99, 72, 1);
    my @two = (42, 13, 99, 72, 1);
    if ( cmp_arrays(\@one, \@two) ) {
        print "Arrays are the same\n";
    } else {
        print "Arrays are not the same\n";
    }

    sub cmp_arrays {
        my ($one, $two) = @_;
        return 0 unless @$one == @$two; # check sizes
        foreach my $i (0 .. @$one - 1) {
            return 0 unless $$one[$i] == $$two[$i];
        }
        return 1;
    }

Next week we will look at alternate dereferencing methods and building nested data structures such as multidimensional arrays and hashes.

*****

Understanding References, Part 1

Andrew L. Johnson (First published by ItWorld.com 2001-04-05)

As you probably already know, Perl has three basic variable types: scalar variables, array variables, and hash variables. A scalar variable holds a scalar value (one thing, either a number, a string, or a reference). An array holds a list of scalar values, and a hash holds a set of key/value pairs where each value is a scalar value.

But, what if we want to have a two dimensional array? We cannot simply do this:

    my @foo  = (1,2,3);
    my @bar  = (4,5,6);
    my @twod = (@foo, @bar);

Well, we can do that, but it does not give us a two dimensional array — it gives us a single array hold (1,2,3,4,5,6). The trick is to use references. We saw above that a reference is a scalar value, but we have not yet said what a reference is. Consider the following simple assignment:

    my $foo = 30;

A simple way to think about this is that the number 30 is stored in a slot in memory. All such memory slots have an address (a slot number), and Perl internally associates the variable $foo with the slot address where it stored the number 30. The only way we can access that number again is through the variable $foo (which knows which slot it is in). So, $foo has a slot number (address) and when we access $foo we access that slot. What happens in the following?

    my $foo = 30;
    my $bar = $foo;

The first assignment is the same as the above, but what happens with the assignment to $bar? Well, Perl first gives $bar its own memory slot, and then assigns the number stored in $foo’s slot (copies it) into $bar’s slot. Now the number 30 is stored in two different slots. If we then do: $foo = 42; What happens? All that happens is that we assign a new number into $foo’s slot — we did not touch $bar at all and it remains unchanged (still holding 30).

A reference, then, is simply the slot number (address) of another variable (instead of its contents). We can take a reference using the backslash operator (future articles will discuss other ways of creating a reference):

    my $foo = 30;
    my $bar = \$foo;

Now $bar still has its own slot, but instead of assigning the contents of $foo’s slot into it, we assign $foo’s slot number itself. If we then print out $bar we see that it does not hold 30:

    my $foo = 30;
    my $bar = \$foo;
    print "$bar\n";   #prints: SCALAR(0x80ee87c)

When we print a reference it tells us the type of reference (SCALAR in this case) and the memory address (0x80ee87c in this case, but if you try it you will probably get a different number). The address is just a hexadecimal number, but that doesn’t matter for the purposes of our discussion. Now, how can use this reference? We have to dereference it:

    my $foo = 30;
    my $bar = \$foo;
    print "$$bar\n";         # prints: 30
    $$bar = 42;
    print "$foo : $$bar\n";  # prints: 42 : 42

Notice that we used $$bar (two $ signs) to dereference the variable. When we print out the dereferenced value we get the contents of the memory slot it was pointing to — and similarly, when we assigned into the dereferenced variable we assigned into the slot it was pointing to, so the contents of $foo’s slot were actually changed. It may not look like we have much at this point, merely adding an extra step to get to the contents of a variable — but I assure you, this is a foundation for a great deal of usefulness as we will see in the next few articles.

*****

Building a Distribution with h2xs

Andrew L. Johnson (First published by ItWorld.com 2001-03-29)

I have discussed using h2xs in a previous article (dated April 27, 2000 in the article archives: see link below), but I think it would good to look at it again in a little more depth in light of the last two articles and because it produces slightly different output in version 5.6.0 than in 5.005.

Let’s say we decide to build our Cool module and we want to be able to package it up and share it (or perhaps upload it to CPAN if you have a CPAN id). We can automate much of the task by using the h2xs utility that comes with perl:

    [jandrew]$ h2xs -XAn Cool
    Writing Cool/Cool.pm
    Writing Cool/Makefile.PL
    Writing Cool/test.pl
    Writing Cool/Changes
    Writing Cool/MANIFEST

We used the -X option because we are not creating an XS module, and the -A option because we are not using the Autoloader facility (both are beyond the scope of this article), and the -n option to specify the name of the module (Cool in this case). You can see from the output that it has created a ‘Cool’ directory and written several files in it for us. The one we are really interested in is the Cool.pm file, which is a skeleton of our module. If you are using version 5.005 then the contents of the module file will begin like:

    package Cool;

    use strict;
    use vars qw($VERSION @ISA @EXPORT @EXPORT_OK);

    require Exporter;

    @ISA = qw(Exporter AutoLoader);

    # Items to export into callers namespace by default. Note: do not
    # export names by default without a very good reason. Use
    # EXPORT_OK instead.  Do not simply export all your public
    # functions/methods/constants.

    @EXPORT = qw(

    );
    $VERSION = '0.01';

    # Preloaded methods go here.

    # Autoload methods go after =cut, and are processed by the
    # autosplit program.

    1;
    __END__

There is also stub POD after the END token (see: perldoc perlpod). Unfortunately, even using the -A option, this still tries to include Autoloader in the @ISA array — you should remove that, and delete the the last comment since we aren’t using autoloaded methods.

As you can see, this is the basic structure of the module we previously built by hand, although there I neglected to ‘use strict’ and declare our globals with ‘use vars’ (oversight on my part). You’ll also note that although the @EXPORT array is ready for us to fill, the @EXPORT_OK array isn’t set up — I would simply change that to be @EXPORT_OK if I wanted to export by demand (which I usually do).

All that’s left for us to do here is add put our cool() function into the @EXPORT_OK array and then define that subroutine after the ‘Preloaded methods go here’ comment.

In 5.6.0 the skeleton file begins like:

    package Cool;

    require 5.005_62;
    use strict;
    use warnings;

    require Exporter;

    our @ISA = qw(Exporter);

    # Items to export into callers namespace by default. Note: do not
    # export names by default without a very good reason. Use EXPORT_OK
    # instead.  Do not simply export all your public
    # functions/methods/constants.

    # This allows declaration use Cool ':all'; If you do not need this,
    # moving things directly into @EXPORT or @EXPORT_OK will save memory.

    our %EXPORT_TAGS = ( 'all' => [ qw(

    ) ] );

    our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );

    our @EXPORT = qw(

    );
    our $VERSION = '0.01';

    # Preloaded methods go here.

    1;
    __END__

The only differences being that the arrays are now declared with our() which is new in 5.6.0, and we have an %EXPORT_TAGS hash. Using this hash is beyond the scope of this article, and isn’t necessary for our simple module — thus, I suggest simply ignoring or deleting it and removing the @{ $EXPORT_TAGS{‘all’} } expression from the @EXPORT_OK array. As before, all you need to do now is add your function definition and put its name into whichever of the EXPORT arrays you desire.

But there is more to a distribution than simply having a skeleton .pm file built for us — what about those other files? Well, the other files are there to automated building, testing, and installing the module. Once you’ve completed filling your module, you can install it like so:

    perl Makefile.PL
    make
    make test
    make install

And, if you wanted to install it under a private directory:

    perl Makefile.PL PREFIX=/home/jandrew/perl5lib

When installing, perl will create any needed subdirectories under that directory and install the module appropriately. If you are on windows, you’ll probably need to get ‘nmake’ and use that rather than make.

To create up a distributable package (containing your module and all the ancillary files) you need only do:

    make dist

Doing so in my example module directory creates a ‘Cool-0.01.tar.gz’ which I can now send to a friend or upload to my web-site or otherwise distribute. For further module information please see the following perl docs:

    perldoc perlmod
    perldoc perlmodlib
    perldoc perlmodinstall
    perldoc -f use
    perldoc -f require
Modules, Part 2: Installing a Module

Andrew L. Johnson (First published by ItWorld.com 2001-03-22)

Last week we built a simple module the defined and exported (on demand) one function. We also tested using the module with a test script in the same directory. This week we will look at how to install the module so we can use it from anywhere (well, within reason).

The first thing to know is where perl looks for modules when it wants to load them — and we can find out by looking at the special @INC (include) array. If you type ‘perl -V’ at the command prompt you will see a bunch of configuration information, at the of which you will see the contents of the @INC. You can also just view the @INC array with this command-line invocation (you might have to fiddle with the quoting depending on your shell):

    perl -le 'print join "\n", @INC'

On my machine it says:

    [jandrew]$ perl -le 'print join "\n",@INC'
    /usr/local/lib/perl5/5.6.0/i586-linux
    /usr/local/lib/perl5/5.6.0
    /usr/local/lib/perl5/site_perl/5.6.0/i586-linux
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl/5.005/i586-linux
    /usr/local/lib/perl5/site_perl/5.005
    /usr/local/lib/perl5/site_perl
    .

These are the search paths perl uses to find modules. Notice, that last line is just a dot — meaning current directory, which is why our test script was able to work.

So, if we want to be able to use our Cool.pm module we can place it in one of those directories, and we usually use the sit_perl directory for our current version of Perl. My version here is 5.6.0 so I would place the Cool module at:

    /usr/local/lib/perl5/site_perl/5.6.0/Cool.pm

Now, you may not have permissions to install into these directories. In this case you have two options for using a private installation directory. First create a directory where you will install your private modules — I might use /home/jandrew/perl5lib or something — and put your module there. Now we have to tell Perl where to find it (or rather, we have to get this directory into the @INC array). The ‘use lib’ pragma can be used on a script by script basis to install extra directories in @INC:

    #!/usr/bin/perl -w
    use strict;
    use lib '/home/jandrew/perl5lib';
    use Cool;

That ‘use lib’ line tells perl to install that directory at the beginning of the @INC array (so it will be searched first). The problem with this is that we have to use this line in every script that needs to use one of our private modules — and if we share our module and scripts with someone else, they’ll need to install the module in their own private directory and change that line in ever script we give them. The alternative is to use the PERL5LIB environment variable. Setting this variable (by whatever means your platform or shell uses) gives us a way to tell Perl where to look without our having to specify it in each script:

    [jandrew]$ export PERL5LIB=/home/jandrew/myperlib
    [jandrew]$ perl -le 'print join "\n", @INC'
    /home/jandrew/myperlib
    /usr/local/lib/perl5/5.6.0/i586-linux
    /usr/local/lib/perl5/5.6.0
    /usr/local/lib/perl5/site_perl/5.6.0/i586-linux
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl/5.005/i586-linux
    /usr/local/lib/perl5/site_perl/5.005
    /usr/local/lib/perl5/site_perl
    .

Here we see that once the environment variable is set, that directory is then automatically prepended to the @INC array. Now we can decide to change our private directory and move all of our modules and we only have to change this environment variable to point to the new location rather than each of our scripts. Next week we will look at packaging up our module so we can conveniently distribute it to others.

*****