Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc0
Article Calendar
<= April, 2012
S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Deleting Elements from An Array

Andrew L. Johnson (First published by ItWorld.com 2001-09-13)

There are two things one might mean by ‘deleting’ elements from an array: deleting the value for a particular index (or indices) in the array (while still leaving the slot in the array open), or, actually removing a slot (and its contents) from the array. The first case can be accomplished with the delete() function, and the second with the splice function.

    my @array = (0,1,2,3,4,5,6);
    delete $array[3];
    print join(':', @array),"\n";

    splice(@array, 3, 1);
    print join(':', @array),"\n";

This snippet produces the following output:

    Use of uninitialized value in join or string at - line 3.
    0:1:2::4:5:6
    0:1:2:4:5:6

You can see that the delete() function only deletes the value at index 3 in the array, while the splice() function removes the slot entirely and shifts the remainder of the array down to fill in the gap.

The delete() function can also be used on an array slice as well as a single element — and that slice need not be a contiguous range of elements:

    # delete a range
    delete @array[0..3];
    # or a discontiquous slice
    delete @array[0,3,5];

The splice() function may also be used to remove a range of elements from an array, but not a discontiguous slice:

    splice(@array,0,3);  # remove 3 elements starting at index 0.

One may think that the delete() function (formerly only allowed on hash elements) is nothing more than simply undef()’ing elements in an array, or assigning either multiple undef values to multiple elements or perhaps assigning an empty list to multiple values:

    my @array = (0,1,2,3,4,5,6);
    $array[0] = undef;
    @array[1,2] = ();
    @array[3,4] = (undef,undef);
    print join(':', @array),"\n";

which prints (ignoring warnings):

    :::::5:6

This is what we’d expect if we’d used delete() as well. However, the methods are not entirely equivalent. The delete() function has a companion exists() function (also formerly only used with hashes) that detects the difference between and array element that has been deleted and one that has been undefined:

    my @array = (0,1,2,3,4,5,6);
    @array[0,3,5] = (undef,undef,undef);
    print "1: Still there\n" if exists $array[3];
    @array[0,3,5] = ();
    print "2: Still there\n" if exists $array[3];
    delete @array[0,3,5];
    print "3: Still there\n" if exists $array[3];

Which produces:

    1: Still there
    2: Still there

So even though a given element is undefined (ie, the defined() function would return false), Perl can still tell if it has been delete()’ed or not. In many situations the delete() function applied to an array element or slice is no better than the other methods shown, and somewhat slower. But, some algorithms may find it useful to be able to determine if the value of an array element is undefined because it was assigned an undefined value or because it was intentionally deleted.

This discussion brings up one additional warning: assigning an empty list to an array slice does not remove any array elements (it merely assigns an undefined value to the slice elements). Thus the following two statements do not produce the same result (even if it might seem that they are logically equivalent):

    my @array = (0,1,2,3,4,5,6);
    @array[0..$#array] = ();
    print join(':', @array),"\n";

    my @array = (0,1,2,3,4,5,6);
    @array = ();
    print join(':', @array),"\n";

Even though the first snippet assigns an empty list to a slice covering the entire array, the array slots themselves are still considered to be in the array. The second snippet assigns the empty list to the array itself and therefore results in an empty array.

*****

Iterating Over the Fibonacci Numbers

Andrew L. Johnson (First published by ItWorld.com 2001-09-06)

Consider the Fibonacci numbers (0,1,1,2,3,5,8,13,21,34,…), where the next number is the sum of the previous two in the series. We could create a recursive routine to calculate these:

    #!/usr/bin/perl -w
    use strict;

    sub fib {
      my $n = shift;
      return $n if $n < 2;
      fib($n-1) + fib($n-2);
    }

    for(0..9) {
        print fib($_),"\n";
    }
    __END__

However, a problem with this approach is that sue to its recursive nature, the fib() routine must recalculate earlier numbers in the series every time it is called. One simple solution is to create a cache to store numbers in the series and have the function use those if they exist This is called Memoization and there is a Memoize module on CPAN (we discussed this module back in May 2001). This technique is a trade off between using more space (the cache) and gaining more speed.

If one only wishes to iterate through the series in order (perhaps with breaks to do other processing) then another solution is to create an iterator using a closure:

    #!/usr/bin/perl -w
    use strict;
    sub gen_fib {
        my($curr, $next) = (0,1);
        return sub {
            my $fib = $curr;
            ($curr, $next) = ($next, $curr + $next);
            return $fib;
        };
    }

    my $fib_stream1 = gen_fib();
    my $fib_stream2 = gen_fib();

    # print first 5 Fibonacci numbers from stream 1
    print "Stream One:";
    for(1..5){
        print $fib_stream1->(), ' ';
    }
    print "\n";

    # print first 10 from stream 2
    print "Stream Two:";
    for (1..10) {
        print $fib_stream2->(), ' ';
    }
    print "\n";

    # print next 5 from stream 1
    print "Stream One:";
    for(1..5){
        print $fib_stream1->(), ' ';
    }
    print "\n";
    __END__

Some advantages of this technique are that it only stores enough state to calculate the next number (without recursion) rather than all the previous numbers in the series as memoization does, and you can have more than one independent iterator (or stream) in the same program. Disadvantages are that it is only useful in cases where you actually want to iterate through the series — it is not useful for calculating the Nth Fibonacci number (which the recursive solution could be used for).

You do not have to iterate over mathematical functions, you can iterate over anything — it is possible to create an iterator that iterates over N arrays in n-tuples (ie, get the first item from each array, then the second, …). With an array of array refs as a table, this is analogous to iterating over the columns rather than the rows.

Subroutine References as Closures

Andrew L. Johnson (First published by ItWorld.com 2001-08-30)

A closure is a tricky kind of thing to explain, but not terribly hard to understand. The following may not meet with computer science approval as an explanation of closures, but the examples should provide enough hands-on exposure to grasp underlying concept.

The first thing we need to understand is that named subroutines are global by nature — that is, the subroutine name (identifier) lives in the package’s symbol table. However, they run within the lexical scope they were defined within:

    #!/usr/bin/perl -w
    use strict;

    {
        my $inc = 10;
        sub incr {
            print "$inc\n";
            $inc++;
        }
    }

    incr();
    incr();
    __END__

    prints:
        10
        11

In the above, $inc is lexically scoped within the block, and the subroutine (incr()) is defined within the same block. We can not refer to $inc after the block (scope) is finished, but we can call the subroutine and it runs in the scope it was defined in so it can still access $inc and change it.

A closure is simply an anonymous subroutine that is bound to the lexical scope it was defined within (rather like the above situation.

    #!/usr/bin/perl -w
    use strict;

    sub make_incr {
        my $inc = shift;
        return sub {print "$inc\n"; $inc++};
    }

    my $c1 = make_incr(10);
    my $c2 = make_incr(20);

    $c1->();
    $c2->();
    $c1->();
    $c2->();
    __END__
    prints:
        10
        20
        11
        21

Our make_incr() routine is a closure generator — it takes an argument and creates a new lexical variable. It then returns an anonymous subroutine that uses this same variable. Each time we call this routine a new lexical $inc is created and the anonymous subroutine refers to that particular copy of $inc. Each closure generated ($c1 and $c2) have their own private copies of $inc to work with. Essentially, they each can save information about their current state (in this case, what number they are currently at).

Here is another simple example (adapted from my book) where the closure itself also takes an argument:

    sub exclaim {
        my $prefix = shift;
        return sub {print "$prefix $_[0]!\n"};
    }

    my $batman = exclaim('Indeed');
    my $robin  = exclaim('Holy');

    $robin->('Mackerel');   # prints: Holy Mackerel!
    $batman->('Robin');     # prints: Indeed Robin!

Closures can be useful for creating callback routines and iterators. The above example of make_incr() creates very simple iterators that simply count up by ones from a given base number. However, you can create an iterator to iterate through more complex functions as well. Next week we will discuss using a closure as an iterator over the Fibonacci numbers.

*****

Passing Subroutine References

Andrew L. Johnson (First published by ItWorld.com 2001-08-23)

Last week we look at using subroutine references for creating a dispatch table, this week we will consider passing subroutine references as arguments to other subroutines.

The ability to pass subroutine references is a powerful feature. Any subroutine you write that takes arguments can be thought of a general routine and the arguments provide the specifics. For example, a subroutine that calculates the square root of a number is general, and the argument provides the specific number to calculate the square root from and return the result. Passing a subroutine reference is no different in that respect, but rather than passing something static like a number or a string, you can pass a definition of an action to take.

An excellent example is the File::Find module which defines the find() routine (we saw this module back in February). The find() routine is a general directory tree traversal subroutine whose first argument is a reference to a subroutine, followed by a list of directories to traverse. The find() routine traverses the listed directories and calls the subroutine reference for each directory entry found (it also provides a package global variable holding the current directory entry, for your subroutine to use). This gives the user of the module a very flexible means of accomplishing virtually any task involving directory traversal — because the user supplies the intended action, the find() routine itself need only worry about the vagaries of traversing directories and none of the logic of detecting certain files, removing files, renaming files, or whatever else the user wishes to do.

You already know how to create a reference to a named function, and how to call a function through its reference — there really isn’t anything else you need to know to write a subroutine that takes a subroutine reference as an argument. Let’s consider writing a reduce() routine in Perl (such a routine exists in the List::Util module on CPAN, which is implemented in C, but we will implement in Perl here).

A reduce() function takes a function reference as its first argument and then a list of values. It "reduces" the list by applying the passed function to the first two values, then to the result of that and the next value in turn. Thus, it could be used to easily produce sums of lists by supplying a function reference that simply adds two arguments together:

    #!/usr/bin/perl -w
    use strict;
    my @array = (1,2,3,4);

    my $sum = reduce(\&add,@array);
    print $sum;

    sub add {
        $_[0] + $_[1]
    }

    sub reduce {
        my $sref = shift;
        return @_ if @_ < 2;
        my $init = shift;
        for (@_) {
            $init = $sref->($init, $_);
        }
        return $init;
    }

We have taken the extra caution of simply returning the argument list without calling the sub reference if a list of only one element is passed to reduce().

The above produces a result of 10. Using the same reduce function we could produce the results of multiplying through the list, or dividing through the list, or any other action we created a function to do:

    sub product { $_[0] * $_[1] }
    sub divide  { $_[0] / $_[1] }

    my $product  = reduce(\&product, @array);
    my $division = reduce(\&divide,  @array);

We can avoid creating named subroutines by using anonymous subroutines (just like we can have anonymous arrays and hashes, we can have anonymous subroutines which are references to subroutines with no names). We create such a subroutine reference by using the ‘sub’ keyword followed immediately by the block of code (no name is specified):

    my $sref = sub{print "Hello World\n"};
    $sref->();

So now we can use reduce() as follows:

    my $sum      = reduce( sub{ $_[0] + $_[1] }, @array);
    my $product  = reduce( sub{ $_[0] * $_[1] }, @array);
    my $division = reduce( sub{ $_[0] / $_[1] }, @array);

We could shorten that further using prototypes, but that is beyond the scope of the present article.

*****

Subroutine References

Andrew L. Johnson (First published by ItWorld.com 2001-08-16)

We have used subroutine references a couple of times in previous articles, but we haven’t taken a look at them directly.

You can take a reference to an existing (named) subroutine using the backslash operator along the ampersand in the following manner:

    sub foo {
        print "This is foo\n";
    }

    my $sref = \&foo;

You can call the referenced routine by preceding the scalar holding it with an ampersand, or by using the dereference arrow and parentheses:

    &$sref();     # or:  &$sref('argument')
    $sref->();    # or:  $sref->('argument')

It is important to realize that you cannot take a reference to a subroutine and pass it arguments at the same time:

    my $sref = \&foo('argument');

What this actually does is return a reference to the return value of that function (using the given argument), which probably isn’t what you wanted.

But what are subroutine references good for? A couple of common uses are dispatch tables and passing routines as arguments to other routines — we will consider the first here. A dispatch table is simply a table (usually a hash) that allows you to select which routine to run. Consider a user interface that queries the user to enter a command to run:

    #!/usr/bin/perl -w
    use strict;

    sub this { print "You picked 'this'\n" }
    sub that { print "You picked 'that'\n" }
    sub quit { exit }

    print "Enter a command:\n";
    while (<STDIN>) {
        chomp(my $cmd = $_);
        if ($cmd eq 'this') {
            this();
        } elsif ($cmd eq 'that') {
            that();
        } elsif ($cmd eq 'quit' or $cmd eq 'exit') {
            quit();
        } else {
            print "Unrecognized command: $cmd\n";
        }
    }
    __END__

Now imagine that there are many more possible commands the user could enter and you can see that the program would grow quite large. Using a dispatch table simplifies all of the logic of the if/elsif statements into the datastructure:

    #!/usr/bin/perl -w
    use strict;

    sub this { print "You picked 'this'\n" }
    sub that { print "You picked 'that'\n" }
    sub quit { exit }

    my %dispatch = (
        this => \&this,
        that => \&that,
        quit => \&quit,
        exit => \&quit,
        );

    print "Enter a command:\n";
    while (<STDIN>) {
        chomp(my $cmd = $_);
        if ($dispatch{$cmd}) {
            $dispatch{$cmd}->();
        } else {
            print "Unrecognized command: $cmd\n";
        }
    }
    __END__

This program is actually a little longer, but it is simpler in design and simpler to maintain and update — adding new commands involves only defining a subroutine and adding another entry to the hash (we don’t have to add further elsif clauses as we would with the first example.

*****