Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc0
Article Calendar
<= February, 2012
S M T W T F S
1234
567891011
12131415161718
19202122232425
26272829
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Using a do-block

Andrew L. Johnson (First published by ItWorld.com 2001-10-18)

Perl has a special kind of block called a do-block (the ‘do’ keyword followed by a block). This kind of block can be used as a term in an expression, or it can take a statement modifier.

When used as a term, the result returned is the value of the last statement evaluated in the block (rather like a subroutine’s default return value):

    my $in = do{print "Enter a number:";<STDIN>};
    print $in;

The context of the return value is the context of the expression (in the above case, since we assigned to a scalar, scalar context). The above isn’t a terribly useful example — but what about when you want to localize a global for a limited scope? Consider reading in an entire file into a scalar (inside of a larger script where you don’t want to change $/ for the duration of the script):

    my $file = 'data';
    open(FILE, $file) || die "Can't open $file: $!";
    my $contents;
    {
        local $/;   # $/ is locally undefined
        $contents = <FILE>;
    }
    print $contents;

A simpler method using a do-block might be:

    my $file = 'data';
    open(FILE, $file) || die "Can't open $file: $!";
    my $contents = do{local $/;<FILE>};
    print $contents;

With a statement modifier, this kind of block allows for a ‘run at least once’ form of the while statement:

    my $rand = int(rand(10)) + 1;
    my $guess;
    do{
        print "Enter your guess: ";
        $guess = <STDIN>;
    } while $guess != $rand;
    print "Yes: the number is $guess";

This allows us to use what looks like an uninitialized value in the conditional — it works only because the condition is tested after each block (by which time the $guess variable has a value).

The do block is also convenient for switch or case like statement blocks:

    my $rand = int(rand(10)) + 1;
    {
        print "Enter your guess: ";
        chomp(my $guess = <STDIN>);
        $guess < $rand && do{print "Too low\n"; redo};
        $guess > $rand && do{print "Too high\n";redo};
    }
    print "You guessed it!\n";

These case like examples could easily be solved using other means (like standard if/else statements), this is just another example of TMTOWTDI (there’s more than one way to do it).

*****

The /c Regex Modifier

Andrew L. Johnson (First published by ItWorld.com 2001-10-11)

To understand the /c regex modifier you first need to know how the /g modifier and the \G anchor behave. The /g modifier, as you probably already know, means ‘keep applying the regex until it fails or we hit the end of the string’:

    $_ = '123456abc789';
    my $pattern = '\d\d\d';
    while ( m/($pattern)/g ) {
        print "$1\n";
    }

The above will match each sequence of 3 digits and execute the loop. Each string has a positional marker associated with it that records where the last regex match ended — you can access or set this marker directly with the pos() function — thus the regex engine knows where to continue searching from in the string. When the pattern can no longer be found, the match operator returns false (ending the while loop in this case) and the positional marker is reset to 0 (the beginning of the string).

One thing to notice is that the above snippet will skip over the ‘abc’ part of the string — that is, on the third attempt to match, we start trying to match at position 6 (right before the ‘a’) but we aren’t forced to actually match at that point. To force the match to succeed where we left off we would do:

    $_ = '123456abc789';
    my $pattern = '\d\d\d';
    while ( m/\G($pattern)/g ) {
        print "$1\n";
    }

In this case, each occurrence of $pattern must be found immediately following the positional marker (either the beginning of the string, or wherever the last successful match left off). Thus, this snippet only finds and prints ‘123’, and ‘456’, and then the match fails.

What if we wanted to be able to match different patterns while stepping through the string (say, sequences of three digits or three lowercase letters)? We could set up an alternation pattern and then test the captured results:

    $_ = '123456abc789';
    my $pattern = '\d\d\d|[a-z]{3}';
    while ( m/\G($pattern)/g ) {
        my $result = $1;
        if ($result =~ /\d/) {
            print "We got 3 digits\n";
        } else {
            print "We got 3 letters\n";
        }
    }

That’s not horrible, though we needed to test for numbers twice (once in the original pattern, and once in the if test). This could get more cumbersome if we had more choices to distinguish (and slower because alternations in regexen are somewhat slow).

The /c modifier allows a /g match to fail without resetting the positional marker — so we can try another match:

    $_ = '123XYZ456abc789';
    while (1) {
        print "Got digits ($1)\n" and next if m/\G(\d\d\d)/gc;
        print "Got UCase  ($1)\n" and next if m/\G([A-Z]{3})/gc;
        print "Got LCase  ($1)\n" and next if m/\G([a-z]{3})/gc;
        print "End of Parsing\n"  and last if m/\G$/gc;
        print "Parse Error at position: ", pos(), "\n" and last;
    }

Now we never skip over any data that we haven’t accounted for, yet when any regex fails we simply try the next the regex from the same position. Our parse of the string only fails if all of the regexen fail and we hit the last line of the loop. The above succeeds through the string, but if you try $_ = ‘123ABC456ab789’; you’ll get a parse error message at position 9. If you tried this without the /c modifier you would have a problem because the if the first regex fails it would reset the positional marker to 0 (meaning you wouldn’t be starting where you wanted with the next regex).

*****

Logical Operators

Andrew L. Johnson (First published by ItWorld.com 2001-09-27)

Perl has operators to perform logical OR/AND/NOT operations, and they come in two forms: a high precedence symbolic form: ||, &&, !, and a low precedence form: or, and, not (respectively).

Logical AND and OR are binary operators and are often used to combine expressions in a conditional statement:

    if ($value > 5 && $value < 10) {
        print "$value is between 5 and 10 exclusive\n";
    }

    if (lc($input) eq 'q' || lc($input) eq 'quit'){
        warn "Quitting application now\n";
        clean_up();
        exit;
    }

The NOT operator is a unary operator that returns the negated (opposite) truth value of its argument:

    if ( not $done ) {                  # also: if(!$done){
        print "We are not finished\n";
    }

Often we can use either the high or low precedence forms, however, occasionally precedence matters — consider the following mistaken expression and how the logical test is actually parsed:

    $a=0;
    $b=1;
    if (not $a && not $b) {
        print "\$a and \$b are false\n";
    }

The programmer wanted to test that both $a AND $b were false. If $a is false then not($a) would be true, and similarly for $b, but this expression obviously fails. That is because the precedence of && is much higher than the precedence of ‘not’ so the expression is actually parsed as:

    if ( not ($a && (not $b)) ) {

which is not what was intended. We could fix this in a few different ways: either use the lower precedence form of AND (‘and’), or the higher precedence form of ‘not’:

    if ( !$a && !$b) {

    if (not $a and not $b) {

A more interesting aspect of the logical AND and OR is that they short-circuit their second operand if they do not need to check it to determine the truth or falsity of the entire expression:

    $a = 0;
    $b = 1;
    if ($a and $b) { print "Both a and b are true\n" }

In the above situation, Perl knows that for the an AND operation to be true, both sides must be true — if the left side is false then Perl doesn’t bother checking the right side (it already knows the whole expression must be false). This short-circuit, or lazy evaluation comes in handy in various situations outside of conditional tests — one of which you should be familiar with:

    open(FILE, $file) or die "Can't open $file: $!";

The open() function returns a true value when it succeeds. Perl knows that only one expression must be true for an OR operation to succeed, so if the left expression is true Perl ignores the right hand side. In this case, that means the right hand side is only evaluated if the file could not be opened (in which case, the die() function is called).

Repeating Yourself: The x Operator

Andrew L. Johnson (First published by ItWorld.com 2001-09-20)

Loops are the general mechanism for expressing repetition in code. However, there are a couple of instances where loops seem like overkill — repeating strings and repeating lists. For these cases, Perl provides the repetition operator (the ‘x’ operator).

In the case of a repeated string we may simply want to print out lines of 20 ’-’ characters (perhaps as boundary lines in a report). Doing this with a loop is not difficult, though a little cumbersome given the task:

    print "This is the Header\n";
    print '-' for 1 .. 20;
    print "\n";

The x operator allows us to build this string in a single operation:

    print "This is the Header\n";
    print '-' x 20, "\n";

The x operator is a binary operator (takes two operands) and is context sensitive both in terms of the context of the entire operation, and in terms of the context of the left operand). The basic syntax is:

    Left-EXPR x Right-EXPR

The right expression (right operand) is always considered to be in scalar context and treated as an integer. The left expression (left operand) may be either a scalar or a list value.

When the left operand is a scalar, as in the example above, the x operator treats it as a string and returns a new string repeated by the number given as the right operand. So, ’-’ x 20 returns a string of 20 ’-’ characters, and ‘foo’ x 2 returns the string ‘foofoo’. This evaluation remains the same whether the entire expression is in scalar context or in list context — in the example above the expression is an argument to the print() function and therefore in list context.

The x operator can return repeated lists if used in list context and if the left operand is a literal list (ie, wrapped in parentheses):

    my @array = (1,2,3) x 2;
    print "@array"           # prints: 1 2 3 1 2 3

You need to be careful to put the left operand in parentheses for list repetition — using a plain array will not behave as desired:

    my @array = (1,12,42);
    @array = @array x 2;
    print "@array\n";       # prints: 33

In this case, because the left operand is not in parentheses it is evaluated as a scalar, and an array in scalar context returns the number of elements in the array — in this case 3 — thus the x operator has returned the string ‘3’ repeated twice.

Is this operator practical? Consider a case where you want to define a ten element array and initialize each element to 1:

    my @array = (1) x 10; #  my @array = (1,1,1,1,1,1,1,1,1,1);

Another useful case is initializing a hash when we’ve read in (or otherwise obtained) a list of keys we wish to initialize to 1:

    my @keys = qw(a b c d);
    my %hash;
    @hash{@keys} = (1) x @keys;

Lastly, a minor cautionary note — remember that ‘x’ is not the multiplication operator:

    my $value = 15 x 2 / 3;
    print "$value\n";       # prints: 505

Here the number 15 is treated as a string and repeated twice to get 1515 which is then treated as a number and divided by 3 to get 505 (rather than the result of 10 you might have wanted). This is one case where Perl’s natural conversion between numbers and strings without warning can mean that a simple typo (‘x’ instead of ’*’) can lead to strange results and be difficult to track down. So, if you have calculations in your code and you are getting bizarre results you might want to check for this particular typo.

*****

Deleting Elements from An Array

Andrew L. Johnson (First published by ItWorld.com 2001-09-13)

There are two things one might mean by ‘deleting’ elements from an array: deleting the value for a particular index (or indices) in the array (while still leaving the slot in the array open), or, actually removing a slot (and its contents) from the array. The first case can be accomplished with the delete() function, and the second with the splice function.

    my @array = (0,1,2,3,4,5,6);
    delete $array[3];
    print join(':', @array),"\n";

    splice(@array, 3, 1);
    print join(':', @array),"\n";

This snippet produces the following output:

    Use of uninitialized value in join or string at - line 3.
    0:1:2::4:5:6
    0:1:2:4:5:6

You can see that the delete() function only deletes the value at index 3 in the array, while the splice() function removes the slot entirely and shifts the remainder of the array down to fill in the gap.

The delete() function can also be used on an array slice as well as a single element — and that slice need not be a contiguous range of elements:

    # delete a range
    delete @array[0..3];
    # or a discontiquous slice
    delete @array[0,3,5];

The splice() function may also be used to remove a range of elements from an array, but not a discontiguous slice:

    splice(@array,0,3);  # remove 3 elements starting at index 0.

One may think that the delete() function (formerly only allowed on hash elements) is nothing more than simply undef()’ing elements in an array, or assigning either multiple undef values to multiple elements or perhaps assigning an empty list to multiple values:

    my @array = (0,1,2,3,4,5,6);
    $array[0] = undef;
    @array[1,2] = ();
    @array[3,4] = (undef,undef);
    print join(':', @array),"\n";

which prints (ignoring warnings):

    :::::5:6

This is what we’d expect if we’d used delete() as well. However, the methods are not entirely equivalent. The delete() function has a companion exists() function (also formerly only used with hashes) that detects the difference between and array element that has been deleted and one that has been undefined:

    my @array = (0,1,2,3,4,5,6);
    @array[0,3,5] = (undef,undef,undef);
    print "1: Still there\n" if exists $array[3];
    @array[0,3,5] = ();
    print "2: Still there\n" if exists $array[3];
    delete @array[0,3,5];
    print "3: Still there\n" if exists $array[3];

Which produces:

    1: Still there
    2: Still there

So even though a given element is undefined (ie, the defined() function would return false), Perl can still tell if it has been delete()’ed or not. In many situations the delete() function applied to an array element or slice is no better than the other methods shown, and somewhat slower. But, some algorithms may find it useful to be able to determine if the value of an array element is undefined because it was assigned an undefined value or because it was intentionally deleted.

This discussion brings up one additional warning: assigning an empty list to an array slice does not remove any array elements (it merely assigns an undefined value to the slice elements). Thus the following two statements do not produce the same result (even if it might seem that they are logically equivalent):

    my @array = (0,1,2,3,4,5,6);
    @array[0..$#array] = ();
    print join(':', @array),"\n";

    my @array = (0,1,2,3,4,5,6);
    @array = ();
    print join(':', @array),"\n";

Even though the first snippet assigns an empty list to a slice covering the entire array, the array slots themselves are still considered to be in the array. The second snippet assigns the empty list to the array itself and therefore results in an empty array.

*****