Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc1
Article Calendar
<= May, 2008
S M T W T F S
123
45678910
11121314151617
18192021222324
25262728293031
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Welcome to Siaris.net

Andrew L. Johnson

Welcome to Siaris.net

Siaris.net is the public face of Siaris: Andrew Johnson’s software development, training, writing, and consulting activities. Utilizing a common weblog format, I will publish short and medium length articles on a variety of topics including: general programming and problem solving, object oriented programming, programming languages, teaching, and communication. Longer writings will be available under the Articles link in the navigation bar.

Siaris.net is not my personal blog (I may add one of those eventually), but a way to gather and make available both my older writings, and to add new articles. With that in mind, I’ve already converted 50 (of 89) short Perl articles (orignally published by ItWorld). Some writings are less suitable for blog publication for a variety of reasons — the Articles link in the navigation bar connect you other writings (a smallish regex tutorial for some of Perl’s additional RE features, and a link to the regex chapter of my book for starters).

Within the blog, the News category will relate information about various goings on at Siaris.net. The LanguageBits category will hold interesting bits on various languages (including how-to’s and small code examples). I expect to be adding other categories as this site evolves.

There is no feedback mechanism for articles at this time, but I am considering setting up either a comment forum or a wiki for such a purpose — possibly requiring login to discourage comment/content spamming. In the meantime, comments about this site, or any particular article can be sent directly to me via email (andrew@siaris.net). I hope you find this site useful and enjoy visiting from time to time.

  Best regards,
  Andrew L Johnson

The Map

Once I would go
to the edge of the map.
To the empty,
white
space.

To where there be dragons
and perils unknown.
One could fall off
edges of
worlds.

Now would I go
to the edge of the map.
To the swirling
black
hole.

Forever uncharted
to those left behind.
One could fall off
edges of
Time.

  — Andrew L. Johnson (1985)

About Siaris

Andrew L. Johnson

SiarisSimplicity, Clarity, and Vision — a little more of what the world needs today.

Complexity is the prodigy of the world. Simplicity is the sensation of the universe. Behind complexity, there is always simplicity to be revealed. Inside simplicity, there is always complexity to be discovered.   — Gang Yu

Siaris.net

Siaris.net is the public face of Siaris: Andrew Johnson’s programming, training, writing, and consulting activities. In keeping with the motto of simplicity, primary content is served in typical weblog fashion — simple to publish new articles and information, and simple for readers to browse, search, and subscribe. The focus is on content.

Siaris

Siaris provides software development, training, and consulting to developers, businesses, and individuals.

Software Development: 15+ years experience
Utilizing languages such as: C, Perl, Ruby, and Python (and others) in cross-platform web, database, and networking application domains.
Teaching: 8+ years experience
Including: introductory programming, introductory and advanced Perl, and undergraduate human osteology and anthropology labs.
Technical Writing:
Elements of Programming with Perl (Manning Publications: 2000)
Over 80 Perl Newsletter articles for ItWorld.
A handful of articles for the Linux Journal and Linux Gazette.

Contact Siaris

    contact@siaris.net      Siaris
                            263 Knightsbridge Drive
                            Winnipeg, MB   R2M 4K5
                            Canada
Other Perl Resources

Andrew L. Johnson (First published by ItWorld.com 2001 10 25)

ItWorld is discontinuing the Perl newsletter, so this is my farewell article. That being the case, I decided to try to leave you with a few tidbits of wisdom and suggestions of where else to turn for help.

The first place to turn for help is perl itself — running perl with the -w switch and ‘use strict’ enabled will help you catch many little bugs, typos, and questionable practices.

    #!/usr/bin/perl -w
    use strict;

If you know your program will be used only with Perl-5.6 and later version you can use the ‘warnings’ pragma to turn on warnings instead of the -w switch (this pragma allows more control over warnings, see the ‘perllexwarn’ manpage for further information):

    #!/usr/bin/perl
    use strict;
    use warnings;

The perl distribution also comes with copious amounts of documentation that you can read via a browser (if the docs are installed in .html format), the unix ‘man’ utility, or the supplied ‘perldoc’ utility. The 3 major pages (documents) you should be familiar with are:

    perldoc perl     --> the intro perl documentation
    perldoc perlfaq  --> many frequently asked questions (and answers)
    perldoc perlfunc --> documentation on builtin functions

There are quite a few mailing lists you can participate in at various levels (including lists for beginners, module authors, specific modules or distributions, and even a ‘fun with perl’ list). Information can be found at:

    http://lists.perl.org/

The ‘use Perl’ site publishes news and informative tidbits (along with other features), perl.com also publishes articles on various themes and at various levels, and The Perl Journal (now part of SysAdmin magazine) is a very good print publication:

    http://use.perl.org/
    http://www.perl.com/
    http://www.tpj.com/

Join or start your own local chapter of a ‘Perl Mongers’ group by checking out this site:

    http://www.pm.org/

And, lastly, the Perl Monks site is a web forum for questions, answers, tutorials, and discussions on Perl related topics (there are quite a few very knowledgeable individuals there):

    http://www.perlmonks.org/

And, last but certainly not least, do not forget about CPAN (comprehensive perl archive network) — there are loads and loads of free modules there that you should not ignore. Everything from dealing with CSV files to networking to graphics manipulation (and a whole lot more) can be found therein. A few CPAN entry points:

    http://www.cpan.org/
    http://search.cpan.org/
    http://theoryx5.uwinnipeg.ca/CPAN/cpan-search.html

I have very much enjoyed writing these (80+) weekly articles, and I appreciate the numerous comments and suggestions I have received from many of you (even if I didn’t get around to writing on all of the suggested topics). Thank you and I wish you all the best of luck with your Perl programming and perhaps I’ll run into some of you in the future.

            Best regards,
            andrew

END

Using a do-block

Andrew L. Johnson (First published by ItWorld.com 2001-10-18)

Perl has a special kind of block called a do-block (the ‘do’ keyword followed by a block). This kind of block can be used as a term in an expression, or it can take a statement modifier.

When used as a term, the result returned is the value of the last statement evaluated in the block (rather like a subroutine’s default return value):

    my $in = do{print "Enter a number:";<STDIN>};
    print $in;

The context of the return value is the context of the expression (in the above case, since we assigned to a scalar, scalar context). The above isn’t a terribly useful example — but what about when you want to localize a global for a limited scope? Consider reading in an entire file into a scalar (inside of a larger script where you don’t want to change $/ for the duration of the script):

    my $file = 'data';
    open(FILE, $file) || die "Can't open $file: $!";
    my $contents;
    {
        local $/;   # $/ is locally undefined
        $contents = <FILE>;
    }
    print $contents;

A simpler method using a do-block might be:

    my $file = 'data';
    open(FILE, $file) || die "Can't open $file: $!";
    my $contents = do{local $/;<FILE>};
    print $contents;

With a statement modifier, this kind of block allows for a ‘run at least once’ form of the while statement:

    my $rand = int(rand(10)) + 1;
    my $guess;
    do{
        print "Enter your guess: ";
        $guess = <STDIN>;
    } while $guess != $rand;
    print "Yes: the number is $guess";

This allows us to use what looks like an uninitialized value in the conditional — it works only because the condition is tested after each block (by which time the $guess variable has a value).

The do block is also convenient for switch or case like statement blocks:

    my $rand = int(rand(10)) + 1;
    {
        print "Enter your guess: ";
        chomp(my $guess = <STDIN>);
        $guess < $rand && do{print "Too low\n"; redo};
        $guess > $rand && do{print "Too high\n";redo};
    }
    print "You guessed it!\n";

These case like examples could easily be solved using other means (like standard if/else statements), this is just another example of TMTOWTDI (there’s more than one way to do it).

*****

The /c Regex Modifier

Andrew L. Johnson (First published by ItWorld.com 2001-10-11)

To understand the /c regex modifier you first need to know how the /g modifier and the \G anchor behave. The /g modifier, as you probably already know, means ‘keep applying the regex until it fails or we hit the end of the string’:

    $_ = '123456abc789';
    my $pattern = '\d\d\d';
    while ( m/($pattern)/g ) {
        print "$1\n";
    }

The above will match each sequence of 3 digits and execute the loop. Each string has a positional marker associated with it that records where the last regex match ended — you can access or set this marker directly with the pos() function — thus the regex engine knows where to continue searching from in the string. When the pattern can no longer be found, the match operator returns false (ending the while loop in this case) and the positional marker is reset to 0 (the beginning of the string).

One thing to notice is that the above snippet will skip over the ‘abc’ part of the string — that is, on the third attempt to match, we start trying to match at position 6 (right before the ‘a’) but we aren’t forced to actually match at that point. To force the match to succeed where we left off we would do:

    $_ = '123456abc789';
    my $pattern = '\d\d\d';
    while ( m/\G($pattern)/g ) {
        print "$1\n";
    }

In this case, each occurrence of $pattern must be found immediately following the positional marker (either the beginning of the string, or wherever the last successful match left off). Thus, this snippet only finds and prints ‘123’, and ‘456’, and then the match fails.

What if we wanted to be able to match different patterns while stepping through the string (say, sequences of three digits or three lowercase letters)? We could set up an alternation pattern and then test the captured results:

    $_ = '123456abc789';
    my $pattern = '\d\d\d|[a-z]{3}';
    while ( m/\G($pattern)/g ) {
        my $result = $1;
        if ($result =~ /\d/) {
            print "We got 3 digits\n";
        } else {
            print "We got 3 letters\n";
        }
    }

That’s not horrible, though we needed to test for numbers twice (once in the original pattern, and once in the if test). This could get more cumbersome if we had more choices to distinguish (and slower because alternations in regexen are somewhat slow).

The /c modifier allows a /g match to fail without resetting the positional marker — so we can try another match:

    $_ = '123XYZ456abc789';
    while (1) {
        print "Got digits ($1)\n" and next if m/\G(\d\d\d)/gc;
        print "Got UCase  ($1)\n" and next if m/\G([A-Z]{3})/gc;
        print "Got LCase  ($1)\n" and next if m/\G([a-z]{3})/gc;
        print "End of Parsing\n"  and last if m/\G$/gc;
        print "Parse Error at position: ", pos(), "\n" and last;
    }

Now we never skip over any data that we haven’t accounted for, yet when any regex fails we simply try the next the regex from the same position. Our parse of the string only fails if all of the regexen fail and we hit the last line of the loop. The above succeeds through the string, but if you try $_ = ‘123ABC456ab789’; you’ll get a parse error message at position 9. If you tried this without the /c modifier you would have a problem because the if the first regex fails it would reset the positional marker to 0 (meaning you wouldn’t be starting where you wanted with the next regex).

*****