Perl’s sort function is a little different than the standard built-in functions — like map() and grep() it can accept a block as a first argument, but it can also accept a named subroutine, or a reference to a subroutine. This allows you to specify the comparison method used in sorting — the default sorting method is alphabetical (ascii-betical) and the comparison operator is the ‘cmp’ operator. Thus, the following two examples are the same:
my @sorted = sort @unsorted;
my @sorted = sort {$a cmp $b} @unsorted;
Inside the block or subroutine you use the package-global variables $a and $b to refer to the elements being compared (these are specially exempt from the ‘use strict’ pragma for just this reason). To specify a reverse alphabetical sort you switch the variables:
my @reverse_sorted = sort {$b cmp $a} @unsorted;
You can define any comparison method you like, but you must make sure that your block or subroutine returns only -1, 0, or 1 depending on the ordering you want (the ‘cmp’ and ’<=>’ operators are usually used for this purpose).
To sort a list numerically, we’ll use the <=> operator:
my @ascending = sort {$a <=> $b} @unsorted;
my @descending = sort {$b <=> $a} @unsorted;
A hash does not give us a way to store information in an ordered fashion but we often need to print out a hash’s contents in some sorted order. If we wish to print out a hash sorted by keys (alphabetically) we can simply do this:
my %ages = ( Andrew => 37, Sue => 41, Thomas => 9, Joey => 16);
foreach my $name (sort keys %ages) {
print "$name : $ages{$name}\n";
}
But what if I wanted to produce output sorted by age from oldest to youngest rather than names (by hash values rather than keys)? That’s also very easy:
my %ages = ( Andrew => 37, Sue => 41, Thomas => 9, Joey => 16);
foreach my $name (sort { $ages{$b} <=> $ages{$a} } keys %ages) {
print "$name : $ages{$name}\n";
}
In this case, we still get a list of keys, but in the comparison block we compare the hash values for each key rather than the keys themselves — thus, we are iterating over the list of hash keys sorted in descending numerical order by the hash values for those keys. Now, let’s consider the case where we want to dual sort — that is, if two ages (values) are equal we want to order data alphabetically by the name (key). We’ll have to add some additional children to my hash to demonstrate:
my %ages = ( Andrew => 37, Sue => 41, Thomas => 9, Joey => 16,
Karen => 11, John => 9, Kevin => 16, Lisa => 9);
foreach my $name (sort by_age keys %ages) {
print "$name : $ages{$name}\n";
}
sub by_age {
$ages{$b} <=> $ages{$a}
||
$a cmp $b
}
In this case, I’ve use a separate routine ‘by_age’ instead of a block. In the routine, the <=> returns 0 if the ages are equal and this is a false value so Perl looks to the other side of the logical OR operator (||) and evaluates the ‘cmp’ operation to get the return value of the function.
Starting with version 5.6.0 of Perl you no longer have to use $a and $b as your comparison variables when you use a separate subroutine. If you prototype the subroutine with ($$) then the arguments are passed to the routine in the @_ array:
sub by_age ($$); # declare with proto before using
my %ages = ( Andrew => 37, Sue => 41, Thomas => 9, Joey => 16,
Karen => 11, John => 9, Kevin => 16, Lisa => 9);
foreach my $name (sort by_age keys %ages) {
print "$name : $ages{$name}\n";
}
sub by_age ($$) {
$ages{$_[1]} <=> $ages{$_[0]}
||
$_[0] cmp $_[1]
}
This is slower than using $a and $b, but it means you can use sort routines defined in other packages without having namespace problems. (see: ‘perldoc -f sort’ for further documentation).
In next week’s article we will examine ways to sort more complicated data such as sorting multi-field records on any field.
*****



