Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc1
Article Calendar
<= May, 2008
S M T W T F S
123
45678910
11121314151617
18192021222324
25262728293031
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Use the DATA handle

Andrew L. Johnson

One thing I recommend to newcomers to both Perl and Ruby is to make use of the DATA file(handle/object) for explorative purposes. It is quite useful both in the contexts of language exploration and solution-space exploration.

The basic idea is that within your Perl or Ruby script, any text after the special __END__ token is available to be read into the program via a preopened filehandle/object named DATA (in both languages). In Perl, the token may also be called __DATA__.

Here’s a simple example (Perl and Ruby side by side):

  #!/usr/bin/perl -w                        #!/usr/bin/ruby -w
  use strict;
  while (my $line = <DATA>) {               while line = DATA.gets
    my @arr = split /:/, $line;                arr = line.split(/:/)
    print join '|', @arr;                      print arr.join('|')
  }                                         end
  __END__                                   __END__
  abc:def:ghi                               abc:def:ghi
  123:456:789                               123:456:789

Note: the __END__ token must be flush with the left margin in Ruby code.

There are many ways this can be usefully used, but for my exploratory purposes it is only half of the equation. The other half is your text editor (or perhaps IDE). Many text editors can be configured to run the text (or some selected portion thereof) of the current buffer through an external program (usually as a filter). If you are writing Perl or Ruby code, that external ‘filter’ can be the Perl or Ruby interpreter, and you can arrange for the output to be displayed in another window (pane, buffer, whatever).

For example, I have the following in my .vimrc file:

  noremap <F9> :w !perl -w > ~/.vim/p_buff 2>&1 <NL> :sv ~/.vim/p_buff<CR>
  noremap <F10> :w !ruby -w > ~/.vim/r_buff 2>&1 <NL> :sv ~/.vim/r_buff<CR>

Now hitting the F9 or F10 key sends the selected text to the interpreter, captures the output into a special file, and opens that file in a new buffer window. Both Ruby and Perl can recieve scripts via STDIN, and both leave everything following the __END__ token to be read via the DATA handle.

This means I can have a buffer window open to play around with a new language element or feature, or to explore possible solutions to a problem. And if that problem requires some data reading/munging/parsing, I can paste in some representative lines of data and have one-key convenience for trying out various snippets of code.

__END__

Ruby: Enumerators and Generators

Andrew L Johnson

Included with the Ruby distribution are the generator library and the enumerator extension — both useful tools when ordinary iteration doesn’t quite measure up.

The enumerator extension is simple in concept: create a new Enumerable object given an object and a method of that object to be used as an iterator. For example, if we add an each_even iterator to the Array class to iterate over every element with an even numbered index, we can use enumerator to create enumerable versions of an array object that use each_even as the iterator:

  require 'enumerator'

  class Array
    def each_even
      self.each_with_index do|el,i|
        yield el if i % 2 == 0
      end
    end
  end

  arr = ['a','b','c','d','e','f','g','h']
  enum = Enumerable::Enumerator.new(arr, :each_even)
  ev = enum.map {|x| x + x}
  p ev                      #=> ["aa", "cc", "ee", "gg"]

In addition to the constructor above, the following convenience functions are added to the Object class:

  to_enum(:iter, *args)
  enum_for(:iter, *args)

The Enumerable module is also extended with five additional methods:

  each_slice(n)    # iterates over non-overlapping chunks of size n
  enum_slice(n)    # new enumerator object using :each_slice(n)

    ('a'..'m').each_slice(4) {|sl| p sl}
    #  produces:
      ["a", "b", "c", "d"]
      ["e", "f", "g", "h"]
      ["i", "j", "k", "l"]
      ["m"]

  each_cons(n)     # iterates over successive chunks of size n
  enum_cons(n)     # new enumerator using :each_cons(n)

    ('a'..'m').each_cons(4) {|sl| p sl}
    # produces:
      ["a", "b", "c", "d"]
      ["b", "c", "d", "e"]
      ["c", "d", "e", "f"]
      ["d", "e", "f", "g"]
      ["e", "f", "g", "h"]
      ["f", "g", "h", "i"]
      ["g", "h", "i", "j"]
      ["h", "i", "j", "k"]
      ["i", "j", "k", "l"]
      ["j", "k", "l", "m"]

  enum_with_index  # new enumerator using :each_with_index

The generator library generates external iterators from either blocks or Enumerable objects (in the latter case, the :each iterator is externalized).

  require 'generator'

  arr = ('a' .. 'm')
  gen = Generator.new(arr)
  while gen.next?
    p gen.next
  end

This makes iterating over multiple objects relatively easy. However, the generator library also provides the SyncEnumerator class which makes multiple iteration a breeze:

  require 'generator'

  a = (4..5)
  b = ['a',nil,'c']
  c = ['x','y','x']

  enum = SyncEnumerator.new(a, b, c)
  enum.each {|row| p row}

  puts '---'
  table = [ [1,2,3], [4,5,6], [7,8,9] ]
  cols = SyncEnumerator.new(*table)
  cols.each {|col| p col}

  # produces:

    [4, "a", "x"]
    [5, nil, "y"]
    [nil, "c", "x"]
    ---
    [1, 4, 7]
    [2, 5, 8]
    [3, 6, 9]

All of which just goes to show: There’s more than one way to iterate an enumerable.

Discuss

__END__

New Site Wiki

Andrew L. Johnson

I’ve finally gotten around to adding a wiki to this site (for article discussions and other purposes). I’ve tried out a handful of wikis (and bbs-like systems) over the last couple of months (written in Perl, Python, and Ruby) looking for something simple. I noticed JimWeirich just added a UseMod based wiki to his site.

UseMod was the first wiki I considered, but I wanted to look around first — and thus have two months gone by. With Jim’s new wiki as a prod, I grabbed a copy of UseMod and had a simple wiki up and running on my development box in minutes. A few tweaks later and I’m convinced it fits my needs: simple yet tweakable.


discuss

Hash with block or Block with hash?

Andrew L. Johnson

Not exactly new, passing blocks to Hash.new has been around for a while (since the 1.8.0 release I think). This can be used to supply a default value or action for accessing keys not in the hash.

  hash = Hash.new {|h,k| h[k] = "Default Value"}

  hash = Hash.new {|h,k| raise "No Such Key #{k}"}

Note, the objects yielded to the block are the hash and the key (not a key and a value, which is an easy but nonsensical mistake to make).

In the first example above we didn’t just return a default value, we also assigned to the appropriate key in the hash — next time we try to access the hash using this key we will get the default value directly instead of having to evaluate the block again. This point of view — a hash with a block attached to generate default values — is useful, but not terribly interesting.

Turning the view around we have a block with an attached cache — giving us simple memoization. Here’s a little memoized factorial calculator:

  fact = Hash.new {|h, n| n < 2 ? h[n] = 1 : h[n] = h[n-1] * n}
  p fact[4]   #--> 24
  p fact      #--> {1=>1, 2=>2, 3=>6, 4=>24}

Now, there are obvious limitations: the [] hash method takes a single object as a key so you can’t pass multiple parameters directly (you’d need to wrap them in an array or something), and, parameter caching will be based on the #hash method of the objects used as keys. Another limitation is cache size and expiry — in which case you might want to try the memoize module available on the RAA.

Creating MD5 digests with Ruby

Andrew L. Johnson

MD5 is a one-way hashing algorithm for creating digest "signatures" or checksums of strings (usually entire files). Ruby’s standard library includes MD5 as part of the Digest:: set of extension classes.

Creating MD5 checksums is a simple matter of requiring the digest/md5 library and using either the Digest::MD5.digest or Digest::MD5.hexdigest class methods to return the digest of a given string — we will use hexdigests in this article as they are printable:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  digest = Digest::MD5.hexdigest("Hello World\n")
  puts digest
  __END__

  e59ff97941044f85df5297e1c302d260

MD5 digests are 128 bit (16 byte) signatures and are the most common method of providing checksums for files available on the net. To create a checksum of an entire file you need only pass in the file as a string. The following will print out the filename and md5 digest of all the files passed to it on the command line:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  ARGV.each do |f|
    digest = Digest::MD5.hexdigest(File.read(f))
    puts "#{f}: #{digest}"
  end
  __END__

Sometimes you want to do more than just calculate the checksum of a single string — maybe you have a large file and want to calculate the digest in small, memory friendly chunks; or maybe you are calculating a digest from a stream of input. In such cases you can create a Digest::MD5 object and use the #update (alias: #<<), digest, and #hexdigest methods.

For example purposes, we will create a digest by reading and adding one line at a time from a test file as well as calculating the digest all at once. I will use the source for this article as the test-file:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  filename = 'MD5.rdoc'

  all_digest = Digest::MD5.hexdigest(File.read(filename))

  incr_digest = Digest::MD5.new()
  file = File.open(filename, 'r')
  file.each_line do |line|
    incr_digest << line
  end

  puts incr_digest.hexdigest
  puts all_digest
  __END__

Saving this as md.rb and running it produced the following output:

  ~$ ruby md.rb
  a075a1debea63e5d7073d9eed19ce031
  a075a1debea63e5d7073d9eed19ce031

In addition to providing checksums for files you make available, or checking files and packages you’ve downloaded, another use is fingerprinting sensitive directories on your system. Creating a database of MD5 digests for sensitive files or directories means you can periodically cross-check your sensitive data against the database to see if anything has been changed without your knowledge. This can provide you with a very simple to implement addition to your intrusion detection tools.