Simple Things
Syndicate: full/short
Article Calendar
<= April, 2014
Search this blog

Key links
External Blogs
Brought to you by ...

New Site Wiki

Andrew L. Johnson

I’ve finally gotten around to adding a wiki to this site (for article discussions and other purposes). I’ve tried out a handful of wikis (and bbs-like systems) over the last couple of months (written in Perl, Python, and Ruby) looking for something simple. I noticed JimWeirich just added a UseMod based wiki to his site.

UseMod was the first wiki I considered, but I wanted to look around first — and thus have two months gone by. With Jim’s new wiki as a prod, I grabbed a copy of UseMod and had a simple wiki up and running on my development box in minutes. A few tweaks later and I’m convinced it fits my needs: simple yet tweakable.


Hash with block or Block with hash?

Andrew L. Johnson

Not exactly new, passing blocks to has been around for a while (since the 1.8.0 release I think). This can be used to supply a default value or action for accessing keys not in the hash.

  hash = {|h,k| h[k] = "Default Value"}

  hash = {|h,k| raise "No Such Key #{k}"}

Note, the objects yielded to the block are the hash and the key (not a key and a value, which is an easy but nonsensical mistake to make).

In the first example above we didn’t just return a default value, we also assigned to the appropriate key in the hash — next time we try to access the hash using this key we will get the default value directly instead of having to evaluate the block again. This point of view — a hash with a block attached to generate default values — is useful, but not terribly interesting.

Turning the view around we have a block with an attached cache — giving us simple memoization. Here’s a little memoized factorial calculator:

  fact = {|h, n| n < 2 ? h[n] = 1 : h[n] = h[n-1] * n}
  p fact[4]   #--> 24
  p fact      #--> {1=>1, 2=>2, 3=>6, 4=>24}

Now, there are obvious limitations: the [] hash method takes a single object as a key so you can’t pass multiple parameters directly (you’d need to wrap them in an array or something), and, parameter caching will be based on the #hash method of the objects used as keys. Another limitation is cache size and expiry — in which case you might want to try the memoize module available on the RAA.

Creating MD5 digests with Ruby

Andrew L. Johnson

MD5 is a one-way hashing algorithm for creating digest "signatures" or checksums of strings (usually entire files). Ruby’s standard library includes MD5 as part of the Digest:: set of extension classes.

Creating MD5 checksums is a simple matter of requiring the digest/md5 library and using either the Digest::MD5.digest or Digest::MD5.hexdigest class methods to return the digest of a given string — we will use hexdigests in this article as they are printable:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  digest = Digest::MD5.hexdigest("Hello World\n")
  puts digest


MD5 digests are 128 bit (16 byte) signatures and are the most common method of providing checksums for files available on the net. To create a checksum of an entire file you need only pass in the file as a string. The following will print out the filename and md5 digest of all the files passed to it on the command line:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  ARGV.each do |f|
    digest = Digest::MD5.hexdigest(
    puts "#{f}: #{digest}"

Sometimes you want to do more than just calculate the checksum of a single string — maybe you have a large file and want to calculate the digest in small, memory friendly chunks; or maybe you are calculating a digest from a stream of input. In such cases you can create a Digest::MD5 object and use the #update (alias: #<<), digest, and #hexdigest methods.

For example purposes, we will create a digest by reading and adding one line at a time from a test file as well as calculating the digest all at once. I will use the source for this article as the test-file:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  filename = 'MD5.rdoc'

  all_digest = Digest::MD5.hexdigest(

  incr_digest =
  file =, 'r')
  file.each_line do |line|
    incr_digest << line

  puts incr_digest.hexdigest
  puts all_digest

Saving this as md.rb and running it produced the following output:

  ~$ ruby md.rb

In addition to providing checksums for files you make available, or checking files and packages you’ve downloaded, another use is fingerprinting sensitive directories on your system. Creating a database of MD5 digests for sensitive files or directories means you can periodically cross-check your sensitive data against the database to see if anything has been changed without your knowledge. This can provide you with a very simple to implement addition to your intrusion detection tools.

Redefining warn with tracebacks

Andrew L Johnson

A recent question on the ruby newsgroup asked about getting line and file information when using the #warn method. The standard #warn method is really just equivelant to:

  $stderr.puts "Some warning message"

While the #raise method includes a full traceback including the filename, linenumber, and the stack of calls that got to this point. The #caller method provides all the needed traceback information — here’s the relevant portion of the ri generated docs:


    caller(start=1)    => array

    Returns the current execution stack---an array containing
    strings in the form ``_file:line_'' or ``_file:line: in
    `method'_''. The optional _start_ parameter determines the
    number of initial stack entries to omit from the result.

So we just need to redefine the Kernel#warn method to use this information.

  module Kernel
    alias :oldwarn :warn
    def warn (msg = "", fulltrace = false)
      trace = caller(1)
      where = trace[0].sub(/:in.*/,'')
      $stderr.puts "#{where}: Warning: #{msg}"
      $stderr.puts {|t| "\tfrom #{t}"} if fulltrace

Now we have a method that will provide the file and linenumber, and optionally (if you supply a true second argument), a stacktrace like #raise does. Here we have a simple example script:

  ~$ nl -w3 -ba -s" " -nrn  warn.rb
    1 module Kernel
    2   alias :oldwarn :warn
    3   def warn (msg = "", fulltrace = false)
    4     trace = caller(1)
    5     where = trace[0].sub(/:in.*/,'')
    6     $stderr.puts "#{where}: Warning: #{msg}"
    7     $stderr.puts {|t| "\tfrom #{t}"} if fulltrace
    8   end
    9 end
   11 class Foo
   12   def bar
   13     warn "just a warning"
   14   end
   15   def qux
   16     warn "warning with trace", true
   17   end
   18 end
   20 obj =
   22 obj.qux

Which produces:

  ~$ ruby warn.rb
  warn.rb:13: Warning: just a warning
  warn.rb:16: Warning: warning with trace
          from warn.rb:16:in `qux'
          from warn.rb:22

Simple, but effective.

Undenting Strings

Andrew L Johnson

One extremely nice feature about Ruby is that here-doc terminators may be indented (if the terminator specification begins with a hyphen). This means it is not necessary to either put here-docs at the left margin, or to quote some hardcoded amount of whitespace in the terminator specification (as in Perl). Here-docs can make nice easy templates for simple code generation — but what about whitespace sensitivity of the generated code (such as RDoc markup)?

The following is a simple regex to strip common leading spaces from a multi-line string (added as a method to the String class in this example):

  class String
    def undent
      a = $1 if match(/\A(\s+)(.*\n)(?:\1.*\n)*\z/)
    alias :dedent :undent

And now, if you have some method that returns a here-doc, you can simply dedent it:

  class SomeTemplate
    def some_meth(foo,bar)
        * #{foo} list item
          * sublist with #{bar} item

  x =
  puts x.some_meth('first', 'second')


  * first list item
    * sublist with second item

Not rocket science, but I find it handy to have a dedent method lying around for just such uses.