Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc0
Article Calendar
<= April, 2014
S M T W T F S
12345
6789101112
13141516171819
20212223242526
27282930
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

New Site Wiki

Andrew L. Johnson

I’ve finally gotten around to adding a wiki to this site (for article discussions and other purposes). I’ve tried out a handful of wikis (and bbs-like systems) over the last couple of months (written in Perl, Python, and Ruby) looking for something simple. I noticed JimWeirich just added a UseMod based wiki to his site.

UseMod was the first wiki I considered, but I wanted to look around first — and thus have two months gone by. With Jim’s new wiki as a prod, I grabbed a copy of UseMod and had a simple wiki up and running on my development box in minutes. A few tweaks later and I’m convinced it fits my needs: simple yet tweakable.


discuss

Hash with block or Block with hash?

Andrew L. Johnson

Not exactly new, passing blocks to Hash.new has been around for a while (since the 1.8.0 release I think). This can be used to supply a default value or action for accessing keys not in the hash.

  hash = Hash.new {|h,k| h[k] = "Default Value"}

  hash = Hash.new {|h,k| raise "No Such Key #{k}"}

Note, the objects yielded to the block are the hash and the key (not a key and a value, which is an easy but nonsensical mistake to make).

In the first example above we didn’t just return a default value, we also assigned to the appropriate key in the hash — next time we try to access the hash using this key we will get the default value directly instead of having to evaluate the block again. This point of view — a hash with a block attached to generate default values — is useful, but not terribly interesting.

Turning the view around we have a block with an attached cache — giving us simple memoization. Here’s a little memoized factorial calculator:

  fact = Hash.new {|h, n| n < 2 ? h[n] = 1 : h[n] = h[n-1] * n}
  p fact[4]   #--> 24
  p fact      #--> {1=>1, 2=>2, 3=>6, 4=>24}

Now, there are obvious limitations: the [] hash method takes a single object as a key so you can’t pass multiple parameters directly (you’d need to wrap them in an array or something), and, parameter caching will be based on the #hash method of the objects used as keys. Another limitation is cache size and expiry — in which case you might want to try the memoize module available on the RAA.

Creating MD5 digests with Ruby

Andrew L. Johnson

MD5 is a one-way hashing algorithm for creating digest "signatures" or checksums of strings (usually entire files). Ruby’s standard library includes MD5 as part of the Digest:: set of extension classes.

Creating MD5 checksums is a simple matter of requiring the digest/md5 library and using either the Digest::MD5.digest or Digest::MD5.hexdigest class methods to return the digest of a given string — we will use hexdigests in this article as they are printable:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  digest = Digest::MD5.hexdigest("Hello World\n")
  puts digest
  __END__

  e59ff97941044f85df5297e1c302d260

MD5 digests are 128 bit (16 byte) signatures and are the most common method of providing checksums for files available on the net. To create a checksum of an entire file you need only pass in the file as a string. The following will print out the filename and md5 digest of all the files passed to it on the command line:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  ARGV.each do |f|
    digest = Digest::MD5.hexdigest(File.read(f))
    puts "#{f}: #{digest}"
  end
  __END__

Sometimes you want to do more than just calculate the checksum of a single string — maybe you have a large file and want to calculate the digest in small, memory friendly chunks; or maybe you are calculating a digest from a stream of input. In such cases you can create a Digest::MD5 object and use the #update (alias: #<<), digest, and #hexdigest methods.

For example purposes, we will create a digest by reading and adding one line at a time from a test file as well as calculating the digest all at once. I will use the source for this article as the test-file:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  filename = 'MD5.rdoc'

  all_digest = Digest::MD5.hexdigest(File.read(filename))

  incr_digest = Digest::MD5.new()
  file = File.open(filename, 'r')
  file.each_line do |line|
    incr_digest << line
  end

  puts incr_digest.hexdigest
  puts all_digest
  __END__

Saving this as md.rb and running it produced the following output:

  ~$ ruby md.rb
  a075a1debea63e5d7073d9eed19ce031
  a075a1debea63e5d7073d9eed19ce031

In addition to providing checksums for files you make available, or checking files and packages you’ve downloaded, another use is fingerprinting sensitive directories on your system. Creating a database of MD5 digests for sensitive files or directories means you can periodically cross-check your sensitive data against the database to see if anything has been changed without your knowledge. This can provide you with a very simple to implement addition to your intrusion detection tools.

Redefining warn with tracebacks

Andrew L Johnson

A recent question on the ruby newsgroup asked about getting line and file information when using the #warn method. The standard #warn method is really just equivelant to:

  $stderr.puts "Some warning message"

While the #raise method includes a full traceback including the filename, linenumber, and the stack of calls that got to this point. The #caller method provides all the needed traceback information — here’s the relevant portion of the ri generated docs:

  Kernel#caller

    caller(start=1)    => array

    Returns the current execution stack---an array containing
    strings in the form ``_file:line_'' or ``_file:line: in
    `method'_''. The optional _start_ parameter determines the
    number of initial stack entries to omit from the result.

So we just need to redefine the Kernel#warn method to use this information.

  module Kernel
    alias :oldwarn :warn
    def warn (msg = "", fulltrace = false)
      trace = caller(1)
      where = trace[0].sub(/:in.*/,'')
      $stderr.puts "#{where}: Warning: #{msg}"
      $stderr.puts trace.map {|t| "\tfrom #{t}"} if fulltrace
    end
  end

Now we have a method that will provide the file and linenumber, and optionally (if you supply a true second argument), a stacktrace like #raise does. Here we have a simple example script:

  ~$ nl -w3 -ba -s" " -nrn  warn.rb
    1 module Kernel
    2   alias :oldwarn :warn
    3   def warn (msg = "", fulltrace = false)
    4     trace = caller(1)
    5     where = trace[0].sub(/:in.*/,'')
    6     $stderr.puts "#{where}: Warning: #{msg}"
    7     $stderr.puts trace.map {|t| "\tfrom #{t}"} if fulltrace
    8   end
    9 end
   10
   11 class Foo
   12   def bar
   13     warn "just a warning"
   14   end
   15   def qux
   16     warn "warning with trace", true
   17   end
   18 end
   19
   20 obj = Foo.new
   21 obj.bar
   22 obj.qux

Which produces:

  ~$ ruby warn.rb
  warn.rb:13: Warning: just a warning
  warn.rb:16: Warning: warning with trace
          from warn.rb:16:in `qux'
          from warn.rb:22

Simple, but effective.

Undenting Strings

Andrew L Johnson

One extremely nice feature about Ruby is that here-doc terminators may be indented (if the terminator specification begins with a hyphen). This means it is not necessary to either put here-docs at the left margin, or to quote some hardcoded amount of whitespace in the terminator specification (as in Perl). Here-docs can make nice easy templates for simple code generation — but what about whitespace sensitivity of the generated code (such as RDoc markup)?

The following is a simple regex to strip common leading spaces from a multi-line string (added as a method to the String class in this example):

  class String
    def undent
      a = $1 if match(/\A(\s+)(.*\n)(?:\1.*\n)*\z/)
      gsub(/^#{a}/,'')
    end
    alias :dedent :undent
  end

And now, if you have some method that returns a here-doc, you can simply dedent it:

  class SomeTemplate
    def some_meth(foo,bar)
      <<-STOP.dedent
        * #{foo} list item
          * sublist with #{bar} item
      STOP
    end
  end

  x = SomeTemplate.new
  puts x.some_meth('first', 'second')
  __END__

  output:

  * first list item
    * sublist with second item

Not rocket science, but I find it handy to have a dedent method lying around for just such uses.