Siaris
Simple Things
Syndicate: full/short
Siaris
Categories
General0
News2
Programming2
LanguageBits0
Perl50
Ruby10
VersionControl1
Misc1
Article Calendar
<= July, 2008
S M T W T F S
12345
6789101112
13141516171819
20212223242526
2728293031
Search this blog

Key links
External Blogs
Brought to you by ...
Ruby
1and1.com

Creating MD5 digests with Ruby

Andrew L. Johnson

MD5 is a one-way hashing algorithm for creating digest "signatures" or checksums of strings (usually entire files). Ruby’s standard library includes MD5 as part of the Digest:: set of extension classes.

Creating MD5 checksums is a simple matter of requiring the digest/md5 library and using either the Digest::MD5.digest or Digest::MD5.hexdigest class methods to return the digest of a given string — we will use hexdigests in this article as they are printable:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  digest = Digest::MD5.hexdigest("Hello World\n")
  puts digest
  __END__

  e59ff97941044f85df5297e1c302d260

MD5 digests are 128 bit (16 byte) signatures and are the most common method of providing checksums for files available on the net. To create a checksum of an entire file you need only pass in the file as a string. The following will print out the filename and md5 digest of all the files passed to it on the command line:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  ARGV.each do |f|
    digest = Digest::MD5.hexdigest(File.read(f))
    puts "#{f}: #{digest}"
  end
  __END__

Sometimes you want to do more than just calculate the checksum of a single string — maybe you have a large file and want to calculate the digest in small, memory friendly chunks; or maybe you are calculating a digest from a stream of input. In such cases you can create a Digest::MD5 object and use the #update (alias: #<<), digest, and #hexdigest methods.

For example purposes, we will create a digest by reading and adding one line at a time from a test file as well as calculating the digest all at once. I will use the source for this article as the test-file:

  #!/usr/bin/ruby -w
  require 'digest/md5'
  filename = 'MD5.rdoc'

  all_digest = Digest::MD5.hexdigest(File.read(filename))

  incr_digest = Digest::MD5.new()
  file = File.open(filename, 'r')
  file.each_line do |line|
    incr_digest << line
  end

  puts incr_digest.hexdigest
  puts all_digest
  __END__

Saving this as md.rb and running it produced the following output:

  ~$ ruby md.rb
  a075a1debea63e5d7073d9eed19ce031
  a075a1debea63e5d7073d9eed19ce031

In addition to providing checksums for files you make available, or checking files and packages you’ve downloaded, another use is fingerprinting sensitive directories on your system. Creating a database of MD5 digests for sensitive files or directories means you can periodically cross-check your sensitive data against the database to see if anything has been changed without your knowledge. This can provide you with a very simple to implement addition to your intrusion detection tools.

Redefining warn with tracebacks

Andrew L Johnson

A recent question on the ruby newsgroup asked about getting line and file information when using the #warn method. The standard #warn method is really just equivelant to:

  $stderr.puts "Some warning message"

While the #raise method includes a full traceback including the filename, linenumber, and the stack of calls that got to this point. The #caller method provides all the needed traceback information — here’s the relevant portion of the ri generated docs:

  Kernel#caller

    caller(start=1)    => array

    Returns the current execution stack---an array containing
    strings in the form ``_file:line_'' or ``_file:line: in
    `method'_''. The optional _start_ parameter determines the
    number of initial stack entries to omit from the result.

So we just need to redefine the Kernel#warn method to use this information.

  module Kernel
    alias :oldwarn :warn
    def warn (msg = "", fulltrace = false)
      trace = caller(1)
      where = trace[0].sub(/:in.*/,'')
      $stderr.puts "#{where}: Warning: #{msg}"
      $stderr.puts trace.map {|t| "\tfrom #{t}"} if fulltrace
    end
  end

Now we have a method that will provide the file and linenumber, and optionally (if you supply a true second argument), a stacktrace like #raise does. Here we have a simple example script:

  ~$ nl -w3 -ba -s" " -nrn  warn.rb
    1 module Kernel
    2   alias :oldwarn :warn
    3   def warn (msg = "", fulltrace = false)
    4     trace = caller(1)
    5     where = trace[0].sub(/:in.*/,'')
    6     $stderr.puts "#{where}: Warning: #{msg}"
    7     $stderr.puts trace.map {|t| "\tfrom #{t}"} if fulltrace
    8   end
    9 end
   10
   11 class Foo
   12   def bar
   13     warn "just a warning"
   14   end
   15   def qux
   16     warn "warning with trace", true
   17   end
   18 end
   19
   20 obj = Foo.new
   21 obj.bar
   22 obj.qux

Which produces:

  ~$ ruby warn.rb
  warn.rb:13: Warning: just a warning
  warn.rb:16: Warning: warning with trace
          from warn.rb:16:in `qux'
          from warn.rb:22

Simple, but effective.

Undenting Strings

Andrew L Johnson

One extremely nice feature about Ruby is that here-doc terminators may be indented (if the terminator specification begins with a hyphen). This means it is not necessary to either put here-docs at the left margin, or to quote some hardcoded amount of whitespace in the terminator specification (as in Perl). Here-docs can make nice easy templates for simple code generation — but what about whitespace sensitivity of the generated code (such as RDoc markup)?

The following is a simple regex to strip common leading spaces from a multi-line string (added as a method to the String class in this example):

  class String
    def undent
      a = $1 if match(/\A(\s+)(.*\n)(?:\1.*\n)*\z/)
      gsub(/^#{a}/,'')
    end
    alias :dedent :undent
  end

And now, if you have some method that returns a here-doc, you can simply dedent it:

  class SomeTemplate
    def some_meth(foo,bar)
      <<-STOP.dedent
        * #{foo} list item
          * sublist with #{bar} item
      STOP
    end
  end

  x = SomeTemplate.new
  puts x.some_meth('first', 'second')
  __END__

  output:

  * first list item
    * sublist with second item

Not rocket science, but I find it handy to have a dedent method lying around for just such uses.

Deep Cloning

Andrew L. Johnson

One problem with both Ruby’s #dup and #clone methods is that they only provide shallow copying. That suffices for many purposes, but sometimes you want deeper copying. A pretty standard method for deep copying Ruby objects is to use the Marshal module’s #load and #dump methods:

    class Object
      def deep_clone
        Marshal::load(Marshal.dump(self))
      end
    end

As long the object in question is serializable and doesn’t have singleton methods installed, that works. The following is a very bare-bones first cut at another deep clone method:

    class Object
      def dclone
        case self
          when Fixnum,Bignum,Float,NilClass,FalseClass,
               TrueClass,Continuation
            klone = self
          when Hash
            klone = self.clone
            self.each{|k,v| klone[k] = v.dclone}
          when Array
            klone = self.clone
            klone.clear
            self.each{|v| klone << v.dclone}
          else
            klone = self.clone
        end
        klone.instance_variables.each {|v|
          klone.instance_variable_set(v,
            klone.instance_variable_get(v).dclone)
        }
        klone
      end
    end

Singleton methods are handled by #clone, and attributes are recursively #dclone‘d (as are elements of Arrays and Hashes).

Welcome to Siaris.net

Andrew L. Johnson

Welcome to Siaris.net

Siaris.net is the public face of Siaris: Andrew Johnson’s software development, training, writing, and consulting activities. Utilizing a common weblog format, I will publish short and medium length articles on a variety of topics including: general programming and problem solving, object oriented programming, programming languages, teaching, and communication. Longer writings will be available under the Articles link in the navigation bar.

Siaris.net is not my personal blog (I may add one of those eventually), but a way to gather and make available both my older writings, and to add new articles. With that in mind, I’ve already converted 50 (of 89) short Perl articles (orignally published by ItWorld). Some writings are less suitable for blog publication for a variety of reasons — the Articles link in the navigation bar connect you other writings (a smallish regex tutorial for some of Perl’s additional RE features, and a link to the regex chapter of my book for starters).

Within the blog, the News category will relate information about various goings on at Siaris.net. The LanguageBits category will hold interesting bits on various languages (including how-to’s and small code examples). I expect to be adding other categories as this site evolves.

There is no feedback mechanism for articles at this time, but I am considering setting up either a comment forum or a wiki for such a purpose — possibly requiring login to discourage comment/content spamming. In the meantime, comments about this site, or any particular article can be sent directly to me via email (andrew@siaris.net). I hope you find this site useful and enjoy visiting from time to time.

  Best regards,
  Andrew L Johnson

The Map

Once I would go
to the edge of the map.
To the empty,
white
space.

To where there be dragons
and perils unknown.
One could fall off
edges of
worlds.

Now would I go
to the edge of the map.
To the swirling
black
hole.

Forever uncharted
to those left behind.
One could fall off
edges of
Time.

  — Andrew L. Johnson (1985)