Ruby Spider

The REXML package for Ruby is complex and initially only works on well-formed XML documents. A hack like the following may be more suitable for crawling the web: require 'open-uri' open("http://www.stuff.things").read.scan(/<a.*?href="(.*?)"/).each do |match| puts match[0] end
 * 1) open url, read webpage, scan for links
 * 2) note the usage of the '.*?' non-greedy regex wildcard