Reply to comment

Generating Sitemaps in Rails

Code updated 9/18/2009

This week I added sitemap files to my rails site PriceChirp. Sitemaps are used to help search engines to find all your content. They are especially helpful in enumerating pages that are difficult for web crawlers to discover, such as content from database searches.

The web is full of instructions on how to generate sitemaps on the fly using rxml templates. This does not scale well if your site has thousands of links. A better method is to periodically generate site maps and serve these cached files when requested.

www.fortytwo.gr has a good example for generating sitemaps with rails. I've taken his code and fixed/extended it to fit my needs.

Understanding Sitemaps

Google's sitemap howto.

Basically, there are two types of sitemap files:

  • Sitemap files that contain the URL's of your site
  • Sitemap index files that contains a list of your sitemap files

Sitemap files

The xml format of the sitemap file is like this:

<?xml version="1.0" encoding="UTF-8"?>
   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>http://www.example.com/</loc>
     <lastmod>2005-01-01</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.8</priority>
   </url>  
   ...
   ...
   </urlset>

Where

  • loc is the actual url
  • lastmod is the last modified date
  • changefreq defines how often this url is updated
  • priority of this url compared to other urls in your site

Sitemap Index file

The index file contains a list of the sitemap files you want to include.
The xml format of that file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
   <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
   </sitemapindex>

Where

  • loc is the url of the sitemap file
  • lastmod is the last modified date

Building Sitemaps in Rails

To build the sitemap files in rails, we need three things. In /apps/helpers we have a REXML library to build the sitemap index and files. In /lib/tasks, we have a rake script to do the work. Then we make a crontab entry to periodically run the rake script.

Helper Classes

These helper classes generate sitemaps and sitemap indexes using REXML::Document and REXML::Element derived classes. In the file /apps/helper/sitemap.rb:

class SitemapUrl < REXML::Element
 
  def initialize(loc, lastmod = nil, changefreq=nil, priority=nil)
    @loc = loc
    @lastmod = lastmod
    @changefreq = changefreq
    @priority = priority
 
    super("url")
    create_elements
  end
 
  def create_elements
    #add location
    el = self.add_element("loc")
    el.text = @loc
 
    if @lastmod
      el = self.add_element("lastmod")
      el.text = @lastmod
    end
 
    if @changefreq
      el = self.add_element("changefreq")
      el.text = @changefreq
    end
 
    if @priority
      el = self.add_element("priority")
      el.text = @priority
    end
  end
end
 
class Sitemap < REXML::Document
  attr_accessor :loc,:lastmod, :urls
 
  def initialize(loc=nil, lastmod=nil)
    super
    @loc = loc
    @lastmod = lastmod
    self << REXML::XMLDecl.new("1.0", "UTF-8")
 
    urlset = add_element("urlset")
    urlset.add_attributes('xmlns' => "http://www.sitemaps.org/schemas/sitemap/0.9")
 
    @urls = self.root
  end
 
  def to_xml
    to_s
  end
 
  def add_url(loc, lastmod = nil, changefreq=nil, priority=nil)
    @urls << SitemapUrl.new(loc, lastmod, changefreq,priority)
  end
end
 
class SitemapIndex < REXML::Document
  attr_accessor :sitemaps
 
  def initialize
    super
 
    self << REXML::XMLDecl.new("1.0", "UTF-8")
 
    sitemapindex = add_element("sitemapindex")
    sitemapindex.add_attributes('xmlns' => "http://www.sitemaps.org/schemas/sitemap/0.9")
  end
 
  def add_sitemap(sitemap)
    el = self.root.add_element("sitemap")
    loc = el.add_element("loc")
    loc.text = sitemap.loc
  end
 
  def to_xml
    to_s
  end
end

Rake Script to Generate Sitemap

By creating a rake task, we can generate our sitemaps at will. Rake tasks have full access to our models. The file /lib/tasks/sitemaps.rake:

namespace :sitemap do
  desc "Create Index"
  task(:create_index => :environment) do
    puts "Creating Index"
    items = Sitemap.new("http://pricechirp.com/sitemap_items.xml.gz")
    statics = Sitemap.new("http://pricechirp.com/sitemap_static.xml.gz")
    index = SitemapIndex.new
 
    index.add_sitemap(items)
    index.add_sitemap(statics)
 
    FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_index.xml.gz"), :force => true)
 
    f =File.new(File.join(RAILS_ROOT, "public/sitemap_index.xml"), 'w')
 
    index.write(f,2)
    f.close
    system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_index.xml')}")
 
  end
 
  desc "Create all sitemaps"
  task(:create_sitemaps => :environment) do
    #first create the sitemap for
    Rake::Task["sitemap:items"].invoke
    Rake::Task["sitemap:static"].invoke
    Rake::Task["sitemap:create_index"].invoke
  end
 
  desc "Create Items Sitemap"
  task(:items => :environment) do
    sitemap = Sitemap.new
    #add every item
    user = User.find_by_login('default')
    for i in Item.find(:all, :select => "id,status_change_at", :conditions => ['user_id = ?', user.id])
      sitemap.add_url("http://pricechirp.com/items/#{i.id}",w3c_date(i.status_change_at),nil,'.5')
    end
 
    puts "#{sitemap.urls.length} total urls"
    #delete the file
    FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_items.xml.gz"), :force => true)
 
    f =File.new(File.join(RAILS_ROOT, "public/sitemap_items.xml"), 'w')
 
    sitemap.write(f,2)
    f.close
 
    system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_items.xml')}")
  end
 
  desc "Create Static Sitemap"
  task(:static => :environment) do
    sitemap = Sitemap.new
    sitemap.add_url("http://pricechirp.com/",w3c_date(Time.now),'daily','1.0')
    sitemap.add_url("http://pricechirp.com/faq",nil,'monthly','.5')
    sitemap.add_url("http://pricechirp.com/contact/new",nil,nil,'.5')
    sitemap.add_url("http://pricechirp.com/items/search",nil,nil,'.5')
    sitemap.add_url("http://pricechirp.com/signup",nil,nil,'.5')
    puts "#{sitemap.urls.length} total urls"
    #delete the file
    FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_static.xml.gz"), :force => true)
 
    f =File.new(File.join(RAILS_ROOT, "public/sitemap_static.xml"), 'w')
 
    sitemap.write(f,2)
    f.close
 
    system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_static.xml')}")
 
  end
 
  def w3c_date(date)
    date.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
end

The rake script gives us:

rake sitemap:create_index                # Create Index
rake sitemap:create_sitemaps             # Create all sitemaps
rake sitemap:items                       # Create Items Sitemap
rake sitemap:static                      # Create Static Sitemap

Using a Crontab to Automate the Rake Task

Now we can easily create a crontab entry to automatically generate our sitemaps.

5 */2 * * * cd /path/to/your/application/ && /usr/bin/rake sitemap:create_sitemaps RAILS_ENV=production >>
/path/to/your/logs/sitemaps.log

Publishing your sitemaps

Now that you have sitemaps, you need to add a line to your robot.txt files to let the search engines know about your sitemap files. You should also submit it to google via their webmastertools to ensure the files are properly formed.

robots.txt:

Sitemap: http://pricechirp.com/sitemap_index.xml.gz

Reply

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
          _  __  ____    ____        _        
_ _ | |/ / |___ \ | _ \ __| | ___
| | | | | ' / __) | | |_) | / _` | / _ \
| |_| | | . \ / __/ | __/ | (_| | | __/
\__, | |_|\_\ |_____| |_| \__,_| \___|
|___/
Enter the code depicted in ASCII art style.