Handling Sitecore Multi-site Instance robots.txt

When dealing with Sitecore multi-site instances SEO can be a problem. One of the biggest challenges is to have a separate robots.txt file for each site. Fortunately in Sitecore there is an option to write a custom request processor which can deal with the matter.

Step 1 – Creating a Base Template

The first step is creating a template with one field which will store the robots.txt file contents.

Generic Robots Settings

The field type could be Multi-Line Text or Rich Text Editor – depending on the personal preferences (I prefer to use Multi-Line Text, because I don`t want html to accidentally appear in the robots file).

Generic Robots Settings Fields

Add Standard Values and set a default value for the robots.txt file. When using Sitecore it is important to prevent robots from crawling the”/sitecore” path, so a good default value for the robots is:


User-agent: *
Disallow: /sitecore

For development environments the value can be set to:


User-agent: *
Disallow: /

This way the crawlers won`t crawl the site at all, reducing the chance of leaking something important.

After the template is ready it should be inherited across all the different Home Page Templates (the templates that are used for the startItems in the sites configuration). Please note that the best practice when dealing with multi-site instances is to have some type of Generic Home Page Template which is inherited across the Concrete Home Page Templates. In this situation just inherit the template in the Generic Home Page Template.

After it is all set in Sitecore – it’s time for coding !

Step 2 – Creating a custom HttpRequestProcessor

Create a custom HttpRequestProcessor with the following code.


namespace Sandbox.Processors
{
    using Sitecore.Data.Items;
    using Sitecore.Pipelines.HttpRequest;
    using System;
    using System.Web;

    public class RobotsTxtProcessor : HttpRequestProcessor
    {
        public override void Process(HttpRequestArgs args)
        {
             HttpContext context = HttpContext.Current;

             if (context == null)
             {
                 return;
             }

             string requestUrl = context.Request.Url.ToString();

             if (string.IsNullOrEmpty(requestUrl) || !requestUrl.ToLower().EndsWith("robots.txt"))
             {
                 return;
             }

             string robotsTxtContent = @"User-agent: *"
                                       + Environment.NewLine +
                                       "Disallow: /sitecore";

             if (Sitecore.Context.Site != null && Sitecore.Context.Database != null)
             {
                  Item homeNode = Sitecore.Context.Database.GetItem(Sitecore.Context.Site.StartPath);

                  if (homeNode != null)
                  {
                      if ((homeNode.Fields["Site Robots TXT"] != null) &&
                          (!string.IsNullOrEmpty(homeNode.Fields["Site Robots TXT"].Value)))
                      {
                          robotsTxtContent = homeNode.Fields["Site Robots TXT"].Value;
                      }
                  }
             }

             context.Response.ContentType = "text/plain";
             context.Response.Write(robotsTxtContent);
             context.Response.End();
         }
    }
}

The code is pretty straightforward. If the requested URL has robots.txt in it – the request is intercepted and the value from the custom field is sent as a response. Also the processor assures that there will always be a default value for the robots.txt in case something goes wrong or the field was left blank.

Step 3 – Creating the custom configuration

The configuration file needs to register the new HttpRequestProcessor and it must allow txt files handling in the Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions processor.

The sample configuration can be seen bellow.


<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <preprocessRequest>
        <processor type="Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions, Sitecore.Kernel">
          <param desc="Allowed extensions (comma separated)">aspx, ashx, asmx, txt</param>
        </processor>
      </preprocessRequest>
      <httpRequestBegin>
        <processor type="Sandbox.Processors.RobotsTxtProcessor, Sandbox"
                   patch:before="processor[@type='Sitecore.Pipelines.HttpRequest.UserResolver, Sitecore.Kernel']"/>
      </httpRequestBegin>
    </pipelines>
  </sitecore>
</configuration>

And that`s it ! Now you can have a separate robots file for each of your sites and you are one step closer in being SEO Friendly !

For another implementation with ASP.NET HTTP Handler you can refer to this blog post: http://sitecoreclimber.wordpress.com/2014/07/27/sitecore-multisite-robots-txt/

2 thoughts on “Handling Sitecore Multi-site Instance robots.txt”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s