Building an Automatic XML Sitemap Generator for your SharePoint Site

Although SharePoint 2010 provides a top notch environment for building corporate web sites, one of the things that it does not do is to generate an XML sitemap file automatically. This is unfortunate, as this type of file is used by the major search engines to help discover content on your site. Luckily, the development tools for SharePoint make this process relatively straightforward. Below, I’ll walk through the process of creating a branded event receiver to rebuild the site map whenever a page is approved.

In order to follow along, you’ll need a copy of Visual Studio 2010 installed on a machine that also has SharePoint Server installed (SharePoint Foundation won’t cut it for this one – we’re using the publishing features). You’ll also need the Visual Studio Tools for SharePoint installed.

If you don’t want to walk through the whole creation process, and just want a site map builder, you can download the solution file from codeplex here. Just note that your web application will need an internet zone for this to work properly.

1. Create an Event Receiver Project

Open Visual Studio and create a new project. Select the SharePoint node, and the Event Receiver project template. Give the Solution and the Project a name, then click OK.

image

The project name will be the name of the SharePoint solution. It can be changed later, but it’s much easier to get it right ahead of time. The next prompt will ask for the debugging site, and whether this is a farm, or a sandbox solution. The debugging site will need to have the publishing features enabled (this can be done later, and the debugging site can be changed through the Project Properties). Select a farm solution and click Next.

The next screen will ask what type of event receiver that you want to build. The available options are a function of what is available in your debugging site (chosen previously). For example, the Pages library will not be an option for the event source if the Publishing infrastructure has not been enabled. For our purposes, we want this to un whenever a page in the Pages library has been updated. Therefore we select the type to be List Item Events, the source to be a Pages library, and the event to be “An item was updated”.

Click Finish when done. The system will create a feature and an event receiver for you.

2. Make Any Branding and Name Changes

This is not absolutely necessary, but the first thing that I like to do is to change my assembly name and my root namespace to distinguish the work done by my organization from any other things installed. To do this, you open the project properties page, click the Application tab, and change them there.

The Assembly Name controls the file name of the DLL that is generated, and the root namespace controls where your classes are found in the .Net Framework. Unfortunately changing the root namespace does not update the assembly references in the project, and if you attempt to debug the project at this point, you will receive this oh so helpful error:

“Error occurred in deployment step ‘Activate Features’: Operation is not valid due to the current state of the object.”

What you need to do is to update all references to the old namespace in the project. Specifically, the Elements.xml file in the event receiver folder needs the correct starting namespace. Open the file for editing and replace the old assembly name with your new one.

Save the file, and close it if you wish,but we will be coming back to it.

Next, we want to name our feature. The feature will have an internal name that is used when it is referred to programmatically (through powershell, sysadm, etc) and a display name (title), that will be used in the UI. First we’ll modify the internal name. The easiest way to do this is to open the Features folder, and rename the Feature1 node. We’ll call our feature xmlSiteMapBuilder.

The tools are smart about renaming everything in the features folder. Next, double click on the feature node (in our case, xmlSiteMapBuilder). This opens the feature designer. All that we need to do here is to change the title, the description and the scope. The first two are cosmetic (but important!). However, we want our event receiver to run on all pages in the site collection, so we need to change its scope from Web to Site.

At this point, it’s a good idea to run the project to make sure that everything is OK. One you’ve done so, and the browser window opens, go to Site Actions-Site Settings, and select Site Collection Features. You should see your feature there, in an activated state, with your title and description.

Next, we want to change the name of our event receiver to something other than “EventReceiver1”. click on the EventReciver1 folder and rename it, in our case to PageChangedEventReceiver. Then, rename your EventReceiver1 class in a similar fashion. You will be prompted to update all references to the class when you do this, so select yes. Unfortunately, the updates don’t completely affect our pesky Elements.xml file, and we need to perform these manually. Open this file and change all references to the old name to use the new one.

Now we’re ready to write some code!

3. Add the Logic

You can add all of your code directly into your event receiver class. However in our case, we need to perform the same functions not only when the event fires, but also when the feature is activated. Therefore, we add a new class to the project, simply called Builder. In addition, we will need to access the Microsoft.SharePoint.Publishing namespace, so we need to add a reference to it to our project.

Without going through it line by line, our code will walk through our site collection, find all of the pages, check to see if they have been published and then build a site map entry for each one, using the URL prefix for the Internet zone. The complete code is available on the Codeplex site mentioned above, but the content of the Builder class is below.

Imports System.Text

Imports Microsoft.SharePoint.Publishing

Imports System.IO

Imports Microsoft.SharePoint.Administration

 

Public Class BuilderMain

    Private _siteURL As String

    Private _SiteID As Guid

 

    Dim textWriter As StringBuilder = Nothing

    Dim dateFormatString As String = "yyyy'-'MM'-'dd"

   

    Public Sub New(ByVal siteID As Guid)

        _SiteID = siteID

    End Sub

    Public Sub New(ByVal Url As String)

        Dim st As New SPSite(Url)

 

        _SiteID = st.ID

    End Sub

 

    Public Sub Run()

        Try

            textWriter = New StringBuilder(String.Empty)

         

            Dim site As SPSite = New SPSite(_SiteID, SPUrlZone.Internet)

            With site

                Dim web As SPWeb = site.RootWeb

                With web

                    textWriter.AppendLine("<?xml version=""1.0"" encoding=""UTF-8""?>")

                    textWriter.AppendLine("<urlset xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xsi:schemaLocation=""http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"" xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">")

                    LoadTreeViewForSubWebs(web) ' kick it off with the root web As 

                    textWriter.AppendLine("</urlset>")

                    Dim stream As MemoryStream = New MemoryStream(Encoding.UTF8.GetBytes(textWriter.ToString()))

                    web.Files.Add("sitemap.xml", stream, True)

                    stream.Close()

                End With

            End With

 

        Catch ex As Exception

            errorHandler(ex, "BuildSitemap")

        End Try

    End Sub

 

    Private Sub LoadTreeViewForSubWebs(ByVal currentWeb As SPWeb)

        Dim pPageCol As PublishingPageCollection = Nothing

        Dim tagLabel As String = String.Empty

        If PublishingWeb.IsPublishingWeb(currentWeb) Then

            Dim pWeb As PublishingWeb = PublishingWeb.GetPublishingWeb(currentWeb)

            pPageCol = pWeb.GetPublishingPages()

        End If

        'create xml link to site

        'to remove the link to the site (without a page), remove the next line

        writeSitemapNode(currentWeb.Url + "/", currentWeb.LastItemModifiedDate.ToString(dateFormatString))

 

        'create xml links to site pages

        If Not pPageCol Is Nothing Then

            LoadTreeViewForSubWebPages(pPageCol)

            pPageCol = Nothing

        End If

        For Each web As SPWeb In currentWeb.Webs

            Dim _file As SPFile = web.GetFile(web.Url.ToString() + "nopost.xml")

            If Not _file.Exists Then

                LoadTreeViewForSubWebs(web)

            End If

        Next

        currentWeb.Close()

 

    End Sub

 

    Private Sub LoadTreeViewForSubWebPages(ByVal currentPages As PublishingPageCollection)

        Dim page As PublishingPage

        For Each page In currentPages

            If page.Url.EndsWith("aspx") Then

                If page.ListItem.HasPublishedVersion Then

                    writeSitemapNode(page.PublishingWeb.Url + "/" + page.Url, page.LastModifiedDate.ToString(dateFormatString))

                End If

 

            End If

        Next

    End Sub

 

    Private Sub writeSitemapNode(ByVal pageLocation As String, ByVal lastModified As String)

        textWriter.AppendLine(vbTab + "<url>")

        'replace secured links with non-secured links

        textWriter.AppendLine(vbTab + vbTab + "<loc>" + pageLocation.Replace("https:", "http:") + "</loc>")

        textWriter.AppendLine(vbTab + vbTab + "<lastmod>" + lastModified + "</lastmod>")

        textWriter.AppendLine(vbTab + "</url>")

    End Sub

 

 

    Private Sub errorHandler(ByVal errorMessage As Exception, ByVal errorLocation As String)

 

        Try

            Dim _eventLog As System.Diagnostics.EventLog = New System.Diagnostics.EventLog("Timer Jobs")

            _eventLog.Source = "Sitemap Generator"

            _eventLog.WriteEntry("Error (" + errorLocation.ToString() + "): " + errorMessage.Message.ToString())

            _eventLog.Close()

            _eventLog = Nothing

        Catch

 

        End Try

    End Sub

End Class

 

Next, we need to call our builder from our event receiver. Our code will go into the ItemUpdated sub. The builder constructor takes either a URL or a Site ID as an argument, and since the item can be obtained through the properties object, our job is pretty straightforward. All we need to do is to check to see if the item has been approved.

Public Overrides Sub ItemUpdated(ByVal properties As SPItemEventProperties) 

   MyBase.ItemUpdated(properties)

   If properties.ListItem.ModerationInformation.Status = SPModerationStatusType.Approved Then

        Dim smb As New BuilderMain(properties.SiteId)

        smb.Run()    

    End If

End Sub

 

4. Add a Feature Receiver

Of course, we don’t want to wait until a page is edited, we want to build a site map as soon as the feature is activated. To do that, we need to add a feature event receiver. To do this, we simply right click on our feature node (in this case, xmlSiteMapBuilder) and select Add Event Receiver. The designer will open the new class, and the 4 event receivers will be commented out. Simply uncomment the FeatureActivated Sub, and add the required code.

Public Overrides Sub FeatureActivated(ByVal properties As SPFeatureReceiverProperties)

    Dim Parent As SPSite = CType(properties.Feature.Parent, SPSite)

    Dim smb As New BuilderMain(Parent.ID)

    smb.Run()

End Sub

 

We don’t need to clean anything up when the feature is deactivated, so this is the only code that we need to add. Go ahead and run the project, and you should find a brand new sitemap.xml file in the root of your site collection. You can use SharePoint Designer to see it, or just use the browser with a url of http://yoursitecollectionurl/sitemap.xml

That’s all there is to it. A little bit of code, and you’re well on your way to Search Engine Optimization.

4 comments

  1. Hi John,

    I tried this i was able to successfully deploy the collection feature but after i approve the page it is hitting error at
    smb.Run()
    in the method
    Public Overrides Sub ItemUpdated(ByVal properties As SPItemEventProperties)
    in the file
    PageChangeEventReceiver.vb

    the below exception:

    System.MissingMethodException was caught
    Message=Method not found: ‘Void Microsoft.SharePoint.SPSite..ctor(System.Guid, Microsoft.SharePoint.Administration.SPUrlZone)’.
    Source=WIRB.SiteMapBuilderDev
    StackTrace:
    at SiteMapBuilderDev.BuilderMain.Run()
    at SiteMapBuilderDev.PageChangedEventReceiver.ItemUpdated(SPItemEventProperties properties)
    InnerException:

    I updated my sharepoint 2010 with latest patches etc. still the same thing. Please advise.

    thanks,
    Seshu

    If Not smb Is Nothing Then
    smb.Run()
    End If

  2. Seshu – It would appear that you’re not using an “Internet” zone. The solution requires one to function. Try adding an “Internet” zone and see if that helps.

  3. Hi Olivia, I would recommend trying DYNO Mapper (http://www.dynomapper.com) visual sitemap generator. With DYNO Mapper you simply input any URL into the system and it will pull thousands of pages for any website and display all the pages into one sitemap.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Exit mobile version