Creating Sitemaps for Search Engines

dleinich
Occasional Collector
7 5 4,679

Web crawlers of search engines usually discover pages by following links within your site from one page to another. This approach can lead to search engines not finding all pages of your site, especially if certain pages are not linked directly. Furthermore it is hard for the search engine to find out how important a page is, how often it gets updated, if there is more than one URL pointing to the same page and so on.

The easiest way to help search engines like Google, Yahoo! and Bing from Microsoft find all the content of your site is to provide an XML based Sitemap. The Sitemap provides a list of all URLs of a site along with additional metadata about the URLs so that search engines can crawl your site more intelligently.

Fortunately for us the big three search engines mentioned above, as well as many others, agreed on using a standard format for Sitemaps which is explained in detail on www.sitemaps.org.

This article will explain how to easily create such a Sitemap using the navigation function available in FirstSpirit.


The Sitemap Protocol

In its most basic form the Sitemap is a simple list of all URLs of your website. You can use several XML tags to provide additional information about the URLs to help search engines learn more about your content. See the example below for a simple XML sitemap containing only one URL and using all optional tags.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://www.example.com/</loc>
        <lastmod>2014-01-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

You always start with a urlset containing url entries for each URL. The url-tag requires a loc-tag as a child, all other tags are optional.

The Sitemap protocol supports a wide variety of options and features. You can, for example, use multiple Sitemap files and group them using a single Sitemap index file. Not all of these features can be explained within the scope of this article which focuses on how to create these in FirstSpirit. Please refer to the explanation on www.sitemap.org for further details.

Preparing your Project

The values for the loc-tag as well as the lastmod-tag can be automatically determined by FirstSpirit so we do not need to prepare anything. The values for the changefreq-tag and priority-tag on the other hand need to be defined by an editor knowing the content.

If you want to use the latter tags you have to add means for the editor to add and modify these values. In the example below we act on the assumption that the following two input components (md_sitemap_changefreq and md_sitemap_priority) have been defined in the metadata template as comboboxes allowing only valid entries as specified by the Sitemap protocol.

<CMS_INPUT_COMBOBOX name="md_sitemap_changefreq" noBreak="yes" singleLine="no" useLanguages="yes">

    <ENTRIES>

        <ENTRY value="always"><LANGINFOS><LANGINFO lang="*" label="always"/></LANGINFOS></ENTRY>

        <ENTRY value="hourly"><LANGINFOS><LANGINFO lang="*" label="hourly"/></LANGINFOS></ENTRY>

        <ENTRY value="daily"><LANGINFOS><LANGINFO lang="*" label="daily"/></LANGINFOS></ENTRY>

        <ENTRY value="weekly"><LANGINFOS><LANGINFO lang="*" label="weekly"/></LANGINFOS></ENTRY>

        <ENTRY value="monthly"><LANGINFOS><LANGINFO lang="*" label="monthly"/></LANGINFOS></ENTRY>

        <ENTRY value="yearly"><LANGINFOS><LANGINFO lang="*" label="yearly"/></LANGINFOS></ENTRY>

        <ENTRY value="never"><LANGINFOS><LANGINFO lang="*" label="never"/></LANGINFOS></ENTRY>

    </ENTRIES>

    <LANGINFOS>

        <LANGINFO lang="*" label="Sitemap - Change Frequency"/>

    </LANGINFOS>

</CMS_INPUT_COMBOBOX>


<CMS_INPUT_COMBOBOX name="md_sitemap_priority" singleLine="no" useLanguages="yes">

    <ENTRIES>

        <ENTRY value="0.0"><LANGINFOS><LANGINFO lang="*" label="0.0"/></LANGINFOS></ENTRY>

        <ENTRY value="0.1"><LANGINFOS><LANGINFO lang="*" label="0.1"/></LANGINFOS></ENTRY>

        <ENTRY value="0.2"><LANGINFOS><LANGINFO lang="*" label="0.2"/></LANGINFOS></ENTRY>

        <ENTRY value="0.3"><LANGINFOS><LANGINFO lang="*" label="0.3"/></LANGINFOS></ENTRY>

        <ENTRY value="0.4"><LANGINFOS><LANGINFO lang="*" label="0.4"/></LANGINFOS></ENTRY>

        <ENTRY value="0.5"><LANGINFOS><LANGINFO lang="*" label="0.5"/></LANGINFOS></ENTRY>

        <ENTRY value="0.6"><LANGINFOS><LANGINFO lang="*" label="0.6"/></LANGINFOS></ENTRY>

        <ENTRY value="0.7"><LANGINFOS><LANGINFO lang="*" label="0.7"/></LANGINFOS></ENTRY>

        <ENTRY value="0.8"><LANGINFOS><LANGINFO lang="*" label="0.8"/></LANGINFOS></ENTRY>

        <ENTRY value="0.9"><LANGINFOS><LANGINFO lang="*" label="0.9"/></LANGINFOS></ENTRY>

        <ENTRY value="1.0"><LANGINFOS><LANGINFO lang="*" label="1.0"/></LANGINFOS></ENTRY>

    </ENTRIES>

    <LANGINFOS>

        <LANGINFO lang="*" label="Sitemap - Priority"/>

    </LANGINFOS>

</CMS_INPUT_COMBOBOX>


You should make these fields mandatory by using FirstSpirit rules as described in a previous article on Inside FirstSpirit or define a preset, for example 0.5 for the priority which is also the Sitemap protocol default. It is also reasonable to display the input components only in case the metadata of a page is edited which is also possible to put into effect with FirstSpirit rules.

Depending on your specific project it might also make sense to define the metadata on page reference level, not on the page like done in this example. Having the input components on the page makes it easier for the editor to fill in the required information as he can edit everything along with the content. When editing the information on the page reference it is on the other hand possible to set different metadata for single pages that are referenced more than once in the site structure. The rule to display the input components mentioned above as well as the navigation function explained below would need to be modified accordingly.

Generating the Sitemap

Now it is time to merge available information from FirstSpirit with editorial information described in the previous chapter into the final Sitemap. To do this we create a new page template and use the navigation function of FirstSpirit as shown below.

<CMS_HEADER>

    <CMS_FUNCTION name="Navigation" resultname="sitemap">

        <CMS_PARAM name="expansionVisibility" value="all"/>

        <CMS_PARAM name="wholePathSelected" value="0"/>

        <CMS_PARAM name="siteMap" value="1"/>

        <CMS_ARRAY_PARAM name="pageRefRendering">

            <CMS_ARRAY_ELEMENT index="0..10"><![CDATA[

                $CMS_IF(!#nav.ref.getPageLangSpec(#global.language).useExternalUrl())$

                    $CMS_IF(#nav.ref.getMultiPageParams(#global.language, #global.templateSet).getPageCount() > 1 && !#nav.ref.getMultiPageParams(#global.language, #global.templateSet).getData().isEmpty)$

                        $CMS_FOR(multiPage, #nav.ref.getMultiPageParams(#global.language, #global.templateSet).getData())$

                            <url>

                                <loc>$CMS_VALUE(ref(#nav.ref, abs:1, contentId:multiPage.getId()).url.toString().split("/").map(x->x.urlEncode).toString("/"))$</loc>

                                <lastmod>$CMS_VALUE(multiPage.getLastChange().format("yyyy-MM-dd'T'HH:mm:ssZ").substring(0, 22)+":00")$</lastmod>

                                <changefreq>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_changefreq"))$</changefreq>

                                <priority>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_priority"))$</priority>

                            </url>

                        $CMS_END_FOR$

                    $CMS_ELSE$

                        <url>

                            <loc>$CMS_VALUE(ref(#nav.ref, abs:1).url.toString().split("/").map(x->x.urlEncode).toString("/"))$</loc>

                            <lastmod>$CMS_VALUE(#nav.ref.page.changeDate().format("yyyy-MM-dd'T'HH:mm:ssZ").substring(0, 22)+":00")$</lastmod>

                            <changefreq>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_changefreq"))$</changefreq>

                            <priority>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_priority"))$</priority>

                        </url>

                    $CMS_END_IF$

                $CMS_END_IF$

            ]]></CMS_ARRAY_ELEMENT>

        </CMS_ARRAY_PARAM>

    </CMS_FUNCTION>

</CMS_HEADER><?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.google.com/schemas/sitemap/0.9">

$CMS_VALUE(sitemap)$

</urlset>


Let us take a look the parts of the navigation function in detail: By setting the CMS_PARAM siteMap to 1 we define that only page references in folders are considered for rendering that have the option “Display navigation menu in sitemap?” enabled. We then use <CMS_ARRAY_PARAM name="pageRefRendering"> to render the template fragment for each page reference of the menu.

Using conditionals we first check if the current element points to an external URL and do not add it to the Sitemap if that is the case. If it is not we continue and check whether or not the current element is a content projection. If a content projection was identified, the template fragment is rendered for every entity of the content projection, otherwise it is only rendered for the page reference itself.

For every item identified we now render the following information:

  • The URL to the current page or current content projection element without special characters.

<loc>$CMS_VALUE(ref(#nav.ref, abs:1).url.toString().split("/").map(x->x.urlEncode).toString("/"))$</loc>


  • The date this page or content projection element was last changed in W3C Datetime format.

<lastmod>$CMS_VALUE(#nav.ref.page.changeDate().format("yyyy-MM-dd'T'HH:mm:ssZ")+":00")$</lastmod>


  • The change frequency as defined in the metadata for the page.<changefreq>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_changefreq"))$</changefreq>


  • The priority as defined in the metadata for the page.

<priority>$CMS_VALUE(#nav.ref.page.meta("md_sitemap_priority"))$</priority>

After defining the navigation function in the header, we simply output the result surrounded by the structure defined by the Sitemap protocol as explained above.

The only thing left to do now is creating a page from your new Sitemap page template, create a reference to it in the site structure and then generate and deploy everything to your live system.

Keep in mind that you need to inform search engines about the new Sitemap so they can use it to crawl your site. The easiest ways to do this is by adding the following line to your robots.txt, replacing the URL with the actual URL your sitemap is found at:

Sitemap: http://www.example.com/sitemap.xml

Alternatively you can check the documentation of the search engines to learn how to submit the Sitemap to them.

If you require a more sophisticated solution you are welcome to check the XML Sitemap Generator from our Marketplace. The XML Sitemap Generator by our partner TWT provides additional functionality like automatically splitting large Sitemaps into smaller chunks and more.

5 Comments
TimoMeister
Returning Responder

Schöner Artikel, sehr hilfreich. Vielen dank.

Im HTML Kanal gibt es bei uns ein paar Templates die bei bestimmten Bedingungen mit $CMS_SET(#global.stopGenerate,true)$ gekennzeichnet werden und dann eben nicht in der Sitemap angezeigt werden sollen. Vor allem in Content-Projektionen.

Gibt es die Möglichkeit sowas abzufragen?

dleinich
Occasional Collector

Seiten, deren Generierung mit $CMS_SET(#global.stopGenerate,true)$ abgebrochen wird, können leider nicht ohne weiteres aus der Sitemap entfernt werden. Zum Generierungszeitpunkt der Sitemap ist diese Information nicht bekannt.

Die Kriterien, die zu einem Abbruch der Generierung führen, kann man natürlich zusätzlich bei der Erzeugung der Sitemap in der Navigationsfunktion prüfen. Dies kann aber sehr komplex und zeitintensiv werden, je nachdem welche und wieviele Kriterien berücksichtig werden müssen.

Alternativ kann man die Sitemaps auch im nachhinein modifizieren, sobald wirklich klar ist, welche Dateien erzeugt wurden. Bspw. mit einer zusätzlichen Aktion im Auftrag, die die URLs der Knoten aus der sitemap.xml mit dem eigentlich generierten Stand aus dem Filesystem abgleicht.

jstreit
I'm new here

Hi Daniel,

danke für den Artikel!

Wie löst Du weiteren alternativen Inhalt, also sprach Varianten der Seiten?
Vorlage nach google: https://googlewebmastercentral.blogspot.de/2012/05/multilingual-and-multinational-site.html

Grüße

choff
Returning Observer

Hi Daniel,

very helpful, thanks a lot!

I am wondering why #nav.ref.page.changeDate() actually works. As far as I understand, #nav.ref is a PageRef, so #nav.ref.page is a Page. But according to the Access-API, a Page does not have a method changeDate(). If I take a PageRef and look at it in the beanshell console, I get an error if I try to invoke changeDate():

bsh % pageref = e;

<<PAGEREF editor="2786" htmlname="index" id="94055" pageref="91555" perm="90404:2047,90405:0,90408:515,90407:515" releaseRevision="214977" releasedby="2786" revision="214977" uniquedescription="home_1" workflowPerm="w136-g90404-g90407">

<LANG displayname="Startseite" language="DE"/>

<PAGE_LANG_SPEC language="DE" showinpagegrp="1" showinsitemap="1"/>

</PAGEREF>

>

bsh % page = pageref.page;

<<PAGE editor="2786" id="91555" name="navhome" pagetemplate="3" releaseRevision="214943" releasedby="2786" revision="214943" translated="DE">

<LANG displayname="navHome" language="DE"/>

</PAGE>

>

bsh % page.changeDate();

// Error: EvalError: Error in method invocation: Method changeDate() not found in class'de.espirit.firstspirit.store.access.pagestore.PageImpl' : at Line: 1 : in file: <unknown file> : page .changeDate ( )

bsh % pageref.changeDate();

// Error: EvalError: Error in method invocation: Method changeDate() not found in class'de.espirit.firstspirit.store.access.sitestore.PageRefImpl' : at Line: 1 : in file: <unknown file> : pageref .changeDate ( )

bsh %

What is happening there, why does it work in the Navigation? Is this documented somewhere?

Thanks and best regards,

Christian

mbergmann
Crownpeak employee
Crownpeak employee

Hi Christian,

the method is documented here:

Online Dokumentation FirstSpirit - seitenbezogene #global-Aufrufe

#global.page.changeDate

Michael

Version history
Last update:
‎03-31-2014 04:06 AM
Updated by: