How does Crownpeak DQM scan my website?
In order for Crownpeak DQM to analyze your website, we send a web crawler to scan it and save copies of your web content.
Scanning your Website
In order for Crownpeak DQM to analyze your website, we send a web crawler to scan it and save copies of your web content. This is very similar to the way a search engine such as Google operates.
The Crownpeak DQM crawler starts at a given starting URL. It then follows every hyperlink it finds on the page, stopping when it reaches the boundaries of what it has been told to scan. These boundaries are defined through a series of rules which can be unique for each website. These rules can be simple e.g. do not follow links outside the website domain. Or they can be more complex, telling the crawler to ignore certain areas of the website, or not follow links with certain parameters.
The crawler typically starts at a predefined starting URL and carries out the series of steps defined above. We can also configure it to explicitly accept or reject certain URL patterns depending on your requirements.
Using your XML Sitemap
In some cases it may be better for Crownpeak DQM to scan the XML Sitemap of the live site. This can be a good approach if website administrators would like to control what pages Crownpeak DQM scans. Additionally this can be used when websites have unusual URL structures which cause duplication. For example: dynamically generated URLs, or navigation that can only be used with JavaScript turned on.
When Crownpeak DQM scans an XML Sitemap it retrieves only the links present on it and goes no further. Therefore the Sitemap needs to be up to date and correct, to ensure that Crownpeak DQM scans the correct content.
- Crownpeak DQM will require an online XML sitemap to be available and accessible from the website (example Avoid providing an static XML file since this would not be updated when new pages are added.
- Please reach out to the support team by creating a support ticket if you have made changes to the URL (http to https, redirecting landing pages) since the dashboard scans may need to be updated based on the changes made