IanHall
Charter Member

Scanning a site to include orphan pages

How best to go about including orphan pages when wanting the entire website scanned

0 Kudos
6 Replies
RandySoulis
Crownpeak Employee

We can do this a couple of ways, depending on the number of orphan pages you have.

If there are just a few orphan pages then you can submit a case to Crownpeak support with the following information:

  1. Links to all the orphan pages to include.
  2. The dashboard name to update or create.

If there are more than a few orphan pages, the best way is to setup a sitemap file that includes all the pages you want scanned, including any orphan pages.  Then, submit a case to Crownpeak support that includes the following information:

  1. The URL to the sitemap file.
  2. The name of the dashboard to update or create.

--


Randy Soulis
Applications Support Specialist

## If I’ve helped, accept this response as a solution so that other’s can find is more quickly in the future.
## Have thoughts on Crownpeak products? We'd love to hear them. Speak with the Crownpeak Product Team..

Thanks Randy

I suppose the issue is that landing pages are created all the time - and I'm not involved so don't know anything about them. Hopefully the HTTrack file you;re now using resolves this issue

0 Kudos

Would the sitemap file be used instead of the previous crawl or could it be used in conjuction with the crawl?

My example would be a folder of orphaned pages that are added or deleted on a fairly regular basis - could we have Crownpeak crawl the regular linked pages and then refer to a site map file (not "the" site sitemap) to complete the search?

0 Kudos
RandySoulis
Crownpeak Employee

Hello Jase,

We can scan a site one of two ways:

1. Natural crawl.  This method we use a starting URL (i.e. home page) and scan that page for links to other pages.  We then crawl those pages looking for additional links.  That process happens until we have found all the pages with links to them.

2. Sitemap file.  This method we use a sitemap file and will scan just the pages in that sitemap file.  Any page or pages not listed in the sitemap file will not get pulled into the inventory and checked against the checkpoints.

So, we cannot scan a site using the natural crawl method and then scan the sitemap file for any additional pages.  We scan the site(s) using 1 of the methods above but not a combination of the two.

--


Randy Soulis
Applications Support Specialist

## If I’ve helped, accept this response as a solution so that other’s can find is more quickly in the future.
## Have thoughts on Crownpeak products? We'd love to hear them. Speak with the Crownpeak Product Team..

Hi there,

Thanks for your answer - one further question about how Crownpeak crawls our site - does it recognise the standard meta tags of Noindex, Disallow and Nofollow?

i.e. Can we Noindex a page to tell a search engine not to include our page in search results, but will it allow Crownpeak to crawl and check the page?

Thanks

0 Kudos
ArisRamos
Crownpeak (Retired)

A DQM dashboard would normally scan a website and bring in pages regardless if it is set to Noindex or Nofollow. Having said this a dashboard can be configured through the support team to not bring in Nofollow.

So yes you can add Noindex to a page so search engines will not scan it and DQM will still assess it for quality checks.

--


Aris Ramos
Head of DQM Product Management, CSPO, CSM

## If I’ve helped, accept this response as a solution so that other’s can find is more quickly in the future.
## Have thoughts on Crownpeak products? We'd love to hear them. Speak with the Crownpeak Product Team..

0 Kudos