Highlighted
IanHall
Charter Member

Scanning a site to include orphan pages

How best to go about including orphan pages when wanting the entire website scanned

0 Kudos
4 Replies
Crownpeak Employee RandySoulis
Crownpeak Employee

Re: Scanning a site to include orphan pages

We can do this a couple of ways, depending on the number of orphan pages you have.

If there are just a few orphan pages then you can submit a case to Crownpeak support with the following information:

  1. Links to all the orphan pages to include.
  2. The dashboard name to update or create.

If there are more than a few orphan pages, the best way is to setup a sitemap file that includes all the pages you want scanned, including any orphan pages.  Then, submit a case to Crownpeak support that includes the following information:

  1. The URL to the sitemap file.
  2. The name of the dashboard to update or create.
IanHall
Charter Member

Re: Scanning a site to include orphan pages

Thanks Randy

I suppose the issue is that landing pages are created all the time - and I'm not involved so don't know anything about them. Hopefully the HTTrack file you;re now using resolves this issue

0 Kudos
jase
New Creator

Re: Scanning a site to include orphan pages

Would the sitemap file be used instead of the previous crawl or could it be used in conjuction with the crawl?

My example would be a folder of orphaned pages that are added or deleted on a fairly regular basis - could we have Crownpeak crawl the regular linked pages and then refer to a site map file (not "the" site sitemap) to complete the search?

0 Kudos
Crownpeak Employee RandySoulis
Crownpeak Employee

Re: Scanning a site to include orphan pages

Hello Jase,

We can scan a site one of two ways:

1. Natural crawl.  This method we use a starting URL (i.e. home page) and scan that page for links to other pages.  We then crawl those pages looking for additional links.  That process happens until we have found all the pages with links to them.

2. Sitemap file.  This method we use a sitemap file and will scan just the pages in that sitemap file.  Any page or pages not listed in the sitemap file will not get pulled into the inventory and checked against the checkpoints.

So, we cannot scan a site using the natural crawl method and then scan the sitemap file for any additional pages.  We scan the site(s) using 1 of the methods above but not a combination of the two.