Developers
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Hosting & Search Knowledge Base

404 pages not being cleared from SearchG2 collection

How do I troubleshoot 404 pages not being cleared from SearchG2 collection
Symptoms: 

After multiple crawls, 404 pages are still in the SearchG2 collection

Troubleshooting Steps:
  1. Check if the page does return from search queries and/or in the search collection
  2. Using fiddler or other NET tool check if the page returns a 404 or a 302.
  3. If the result is 404, check number of times the site has crawled and open a ticket if applicable.
  4. If 302 read below

.NET default behavior for custom errors is a 302, not a 404.  As a result G2 will keep the file in the index.  It must have a 404 response code in order to remove.  The changes are below

First make sure the 404 error handling is using ResponseRewrite:

<customErrors mode="On" redirectMode="ResponseRewrite">
   <error statusCode="404" redirect="404.aspx" />
</customErrors>

Then make sure the page itself is setting the 404 error:

<script runat="server" language="c#">
protected void Page_Load(object sender, EventArgs e)
{
   Response.StatusCode = 404;
}
</script>
Labels (2)