How do I troubleshoot 404 pages not being cleared from SearchG2 collection
Symptoms:
After multiple crawls, 404 pages are still in the SearchG2 collection
Troubleshooting Steps:
- Check if the page does return from search queries and/or in the search collection
- Using fiddler or other NET tool check if the page returns a 404 or a 302.
- If the result is 404, check number of times the site has crawled and open a ticket if applicable.
- If 302 read below
.NET default behavior for custom errors is a 302, not a 404. As a result G2 will keep the file in the index. It must have a 404 response code in order to remove. The changes are below
First make sure the 404 error handling is using ResponseRewrite:
<customErrors mode="On" redirectMode="ResponseRewrite">
<error statusCode="404" redirect="404.aspx" />
</customErrors>
Then make sure the page itself is setting the 404 error:
<script runat="server" language="c#">
protected void Page_Load(object sender, EventArgs e)
{
Response.StatusCode = 404;
}
</script>