Periodically check of external urls

RVoss
Crownpeak employee
Crownpeak employee
4 2 1,476

Over the time external urls have the tendency to become invalid. With the following short scripts you can create a periodical schedule entry that checks external urls and sends you an e-mail if an url could not be requested.

1st task: script to check external urls and put the invalid ones in the context for later use.

invalidRefs = new ArrayList();

for (ref : context.project.getExternalReferences("url", true)) {
   url = ref.referenceString;
   try {
      new URL(url).openStream().close(); // very simplified url check
   } catch (IOException e) {
      invalidRefs.add(ref);
      context.logWarning("found invalud url '" + url + "': " + e);
      for (usage : ref.usages) {
         context.logInfo("     used in: " + usage);
      }
   }
}

if (! invalidRefs.empty) {
   context.setProperty("invalidRefs", invalidRefs);
   context.logError("found " + invalidRefs.size() + " invalid references.");
}

2nd task: script to deactivate the following mail-task if no error is occurred.

context.tasks.get(context.taskIndex + 1).setActive(false);

3rd task: Send an e-mail with the list of invalid urls.

Found $CMS_VALUE(#context.getProperty("invalidRefs").size())$ invalid urls:
$CMS_FOR(ref, #context.getProperty("invalidRefs"))$
* $CMS_VALUE(ref.referenceString)$$CMS_END_FOR$

Put this 3 tasks into a schedule entry and mark the mail task with "Execute even in case of error". So, in case of an invalid url, 1st task will cause an error, 2nd task won't be executed, 3rd task will send you the error-mail. If no invalid url is detected, 2nd task will be executed and deactivate the mail-task.

schedule-entry.png

Of course for real projects you have to optimize the first script to ignore some internal or mailto-links.

Tested with FS4.2R2.

2 Comments
thomas_walter
I'm new here

We added for us settings to use a proxy server and as first step we check two popular URLs (f.e. www.heise.de and www.google.de) to make sure our internet connection is valid.

Otherwise our users might get all mails with lots of broken links while the problem is not the link but the connection from our server to the internet ...

We also only check the http header response and made different rules for different response codes (f.e. links with authentifications are in our company not reported as broken)

Peter_Jodeleit
Crownpeak employee
Crownpeak employee
Of course for real projects you have to optimize the first script to  ignore some internal or mailto-links.

You could use categories in your link templates. Then the editor could choose between the link template "email link" or "external link" which produces references with the category "email" respectively "url". See this blog posting.

Version history
Last update:
‎07-02-2010 04:24 PM
Updated by: