Resource Database Link Checker Specification

Functionality

The main program will run as a cron script (daily). It will read a configuration file to find a list of tables/columns to check (this is so it won't end up being tied to a particular version of the resource database). For a particular table/column in the resource database, the checker should access each URL and obtain the HTTP status (probably using the LWP library). It will use a module to actually perform the checking (this module might provide some functionality for the CAS service we've been discussing later on). Broken links (ones whose status is not "OK" or "Authentication required") should be reported via email in the first instance to the appropriate administrator, with as much information as possible. (When some sort of Web interface for the RDB is developed, the information should also be available there.) If the configuration file lists a place to put link-last-checked information in the database, this will be updated. The checker will put the HTTP status code returned in a link-last-checked-status field, whether it is 200 (OK) or not. Since availability may be affected by other factors than the Web server from which the resource is delivered (e.g. local network outage), it will be necessary to have a code for time out.

Since the location availability field is used to determine whether a link is displayed to users, it will probably be confusing to change this as a result of the link becoming unavailable. Users who follow the link will therefore see just what would happen if they normally accessed a link with the problem, and may well not be affected if there has been a temporary change in the link status when the link checker ran. The user would also want to be able to get at the resource as soon as the administrator fixed it, rather than having to wait until the next time the checker ran.