The main program will run as a cron script (daily). It will read a configuration file to find a list of tables/columns to check (this is so it won't end up being tied to a particular version of the resource database). For a particular table/column in the resource database, the checker should access each URL and obtain the HTTP status (probably using the LWP library). It will use a module to actually perform the checking (this module might provide some functionality for the CAS service we've been discussing later on). Broken links (ones whose status is not "OK" or "Authentication required") should be reported via email in the first instance to the appropriate administrator, with as much information as possible. (When some sort of Web interface for the RDB is developed, the information should also be available there.) If the configuration file lists a place to put link-last-checked information in the database, this will be updated. The checker will put the HTTP status code returned in a link-last-checked-status field, whether it is 200 (OK) or not. Since availability may be affected by other factors than the Web server from which the resource is delivered (e.g. local network outage), it will be necessary to have a code for time out.
Since the location availability field is used to determine whether a link is displayed to users, it will probably be confusing to change this as a result of the link becoming unavailable. Users who follow the link will therefore see just what would happen if they normally accessed a link with the problem, and may well not be affected if there has been a temporary change in the link status when the link checker ran. The user would also want to be able to get at the resource as soon as the administrator fixed it, rather than having to wait until the next time the checker ran.
Resource/about_url link http://www.brokenlink.com/help_for_resource is "not found" for resource 138439 (Brokenlink Service) Location/url link http://www.morebrokenlinks.com is "permanently moved" for location 342349 of resource 138439 (Brokenlink Service)
More generally, this takes the form:
Table/column URL (as link) is "HTTP error message" for table unique_id [link to resource table] (resource name)
This would need to be made more user friendly when there is a web interface, because the librarian would not know anything about tables and columns in the resource data model.
The configuration file will contain the following information:
The initial list of tables columns will be:
The following changes will need to be made to the resource database.