PIE Current Awareness Services - Technical Possibilities

Technical Requirements

The purpose of this particular aspect of the PIE's current awareness service is to allow users to monitor web sites that they themselves have input into the PIE. This is separate from CAS functions that might be implemented into the PIE which are concerned with items in the Resource Database or with saved searches of other resources. (These functions have to some extent been implemented already.)

Requirement List

Problems With Monitoring

External Candidates

Google's directory provides a useful list of candidates. Such a list is much more difficult to identify on most search engines, which confuse current awareness software with content filtering (to prevent access to pornography etc.) software.

External services

Some of the available services seem to monitor search engine results rather than specific URLs; examples include TracerLock. Slightly more advanced, Karnak allows monitoring of the pages returned in a search engine's results as well as just checking to see if the results list itself has changed.

NetMind's Mind-it service is aimed at Web masters rather than end users, so that they can put a box on their pages and allow users to monitor updates. It uses such features as special tags to put round HTML to be ignored by the monitoring software, the ability to send customised messages to users as part of their alert emails. This is not therefore going to be a possibility for use through the PIE.

The Informant is more like the kind of service we will want, with email notification of changes, but only allows five URLs per user and this makes it impossible.

EoMonitor, used by Daily Diffs, allows up to 200 URLs in a free account, with as many as desired in a paid account. The free account doesn't send email alerts. Daily Diffs is a database of the 40,000 most requested URLs from EoMonitor, with listings of changes. Basically, it monitors the HTML of a page for changes. The user sees a graphic with a schematic representation of changes in the page's HTML, as here. This would mean that it wouldn't monitor a page connected to a database or requiring user authentication correctly, and that it doesn't really filter out unimportant changes.

Spyonit has three kinds of change that it will notify the user about: when it changes in any way, when a phrase is added, or when it is removed. It also allows the entry of a username and password for pages requiring authentication.

The feature that really makes Spyonit stand out, is Spybuilder, which allows a site like the PIE to customise exactly what is monitored about external sites, including features where PERL-style regular expressions are used to filter out information that is not to be monitored (such as background colour changes). The format of the notification message can also be specified (short, normal, detailed; ASCII text/HTML), which aids in the parsing of the message by PIE software.

Locally Installable Software

CheckURL is a simple script which checks the header information from the HTTP server, which should contain the date of last modification for a page. This can be incorrect for a wide variety of reasons (detailed above).

Web Secretary is more sophisticated, checking a saved version of each page against the current one (the alert messages it sends contain the full HTML of the new page with changes highlighted) and having some scope for HTTP authentication. It is also written in perl, which will make it easier to integrate with the PIE.

Conclusion

In general, locally installed software is a more useful option for PIE components than reliance on external products. This is because it is easier to integrate it into existing and future PIE functionality (e.g. seamless access to get round authentication problems) and because with external services we are dependent on the format of the service interface not changing.

However, in this case, the possible candidates which consist of installed software do not seem to be as mature as the external services which are available, and particular that offered by Spyonit/Spybuilder. Currently, this seems to be the best option. In the long term, though, customising Web Secretary is probably the most sensible way to add web monitoring functionality to the PIE.