End-user Electronic Document Delivery from Ariel
System & service design proposal for a HeadLine discovery-to-access component
John Paschoud, 15 September 1999
<j.paschoud@lse.ac.uk>
The Problem
Ariel [http://www.rlg.org/ariel/index.html] is a comprehensive and efficient system for electronic requesting, scanning and delivery of paper documents between 'equipped, consenting parties' - that is, libraries which have existing inter-supply arrangements between them, staff trained to use the system, and IT resources that are usually better-supported than those of most individual end-users. Ariel is not (and was never designed as) an end-user application; it requires custom software installation and configuration on a relatively high-specification workstation. Ariel can receive requests and deliver documents via a standard email connection (to an end-user workstation without Ariel software installed), but this causes problems with individual email accounts, and institutional email services, because the messages (containing raster image files of scanned documents) are often very large. There are also problems associated with compliance (by the end-user and the supplying library) with the copyright licence conditions imposed on such transactions.
The Proposed Solution
The technical product of this proposal is generally referred to herein as an "EEDD server" (until we invent a more pronounceable, or humourous, acronym for it).
Email requesting, and email notification of delivery, without emailing actual documents to end-users, is the critical part of the solution. 'Delivered' scanned document-images will be placed on secure Web-server filespace, accessible only to the end-user who made the request, after s/he has explicitly assented to a statement of the applicable copyright conditions.
Conversion of scanned documents to Adobe PDF file format (if requested, on the server-side, and on-the-fly as part of the delivery process) could be an option, but may not be necessary, as the 'Imaging' accessory that comes as standard in Windows-95 (& Windows-98) can deal perfectly with the multi-page TIFF files (which are a bit smaller than the equivalent PDFs) produced by Ariel scanning, and is better than Acrobat-reader for end-user browsing or printing.
From the point of view of the supplying library, using Ariel, email requests for documents appear as if they come from another Ariel system, and scanned documents are despatched by Ariel to a known email address (of the EEDD server, not that of the end-user), as if it were that of another Ariel system. There is therefore no 'intimate' interface between the EEDD server and the Ariel software, and no extra training for library staff, or adaptation of the Ariel installation at supplying libraries, other than the registration (in their Ariel installation) of a single new email address as a valid requester and destination.
Institutional email systems administrators will not see the delivered documents cluttering up their valuable filespace, because the EEDD server can run on a (probably Unix-based) Web-server host machine which handles email directly, bypassing the normal institutional email 'post offices'. This is achieved by using an address like
eedd-server@eeddhost.lse.ac.uk instead of the more usual end-user@lse.ac.uk.The end-user interface comprises a combination of Web-page forms and email messages, all of which should work with all Web-browsers and any email systems and clients. Having identified herself with a name, password and personal email address, details of the request (bibliographic identification, supplying library) are entered on or selected from a Web-form. An email acknowledging the request will be received immediately. Time passes…
When the requested document has been scanned and despatched by the supplying library (using Ariel), a notification email will be received, containing the URL for access to the delivered document. When the end-user follows this link, she is again required to identify herself with name and password. She is then presented with a Copyright declaration Web-form, which must be explicitly acknowledged, before an automatic link is made to the actual document. This is then downloaded to her workstation, and can be viewed or printed using the 'Imaging for Windows' desktop accessory (or any other program capable of reading a multi-page TIFF file) which can be configured as the automatic Web-browser 'plugin' for this file-type.
When an attempt has been made to access a delivered document, a further email message will be received by the end-user, to confirm that this has happened; this also offers her a time-limited opportunity (by visiting another included URL) to 'complain' that the document was faulty, incomplete, not as-requested, or had been downloaded by an unknown third-party. If this 'complaint' action is not taken by the end-user within the (library-configurable) time-limit, the delivered document is automatically deleted from Web-space. This email could also be used to request an email reply from the end-user, confirming that she has received the requested document, to satisfy the copyright requirement for an individual end-user undertaking in respect of every document delivered.
The EEDD server will maintain logs of all document requests and deliveries, which can be used for service demand analysis or for retrospective billing of end-users or their departments or institutions.
A demonstrator of the end-user interface has been implemented on our test server at the URL:
http://bungo.lse.ac.uk/simon/cgi-bin/ariel.pl requiring a valid LSE (internal) email alias and network login password for access. A username/password combination of "guest/guest" (with which some valid email address must also be entered) can also be used for demonstration purposes.
Origin and history of this proposal
This proposal and design was developed following discussions with a number of people, mainly outside the HeadLine Project, who have been involved in investigating possible solutions. Their contributions are acknowledged, particularly (but not exclusively) those of:
Ingrid Evans, JEDDS Support
i.evans@mmu.ac.ukJean Sykes, BLPES & Lamda Board
j.sykes@lse.ac.ukKerry Blinco, Infostructure Consulting Services Pty Ltd
K.Blinco@ibm.netStephanie Taylor, Lamda Support
S.R.Taylor@mmu.ac.ukKerrie Henderson, LSE IT Support
k.henderson@lse.ac.ukSimon McLeish, HeadLine
s.mcleish@lse.ac.ukJane Neilson, BLPES
J.Neilson@lse.ac.ukApart from the requirements of the HeadLine and Decomate2 projects, for a method of end-user delivery for 'scan-on-demand' full-text documents, requirements were also generated by the potential for 'internal' request and delivery of journal articles between the LSE Library and LSE staff in academic departments (Ariel-to-Ariel delivery to one academic departmental office was piloted in 1998/99), and for inter-university EDD, probably under the umbrella of the Lamda Consortium.
Application Specification
We have specified a task in the HeadLine Project to build a system which deals with the 'supply' end (i.e. it doesn't attempt to solve all of the 'who-to-request-from' problems) of direct delivery from an Ariel supplier, to Web-space acting as the "personal document directory" of an end-user, plus email notification to the end-user, plus appropriate security to control and log who accesses delivered documents. We envisage this working best through a Web-based user environment (i.e. the HeadLine PIE, in our case) that integrates access to a lot of library resources, and serves a known and 'controlled' population (i.e. all the registered users at a university - with each participating university running it's own such gateway - or having one run on its' behalf). However, it would also be possible to run this as a standalone.
There would be three substantive application components. Estimated 15 days programming time (this does not include possible enhancements prefixed as "Option:") to develop and test for alpha release or demonstration, using mainly Perl-CGI, Perl-DBI, and sendmail skills. [Numbers in [..] reference end-user interface components or interactions described in the Demonstrator specification].
The first application component would manage a database of 'live' requests, and use email-protocol interfaces to one or more Ariel-supply-side systems. It would sit 'listening' (as a daemon process) on some (probably Unix) server, and respond to a number of email-driven events:
[A] Email request received from Web-form [1]:
Enter request in database, storing unique request-id, timestamp, user-id and email address, request details, Ariel requestee;
Reformat and forward as email request to Ariel requestee system, using request-id to identify;
Send email confirmation [2] of request to end-user, with request-id for reference in enquiries;
[B] Email received from Ariel, including scanned document:
Identify request-id from email header;
Separate document from email, store (in same TIFF format) on Webserver file-space (or on filespace not mapped to the Webserver, but accessible to the CGI process, to eliminate theft by 'URL guessing');
Update database record with timestamp and URL or filepath of stored document;
Send email 'delivery note' [3] to end-user (if dealing with un-registered users, this could include an access-code, or a one-time PGP-key or similar);
Send email receipt-confirmation to Ariel requestee, if required;
[C] "Cannot supply" email received from Ariel:
Identify request-id from email header;
Update database record with timestamp and failure reason/code;
Send email "regrets" note to end-user;
The second component would be a CGI process, independent from the first, which would drive the Web-forms for requesting [1] and controlling access to document downloading [4]. It would check the request form for completeness, use an existing access-control interface to identify the end-user (including their rights, or not, to request document-supply), and present the 'terms and conditions' form for acceptance, before giving access to the actual URL of the stored TIFF document.
It would update (timestamp) the database to record the document download attempt; Possible copyright-control protocols (admin.-configurable) could be to delete the document after one download attempt, or to warn the user that the document would now be deleted after (a configurable period), in case she needed to download it again.
It could also append-to a serial logfile, as a more secure way of tracking who had accessed what document, when.
[Option: It could include an option to "download as PDF" (instead of the default, as multi-page TIFF), and invoke the tiff2pdf utility to do on-the-fly conversion. tiff2pdf might need some adaptation, in C++, to read the multi-page TIFF format, to implement this enhancement.]
[Option: We could use either SSL or Docserver-type PGP-encryption (sending the one-time key in the 'delivery-note' email) for added security.]
The third component would be a timed (i.e. crontab-controlled) robot, which would be activated periodically (say, each weekday night) to tidy the database and storage Webspace, deleting documents that had been successfully retrieved, and corresponding 'dead' database records (or, the latter could be kept as backup audit information, as well as the logs of the CGI process).
The access-control (login) component will be derived from the one already developed for Decomate2 (the Workpackage 6 "Access-broker"), which can (or will be able to) use existing name/password sources including Samba/NT, LDAP, and ATHENS (still pending availability of the ATHENS Perl-API), so that users should be able to identify themselves with a name/password that they use already for another purpose. If the EEDD server was integrated within a wider-purpose library portal (e.g. the HeadLine PIE), access-control would be performed by that, anyway.
Development Plan
The HeadLine project plan assumes that this component will be developed by Christmas (1999), but it could be fitted-in earlier if there was a specific demand (like, the Lamda bid for the national doc-delivery pilot...) or if somebody offered us the necessary extra days of a(nother) good Perl programmer...
User interface demonstrator
We also specified an interim demonstrator which would show the entire process from the end-users' point of view, as follows:
[1]* Web-form --> email (we can do this for real, generating...)
[2]* Email confirmation auto-reply ("your request has been received and will be processed within n hours..")
[3]* Email 'delivery note' ("your requested item has been fetched and scanned by our well-oiled team of cybrarians, and is now accessible via your personal document directory URL at http://xxx ...")
[4]* Web-form 'gateway (at http://xxx ) with (genuine) access-management, using LSE network name/psw, leading to...
[5]* Copyright declaration Web-form, for user to 'sign' / acknowledge, leading to...
[6]* Online Ariel-generated multi-page TIFF of requested article, which will start-up...
[7]* WangImg as Web-browser helper-app, to view and print.
This will be more useful than any amount of explanations to people (managers, etc); It will also allow us to experiment (tweaking the wording, or the number of "Are you sure?" confirmations required) with what will be legally acceptable as an online alternative to signing the paper declaration form for each request.
The demonstrator was implemented by Simon McLeish in August 1999 (taking approximately 2-3 work-days) and is currently available on our test server at:
Internal LSE Pilot
A proposal has been agreed by BLPES SMG (28 July 1999) to extend the Library-to-Department document delivery service (piloted with the LSE Government Department in Spring 1999) to all LSE academic departments in January 2000. A meeting of operational staff who would be involved in this (25 August 1999) with John Paschoud agreed:
that one way to approach the pilot for this would be to inform academic staff that the Ariel scheme will be extended to all departments in the School, & to advise them that they would thus have access to partial electronic delivery of articles. At the same time, we could encourage interested academics to participate in the pilot for desktop delivery, by inviting them actively to opt into the pilot desktop scheme. The opting-in stage for the desktop pilot would involve each participating academic signing a blanket copyright undertaking which would be retained by the library.
(While this would not be fully complying with copyright law, which requires a signed copyright declaration for each item requested, it would provide a blanket declaration from each potential requester.)
The only other uncontrolled pre-requisite for this approach is the inclusion of the 'Imaging' (wangimg.exe) application in the LSE standard build for the Windows-95 desktop. Paul Jackson of IT Services has confirmed that it is included in the desktop version scheduled for rollout in September 1999.
Jane Neilson will lead on the co-ordination of the LSE pilot, and in particular the drafting of and legal advice on a 'watertight' form of words for the 'opt-in' application statement to be signed by end-users who wish to participate.
It is not envisaged that any likely level of initial uptake would prove technically overwhelming, although if the service is successful we may observe greater volumes of requests, and therefore greater workloads of fetching & scanning for the BLPES staff involved in the supply-side. The invitation to participate would be worded in such a way that BLPES could reserve the right to form a waiting list of applicants if demand exceeded available resources.
Three-site Lamda Pilot
A proposal has been made (? subject to agreement by the Lamda Board ?) to establish an inter-university pilot involving three existing members of the Lamda consortium: LSE, Manchester Business School, and Leeds University. John Paschoud will initially liaise with contacts at MBS and Leeds, who are:
MBS:
Karen Bradshaw, Head ILL,
Leeds:
Michael Emly, Head of Library Systems,
Subject to agreement between the co-ordinators, end-users at MBS and Leeds could follow a similar 'opt-in' path and declaration as proposed for LSE users, to provide some (arguable) legal cover for all parties involved.
For the purposes of this pilot, it is envisaged that EEDD server(s) would be hosted on hardware physically sited at LSE (in the domain lse.ac.uk, or headline.ac.uk) on behalf of the other participating institutions.
Next Steps for Development and Pilots
The challenging element of the EEDD server to be developed is the emulation of an Ariel system, sufficiently complete to 'fool' a real Ariel system when sending it requests, and to deconstruct and process responses returned (containing documents, or signalling reasons why no document has been returned).
The fact that this is a 'non-intimate' interface to Ariel, using only email protocols, suggests a straightforward approach of intercepting and analysing messages between two real Ariel systems, by creating an email address which will become that of the EEDD server, and registering that address (as "another Ariel system") with one or more Ariel systems.
As well as the Project Team's Ariel system (used for testing only), we should exchange registrations and messages with Ariel systems in routine 'live' use (i.e. those at the three institutions in the Lamda pilot), so that we can account for library staff practice in completing the Ariel administrative form-fields when responding to a request. As part of our development, it may be necessary to review, document and standardise procedures amongst participating libraries; for example, to ensure that a request unique reference number will be returned intact and unchanged with the matching response(s). Ideally, we want to be as unobtrusive as possible to existing operations - so that not even the human 'components' of the system are aware (or have to act differently) if a request comes via an EEDD server instead of another librarian.
By the end of October 1999 we expect to have exchanged initial test messages with 'live' Ariel services at LSE, MBS and Leeds. By the end of November 1999 we should be able to demonstrate a working system (possibly without handling all possible errors and exception conditions) with a draft version of the user interface. During early December 1999 we will need to consult relevant policy makers, to fine-tune the user interface (to satisfy usability, aesthetic and legality concerns). By early January 2000 we should be able to launch a service to end-users, within the scope and scale of the two pilot projects.
Electronic Signature issues
This proposal involves an 'extended interpretation' of the applicable copyright regulations, in the acceptance of a personal email (or an authenticated access to an "I accept" button on-screen) from end-users as a valid 'electronic signature' to the per-request acknowledgement.
Of course, no form of 'electronic signature' is yet formally recognised under British Law, but a variety of possibilities are currently under evaluation by the Home Office, CITU, DTI and other UK Government agencies. We need to remain informed on the progress of this issue (in the UK and in other countries which may be looked-to as models for the UK) and ensure that the procedures and technologies we use can be adapted to any form of e-signature which is likely to acquire legal validity for copyright compliance purposes. There may also be scope for using this 'test' of copyright law to establish precedents which will benefit the development of other e-services in the wider library and HE community.
It would be useful to know of any current initiatives within UK HE, for the pragmatic acceptance of electronic signatures for similar purposes; this may be an area in which we can call upon the wide collective experience of Lamda Board members.