Keyword and Citation Hypertexts:
new directions for the DLS

The Distributed Link Software (DLS) that was a WWW implementation of the hypertext ideas demonstrated in the Microcosm environment. It made use of a WWW proxy environment to add links to HTML or PDF documents while they were delivered from a digital library (in pristine, unlinked form) to a user’s browser (with links integrated into them).

Figure 1: DLS Architecture Figure 1 shows the conceptual architecture of the DLS as used in a UK Digital Libraries project. In particular the stream of the document data from digital library to user’s browser through the DLS software is modified by the various software modules which are here termed ‘agents’ because they have an ‘expertise’ at recognising particular kinds of information in the document.

The hyperlink agent is very simple and uses various databases of standalone links (mostly created on key words and phrases in a subject domain) and attaches them to the documents whenever those keywords appear.

The person agent looks for different appearances of a person (e.g. strings "Leslie Carr", "LAC" or "Dr. Carr" in the context of Southampton University). Figure 2 shows a fragment of the first page of an ACM DL library article with both keyword and person links added to the mentions of interesting people and systems.

The citation agent recognises occurrences of citations in academic papers and analyses their contents to determine author, year, publisher, page range and the like. It uses this information from each citation to perform a lookup in a bibliographic database and to add a link to either the online full text of the cited article (if the database indicates its availability) or to the full bibliographic record which includes abstract and citations. Figure 3 shows another fragment from an ACM DL article that has been populated with links to the ACM library from any citation of CACM or an ACM Hypertext Conference.

The DLS maintains individual profiles for separate users, allowing them to have an adjustable overlay of links and annotations dynamically added to the WWW pages that they browse.
 

Experience and Changes

Although the hypertext facilities provided by the project were well received, some disadvantages were noticed. Firstly, it required users to explicitly change their browser’s proxy settings to use the DLS which was difficult for some because of administrative restrictions and for users at other sites for reasons of network security. Secondly and more fundamentally, it clashed with the webmasters’ paradigm of serving static HTML files, processed by scripts under their control. Partly this was a clash of ideologies: the DLS was conceived as a service which a user could choose to subscribe to in order to provide a cohesive hypertext overlay which integrated the content of many sites, rather than a tool which could be used by a single administrator for constructing the local contents of a specific site.

As a result, the functionality of the DLS is being re-implemented as a suite of individual tools which can be deployed under scripted control by webmasters or content architects, as part of a strategy of internal and external information linking under the constraints of a local design process.

Link Harvester

A standalone program that extracts pre-existing links from a document and adds them into a database. A new link-free version of the document is also generated.

Link Interpolater

A standalone program that inserts links from a database into a document. By choosing different sets of databases the same document can be linked into different subject-oriented or customer-oriented navigation strategies.

Citation Harvester

A standalone program that extracts citations from a document’s bibliography for storage in a database.

Citation Interpolater

A standalone program that inserts links into a document based on the contents of a citation database. Links can be added to a publisher’s Digital Library or to a generic bibliographic database and citation service (such as ISI’s Web of Science). Some publishers may own a significant subject niche and wish to create a closure of articles and citation links within their own site, although a mix of both techniques may be suitable.

The databases manipulated by these programs are not proprietary: the data format used for input and output is XML or XLink based. The databases can be stored, maintained and manipulated by any tool that the Web Manager wishes independently of the DLS Suite. The DLS suite is used only for the purpose of extracting links from or inserting links into documents: control over which links are applied under which circumstances is now given to the Web Site Manager.