University of Southampton
A Mobile Agent Architecture for
Distributed Information Management
A thesis submitted for the degree of
Doctor of Philosophy
Faculty of Engineering and Applied Science
Department of Electronics and Computer Science
Large-scale networked environments, such as the Internet, possess the characteristics of distributed data, distributed access and distributed control; this gives the user a powerful mechanism for building and integrating large repositories of distributed information from diverse resource sets. However, few support tools have been developed to allow the user to take advantage of the distributed nature of their information.
Distributed information management is the process by which users can create, disseminate, discover and manage information that is spread across distributed resources. Distributed open hypermedia systems have shown how distributed information, such as documents and hypermedia links, can be managed and handled within an environment that integrates smoothly between the user's desktop and the network. However, such systems are now looking at addressing the problem of interoperability across hypermedia systems, so that documents and links can be shared between users on heterogeneous integrating technologies.
This thesis proposes that the distributed information management provided by open hypermedia systems needs to be extended so that it is more interoperable, extensible and pervasive and that this can be achieved by integrating the principles of open hypermedia with the technology of mobile agents. Mobile agents present a new development mechanism for designing and building distributed applications which are well suited to the dynamic environment of large-scale networks.
This thesis describes the development of a mobile agent architecture within which distributed information management tasks can be built and executed. Mobile agents present an important abstraction mechanism when designing distributed environments and also allow the user to manage distributed information indirectly through their mobile agents. A number of prototype agents are described that have been developed to illustrate distributed information management tasks within the architecture and to show how abstractionism and indirect management can be achieved.
List of Figures
5.9 An Example
List of Tables
Be courteous to all, but intimate with few, and let those few be well tried before you give them your confidence. True friendship is a plant of slow growth, and must undergo and withstand the shocks of adversity before it is entitled to the appellation
-- George Washington
First thanks go to my supervisor, David DeRoure, for his help with writing papers, attending conferences, resolving technical matters and developing the thesis during the course of my PhD. I would also like to thank Wendy Hall for allowing me to work within the Multimedia Research Group and Paul Lewis for being my internal supervisor and a fast proof-reader!
Thanks also have to go to my long-standing (and possibly long-suffering!) friend Stuart Goose--cheers for the encouragement, friendship, travel itinerary and beers! I would like to voice my appreciation to my other friends and colleagues in the lab, both past and present, who have helped me along the way; Emilia Mendes, Mark Weal, Danius Michaelides, Jessie Hey, Luc Moreau, Nick Beitner, Joseph Kuan, Gareth Hughes, Gary Hill and Ian Heath. And, who could forget our wonderful secretaries, Karen Smith and Julie Smith.
I would like to take this opportunity to express my inexpressible gratitude to Frank McCabe for his enthusiastic support, tireless effort and constant patience when dealing with my bugs, queries and general ramblings. I hope that technical support for APRIL and DialoX does not run out now that I am not a student any more, Frank!
Concluding thanks have to go to my close friends and family for providing support and weekends of light relief. It is Ian McWilton, Paul Tomlinson and Matthew Nutall chiefly whom I have to thank for helping to spend my grant-- where did it all go? And to my meticulous proof reader, Lesley Dale, many thanks for the hours of eye-strain and green-pen ink it must have cost you! And finally, I would like to dedicate this thesis to my family for their love and encouragement over the years and to their belief (however misguided) that the eternal student might, at last, take up the unmentionable four letter word.
The White Rabbit put on his spectacles. `Where shall I begin, please your Majesty?' he asked. `Begin at the beginning,' the King said gravely, `and go on till you come to the end: then stop.'
-- Lewis Carroll
Alice's Adventures in Wonderland
There is a contemporary style of science fiction writing which has evolved over the past ten years that is known as cyberpunk. This form of writing differs from normal science fiction in two respects. Firstly, it is directly related to the cultural concepts that have developed during the computer revolution of the late 1980s and early 1990s. And secondly, it makes a fundamental distinction between information and technology. In cyberpunk novels, electronic information is the motivating and dominating factor in the world; technology is assumed and is merely an enabling tool.
Cyberpunk writers, such as William Gibson and Neal Stephenson have both used differing terms to describe what is essentially the same concept. In Neuromancer (Gibson, 1984) what Gibson calls `Cyberspace' and in Snow Crash (Stephenson, 1992) what Stephenson calls the `Metaverse', superficially characterise an electronic environment that is accessed through immersive virtual reality. However, both of these terms have much deeper meanings and more wide-ranging implications: they are both used to represent the entire sum of human information that is available electronically. This information is accessible to users through powerful artificial intelligence software that can search and cross-reference multimedia information very quickly. Indeed, Stephenson devotes a number of chapters to dialogue between a user and an AI program known as the `Librarian'. This user has an interactive, real-time and natural language discussion with the Librarian, who collects, correlates and presents information to the user at any level of detail. The information that is available to the Librarian, and hence the user, is historical (through archives), contemporary (through current news media input) and real-time (through a global network of satellites and sensors). Whilst the realisation of such an advanced system is not dealt with at a technological level in any detail, it is easy to see how the atomic elements of computer science today (for example, artificial intelligence, information retrieval and high-bandwidth networking) have been interwoven to construct a fictitious, but potentially powerful and useful electronic information environment.
The current electronic information age is starting to experience a radical change with the development of its own global multimedia and hypermedia information space. The increase during the past five years of inter-networked environments, such as the Internet and corporate and institutional intranets, has led to more electronic information becoming available to people than ever before--this is set to progress rapidly with the increasing penetration of networks into homes. Information that was once primarily text-based has quickly branched into the areas of images, sound and video clips, and audio and video streaming is also being deployed.
Yet networks of information present the electronic information community with a fundamental problem: how is information to be created, published, organised, managed and navigated on a global scale? Information management has only been considered previously on relatively small scales and has dealt primarily with text- and image-based media, for example, libraries, encyclopedias, thesauri and dictionaries--for the most part, these have had to be painstakingly compiled and cross-referenced by hand.
The increase in awareness of electronic information has led to the introduction of a wide range of services to help support the volume of information that is becoming accessible. Typically, electronic information is segregated according to its communication protocol: the mechanism by which it can be accessed. This approach has been criticised (van Dam, 1988) because it can lead to islands of information being created--effectively clusters of information that are kept apart by their different communication protocols and internal data representations. Hypermedia systems have begun to evolve over the last ten years to help bridge the gaps between these islands; they are ideally placed as an integrating technology that can mediate amongst the different information systems to provide hypermedia link and information management functionality (Halasz, 1988).
... they demand the user disown his or her present computing environment to use the functions of the hypermedia system.
In effect, if hypermedia systems intrude into the working practices of the user and are not extensible in their inherent functionality, then they cannot provide a true integration environment within which a global information space can exist. As Meyrowitz goes on to say:
Only when the paradigm is positioned as an integrating factor for all third-party applications, and not as a special attribute of a limited few, will knowledge workers accept and integrate hypermedia into their daily work process.
This theme of integration is further exacerbated when considering the vast quantities of information that are available through these islands. The problems of information overload and management were identified by Vannevar Bush (Bush, 1945), a scientific advisor to President Roosevelt, with regard to his Memex system; a mechanical machine that would allow users to build and share personal repositories of information1. However, the advent of digital computers to hold this type of information has since offered possibilities far beyond those initially predicted by Bush.
As distributed information systems grow, there will be a greater need for powerful tools to help users manage and navigate through the information that is available. Due to the heterogeneity associated with such systems, these tools will have to possess certain characteristics to assist them in dealing with and managing distributed information. These include integration to communicate with disparate information systems, extensibility to deal with new systems and dynamism to react to unpredictable changes in the information and in the environment.
Agents are now being regarded as an important technology to assist in helping to solve the problems of information overload and management (Maes, 1997). They differ from other more conventional technologies in the way that they allow systems to be envisaged and ultimately constructed. Their key tenets are abstraction, interoperability and indirect management and they represent the next evolution in integration since they can interoperate at many different levels; with the user at the desktop, with information and legacy systems in both local and distributed manners, and with other agent-based systems.
The characteristic of mobility in agents allows the metaphor to address aspects of distribution, especially those which involve large networks of machines that are spread over wide areas. This technology, coupled with the management facilities traditionally attributed to hypermedia systems, can allow agents to undertake tasks on behalf of their user; tasks that can be carried out in the background without explicit user intervention. It is the amalgam of these composite technologies that can serve to help build the next generation of management tools that integrate with existing and future information systems to assist the user in the management of their distributed information.
In a widely distributed environment there are many different information resources available to users. Information that is stored within these resources is represented and accessed through heterogeneous systems comprising different data formats and different communication protocols. Traditionally, open hypermedia systems have been used to provide hypermedia functionality between information contained in such resources; they have become a de facto integrating technology because of their central position relative to the requisite information. Such systems have provided tools to manage information objects within the hypermedia environment, such as documents and hypermedia links, and have subsequently become distributed information management systems.
However, a limitation of current open hypermedia systems is their inability to integrate easily with other hypermedia systems--for example, hypermedia link information is not readily interchangeable between most heterogeneous hypermedia systems. If users are to be able to possess a truly integrated information environment that is composed of manageable distributed information resources, then open hypermedia systems need to become an information resource to be exploited in their own right.
This work describes the design and development of an architecture that is based around the metaphor of mobile agents and illustrates how this can be used to provide agents that can help manage the distributed information of a user. The aim is to produce a prototype system in which users can indirectly manage the typical objects found within a hypermedia environment. By separating the issues of communication between hypermedia components from actual hypermedia functionality, it is believed that hypermedia links can become a first-class entity within a distributed information environment.
Furthermore, this research presents a study of mobility within this context and details a working architectural design which uses mobility. It is believed that mobility presents an important abstraction and benefit when dealing with the management of information which is distributed over wide areas. The use of a simple inter-agent communication language allows agents within the architecture to communicate in a consistent and flexible manner.
Consequently, the originality of the work presented within this thesis lies in two main areas. Firstly, the use of mobile agents as a metaphor for the management of distributed information is novel, since most contemporary solutions prefer static, client-server based models of interaction. The second contribution relates to the synthesis of agency, mobility and open hypermedia into a cohesive architecture, since it is proposed that these three technologies are ideally suited in helping to deliver distributed information management to the user.
The need for hypermedia linking between documents, the development of hypermedia systems and their role in providing an integrating information service is presented in chapter 2. It also highlights the need for open hypermedia systems to move towards distribution, and details the main tasks of distributed information management.
Chapter 3 provides a literature review on agents and tackles the question of what actually comprises and constitutes an `agent'. To this end, three different views on agent theories, technologies and applications are considered; agents as a new programming paradigm, the notions and characteristics of agency and finally, a taxonomy of agent applications.
A particular characteristic of agents is focused upon in chapter 4, namely that of mobility. The technology of mobile agents is presented as a progression of existing processing and communication models, such as client-server and subprogramming, and the advantages of this new approach are summarised. Next, the general characteristics of mobile agent systems are discussed and a survey of current mobile agents is detailed. The chapter ends with a comparison of these mobile agent systems, contrasted in terms of the characteristics identified earlier.
Chapter 5 presents the design and implementation of an architecture to support mobile agents; this architecture bridges the gap between mobile agent systems and the intended distributed information management application. A functional description for each core agent that populates the architecture is given, along with their relationships and interactions. The second half of the chapter describes each of the individual services that are available to agents within the architecture, such as standard messaging and migration policies. These services are described within the context of the layered model of the architecture.
The actual implementation of the agents within the architecture is detailed in chapter 6. An environment is described which is based upon integrating various distributed information resources and upon allowing mobile agents to undertake distributed information management tasks to achieve their objectives. A layer of abstraction between users and agents and resources is described as a set of primitives which provide a basic level of hypermedia functionality. Then, the composition of these agents is described, concentrating on their behaviour and execution environment. The chapter ends with a description of the prototype system that has been developed and the sample agents which can perform distributed information management tasks on behalf of their user.
Finally, chapter 7 describes some potential areas for future work that have arisen from the research and development described in this thesis. It also presents a summary of the work carried out and the conclusions that have been drawn.
The cross-referencing and annotation of material can be traced at least as far back as the writings of Aristotle, who constantly referenced the works of related authors around the time of 384 BC. Another important example is The Talmud2, a work that is the primary source of Jewish religious law which was finished around the year 499 AD. It was marked with innumerable commentaries and annotations, the most important of which was by the 11th century scholar Rashi who was considered to be one of the greatest authorities on Jewish law. These types of physical documents illustrate that associations, or links, could be used to provide information which is of a supplementary nature, that is, something related to the flow of the text which may assist in understanding or in providing additional information. Hypertext, a term first used by Theodore Nelson in the 1960s, is used to indicate links between documents of a textual nature and hypermedia refers to linking between documents of different or multiple media types. The term hypermedia will be adopted throughout this thesis to indicate linking between both textual and non-textual media and the term document refers to a document of any given media type, single or composite, including continuous media.
Since the introduction of digital computers and electronic information, hypermedia linking has given users a new ability above and beyond just annotation and cross-referencing; the capacity to browse through multimedia information in a non-sequential manner by following links at arbitrary points. As Nelson (Nelson, 1987) summarised:
Text, graphics, audio and video can now come alive in unified, responding, explorable new works that present facts and ideas; hypermedia. Unlimited new forms of connection and branching now offer the chance to explore ideas--to follow different lines of thought, different forms of exposition, different connections in a subject, in ways never before possible. The sequential writings and media of the past have given us only the dimmest precedents.
This chapter provides a brief overview of the approaches to providing hypermedia functionality, such as link creation and link following, in computer systems and makes a firm distinction between an open and closed perspective. It becomes apparent that whilst early hypermedia systems concentrated on providing hypermedia functionality within the desktop on a single or locally networked set of machines, contemporary solutions have been preoccupied with developing hypermedia functionality across large or globally distributed networks. More recently still, the term distributed information management has been used to describe the management of any form of information (media, links, documents and so on) in federated systems. It is proposed that the principles of open hypermedia can be applied to distributed information management as a metaphor for developing systems which can be interoperable in terms of their hypermedia linking and document management strategies.
Early hypermedia systems were monolithic applications that provided an entire environment in which hypermedia material could be created and viewed. For a review of these systems and a general introduction to hypermedia, the reader is referred to (Conklin, 1987). These hypermedia systems originally possessed a number of limitations in their approach to providing hypermedia functionality (Fountain et al., 1990):
Proprietary document formats into which a document had to be converted before it could be used by the system. This makes the document unreadable and unusable by the original application that created it.
Embedded mark-up where anchors (the start and end points of a link) are directly inserted into the document. This makes the basic link model difficult to alter in the future since each anchor in every document must be changed, and links can only be followed from documents in this format.
Van Dam summarised the position of these first hypermedia systems and criticised their deficiency in integrating with other systems and applications (van Dam, 1988):
... right now we are building docu-islands; none of our systems talk to each other, they are wholly incompatible. So we are all working towards the same agenda, more or less, but we can't exchange stuff; there is no exchange format, there is no universality, and furthermore, our systems are closed systems ...
To help address these issues and others that were arising as a result of research into hypermedia system design (such as standards, separate link services and provision for collaborative working) the concept of open hypermedia3 was developed. Davis defines an open hypermedia system to be (Davis, 1995):
... a hypermedia system where the term open implies the possibility of importing new objects into the system.
Halasz4 reinforces the need for open hypermedia systems which exhibit the characteristic of modularity and extensibility:
It's very clear that the hypermedia systems of the 1980s, the monolithic hypermedia system, things like NoteCards and even Intermedia, is no longer viable. They're going to be replaced by open hypermedia systems in which you have independent but communicating components which, taken together, produce hypermedia functionality.
Presentation services (application layer) which provides presentation of hypermedia linked documents. This facet should render hypermedia linked documents in the most suitable desktop application and should also provide a consistent interface across desktop applications to access the underlying hypermedia and document functionality.
Hypermedia services (link service layer) which provides a hypermedia link service between documents. This facet should deliver a given level of hypermedia functionality that can be expressed in the presentation service and can offer linking facilities between documents that are contained within the document management service.
Document management services (storage layer) which provides for the management of documents. This facet should be concerned with abstracting both the underlying storage mechanism and the individual access mechanisms.
Although each of the facets are conceptually distinct, closed hypermedia systems (and some early open hypermedia systems, too) have traditionally combined them into a single application. The purpose of contemporary open hypermedia is to keep these services separate and modular but still allow them to communicate in such a way that their full functionality is realised. It is hoped that eventually the components will be so well understood and specified that different solutions can be interchanged and employed at each respective service level. For these reasons, researchers have proposed open hypermedia as an integrating technology at the desktop as Malcolm et al. note (Malcolm et al., 1991):
Hypermedia technology can be used to provide access to and to manage the applications used to create the data. Although hypermedia is often thought of as a technology to deliver information, its use can be greatly expanded if it is perceived as an integrating technology.
The following subsections trace the development of open hypermedia by looking at a small but notable cross-section of systems, from desktop through to networked open hypermedia. Here we focus on aspects relevant to the work in this thesis, primarily the models and architectures of these information systems. A discourse on the openness of hypermedia systems, standards and implementations can be found in (Davis et al., 1992; Goose, 1997) and a review and comparison of distributed open hypermedia systems is given in (Hill, 1994; Goose, 1997).
Within the research and development of open hypermedia systems, there are a number of different approaches which have been taken in trying to deliver hypermedia functionality (Goose, 1997):
Scripting languages were provided by the early monolithic hypermedia systems to allow the functionality of the system to be extended and generic behaviour to be customised. Scripting language support in an open hypermedia system is useful, since if scripts can be attached to links then the default action of following a link can be altered. Example systems which exhibit scripting languages and make use of their advantages are Multicard (Rizk and Sauter, 1992) and HyperDisco (Wiil and Leggett, 1993).
Toolkits supply a library and an application programming interface (API) which allows the functionality of the hypermedia system to be accessed. This means that a given application can be tightly integrated into the hypermedia environment, but assumes that either the application code can be accessed or it can communicate through a wrapper layer. Toolkit systems include the Hypertext Abstract Machine (Campbell and Goodman, 1988) and the Eggs/HOT toolkit (Puttress and Guimaraes, 1990).
Link services do not need the tight integration required by toolkits, but allow any application to access the hypermedia functionality by communicating through a given protocol. Since the link service sits between the document management service and the presentation service, it provides a flexible and abstract mechanism for delivering hypermedia functionality. Example link services are Sun's Link Service (Pearl, 1991), Microcosm (Fountain et al., 1990; Hall et al., 1993), Intermedia (Yankelovich et al., 1988) and Multicard (Rizk and Sauter, 1992).
Hyperbases provide extended hypermedia functionality within a database environment by storing not only multimedia data but also hypermedia data. Additionally, they offer solutions which have been traditionally associated with database systems, such as controlled shared access, backup and recovery, and so on. Example hyperbase systems are the DeVise Hypermedia system (Grønbæk and Trigg, 1994) and HyperDisco (Wiil and Leggett, 1993).
Each of the hypermedia systems surveyed in the next subsections exhibit some of the aspects that are central to the principles of open hypermedia identified by Davis earlier. Due to the advent of large-scale networks such as the Internet, one of the new characteristics being embodied by open hypermedia systems is distribution. The nature of this distribution can take three forms (Sloman and Kramer, 1987); data (documents and links), control (processing) and hardware (networks of machines). For this reason, the survey is further categorised by open hypermedia systems which feature no inherent distribution5 and open hypermedia systems which attempt to offer global distribution.
Intermedia (Yankelovich et al., 1988) is an open hypermedia system that has been developed at the Institute for Research in Information and Scholarship (IRIS) within Brown University in the USA. The aim of the project was originally to provide support for building and presenting teaching material electronically, but it was recognised that the system had potential for application in other areas.
The Intermedia architecture is based around a link service (in figure 2.2), where each viewer at the presentation layer is a purpose-built application that has been augmented to access the hypermedia functionality of the link server; this provides for seamless integration, but makes future extensibility difficult. Such viewers include a word processor, a graphic drawing package and a video viewer. One reason for building fully integrated applications is that every aspect of their functionality and execution can be monitored and controlled by the hypermedia system. For example, to make linking as easy as the `cut-and-paste' metaphor, each Intermedia application supports hypermedia operations which can be performed uniformly on links and anchors as well as selections.
In line with a requirement of open hypermedia, link anchor information is held separately in a link database. However, to help maintain the consistency between document data and link data, anchor information is also stored within the documents. Intermedia webs allow sets of related links to be grouped, although only one web can be applied to a collection of documents at a time. Nevertheless, Intermedia webs allow users to view documents in different contexts and through different perspectives.
The communication between Intermedia viewers and the link server is through the sockets inter-process communication mechanism which allows for distribution of the viewers in a local configuration. However, each client can only access one link server and there is no provision for link servers to communicate with each other.
The strength of Intermedia lies in the view of hypermedia that it offers rather than in the way it has been implemented. It has shown the level of integration with the desktop that is required if hypermedia as a technology is ever to become part of the standard working practices of a user.
The DeVise Hypermedia (DHM) framework (Grønbæk and Trigg, 1994) is a project being developed at the Aarhus University in Denmark around an object-oriented approach to hypermedia based upon the Dexter Hypertext Reference Model (Halasz and Schwartz, 1994). The original aim of the project was to produce general tools to support system development and cooperative design, especially in large-scale engineering projects.
The Dexter Hypertext Reference Model, commonly abbreviated to just Dexter, was an attempt to design a standardised approach to hypermedia terminology and a formal model of common abstractions found within contemporary hypermedia systems. The development of such a standard was intended to provide the basis for comparison between hypermedia systems and for interoperability and hypermedia information interchange. The model contains a number of layers (figure 2.3):
The within component layer represents the internal content of a component or link. Dexter does not attempt to model the structure of a component and thus no distinction is made between different types of components (such as text and graphics). This leaves the contents of a component open-ended but can be considered a weakness, since no provision is made to deal with structured (composite) documents other than treating them as a whole entity.
The presentation specifications layer manages presentation specifications which describe how a component should be presented to the user. The form of this presentation can be a function of the hypermedia system, a property of the component itself or the path taken to reach the component.
The anchoring layer provides access to components at the within component layer through anchors at the storage layer. Anchoring is a mechanism that allows links to apply to a selection within a component whilst maintaining a separation between the respective layers.
DHM is based around a client-server model of interaction in which hyperbase hypermedia functionality is provided by multiple database servers. A multi-user object-oriented database is used to provide cooperative support when working with shared hypermedia structures, such as locking and event-notification. The development team at Aarhus are currently investigating the potential of scaling DHM to work with large distributed systems, such as the Internet.
During the development of prototype DHM applications, the team at Aarhus noted a number of areas in which Dexter proved insufficient and required modifications to extend its functionality. An example (Grønbæk and Trigg, 1996) is the introduction of location specifiers (locspecs) and reference specifiers (refspecs) to support World Wide Web embedded links (see subsection 22.214.171.124) and Microcosm generic links (see subsection 126.96.36.199).
The extensions proposed by DHM to Dexter make for a rich and flexible linking model. However, due to the fact that the DHM architecture is represented by an object-oriented class hierarchy which must be compiled into a single module, DHM suffers considerably in its dynamic extensibility. Research into Dexter (and other standardised reference models) is ongoing and is an important area for increasing interoperability between open hypermedia systems.
The Microcosm open hypermedia system (Fountain et al., 1990; Hall et al., 1993) is a collaborative research and development effort by the Multimedia Research Group within the University of Southampton in the UK. Microcosm is both a model for open hypermedia and an implemented system that began in 1988 and has gone through a number of revisions to incorporate new research.
The architecture of Microcosm (figure 2.4) is based around a concept of loosely coupled and independent processes (called viewers and filters) which cooperate to provide a presentation service and a hypermedia link service. These processes communicate through an underlying inter-process communication mechanism of the operating system, for example, Dynamic Data Exchange (DDE)6 or sockets.
External link databases. The separation of links from documents into an external database which is managed by a filter, called a linkbase. This approach is desirable for a number of reasons. Firstly, users can store different sets of links on the same sets of documents in separate link databases, thus permitting alternate views on and through documents. Secondly, links are dynamically embedded into documents as they are rendered which allows links to be made to and from media on CDROM. Thirdly, link maintenance is made easier because all links are held in central locations, that is, the link databases.
Transparent desktop integration. The provision for allowing hypermedia functionality to permeate all aspects of the desktop environment, but not intrude on existing applications, for example, by embedding link mark-up or by imposing proprietary document formats. Microcosm provides three levels of integration with the desktop; fully-aware viewers (which are purpose-written and converse fully in the Microcosm protocol), partially-aware viewers (which can be augmented to be conversant in the Microcosm protocol) and unaware viewers (which exist outside of the Microcosm framework but can communicate through mechanisms provided by the graphical user interface, such as the clipboard). The Universal Viewer (Davis et al., 1994) acts as a mediator between Microcosm and an unaware viewer, monitoring the clipboard for activity that indicates a hypermedia action.
Reduced link creation effort. It has been recognised that for very large hypermedia structures the effort of creating individual links can be very time-consuming. A link in a Microcosm link database consists of a bidirectional association between a source anchor and a destination anchor. If the source anchor does not specify a particular document then it is considered to be generic; a generic link is created once for a particular source anchor selection and subsequently applies to all further instances of that selection. For example, a generic link created on the text string `open hypermedia' would apply to all occurrences of that term in every document, which helps to reduce the effort of creating large amounts of repetitive links.
Modular component architecture. The filter model used by Microcosm (called the filter chain which is managed by the Filter Management System) allows processes to be inserted or removed dynamically, thus augmenting the available hypermedia functionality. For example, if a new linkbase filter is inserted then potentially more links could apply to a document. This allows users to configure Microcosm to their changing requirements. Other filters include a computed linker filter (which retrieves documents or links according to a full-text search) and an available links filter (which displays the results of following a link where more than one link result can apply).
Microcosm provides for a contextual grouping mechanism called an application7, which allows related sets of links and documents to be marshalled according to a given context. However, like webs in Intermedia only one application can be active at a time, which has the restriction that links cannot span across applications.
The Microcosm model has been used in a number of diverse situations to provide dynamic and open hypermedia functionality, such as modular teaching courses for MSc. Physics students (Bacon and Swithenby, 1996) and the maintenance of large and complex engineering equipment (Crowder and Hall, 1992). The strengths of Microcosm lie in its configurable nature and the way that the presentation and structuring of information can be dynamically altered by changing this configuration.
Although Microcosm in its current implementation is not inherently distributed, research has been undertaken in extending the existing model to support distribution (Hill, 1994) and in redesigning the underlying communication model to be inherently distributed (see section 188.8.131.52).
One of the requirements identified by Davis for true openness in a hypermedia system is that the design of the hypermedia system should be able to exist on multiple and distributed platforms. Another way of interpreting this is to say that the open hypermedia system should not prevent distribution inherently in its design.
It is clear to see that systems which are monolithic in nature, such as the early hypermedia systems and some mentioned in subsection 2.2.1, do not lend themselves easily to large-scale distribution. The growth of the Internet and the large amount of electronic information which is available have shown hypermedia systems that they not only have to integrate with the desktop, but also with the distributed and heterogeneous environment of a large network.
It is only relatively recently that distribution of this type has been considered for hypermedia, mainly because of the explosion of the World Wide Web as a basic information publication and hypermedia linking system. Distributed information dissemination and resource discovery mechanisms are not discussed here, since they do not explicitly deal with hypermedia linked information, but for a review, comparison and taxonomy of such systems, the reader is referred to (Schwartz et al., 1992).
On the one hand, it can be argued that the World Wide Web (WWW) (Berners-Lee et al., 1992) is not an open hypermedia system in the sense alluded to earlier in this chapter. However, on the other hand, it is an influential system, it is currently in wide-spread use and parts of its design can be considered open.
The development of the WWW began at the high-energy physics research institute CERN in Geneva, Switzerland, where it was used as an internal information dissemination mechanism by the physics and engineering community. However, it has since been adopted as a de facto standard for the distribution of electronic information and basic hypermedia linking between documents within the Internet.
The WWW architecture (figure 2.5) is based around clients (the subset that are used to view information are called browsers) connected to multiple servers within a heterogeneous networked environment. Information is transferred between a client and a server through the Hypertext Transfer Protocol (HTTP) and documents are encoded using a proprietary format called the Hypertext Mark-up Language (HTML); both of these have become Internet standards. HTML is a presentation-independent specification language that is used to describe the layout of media components within a document and also the position of links through embedded anchors.
A code extension which can be dynamically integrated with the browser to handle the media type, such as plugins for the Netscape Navigator. The browser can then handle the media according to the functionality provided by the plugin. However, plugin technology is both browser and platform specific.
An external helper application that is launched by the browser when it cannot handle the media type internally or through a plugin. Usually, integration with the helper application is minimal and it can only be used as a viewer for the media type.
A platform- and browser-neutral code extension such as a Java applet, which handles the media type within the environment of the browser; the browser supports the execution of the byte-code of the applet and a programming interface to its internal functionality.
To accommodate all of these extra features, browsers such as the Netscape Navigator and the Microsoft Internet Explorer have become large and monolithic. The Sun HotJava browser is an example of modularity and dynamic extensibility with regard to the way in which new media types are handled. The HotJava browser understands no media type inherently, but requests byte-code from the local Java subsystem when a new one is encountered. In this way, it is both smaller and easier to update dynamically.
Despite the fact that the WWW has become hugely successful in terms of its user base, it contains a number of inherent problems which both limit its potential and also violate open hypermedia principles. The use of HTML as both a document layout and a hypermedia link format means that not only do all documents have to be converted into HTML, but also that links must be embedded within the documents. Moreover, due to such a simplistic unidirectional linking mechanism there is no management of links within the WWW; if a document is deleted or moved, it is impossible to determine the documents which referred to it and to update them.
The WWW is also restricted by its client-server form of communication, since there is no mechanism for servers to communicate with each other. An object within the WWW (such as a document, a server, a link and so on) is referenced by a Uniform Resource Locator (URL) which comprises a scheme (typically the Internet protocol to be used), the Internet address of the host and a path specification leading to the object on the particular host. The advantage of URLs is that they are easy to use and pass between users, but they are extremely susceptible to change due to the fact that they are heavily location-dependent.
For all of its faults, the WWW has quickly risen as a dominant force within the increasing Internet community and research is ongoing to help alleviate some of its shortcomings. The Distributed Link Service (DLS) (Carr et al., 1995) is a research project aimed at adding Microcosm hypermedia functionality within a WWW environment. In one mode of operation, users specify a DLS-compliant server as their proxy8 and a request from their browser passes through the proxy and on to the intended WWW server. On the return journey the document data is temporarily arrested at the proxy, compared against a local link database and marked-up dynamically with hypermedia links before being passed onward to the waiting browser. In this way, users can regain some control over their documents and links, since they can direct which link databases are active or inactive.
The DLS also supports a mode of operation where links are compiled statically into documents which allows entire sets of WWW pages to be generated from a given set of link databases and documents. Additionally, the use of the Universal Viewer (described in subsection 184.108.40.206) at the client side allows selections to be made and hence links to be followed from arbitrary desktop applications.
Hyper-G (Maurer and Tomek, 1990; Kappe et al., 1993) is a large-scale, distributed open hypermedia system that is being developed at the Technical University of Graz in Austria. The motivation for the project was to provide a general purpose university information system which could support a wide range and large number of users.
As can be seen in figure 2.6, the architecture of Hyper-G is based around a client-server model, where a number of heterogeneous clients can connect to any number of heterogeneous servers. The level of integration with other external information systems is high, like the WWW. However, unlike the WWW, links are stored separately through the link server aspect of a Hyper-G server, but documents are still required to be converted into HTML before they can be imported into Hyper-G's object-oriented persistent document server.
A Hyper-G client comprises a selection of purpose-built media viewers and editors, as well as a session manager. Before a viewer can access a given document, it must issue a request to the link server with the unique identifier of the document; the link server responds with access and anchor details. The access details can be used with the document server to retrieve the data of the document and the anchor information can be used to mark-up links dynamically within the document.
A novel feature of the Hyper-G system is that the provision of links are made available in a variety of different languages, irrespective of the original language in which they were created. This is achieved by using translators which perform the conversion automatically.
The Hyper-G server design is limited to the three components shown in figure 2.6, and is therefore not inherently extensible and is difficult to customise to the requirements of a user. However, because links are stored separately, access control is managed by the server and server to server interaction is permitted. For these reasons, Hyper-G servers have been proposed as a better future prospect of serving information across the Internet than current WWW servers (Andrews et al., 1995).
As with the DLS project, Microcosm: The Next Generation (Microcosm TNG) (Goose et al., 1996; Goose, 1997) is a distributed open hypermedia system that is based around the principles of Microcosm and is being developed by the Multimedia Research Group within the University of Southampton. However, where the DLS is essentially a document-driven approach based within the WWW framework and environment, Microcosm TNG is a process-driven model in which hypermedia functionality is provided by a number of loosely coupled processes, but these processes can be spread across machines within a distributed environment.
To accommodate this new level of distribution, the underlying communication model of Microcosm had to be redesigned and extended (Hill, 1994; Goose, 1997). The original filter model was based around a sequential chain where messages entered at the head and passed through all of the filters along its journey; each filter had the opportunity to inspect the contents of the message and perhaps even alter or terminate it. In a distributed environment, a chain of this type is inefficient since a message may pass back and forth across the network to filters which may not even inspect its contents.
To reflect this, the central concept of Microcosm TNG is a message router (figure 2.7) where processes providing hypermedia functionality to the system encircle the router and register their interests with it. A process is only sent messages of the type for which it has registered, thus helping to reduce network communication. Viewers provide the presentation of hypermedia functionality to the user in a platform-independent manner (Dale, 1996). Since the Microcosm TNG philosophy of open hypermedia is based upon Microcosm, links are stored separately in link databases and documents can remain in their native format.
Unlike both the WWW and Hyper-G, with Microcosm TNG each user has a separate execution environment (called a user session) which can be tailored to their needs and requirements; processes can be invoked or terminated to augment the hypermedia functionality of the system. Users can share information and links between user sessions through mechanisms provided by the Microcosm TNG subsystem and a new abstraction called a hypermedia application (Goose et al., 1996). A hypermedia application is a hierarchical structure that is split into discrete partitions of documents and link sets. These partitions (themselves hypermedia applications) can be combined with other hypermedia applications to form larger structures. The individual components of any given hypermedia application can be composed from other hypermedia applications which may be distributed across a large-scale network. Microcosm TNG provides support for ensuring that messages are passed transparently between message routers of the composite hypermedia applications and not across the entire network.
The strengths of Microcosm TNG initially derive from the original Microcosm system. However, the introduction of distribution, true multiple user support and high-level information sharing has added a new perspective on open hypermedia system design. The architecture of Microcosm TNG is not as compact as the client-server solution used by the WWW and Hyper-G, since more infrastructure is required to provide the hypermedia functionality. The price of an increase in processes is offset somewhat by an increase in extensibility, modularity and configurability which client-server based solutions have difficulty in matching.
The future development of open hypermedia systems lies very much within the area of distribution, especially through integration with the WWW since its hypermedia linking model and management facilities are so poor. However, the direct integration of an open hypermedia system with the WWW to achieve distribution will probably yield a relatively quick solution in the short term which can offer more hypermedia functionality than the WWW currently provides. In the long term, users may become frustrated with the lack of functionality and extensibility.
Early hypermedia systems were preoccupied with integrating with the desktop to the exclusion of all other considerations. This led to closed systems which enforced harsh policies and were confined to the limitations imposed by system designers and software engineers of the time. More contemporary distributed hypermedia systems have been preoccupied with integrating with the network somewhat to the detriment of integration with the desktop. Yet, the Microcosm philosophy for hypermedia has shown that true openness and interoperability with both the desktop (Microcosm) and networked resources (Microcosm TNG) can be achieved through an extensible process-based model. The advantages of such an approach to distributed open hypermedia can be summarised as follows:
Personalised links since link information is stored separately from documents. This means that users can create link databases for sets of links according to their own preferences and that they can decide which sets of links should apply--they are not restricted to one view on a set of documents. Moreover, these link databases can be shared by users to create larger hypermedia structures. This is difficult to achieve with embedded link systems.
Customisable hypermedia functionality due to the fact that processes can be dynamically added and removed from the system. Also, since an open message passing protocol is advocated, users can write their own processes (filters) to augment the hypermedia functionality according to their own desires. This is difficult to achieve with monolithic hypermedia systems.
Modularity and extensibility because each aspect of the hypermedia functionality is handled by a different process. Different solutions can be employed to take over those tasks, for example, an off-the-shelf document management system could fulfil the role of the document management service. Again, this is difficult to achieve with monolithic hypermedia systems.
Scalability since each user is working within their own execution environment and the set of processes that comprise their hypermedia functionality can be spread across machines within that environment. In a client-server model, scalability in the large is difficult to achieve smoothly.
High-levels of integration with desktop applications, networked resources, protocols and data formats. User documents can express hypermedia functionality without being converted into a proprietary format demanded by the hypermedia system. This means that a user can create links in a document and still edit the document afterwards in the application which generated it. In embedded link systems and proprietary document format systems, this is very difficult to achieve.
Although open hypermedia systems have been called an integrating technology for the desktop and the network9, one area that they have failed to address is integration and information exchange with other open hypermedia systems. The problem here is not just a question of different protocols and data formats, but also one of differing sets of hypermedia functionalities provided by each individual open hypermedia system. An obvious example is the hypermedia linking model supported; open hypermedia systems may be open with regard to their extensibility and integration capabilities, but can be closed with regard to mapping hypermedia link structures between disparate systems.
A higher-level of integration between open hypermedia systems can be gained by adopting communication protocol and data interchange format standards. The Open Hypermedia Protocol (OHP) (Davis et al., 1996) addresses the former interoperability issue and attempts to abstract common themes from research and development in open hypermedia systems. Its first revision has been concerned with establishing an open hypermedia linking protocol that will allow various desktop applications to communicate with separate hypermedia link services.
Reference models such as Dexter, provide a standard linking model environment into which hypermedia information can be converted. Leggett et al. (Leggett et al., 1991) performed a number of tests with Dexter and two radically different hypermedia models--they concluded that problems existed in conversion due to attributes being lost in the bidirectional mapping process.
Standard interchange formats such as the Hypermedia/Time-based structuring language (HyTime) (International Standards Organisation, 1992) and the Multimedia and Hypermedia Information Coding Experts Group (MHEG) standard (Bertrand et al., 1992), provide representations for describing hypermedia documents and the relationships between them. Unlike reference models, these standards include specifications of a wide range of link types and document components. However, the HyTime specification is complex and few implementations of even subsets of the standard exist, and MHEG hypermedia objects cannot be altered or revised which makes it better suited to information delivery than flexible and open hypermedia.
To be effective, both of these potential solutions require that each open hypermedia system implements a mapping between either a supported standard or set of standards. Obviously this is not an optimal solution, especially where existing standards develop and future standards emerge. The following section describes a different, but complementary approach which advocates treating the interchange of both hypermedia link and document information as separate issues to the provision of hypermedia functionality.
The vast quantity of information contained within the Internet (and other large-scale networks) is both an advantage and disadvantage. The more information that becomes available electronically gives the user potentially greater diversity and choice, but means that they must expend more time sorting the relevant information from the irrelevant. When the burden of sorting through electronic information becomes overwhelming, it is referred to as information overload. This problem was detected as far back as 1945 by Vannevar Bush (Bush, 1945), an early pioneer of hypermedia systems:
There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialisation extends. The investigator is staggered by the findings and conclusions of thousands of other workers--conclusions which he cannot find time to grasp, much less to remember, as they appear.
Bowman et al. (Bowman et al., 1994) also recognise the problems associated with large-scale information dissemination and classifies them as scalability problems; scalability of data volume, user base and data diversity. These dimensions are illustrated in table 2.1, where an increase in data volume leads to information overload, an increase in the user base leads to insufficient availability of information, and an increase in data diversity leads to data extraction problems and a generally lower quality of information. They reaffirm the well known belief that the Internet is:
Distributed Information Management (DIM) (Office of Science and Technology, 1996) is an initiative that is being promoted by the UK Technology Foresight panel on Information Technology to develop methods for managing change and evolution in distributed systems, particularly targeting multimedia applications across such systems as the Internet. The programme of work proposed by the initiative includes:
In this way, DIM is an effort designed to manage, not just discover or retrieve, multimedia information that is distributed across a large networked environment. However, it is not seen as an application in its own right, but is a set of connected technologies which provide support for document management systems, open hypermedia systems and the like. In this way, as with open hypermedia, information is left to be handled by the application most suited to deal with it.
In pursuance of characterising certain aspects of DIM, Dale et al. (DeRoure et al., 1996; Dale and DeRoure, 1997) have identified a number of key tasks which can be considered necessary for managing both document and hypermedia link information. The next subsections describe these four broad categories and illustrate the relevance of DIM within an Internet environment.
Resource discovery (and hence information retrieval) is the process of searching through known information repositories and also for finding new sources of information that might be of relevance to the user. As the user becomes inundated with more and different forms of electronic information, so the process of searching manually becomes time consuming and tedious. Therefore, any resource discovery and information retrieval algorithm must be accurate to ensure that relevant data is not overlooked and that irrelevant data is discarded before it reaches the user.
Another function of resource discovery is monitoring information repositories and notifying the user if particular aspects of that information change. This is useful if the user requires temporally active information to be observed, such as stock prices or live video. Resource discovery and information retrieval are, arguably, the most prevalent tools available for the Internet and the most undertaken activity. A comparison and review of such systems and the issues involved is given in (Schwartz et al., 1992) but at present, most of these systems only index information that is text-based in nature.
Systems have been developed which try to deal with multimedia information in a uniform manner and most approaches to indexing and retrieving are based around associating textual keywords as meta-data, content-based methods developed from computer vision and image processing or a mixture of the two. Example systems include the Multimedia Architecture for Video, Image and Sound (MAVIS) (Lewis et al., 1996) and the Miyabi system (Hirata et al., 1993); an overview of methods and technologies involved in handling multimedia information is given in (Grosky, 1994).
As information becomes distributed across heterogeneous networks, the integrity of hypermedia links and documents can become difficult to maintain. Due to the problems of network latency, intermittent connectivity and variable service availability, it is hard to ensure that consistency updates are made in a timely fashion. Early hypermedia systems addressed these problems only obliquely by developing purpose-built applications that could work only within fixed bounds and non-networked environments.
In open (especially distributed) hypermedia systems where the control of information cannot be guaranteed by the system, Davis (Davis, 1995) has identified three major integrity issues that can arise:
The editing problem. When a document is altered, the anchors which constitute a link may no longer correspond to their original locations due to the fact that links are stored separately in a link database. This is because, invariably, anchors are specified in the link database as a byte offset into the document. In systems which have embedded mark-up this is not so much of a problem since when the content moves so do the anchors.
The versioning problem. As a document progresses through a series of revisions during its lifetime, if a new version is recorded at each stage then a modification history of the document becomes apparent. In most cases, this history can be used by an open hypermedia system to maintain the integrity of anchors because it can determine how the document has been altered. However, when a user wishes to view the document, which version is appropriate to display?
The dangling link problem. Open hypermedia systems, especially distributed open hypermedia systems, cannot be aware of the status or availability of all its links or documents. For this reason, when a document is moved without the hypermedia system being informed then the destination of a hypermedia link can become invalidated.
It is the task of the information integrity aspect of DIM to ensure that the validity of documents and hypermedia links is preserved across systems, either by attempting to repair or by notifying the user of the problem. Two general approaches to integrity failure detection exist. The first is a lazy method which identifies problems as and when they occur (generally when a user tries to follow a link or access a document), and the second is to try and prove the integrity of a hypermedia and document structure (by following all links and accessing all documents). In reality, the former approach might not be desirable in all situations and the latter is generally not possible in dynamic environments, so a mixture of the two normally has to be employed.
Navigation assistance is the process of aiding the user in manoeuvring through a large information system or body of information. As resource discovery finds information that is considered to be of relevance to the user, so navigation assistance helps review this information to ensure that the user does not become overloaded with possibilities. Wilkins (Wilkins, 1994) describes the process of navigation assistance as a balance between the following requirements:
Oren (Oren, 1987) likens the role of navigation assistance to a human librarian:
... who does not comprehend the material in articles being sought, but does understand the conventions of card catalogues, abstract collections, citation indexes and bibliographical references. Because these relations can be made explicitly in hypertext they can be utilised without, for instance, having any deep comprehension of the meaning of any article title.
The Advisor Agent (Wilkins, 1994) is a filter that exists within Microcosm and acts as a mediator between users and their information resources. The agent presents the user with an overview of their current situation by querying all available sources of navigation information; each source responds with a piece of advice and the agent then combines these replies into an overall advice strategy. The Advisor Agent can operate in either autonomous or semi-autonomous mode. When acting autonomously, the user relies on the agent to query all sources, combine the advice and present it; when acting semi-autonomously, the user performs the querying and combines the advice themselves.
Other application areas of navigation assistance include monitoring the user's interaction with their information in an attempt to build a user profile which can be used to tailor future discovery and retrieval queries. Also, navigation assistance algorithms can be employed to analyse a complex body of information, such as a WWW site, to help the user navigate it and subsequently locate the information that they require.
A key aspect of DIM is the ability to manage information resources that are available across heterogeneous systems and networks; this is at the heart of the belief that information can be managed by the most appropriate systems and that these systems can be used interchangeably. It has already been shown how open hypermedia systems can integrate in a flexible manner with the desktop (Davis et al., 1994) and with networked resources (Goose, 1997).
One area that open hypermedia research is currently beginning to focus upon is the integration and interoperability between hypermedia systems, and the provision of separate hypermedia link services and document management services. Standards in both data models and communication protocols are being developed, but DIM can also offer a level of integration where the service in question does not have to worry about the conversion process between systems; this is the responsibility of the DIM subsystem. The provision of integration with both legacy systems and distributed information systems forms a key component of DIM and various solutions will be detailed in the next chapter.
One of the reasons for the success and proliferation of the WWW is due to its simplistic design. This is particularly evident in the small amount of infrastructure that is required to realise its hypermedia functionality, namely a browser and a server. Hypermedia links can be created between documents in a very simplistic manner in which the process of link creation and subsequent following is cognitively simple to understand and appreciate. By contrast, open hypermedia systems have associated overheads such as increased complexity, delays in retrieving hypermedia links from within a link database and more infrastructure required to deliver hypermedia functionality. Thus, systems such as the DLS, where open hypermedia principles are being applied to a WWW environment, can be criticised for these very reasons.
However, it is the feature of simplicity within the WWW which also makes distributed information management difficult. Open hypermedia systems provide more control of objects within the hypermedia environment, such as documents and links, and are thus better suited to providing inherent management capabilities. Additionally, their extensibility and modularity allow them to be customised to the user's requirements and also to be functionally augmented. Yet, unlike the WWW, distributed open hypermedia systems like Microcosm TNG are very hard to engineer because of this extra management burden; the mechanism of providing a distributed communication infrastructure and tools to assist in document and link management is difficult and time-consuming.
This chapter has illustrated the essential difference between open and closed with regard to hypermedia and their systems. The criticisms levelled at early hypermedia systems were that they were monolithic, static and imposed proprietary hypermedia link and document formats; in effect, they were closed. Typically, they afforded high levels of integration but only in carefully controlled or restricted situations, such as where each application was purpose-built so that it could take part in the hypermedia functionality. As a result, other desktop applications were either impossible or difficult to integrate.
Open hypermedia has recognised that the ubiquity of hypermedia link functionality at the desktop needs to be pervasive, but not at the expense of compromising the user's data. To support this, the community are developing systems that are modular, dynamic and do not impose proprietary document formats or embed hypermedia link information within a document. Yet it is distributed open hypermedia and the advent of the WWW that has shown that users require integration with both the desktop and the network. Systems such as Microcosm and Microcosm TNG have demonstrated how both levels of integration can be achieved.
It is proposed that the next generation of open hypermedia systems will not only possess the ability to integrate with the desktop and the network, but will also be able to integrate with other open hypermedia systems. Link services have shown that hypermedia link functionality can be delivered independently of applications and the document management service. It is this higher level of integration and interoperability between systems that DIM seeks to achieve
However, DIM also recognises that the chief efforts of research for both open hypermedia and distributed open hypermedia have been focused on the creation and dissemination of information, and not on its subsequent management. This is particularly evident in such a massively distributed and heterogeneous system as the WWW; basic hypermedia link functionality (following links) is delivered through a simplistic mechanism which is readily available and supported. Nevertheless, there is little provision in the WWW for managing these links or for extending the linking model.
DIM is an initiative which is being promoted to highlight that large distributed information systems require information to be actively managed , not just discovered and retrieved. For this reason, DIM advocates an integration policy that is distinct and abstract from the individual applications and promotes a preservation policy similar to open hypermedia; hypermedia links should be stored and managed separately and documents should be kept in their original format.
Furthermore, DIM is not an isolated or single solution in its own right, but recognises that information of any type is managed best by the application which created it; hypermedia links are best managed by an open hypermedia system and documents are best managed by a document management system. What it does aim to provide, through the incorporation of various composite technologies, is the underlying mechanisms by which interaction between disparate open hypermedia systems and document management systems can occur. In this way, a higher level of abstraction can be afforded to the various management systems, such as communication protocol, location and integration transparency which can be passed upwards to the user.
The next chapter investigates agents and agent technology which offer abstraction and interoperability for a DIM substrate which can integrate separate and multiple solutions and provide a set of tools to help users in managing their distributed information.
agent (a'jent) n . a person who acts on behalf of another person, business or government, etc. [C15th: From Latin agent- , noun use of the present participle of agere to do]
-- The Collins English Dictionary
The development and deployment of agents is one of the most rapidly developing fields of interest, research and development within mainstream computer science. That agents have a perceived future role to play in software systems is gaining momentum, as Guilfolye (Guilfoyle, 1995) states:
... in 10 years time most new IT development will be affected, and many consumer products will contain embedded agent-based systems.
This movement is also perceived by the popular press when the Guardian predicted that (Sargent, 1992):
Agent-based computing (ABC) is likely to be the next significant breakthrough in software development.
These are very bold claims for a technology which some would insist is still in its infancy and has no clear direction, definitions or standards. Indeed, there still remains much confusion over the question what is an agent? Nwana et al. (Nwana et al., 1996b) likens this to the object technology debate of a over decade ago, where it was unclear what an object represented and what features such a system should possess. Agency as both a technical concept and a metaphor is also in danger of being hyped as much as artificial intelligence was during the 1980s.
The aim of this chapter is to try and show that although there is no conclusive definition of agency and that claims about agents are being exaggerated by the popular press and commercial software developers, there is useful research being undertaken. Indeed, it is the very fact that agency is such an umbrella technology that it has defied attempts to be neatly categorised!
Three different views on agents, their technologies and applications will be offered. The first view looks at the motivations behind the agent paradigm, highlighting some of the fundamental technologies that are emerging. The second view presents sets of characteristics which it has been identified that agents could or should possess; these attributes are helping to provide general and working classifications of an agent. The final view details a taxonomy of agents according to the types of actions that they are undertaking and the different behaviours that are expected of them.
For a more in-depth treatment of agents and their associated technologies, the reader is referred to (Wooldridge and Jennings, 1995) for a review of agent theories, architectures and languages, (Bond and Gasser, 1988; Chaib-draa et al., 1992) for a review of agents in distributed artificial intelligence and (Nwana et al., 1996) for a general taxonomic review of agent applications.
The problem of giving a general description of a software agent (also known as an autonomous agent) is difficult due to the fact that there is no universal definition that is generally accepted. Indeed, Carl Hewitt stated10 that the issue of what constitutes a software agent was as much of an embarrassment to the Agent community as the issue of what constitutes artificial intelligence was to the Artificial Intelligence community.
The lack of a clear definition of an agent by the community can raise a number of serious problems. Firstly, if it is not clear what comprises an agent, then it is equally unclear what does not comprise an agent. This can lead to the situation where everything is labelled `agent' and the term subsequently loses coherence and potentially introduces confusion wherever it is used. Secondly, without accepted definitions the task of producing comparisons and drawing analogies between agent systems is made more difficult due to the fact that there is no common frame of reference from which to start.
In its purest sense, an agent is one who takes action (Laurel, 1990). Yet the truest questions here are not in the fact of taking action, but what action is being taken, why is it being taken and who is taking it for whom? Kay (Kay, 1984) attributes the idea of an agent to John McCarthy and Oliver Seldfridge in the mid-1950s:
They [McCarthy and Seldfridge] had in view a system that, when given a goal, could carry out the details of the appropriate computer operations and could ask for and receive advice, offered in human terms, when it was stuck. An agent would be a "soft robot" living and doing its business within the computer's world.
Wooldridge and Jennings go on to introduce agency as central to artificial intelligence (Wooldridge and Jennings, 1995):
It is interesting to note that one way of defining AI [artificial intelligence] is by saying that it is the subfield of computer science which aims to construct agents that exhibit aspects of intelligent behaviour.
Is agency solely the remit of mainstream artificial intelligence, which has given rise to the term `intelligent agent'? Oren et al. believe not, especially with reference to the simplistic agents used in the Guides hypertext system (Oren et al., 1990):
Guides are a simple form of agent who assume the roles of storytellers, and their success is to be measured not by the stringent requirements of full-blown intelligent agents, but against the softer criteria of plausibility and suspension of disbelief.
The real problem in producing a definition of agency is because of the sheer breadth of subject areas in which agents are being applied. Agents and their technologies are claimed to be used in such diverse fields as telecommunications to assist in creating electronic market places (Magedanz et al., 1996) and smart networks (Appleby and Steward, 1994), in Human Computer Interaction as intelligent user interfaces (Chin, 1991) and believable computer personas (Bates, 1994), and in control of large real-world situations, such as power systems management (Wittig, 1992) and air traffic control (Steeb et al., 1988).
More recently, there has been much interest in developing agents and agent technology suitable for publicly accessible networks, such as corporate intranets and the Internet. Such technologies range from Internet-aware agent programming languages (Lange and Mitsuru, 1997; White, 1994), personalised shop assistants (Geddis et al., 1995; Chavez and Maes, 1996) and match-makers (Fisk, 1996; Foner, 1996) and tailored information retrievers (Lieberman, 1995; Maes, 1994; Falk et al., 1996) to meeting schedulers (Kozierok and Maes, 1993; Dent et al., 1992) and digital library assistants (Atkins et al., 1996).
Therefore, since agents are such a practical field of interest, the question surrounding agency is not `what are agents', but `what typifies agents and what features do they have in common'? It is Nwana (Nwana et al., 1996) who describes agents as existing in a truly multi-dimensional space, where the characteristics of agents form each of the separate dimensions and agent applications form the interstices between. Section 3.3 will address this more closely by looking at the desirable characteristics that agents can possess.
Although agency is an emerging field, practicable definitions (Mitchell, 1990; Wooldridge and Jennings, 1995), standards11 and formal definitions for agent characteristics (Goodwin, 1993) are being developed by researchers in the area. The following sections attempt to give a grounding in the pragmatic rationale behind the deployment of agents by detailing some of the motivations and advantages behind their use.
Abstraction, with reference to agents, has the same connotations as abstraction in software engineering, particularly object orientation; namely, the process of formulating generalised concepts through the extraction of common qualities from specific examples. In object orientation, each super-class offers a higher level of generalisation from its sub-classes which, in turn, offer a higher level of specialisation. The similarity and difference between objects and agents has been drawn by Malkoun and Kendall (Malkoun and Kendall, 1997), where an object is defined to possess properties and behaviour (Booch, 1994) and an agent is defined to be an object which is real time, autonomous and can carry out pro-active behaviour. Genesereth and Ketchpel (Genesereth and Ketchpel, 1994) mention that the difference between object-oriented programming and agent-based software engineering lies in the language that each use to describe the messages that pass between their external interfaces:
In general object-oriented programming, the meaning of a message can vary from one object to another. In agent-based software engineering, agents use a common language with agent-independent semantics.
As with object-oriented programming, agents are abstracted from the system by protecting their internal state and data structures through an external interface. Yet the adoption of a universal agent communication language hints towards a higher level of interoperability than with either object-oriented programming or distributed object managers, such as the Common Object Request Broker Architecture (CORBA) (Object Management Group, 1996), since each agent is agreeing to work within a given standard for communication. However, it is not just the adoption of a standard (even of a universal standard) which makes for increased interoperability, but more in the nature and form of the agent communication language itself.
An Agent Communication Language (ACL) (Neches et al., 1991; Genesereth and Ketchpel, 1994) is a lingua franca amongst agents and is primarily concerned with information and knowledge interchange between heterogeneous agents. It consists chiefly of three parts or layers (see figure 3.1--adapted from (Labrou, 1996)):
A vocabulary (or ontology) that contains sets of terms which are appropriate to a specific application domain, called a domain of discourse; the full set of potential ontologies is named the universe of discourse.
A linguistic layer (or communication language) which describes the protocol of message passing (where messages are sentences expressed in the notation of the interlingua) and the provision of context for the content of those messages.
The ARPA Knowledge Sharing Effort (KSE) (Neches et al., 1991) has been working towards describing the components of an ACL that satisfy the needs of passing sufficiently expressive declarative statements (definitions, assumptions, and so on) in the form of messages between agents. Accordingly, there are three working groups that operate under this banner which directly relate to the components of an ACL identified earlier:
The Shared, Reusable Knowledge Bases (SRKB) Group is working towards the development of common ontologies that can be shared between agents to help them interoperate. It has established a repository for shareable ontologies in various application domains using the definitional vocabulary of Ontolingua (Gruber, 1991) and the basic logic of the Knowledge Interchange Format (see below).
The Interlingua Group is developing a common language for describing the content of knowledge bases. The Knowledge Interchange Format (KIF) (Genesereth and Fikes, 1992) is an interlingua content language that can be used to translate between different content languages or between different native knowledge base representations used by two agents.
The External Interfaces Group is involved with examining the dynamics of runtime interaction between knowledge-based systems (agents). The Knowledge Query and Manipulation Language (KQML) (Finin et al., 1995) is a communication language for passing content messages between heterogeneous agents. It is based around the notion of performatives which correspond to illocutionary verbs from speech act theory (Austin, 1962; Searle, 1969). For example, a message is described by a performative (request, inform, warn and so on) which tells a receiving agent how to interpret the contents of the message. In this way, KQML is illocutionary and not perlocutionary, that is, the sending agent expects the message to be understood but has no control over the actual effect of the message. This is because there are no inherent semantics attached to KQML performatives; a criticism that has been addressed by Yabrou (Yabrou, 1996).
Although the KSE designed Ontolingua, KIF and KQML as a packaged solution to the interoperability question, KQML appears to have developed as a de facto standard for pragmatic interoperability between heterogeneous agents, especially in industrial applications. An evaluation of KQML as an ACL is given by Mayfield et al. in (Mayfield et al., 1996). Further information on knowledge sharing can be found in (Nwana et al., 1996b) which provides a more detailed description and critical appraisal of Ontolingua, KQML and KIF, as well as examining other ACLs currently in use.
A transducer mediates between an existing system and other agents. The transducer accepts messages from other agents and translates them into a form suitable for the legacy system and vice-versa. In general, the transducer will have to be fully conversant with the native protocol of the legacy system and the ACL used by the other agents. This approach is favourable when the code for a legacy system is unavailable or is too delicate to modify. It has the advantage that it requires no knowledge about the legacy system other than its communication protocol, but all communication to and from the legacy system must be conducted through this protocol.
A wrapper is an extension of code that allows the legacy system to communicate directly in the appropriate ACL. The wrapper has access to the internal data structures of the system and can make modifications to them. This approach is more efficient than the transducer due to the fact that the wrapper has internal access to the system and can also work where there is no inter-process communication available within the original legacy system. However, this assumes that the source code is available and modifiable, or that the legacy system is extensible in some fashion, such as through an application programming interface.
Rewriting the legacy system is the third and most drastic approach. The advantage here is that a complete rewrite may be able to enhance both the efficiency and capabilities of the original system beyond those offered by either transduction or wrapping. However, for large systems or systems for which there is already a large investment, rewriting may be neither cost effective nor possible.
By adopting ACLs, agents can begin to embody the characteristics of abstraction, interoperability, modularity and dynamism. These are particularly useful qualities that can help to promote open systems which are typically dynamic, unpredictable and highly heterogeneous (Jennings, 1995), such as the Internet. In these types of application domains, the interoperability afforded by an agent-based approach is required because the individual components with which an agent must interact are not known a priori . Additionally, the interfaces between both the agents that exist within the Internet and between agents and software systems must be very robust and flexible, since they cannot be anticipated at design time.
Abstraction not only exists at the component or the system level, but can also exist at the user or problem level. For example, in application domains which are large and sophisticated, the only way to address a given problem is to develop a number of modular components which are specialised in solving a particular aspect of it (Jennings, 1995). Problem decomposition by agents is the particular (though not exclusive) remit of distributed artificial intelligence agent-based systems (or multi-agent systems12) where a problem is rendered down into a smaller subset of components, which in turn are partitioned into smaller subsets of the problem, and so on. Eventually, the problem is decomposed to a manageable level that can be solved by individual agents which are free to choose the most suitable paradigm, rather than being forced to adopt a uniform approach. As Jennings adds:
Computer programs have traditionally been thought of as tools; they are mainly user-centric and require direct manipulation by the user (Shneiderman, 1983; Park et al., 1991; Hoyle and Lueg, 1997). Macros and user-task schedulers have had little effect on changing this due to the fact that they are relatively simplistic in nature and have no notion of context (the ability to be concerned with the user notion of task), contingency (the ability to delay, retry or alter an action if it fails) or semantics (the ability to understand the meaning of an action).
With the advances in computer technology, the fundamental characteristics of computers are changing to support increased functionality, more cognitive tasks, more complex applications, and more interactive interfaces (Card, 1989). The concept of indirect management (Kay, 1990; Negroponte, 1990) is required to support these extended capabilities; the idea that a user no longer has to execute explicitly a given set of applications to complete a task, but that they can delegate the task to an agent. In figure 3.3--taken from (Card, 1989)--the user confers the details of what task the agent is expected to perform instead of actually how to perform it; it is the responsibility of the agent to interpret the task (user discourse machine) and achieve it accordingly (task machine). Thus, the user is abstracted from the details of the individual tools that are being used and can concentrate on the task at hand.
Maes furthers the argument of indirect management when talking of an agent as a personal assistant (Maes, 1994):
The metaphor is that of a personal assistant who is collaborating with the user in the same work environment. The assistant becomes gradually more effective as it learns the user's interests, habits and preferences (as well as those of his or her community).
Essentially, a personal assistant (or interface agent--see section 3.4.2) provides pro-active assistance by observing patterns of actions taken by the user involved in everyday tasks, by asking for feedback on offered assistance and by receiving explicit instructions from the user. However, the personal assistant is not necessarily the only interface between the user and the computer, since it should not prevent the user from taking actions and fulfilling tasks personally.
It is in the move to indirect management and the concept of a personal assistant where agents can describe a more natural metaphor for conceptualising or presenting a given software functionality (Jennings, 1995; Laurel, 1990). Such situations are where the agent is taking the role of dialogue partner (Park et al., 1991), for example, in conversing with other users and other agents (Laurel et al., 1990), in acting and negotiating on behalf of a user (Rosenschein and Genesereth, 1985) or even in instructing the user in some task (Maes, 1994). In this way, the role of the personal assistant becomes very similar to the role of a personal secretary; it is an effort to move work and information overload away from the user.
Hoyle and Lueg (Hoyle and Lueg, 1997) illustrates two methods by which agents can collect information that can be useful in determining how to pro-actively assist the user:
Automated acquisition of a user profile is where the agent attempts to infer what a user is trying or wishes to achieve from the actions that they are currently performing and have performed in the past. In this way, it can make decisions that are based upon past experiences correlated against a database of recognised patterns with which it can provide help. For example, placing the icon of a frequently used application somewhere convenient on the desktop; the agent recognises the frequency of usage and correlates this with the database to produce a course of action.
Cooperative acquisition attempts to build a user profile by actively involving the user in the selection and refinement of preferences that the agent can subsequently use. The personal assistant is either restricted to a particular domain application so that it can make qualitative judgements about a user's preferences, or is domain unspecific in which case the user assigns weightings or rankings to their actions.
The fundamental difference between the two methods is that a purely automated approach, such as the Open Sesame! personal assistant (Caglayan et al., 1996), detects behaviour patterns in a user's actions but then generalises these patterns across different situations. For example, if the detected behaviour pattern of a user is always to empty the trash can after dragging a document into it, then is it necessarily correct that the generalisation of this pattern is always to offer to empty the trash can when something is put in it? This problem is referred to as lack of situatedness (Suchman, 1987): the concept that a user can be in a number of different situations when performing a given action.
The cooperative model is being used in a number of fields, mainly information filtering based upon ratings (Shardanand and Maes, 1995) and upon actions (Hoyle and Lueg, 1997), and in recommendations made manually by humans (Fisk, 1996; Goldberg et al., 1992).
Shneiderman (Shneiderman, 1995) is a vocal critic of personal assistant agents because current agents are not subject to the standard laws of user interface composition in that they have inadequately considered designs, vague goals, unnecessary anthropomorphism and a poor understanding of adaptability. As he says:
Adaptivity under the hood of your car is positive (the computer adjusts the carburettor based on the air and engine temperature, etc.), but is more tricky if it interferes with your choices. If your car decided that since your windshield wipers were on, you should be driving 20% slower you would be quite unhappy.
His research team at the Human Computer Interaction Laboratory within the University of Maryland believe in direct manipulation interfaces (Shneiderman, 1983) and the reengineering of existing user interfaces (Rose et al., 1996) as efforts to achieve the greatest benefit and potential from software packages.
However, even though current personal assistant designers may not be quite on track with user interface design, the move to an indirect management metaphor is marking one of the more interesting and future promises of agents. As Negroponte summarises (Negroponte, 1990):
... I feel no imperative to manage my computer files, route my telecommunications, or filter the onslaught of mail, messages, news, and the like. I am fully prepared to delegate these tasks to agents I trust as I tend to other matters while those other tasks are brought to a satisfactory conclusion. For the most part, these agents are humans. Tomorrow they will be machines.
The connection between object-orientation and agency was first drawn by Hewitt (Hewitt, 1990) with reference to his actor theory (Hewitt, 1977); a model of concurrent object programming in which a concurrent object is a self-contained entity that is concurrently executing with other objects, possesses some internal state, and responds to messages received from other such objects.
Hewitt argues that in truly open systems, since it is very difficult to determine what objects exist at any given point in time, the only thing that components of the system hold in common is their ability to communicate. An actor system is one that is comprised from abstract objects, called actors, which are defined by their behaviour when they accept communications through message passing. Actor theory extends message passing by introducing aspects of parallelism to message passing objects. When an actor receives a message from another actor, it can perform the following kinds of actions concurrently: make simple decisions, create new actors, transmit new messages and change its behaviour for the next message accepted.
This agrees with the view of agency presented by Genesereth and Ketchpel (Genesereth and Ketchpel, 1994), except that even in actor theory the message passing interface between actors can alter in meaning and interpretation from one actor implementation to another. For this reason, he advocates the use of an agent communication language across all agents that is based upon an agent-independent semantics--see section 3.2.1 for a fuller description of agent-based software engineering.
Shoham (Shoham, 1993) proposes a new computational framework called agent-oriented programming (AOP) which is based upon a specialisation of traditional object-oriented programming (OOP):
... whereas OOP proposes viewing a computational system as made up of modules that are able to communicate with one another and have individual ways of handling incoming messages, AOP specialises the framework by fixing the state (now called a mental state) of the modules (now called agents) to consist of components such as beliefs, capabilities, and decisions, each of which enjoys a precisely defined syntax.13
Agents communicate with each other using message passing based around illocutionary verbs from speech act theory (see section 3.2.1), such as do, inform, request and so on. The AOP framework consists of three layers:
Shoham attempted to complete all three of these elements within a framework, called Agent0, but failed to completely address the third component which he describes as being `still somewhat mysterious to me'. The Agent0 formal language allows formulas of time, commitments and beliefs to be expressed, such as given in figure 3.4--taken from (Wooldridge and Jennings, 1995).
This is read as: `if at time 5 agent
can ensure that the door is open at time 8, then at time 5 agent
believes that at time 5 agent
can ensure that the door is open at time 8'. The interpreted agent programming language allows agents to make commitments, transfer facts and inform agents of actions and such like; the task of carrying out commitments is the job of the agent interpreter.
However, Agent0 has been criticised for two main drawbacks. Firstly, agents can neither plan to achieve their goals nor can they communicate high-level goals; this has been addressed by Thomas' Planning Communicating Agents (PLACA) in (Thomas, 1993), where the logic component of Agent0 has been changed to include operators for planning and achieving goals. Secondly, both Agent0 and PLACA specify the relationship between the formal logic and the interpreted programming language loosely, that is, they do not truly execute the specified logic. The Concurrent METATEM language (Fisher, 1994) allows agent specifications represented in a formal temporal logic that represent the behaviour which the agent should exhibit to be executed. In this way, it is possible to prove the specification of the agent and also the resulting actions of the agent.
Expressing agents through the ascription of beliefs, desires and intentions (BDI--pronounced `beady-eye') is an important step forward in describing systems where agents can plan, negotiate, reason and commit to temporal-based goals and where agents can be aware of their preferences, capabilities and obligations to their user and other agents. For such agents, since the BDI model is closer to the user it may be easier to express and understand than, for example, the complete listing of code. As McCarthy reasons (McCarthy, 1979):
To ascribe certain beliefs, knowledge, free will, intentions, consciousness, abilities or wants to a machine or computer program is legitimate when such an ascription expresses the same amount of information about the machine that it expresses about a person. It is useful when the ascription helps us to understand the structure of the machine, its past or future behaviour, or how to repair or improve it ... Ascription of mental qualities is most straightforward for machines of known structure such as thermostats and computer operating systems, but is most useful when applied to entities whose structure is very incompletely unknown.
Agency is concerned with the characteristics and attributes that can be assigned to agents to determine their nature and to predict their behaviour. An agent whose nature is well defined and whose behaviour is predictable is more likely to be of use and to be trusted by the user.
Various notions of agency exist and there are too many to detail here14. However, for the remainder of this section, we focus on the two notions of agency developed by Wooldridge and Jennings (Wooldridge and Jennings, 1995), which offer a clear perspective on different levels of agency and the characteristics that typify them.
Autonomy. Once launched with the information describing the bounds and limitations of their tasks, an agent should be able to operate independently from their user, that is, autonomously in the background (Castelfranchi, 1995; Maes, 1990). To this end, an agent needs to have control over its actions so that it can determine what to do when an action succeeds or fails. Moreover, an agent must be able to augment its internal state so that it can make decisions based upon the information that it has gathered.
Social ability. To effect changes or interrogate their environment, an agent must possess the ability to communicate with the outside world (Genesereth and Ketchpel, 1994; Mayfield et al., 1994). This interaction can exist at a number of levels depending upon the remit of the agent, but typically an agent would need to communicate with other agents and the local environment (to maintain and discover information), and users (to appraise them of their progress and their results).
Reactivity. Agents need to be able to perceive their environment and respond to changes to it in a timely fashion. For example, the task of an agent could be to monitor a local file system, informing the user when changes occur to a particular file set. This implies that the agent has an awareness of the appropriate filing system and how to interrogate it. Agents need not only to be aware of their environment, but they need to be aware of what the state and changes in that environment mean and how to react to them.
Pro-activity. Agents need to be able to exhibit pro-activeness, that is, the ability to effect actions in order to achieve their goals by taking the initiative. This means that an agent needs to appreciate the state of their environment and to decide how best to fulfil their mission target.
Further notions of attributes for agents include descriptions that possess more specific meaning than weak agency, that is, they are attributed characteristics and tasks that would normally be ascribed to humans. Shoham (Shoham, 1993) describes the mentalistic notions of knowledge, belief, intention and obligation that might be attributed to strong agents above and beyond those defined for a weak agent. The philosopher Dennett (Dennett, 1987) describes such an agent as an intentional system:
... [an entity] whose behaviour can be predicted by the method of attributing belief, desires and rational acumen.
An intentional system, then, can best be described by the intentional stance; the ascription of abstract notions to systems for the purpose of describing how they work. For example, although the technical description of a computer system may be available, it is too complex to use when describing, say, why a menu appears when a mouse button is clicked over a certain area of the display. The intentional notions as described by Shoham are useful for providing convenient and familiar ways of describing, explaining and predicting the behaviour of complex systems: Dennett suggests that strong agents are best described by the intentional stance.
Bates (Bates, 1994) uses the concept of strong agents and develops it into more anthropomorphic areas by considering the implications of believable agents (see section 3.4.4), that is, agents that try to model a human approach to their interaction with the user by displaying emotions. Additionally, Maes (Maes, 1994) talks about representing agents visually by attaching an icon or a face to associate them with cartoon or computer characters, as does Laurel (Laurel, 1990). These types of agents are being used in both Human Computer Interaction (HCI) scenarios to help the social interaction between a user and their agents, and also in the computer gaming community to produce virtual characters that react in believable and human ways to given situations.
Intelligence (and thus, reasoning and understanding) is an attribute that can be given to both weak and strong agents and determines how agents will behave in certain situations and react to certain events. In some agent communities, intelligence is seen as the key factor that separates agents from ordinary pieces of software, but most tread lightly around the subject since it is not clear what comprises `intelligence'.
Turing (Turing, 1950) posed the perennial question `Can machines think?'. To try and answer this, he developed what has become known as the Turing Test, where a human questioner converses with either a computer or another person (it is deliberately indeterminate to the questioner) and undertakes interactive dialogue. The purpose of the test is to see if the questioner can identify whether the entity being questioned is a computer or a human. Over time, this test has become an informal goal of artificial intelligence15.
However, Turing recognised that although a digital computer might one day be built which could pass the test, how could it be programmed? He notes that programming it by hand would be impractical, but instead proposes that it would have to learn. Here a strong association is made with artificial intelligence and the development of an artificial personality, and he suggests that there are two approaches to realising such a thinking machine. The first is to embody it in the real world and the second is to concentrate on programming intellectual activities, such as chess.
In his paper, `Intelligence without Reason' (Brooks, 1991), Brooks argues that traditional artificial intelligence16 is based upon Turing's disembodied approach and has been unduly influenced by the technological constraints of computers available at the time. He puts forward a case for moving to an embodied approach with real world situations and problems17, and makes a number of judgements about intelligence to support this approach (the argument between symbolic artificial intelligence and the subsumption architecture is revisited in section 3.4.1):
Intelligence. Robots are observed to be intelligent, but the source of their intelligence is not limited to their computational engine; intelligence is determined by the dynamics of interaction with the world.
Etzioni, in his paper `Intelligence without Robots (A Reply to Brooks)' (Etzioni, 1993), challenges the assertion made by Brooks that only real world environments are sufficient habitats in which to develop and test agents. He argues that real-world software environments, such as operating systems and databases, are a complementary substrate for intelligent agent research.
The argument here seems to be centred around the fact that Brooks believes that by ignoring or simplifying the real world, agents will be based upon aspects of this ignorance or simplification and will, eventually, come to rely upon them. Etzioni, on the other hand, states that simulated environments allow researchers to circumvent many difficult, but peripheral research issues that are inescapable in physical environments (such as reliably mapping perceptual experiences in the real world to internal representations). Etzioni et al. (Etzioni et al., 1994) cites their experiences with softbots as an example of agents existing within a real world software environment, namely UNIX. A softbot is:
... an agent that interacts with a software environment by issuing commands and interpreting the environment's feedback.
In essence, softbots developed for the UNIX operating system environment have shell commands for their actuators (such as
) and sensors (for example,
). Etzioni summarises that software environments have three main advantages over physical ones:
And Etzioni concludes (Etzioni, 1993):
Edwards (Edwards, 1995) asserts that it is not enough for agents to react intelligently to their environment, but they must also be able to adapt and alter to changes by learning. An agent that can learn through exposure to given situations and examples could be more useful to a user than an agent whose intelligence is fixed. However, it is far more difficult to predict the behaviour of an agent that can learn, since it is not possible to determine exactly what it will learn and how it will apply that information.
To this extent, it would appear that intelligence and learning are meta-characteristics of agency. At one end of the spectrum, we have very weak agents which possess no specific intelligence, but still can carry out useful tasks for the user. And, at the other end of the spectrum, we have very strong agents which represent the end-goals of traditional artificial intelligence, namely the realisation of a thinking machine. In this way, agents can employ as much or as little artificial intelligence as is deemed necessary for their target domain of application.
A number of other attributes can be given to both weak and strong agents to augment or temper their functionality. These include, but are not limited to (Wooldridge and Jennings, 1995):
Mobility. The ability for an agent to move across networks and between different hosts to fulfil its goals (Gray, 1995). The concept of mobility as an attribute of agency will be explored in greater depth in the next chapter.
Rationality. The assumption that an agent will not act in a manner that prevents it from achieving its goals and will always attempt to fulfil those goals (Galliers, 1988).
Benevolence. An agent cannot have conflicting goals that either force it to transmit false information or to effect actions that cause its goals to be unfulfilled or impeded (Rosenschein and Genesereth, 1985).
General agent properties are success (it has accomplished a task), capability (it can accomplish a task), perception (it can perceive its environment to allow it to complete a task), reaction (it can respond with sufficient speed to complete a task) and reflexivity (it behaves in a stimulus-response fashion).
Deliberative agent properties are prediction (its model of the world is sufficiently detailed to allow it to correctly predict how to achieve a task), interpretation (it can correctly interpret data from its environment), rationality (it chooses commands it predicts will achieve a task) and soundness (it is predictive, interpretive and rational).
The following is a brief taxonomy of the various perceptions that differing computer science disciplines hold about agents. These definitions were taken, adapted and extended from the taxonomy given in (Wooldridge and Jennings, 1995) and represent typical existing or developing systems from within the software agent research field. A somewhat complimentary taxonomic review of agent applications is also given in (Nwana et al., 1996), but this fails to address some areas of agent research, especially with regard to believable agents.
The traditional concept of agents began, not surprisingly, with the artificial intelligence community. It is a view based around agents being systems that can take input data about their environment, reason about it and (possibly) generate appropriate output responses (Kurzweil, 1990); an allegorical von-Neumann architecture.
The ultimate goal of artificial intelligence agents is to provide intelligence and reasoning capabilities that are comparable to or in excess of those possessed by human beings. As McCarthy (McCarthy, 1979) puts it:
[Artificial intelligence is] the science of making computers do things which if done by humans would require intelligence.
The problems, however, of artificial intelligence seem to lie in two areas (Shardlow, 1990); the translation of the real world into an adequate description in time for it to be useful (transduction), and the representation of complex systems and entities and how to reason about this information in time for it to be useful (reasoning). Traditional artificial intelligence architectures are generally based around three core philosophies:
Deliberative. The reduction of the world to a representation of realisable symbols that can be combined to form structures upon which processes can be executed to operate upon the symbols according to symbolically coded sets of instructions (Newell and Simon, 1976). Decisions regarding actions to perform are made through logical reasoning, based upon pattern matching and symbolic manipulation. Example systems include Homer (Vere and Bickmore, 1990), a robot submarine which explores a two-dimensional Seaworld and can respond to an 800-word spoken vocabulary, and Grate* (Jennings, 1993), a simulation of electricity transportation management where agents act in different manners (individual, selfish and cooperative) to achieve their intentions.
Reactive. A major problem with deliberative (symbolic) artificial intelligence is the processing power required to analyse the information about the real world, plan a suitable solution and then implement a chosen action. Critics of symbolic artificial intelligence have advocated the use of reactive architectures; architectures where there is no complex representation of the real world in symbolic terms and where no symbolic reasoning is performed. The most vociferous critic, Rodney Brooks, has developed an approach based around a reactive model, called the subsumption architecture (Brooks, 1986). A subsumption architecture is a hierarchy of behaviours designed to accomplish tasks. Each behaviour competes with other behaviours in order to influence the actions of the agent; lower layers in the hierarchy represent primitive behavioural styles (for example, avoiding obstacles in the Homer Seaworld) and higher layers represent more abstract behaviours (for example, collecting an object and returning it to a given location). The amount of computation required for these systems is very small when compared with symbolic artificial intelligence systems. Another reactive architecture, based along the same lines as the subsumption architecture, is Pengi (Agre and Chapman, 1987), a computer simulated game where routine tasks are encoded in low-level structures and are only updated to handle new problems that develop. The idea is that most decisions are routine and can therefore be performed quickly and efficiently.
Hybrid. These types of architectures attempt to marry the best qualities of both deliberative and reactive approaches to artificial intelligence. The method consists of building an agent system out of two (or more) subsystems; a symbolic world model which develops plans and makes decisions, and a reactive subsystem which is capable of responding quickly to events that happen without having to resort to complex symbolic manipulation. Examples of such hybrid systems are TouringMachines (Ferguson, 1992) and InteRRaP (Müller et al., 1995). In these architectures a layered model is employed; a symbolic artificial intelligence engine sits at the top of the model handling long-term goals and a reactive artificial intelligence engine handling low-level reactions resides at the bottom. The problem with such systems is how to manage the interactions between the upper and lower layers.
To support this line of reasoning, the Artificial Intelligence laboratory within MIT have developed a prototype interface agent called News Tailor, or NewT (Maes, 1994; Sheth, 1994). A NewT agent is a USENET news filter that can be `trained' by giving it a series of examples that show in which kind of articles the user is interested. From this, the NewT agent can search all news articles to try and find other articles which are similar to those initially indicated by the user. When the agent presents the other articles that it has found, the user gives feedback according to their applicability; thus, the NewT agent can widen or restrict its searching next time.
Other interface agent systems include NewsWeeder (Lang, 1995), UNA and LAW (Green et al., 1995), WebWatcher (Armstrong et al., 1995) and LIRA (Balabanovic et al., 1995). The general advantages and criticisms of interface agents are discussed in section 3.2.2.
An information agent is one that has access to a number of information resources and is able to collect and manipulate that information. Typically, it can communicate across the network to locate information resources to query or manipulate. An example might be where an information agent is asked to find a particular paper; the information agent searches a number of information resources and presents the user with FTP sites and WWW addresses.
The key qualities of information agents lie in their ability to communicate with a large range of information resources to ensure that the widest amount of information is processed to provide the user with the best results. Generally, an information agent will possess some of the characteristics of an interface agent, in that they will have to develop and maintain a user profile to determine how to best deliver the information that a user needs to reduce information overload. Such an example system is Jasper (Davies and Weeks, 1995), where agents work on behalf of the user and are able to store, retrieve, summarise and inform other agents of information that is useful to them on the WWW. A Jasper agent is able to suggest pages to a user by matching user profiles with other users within the community; a successful match results in a user being informed of WWW pages that their peers find interesting.
Theoretical studies on how agents can utilise the information that they receive from different resources are presented by Levy et al. (Levy et al., 1994) and Gruber (Gruber, 1991b). A more practical application has been presented by Voorhees (Voorhees, 1994), which describes a prototype system called the Information Retrieval Agent that can search for loosely specified articles from differing document repositories.
Believable agents are those types of agents which (Bates, 1994):
Hence, there is a very close relationship between believable agents and computer-generated characters for the cinema, computer games and virtual reality. As Disney (Thomas and Johnston, 1981) put it:
Disney animation makes audiences really believe in ... characters, whose adventures and misfortunes make people laugh--and even cry. There is a special ingredient in our type of animation that produces drawings that appear to think and make decisions and act of their own volition; it is what creates the illusion of life.
The Oz project (Bates, 1992) identifies this `special ingredient' as emotion and is attempting to build broad agents; agents that live in simulated world environments and deal with imprecise and erroneous perceptions, with the ability to respond rapidly and the general inability to fully model the agent-rich world that they inhabit. An Oz World has four primary components; a simulated physical environment, a set of automated (broad) agents which help populate the world, a user interface to allow users to participate in the world and a planner that is concerned with the long-term structure of the user's experience.
The Oz project has developed a number of different types of agents that exhibit varying aspects of emotion and social behaviour (Bates et al., 1992):
Perception is achieved by low-level sensory routines which build a topology of the world in time-indexed fragments. The integrated sensory model maintains the agent's best guess about the physical structure of the whole world and is fed by the raw data of the sensory routines.
Emotion is modelled by Em, which uses the success or failure of Hap goals (see below) to generate different emotional states. For example, the fulfilment of a goal (eating) produces a corresponding happiness emotion; however, emotions fade with time as other emotions take priority.
These components have been connected to form a broad agent architecture called Tok (Bates et al., 1992). Within Tok, an illustration of a sample agent has been developed, named Lyotard who is a simulated house cat which it is hoped could pass for a real cat by possessing emotions (fear, happiness, love, hate and so on), behaviours (eating, playing with a mouse, hiding, biting, etc.) and features (curiosity, contentment, aggression and the like) within an Oz micro-world. For example, when Lyotard gets hungry, a goal is triggered to make him find food; he searches through his integrated sensory model to find all of the places in the past where he has seen food, probably his bowl. If it is empty when he gets there and he sees his owner, then he `meows' hoping to attract their attention. If the owner responds by filling the bowl, then Lyotard feels happiness because his eating goal is fulfilled and gratitude towards the user because they have helped him achieve his goal.
Distributed artificial intelligence agents are collective agents which together sit at the macro (social) level, rather than the micro (agent) level. Distributed artificial intelligence (Bond and Gasser, 1988) looks at how problems at the macro level can be broken down into agents at the micro level and how those agents can be made to co-operate, negotiate and co-ordinate their activities to ensure that the problems are solved efficiently.
Other goals of distributed artificial intelligence research include (Nwana et al., 1996):
Distributed artificial intelligence agent technology is being employed in many real-world situations, for example, air traffic control (Steeb et al., 1988), particle accelerator control (Jennings et al., 1993) and telecommunications network management (Weihmayer and Veltuijsen, 1994). However, a key problem with distributed artificial intelligence is ensuring that problem decomposition and subsequent communication and discussion between communities of agents can take place timely enough to produce useful and achievable results.
This chapter has shown that there is no generally accepted single definition for an agent due to the fact that agents can have a wide range of properties and any one of several, possibly even disjoint, subsets of these properties is sufficient for the system to be regarded as agent-based. For this reason, three different views of agents and their technologies were presented based upon the existing research that has taken place under the umbrella of software agents.
Although useful for classifying, describing and predicting the nature of agents, most technologies that use agents realise that the more attributes an agent possesses, the more complex the task becomes of specifying, designing and implementing that agent. This helps to explain why there has been a general trend over the past 10 years away from artificial intelligence dreams (such as HAL from the film 2001 ) to more realistic areas of actual applicability.
Agents that are useful to the user in everyday activities seem to be the way that agent technology as a whole is moving. It is hoped that by starting in the small with relatively easily specified agents which have limited capabilities and limited intelligence and learning, then the experience gained will begin to show the way of progressing up to computing with agents in the large. This argument is borne out by the development of a general and relatively uncontentious definition of weak agents being put forward by Wooldridge and Jennings; it is a starting place for declaring definitions and standards about agents.
In terms of DIM, agents seem to present an ideal technology with which to integrate; their interoperability makes them useful in opening the wealth of data locked away inside legacy systems, their dynamic and modular nature help them to deal with heterogeneous systems and protocols, and their levels of abstraction present a highly modular construction philosophy for the system designer and a higher-level problem representation for the user.
The next chapter looks at a particular aspect of agency, that of mobility, and investigates the suitability of such a characteristic for agents that are being developed in heterogeneous, large and information-rich environments.
One doesn't discover new lands without consenting to lose sight of the shore for a very long time
-- Andrew Gide
The advent and rapid development of the Internet has shown that there is a great need for systems to communicate and distribute control and data amongst themselves in an open manner. Traditionally, this distribution has taken the form of message passing across networks, where the aim has been to try and make the network as transparent to the application as possible.
However, more recently, there has been a move towards a new type of communication model which treats the network as a resource to be exploited and is not limited by it. This seems to echo the distant claim by Sun Microsystems that:
The network is the computer18.
In this chapter the concept of a mobile agent is introduced; an entity that can move across machines and networks to access the resources that it needs to fulfil its goals. However, it is shown that movement of processes is not a new concept and that mobility in agents has grown out of a number of existing technologies. This development is traced to highlight some of the motivations and advantages behind using mobile agents in large-scale networks. Mobile agents are described by looking at the common characteristics that they possess and the systems that typify them and the chapter ends with a comparison and evaluation of mobile agent systems.
Migration. The migration of a process involves arresting its execution and creating a new process on another processor to continue that execution. Migration itself involves marshalling, transferring and restoring some or all of the program context of a process.
The concept of moving processes around, traditionally termed process migration (Smith, 1988), is not new and has existed since the advent of distributed computing. The key requirements for moving a process in such systems were not the desire of the process to move, but the desire of the distributed operating system to have it moved, mainly to improve load distribution, fault resilience, parallelism and so on. With mobility, it is the mobile agent that decides when and where to move.
Milojicic et al. (Milojicic et al., 1996) comment that although technologies such as process migration and object mobility have been present for some time, they have not become widespread for a number of reasons:
However, the introduction of the Internet culture has had a profound effect upon this in both a technological and sociological way. Its ubiquity and increasing pervasiveness are motivating people who were only would-be participants to adapt to it and become active parties. As Milojicic et al. go on to say:
Instead of a processor pool and workstation model, we have a Web environment with computers connected as interfaces to a "network-as-computer" model. [Agents] are relieved of UNIX functionality, with limited requirements for transparency (user-specific solutions are preferred over general highly-transparent ones); performance is dominated by network latency and therefore [agent] state transfer is not as visible as it is on a local area network; heterogeneity is enabled via languages like Java or TCL/TK.
Current mobile agent systems and applications are sparse due to the fact that it is very much a field in its infancy and that there are no general solutions for applying mobile agent technology to large-scale systems such as the Internet. Yet, a lot of interest is being shown by both researchers and industry in mobile agents and their technologies, as demonstrated by the number of systems and applications that are in development, particularly in the areas of electronic commerce, information retrieval and telecommunications (Green et al., 1997).
Mobile (or transportable) agents are considered to be a direct extension of client-server technology (Gray, 1995b; White, 1996). In the client-server communication model (Renaud, 1993), communicating entities have fixed and well-defined roles; a server offers services and a client makes use of those services (figure 4.1). This model implies a strict sense of dependency: clients are reliant upon servers to provide the services that they require. The communication mechanism that takes place between a client and a server is through a message passing protocol. However, message passing of this form has been criticised for being too low level, requiring programmers to determine network addresses and synchronisation points themselves.
Remote Procedure Call (RPC) (Birrell and Nelson, 1984) attempts to remove the burden of these network details from the programmer by allowing the client to request a service from a server in the same way that it would make a local function call. These services are represented by stubs; template function calls that pass through to the RPC subsystem. The location of the server, initiation of the service and transportation of the results are handled transparently to the client.
Still, a fundamental problem exists with client-server architectures when considering the management of distributed information. If the server does not provide the exact service that the client requires, for example the server only provides low-level services, then the client must make a series of remote calls to obtain the required end service (Gray, 1995). This may result in an overall latency increase and in intermediate information being transmitted across the network which is wasteful and inefficient, especially for large amounts of data. Also, if servers attempt to address this problem by introducing more specialised services, then, as the number of clients grow, so the amount of services required per server becomes infeasible to support.
Other client-server systems include the Common Object Request Broker Architecture (CORBA) (Object Management Group, 1996) which attempts to make the client-server paradigm more accessible by adopting the object-oriented principles of object reuse, inheritance and encapsulation, and the Distributed Computing Environment RPC (Rosenberry et al., 1993) which offers security and authentication facilities and an interface of user-level threads instead of sockets to achieve a higher level of communication abstraction.
Subprogramming alleviates the problem of client-server architectures somewhat by allowing clients to launch subprograms at the node where the service is located (figure 4.2). In this way, any number of requests can be initiated locally and the subprogram can process the intermediate results before transmitting the actual results to the client once it has finished. Example subprogramming systems are Subprogram Parameters with RPC (SUPRA-RPC) (Stoyenko, 1994), Remote Evaluation (REV) (Stamos and Gifford, 1990) and the Network Command Language (NCL) (Falcone, 1987).
Although subprograms can migrate on their initial launch, they cannot subsequently move to other systems or resources. This restricts their ability to communicate with each other since they may not execute for large periods of time and also maintains the rigidity of the client-server relationship. Additionally, subprograms are generally written explicitly for a given client and so their reusability is minimised.
Mobile agents are software processes that are released from the constraints imposed by both the client-server and subprogramming communication models, that is, they are free to move to the actual destination of the required service and they are free to move subsequently after their first migration (figure 4.3).
Chess et al. (Chess et al., 1995) recognises that there are two main driving forces behind the use of mobile agents. The first is in support of two forms of computing where intermittent connectivity and some form of mobility is in evidence:
Teleporting (Bennett et al., 1994) describes the situation where the user moves between different locations and machines and support is required to preserve session content and information access across the network in a transparent fashion.
Mobile computers (Forman and Zahorjan, 1994) describes the situation where the user and their machine move between different locations. Mobile agents are programmed with the tasks that they are to perform and then migrate from the user's mobile computer to the most appropriate locations on the network. Once completed, they wait for the user to reconnect and then transmit their results.
The second is for employing the property of asynchronicity and persistence in performing transactions on information-rich and geographically large networks. Indeed, most of the current mobile agent systems being developed are for the deployment of mobile agents within the Internet, mainly to support activities such as information discovery and retrieval. Chess et al. go on to describe that one of the major future application areas of mobile agents will be for electronic commerce, where each mobile agent is contending for a resource, searching and negotiating for the best offer of a service for their user, and also advertising their own services to compete with their neighbouring service providers.
Harrison et al. (Harrison et al., 1995) identify a number of potentially desirable reasons for adopting a mobile agent stance:
Efficiency. In a system that is bounded by network speed, mobile agents can help to reduce intermediate data transfer by moving across the network to the location where the resource resides. Once there, the agent can issue queries to the resource and preprocess any intermediate data into a final result to be returned.
Persistence. Once a mobile agent is launched, it should not be reliant on the system that launched it and should not be affected if that node fails. The concept of an agent moving between network nodes gives it the ability to `survive' and to reach as many resources as possible. This is useful for mobile computer users due to the fact that they can log on, launch an agent, log off and check back later on its progress.
Peer-to-peer communication. A restriction of the strict client-server paradigm is the inability of servers act as a client to another server. Mobile agents are considered to be peer entities and, as such, can adopt whichever communication stance is most appropriate to their current needs. For example, when a mobile agent is interrogating a resource it takes the role of a client and when another mobile agent wishes to query it then it becomes a server. This allows for great flexibility when dealing with network entities and distributed resources.
Fault tolerance. In a client-server relationship, the state of the transaction is generally spread over the client and the server. In the event of a network or server shutdown during a request, it is difficult for the client to reclaim the situation and re-synchronise with the server because the network connection will have been lost. However, since mobile agents do not need to maintain permanent connections and their state is centralised within themselves, problems are generally easier to deal with.
In their conclusions, Harrison et al. argue that, with the exception of remote real-time control when network latency prevents real-time interaction, there is nothing that can be done with mobile agents that cannot also be achieved with existing technologies. Yet, they summarise that the advantages of mobile agents and the motivations behind their use lies not in their individual merits (such as support for disconnected operation and potentially lower bandwidth requirements), but in the aggregate of these advantages.
Additionally, David Chess19 pointed out that because mobile agents are under the direct control of a remote site, then they can be given access to material which might not be given out to a user over a network connection. For example, an information seller may allow a mobile agent to access the complete contents of documents that are for sale; the mobile agent can subsequently use this information to determine the set of documents which might be of interest to the user. Since the information seller does not want the user to directly peruse the documents before buying them (they might take a copy and not pay for them!), the agent is only allowed to return document identifiers and, say, a ranking indicating the interest to the user.
Counter to these perceived advantages, some would argue that there are severe problems with mobile agents that make them potentially a non-viable option to use in the large. As Nwana et al. (Nwana et al., 1996) notes:
The other main arguments that are being put forward to discourage the use of mobile agents can be summarised as (Wayner, 1995):
Security. How can remote sites avoid being exploited by mobile agents and prevent their integrity being violated? Such breaches of security can include illegal instructions being executed, unauthorised information being accessed and modified, resource monopolisation and so on.
Secrecy. How is the internal representation of a mobile agent maintained and preserved while it is in transit and while it is executing on remote sites? In these two cases, a mobile agent is particularly vulnerable because its contents can possibly be inspected and copied. For data this can be a serious problem especially if it is of a private nature, and the inspection and analysis of an agent's code could possibly be used to exploit it in future transactions or negotiations.
Payment for services. How are mobile agents to pay for the services that they require? This includes ensuring that a mobile agent can actually pay, that payment is effected correctly and that the service paid for is satisfactory to the payee.
However, it should be noted that these issues are not just the exclusive problem of mobile agent systems; these areas will also have to addressed in delivering a commercially viable Internet marketplace if ever electronic commerce is to become trusted and accepted. It is for these reasons that researchers in mobile agent technology recognise the need for developing systems and architectures which can take these factors into account.
Although mobile agent systems may support differing sets of functionalities and can be implemented in different ways, they all possess a similar core philosophy; the provision of an agent being able to move to the locations where the resources it needs reside. The next sections will concentrate on characterising, discussing and evaluating mobile agent systems to illustrate the potential functionality of mobile agents. The linguistic aspects of mobile agent languages are not evaluated since they themselves do not present a framework in which mobile agents can execute and exist; they only provide the basic building blocks and primitives in which to develop mobile agent systems. For a review of mobile agent programming languages the reader is directed to (Nwana et al., 1996b; Green et al., 1997).
The characteristics which follow have been drawn from a study of the mobile agent systems given in section 4.4. They are the most prominent features that researchers are building into their systems to support mobility in agents across networks of machines and resources. They will be used to present a pragmatic view of mobile agents and also to draw evaluations and comparisons between the systems in subsection 4.4.8.
As already stated, mobility is the characteristic that allows agents to move between network nodes, but migration is the function which controls how this transfer is achieved. Although a mobile agent is essentially an executing process, the governing factor that distinguishes it from a normal process (or agent) is that not all of its instructions have to be executed on the same node or even within the same network locale.
Just as a mobile agent is fundamentally different to a process, so too is agent migration different from process migration; the difference lies in who decides where and when to move (Johansen et al., 1995b). In process migration, migration is normally forced upon a process by the distributed operating system, due to resource location, load balancing and other factors; it is generally a complex and intensive operation. With mobile agents, it is the agent that decides when to move and the underlying infrastructure must support and execute this request. Enforced migration can only be effected upon an agent in particular circumstances, for example if the agent is to be terminated for attempting to perform a malicious activity.
When considering mobile agents that are executing as a single process, their program context can comprise a code component (the program itself), a data component (the structures used by the program) and a state component (the state of execution of the process). There are two general approaches to moving agents between network nodes which involves transferring different components of an agent's program context either automatically by the underlying subsystem or manually by the agent programmer (Peine, 1997):
Orthogonal migration transfers the entire program context of an agent to the remote destination and resumes execution from the received context. The code, data and state components representing the agent are marshalled, transferred and restored automatically.
Non-orthogonal migration transfers the program context as a subset of the available components. For example, some non-orthogonal migration systems only deal with the transfer of the code component of an agent automatically and the data and state components must be marshalled, transferred and restored by the agent programmer.
Both approaches have their relative advantages and disadvantages. Orthogonal migration is most desirable, since it makes the action of migration virtually transparent to the mobile agent. However, it requires that the entire program context of the agent can be captured and restarted at the remote site; a non-trivial exercise because all three components must be captured and converted to a format suitable for transport and reuse, with possible conversion for different platforms on despatch or arrival. This is simplest with homogeneous machines, that is, machines which have compatible processors or the same virtual machines, such as provided by Java (Gosling and McGinton, 1995). However, there have been efforts in operating systems to produce object code that is executable across a number of platforms, such as fat binaries, which include multiple object code formats in one executable file. Contrary to this are slim binaries (Franz and Kistler, 1996), which are comprised from an architecture-neutral byte-code that is used to generate platform-specific object code upon demand.
Non-orthogonal migration does not capture the state or data components of the agent and it is the responsibility of the mobile agent programmer to encode it within the data component. The agent is restarted by executing the code component and using the state and data components to restore the agent to its condition before migration occurred. Since the state and data components do not need to be captured or transferred by the mobile agent system, the process of migration is more simplistic and is made easier between heterogeneous systems.
Source code is the original programming language of the mobile agent which has not been translated into an intermediate form. Source code can be transferred between any system which has support for a compiler or interpreter of the language.
Object code is derived by compiling the original source code into a form that is suitable for direct execution. Object code can be executed only on the architecture and operating system that it was compiled for, but there are object code formats which are portable and can be executed (and emulated) across a variety of platforms.
On account of mobile agents being typically loosely coupled processes, they need to communicate with other agents using explicit communication mechanisms provided by the underlying mobile agent system. The different types of mechanisms supported will determine the amount and range of entities with which the agent can interact. It will be able to communicate with other agents implemented in the same mobile agent system by default, but may need additional communication models to correspond with agents written in other systems.
Communication between agents is centred around the concept of an association between two or more agents for the purpose of conveying information, called a connection. Two basic communication patterns can be adopted by corresponding entities (Sloman and Kramer, 1987):
The main difference between the two patterns is that the state of a synchronous communication dialogue is more difficult to salvage in the event of either side encountering a fault. Asynchronous communicating agents do not rely on any state to be retained across a dialogue; an agent must be aware of the current state of all of its communications and be able to determine to which dialogue an incoming message belongs. It does have the advantage that communications from different agents can be interleaved and that a mobile agent can deal with messages in any order it chooses.
The ways of expressing the behaviour of a mobile agent is determined by the source programming language in which an agent is coded; different languages have varying domains of applicability, functionalities and properties. Therefore, programming languages for writing mobile agents can be categorised according to their relationship with a mobile agent system:
Supported languages are those which are completely integrated with the mobile agent system; a mobile agent written in a supported language can take full advantage of the facilities offered by the mobile agent system (such as migration primitives) and can be completely controlled by it. Typically, a supported language is either explicitly written to complement a mobile agent system, or has been heavily augmented from an existing programming language.
Partially-supported languages are those which afford some or no integration with the mobile agent system; a mobile agent written in a partially-supported programming language can only take advantage of those facilities which have been integrated with the mobile agent system. Typically, a partially-supported language is an existing programming language that has been augmented to provide a limited subset functionality (such as communication primitives but not migration primitives, for example).
A mobile agent system may support different combinations of supported and partially-supported programming languages. Multiple language support in a mobile agent system is termed language independence (Peine, 1997) since the mobile agent system is not reliant on a single programming language, but true language independence can only exist where just supported languages are allowed. Partially-supported languages can be difficult to integrate, especially with regard to mobility, if they can only be compiled into object code format or if they integrate closely with operating system. However, their use can be important if the agent needs to take advantage of particular features of the programming language, such as execution speed or specific hardware support.
One of the major criticisms of mobile agents is that the execution environment provided by the mobile agent system can be exploited by a guest agent. For example, agents can monopolise resources, avoid paying for resources, swamp the local host and so on, unless the execution of a mobile agent is constrained in some way. This is a complementary but fundamentally different concept to security. Security is concerned with granting or denying access, constrained execution is concerned with controlling granted access; they are both mechanisms which attempt to maintain the integrity of a site.
The execution of an agent can be constrained by limiting its access to a particular resource or set of resources. A mobile agent that has been security checked can be allocated a set of resource constraints, called allowances, which limit certain aspects of its execution. A resource in this sense can be anything that is present within the host site that is quantifiable, for example, CPU cycles, number of files opened, amount of memory allocated and so on. The awareness of such limitations will certainly affect the manner in which an agent conducts itself, since the exhaustion of a constraint will typically result in the agent being terminated20.
Constrained execution is critical to ensuring that confidence can be built into mobile agent systems and that mobile agents can be controlled on large-scale networks. Moreover, it can form the basis for an electronic commerce environment, where mobile agents are charged not only for the services they employ from other agents, but also for the resources that they use at the host site. The execution of mobile agents is also a matter of concern for the user, because constraints imposed by the user can help to prevent agents from incurring large bills.
The problem of security is endemic to all computer systems and relates to the concept of preserving the privacy and integrity of data on a system which supports multiple users, either locally or over a WAN. Current networked operating systems provide mechanisms for partitioning and keeping users separate (or allowing them to share); a user can only influence the parts of the system and data to which they have access. However, in the world of mobile agents, where a site can offer services not only to its local users but also to remote users, a number of serious security problems can develop.
Unless the execution of mobile agents is closely monitored and their access to local resources restricted, the security of a site can be compromised. Such contraventions include agents accessing and modifying data to which they should not have access and interfering with the execution of other agents and components of the systems. To help ensure that an agent is suitable for execution, the host site can employ a number of security checks:
Authentication involves checking that the agent was sent from a trustworthy site. This can involve asking for the authentication details to be sent from the site where the mobile agent was launched, the site from which the agent last migrated or from an independent third-party. A mobile agent which fails authentication can be rejected from the site or can be allowed to execute as an anonymous agent within a very restricted environment.
Verification entails checking the code component of a mobile agent to ensure that it does not perform any prohibited actions. However, some code cannot be verified until it is executed such as the contents of variables used as a parameter to a statement or function. For this reason, verification normally checks to ensure that the agent does not try to corrupt its execution environment; the component of the mobile agent system in which the agent actually executes that is responsible for managing the agent while it runs. If this remains intact, then the agent should not be able to perform any operations outside of its authorisation and resource constraints. A technique called proof-carrying code (Necula, 1997) allows a host site to determine whether code from an untrusted site is safe to execute. This works by attaching a safety proof to each piece of code to be transferred. From this, the host site can quickly validate this proof and thus ensure that the code is safe to execute. Tampering with either the proof or the code will result in an invalid verification--in the few cases where the proof and the code are modified such that verification still succeeds, the new code is also safe.
Authorisation deals with checking the runtime actions of an agent. When a mobile agent is allowed to execute within a given site, it can be allocated both an allowance (see previous subsection) and a set of access permissions. The former determines how many times a resource can be accessed or how much of a resource can be used, and the latter indicates what type of access the agent can perform. For example, a highly trusted agent may be able to read, write and modify a given resource and have unlimited access to it. On the other hand, an untrusted mobile agent may only be able to read the resource and access it a limited number of times.
The problem of enforcing security is one of policy , that is, finding the balance between the restriction and empowerment of mobile agents. For some sites which contain military or confidential information, security will be of the utmost importance including both its trusted and untrusted agents. Yet, for other sites which offer general public services, security may be necessary but less of an issue.
Host security is relatively easy to deal with by employing the three mechanisms of authentication, verification and authorisation to greater or lesser degrees. However, there is also agent security to consider, namely, how to protect a mobile agent which is in transit or is executing on a remote site. Again, there are a number of security techniques the agent or a mobile agent system can use:
Encryption algorithms such as Pretty Good Privacy (PGP) and others (Kaufman et al., 1995), can help to protect mobile agents from having their contents inspected during travel.
Digital signatures (Tanenbaum, 1996) are used to identify messages such that a receiver can verify the claimed identity of a user, the sender cannot later repudiate the contents of the message and the receiver cannot possibly have concocted the message themselves.
Ultimately, though, once a mobile agent is executing it is at the mercy of the host site because anything that is accessible to the agent is accessible to the host system too. An obvious example of this would be the host site analysing the bargaining strategy of an agent and using this against it.
Security is a very large issue that is not easy to solve in the general case, and contrary to critics, it is not purely the problem of mobile agent systems since downloading Java applets can pose as much of a threat. As the Internet is still maturing, new techniques are being developed to provide added security mechanisms at both the network layer and the application layer. It is more than likely that these measures will be built into mobile agent systems (and other Internet software) as they become available.
With the interest that mobile agents are receiving as a potential tool to assist the user in controlling information overload, a number of mobile agent systems are being developed which, while not directly addressing the task of distributed information management, do help to provide a framework within which such a system could be developed.
The following subsections present a review of a cross-section of the most prevalent systems but not those systems which are still in alpha-stage development, such as the Aglets system (Lange and Mitsuru, 1997). For a recent review of mobile agent systems, languages and applications see (Cugola et al., 1996; Green et al., 1997).
Telescript (White, 1994) is a commercial product developed by General Magic Incorporated to support mobile agents for electronic marketplaces. It was arguably one of the first available mobile agent development environments and consists of three major components: the architecture, the language and the development environment.
The Telescript architecture consists of a number of communicating and cooperating elements (figure 4.4). At the basic level, mobile agents can move between logical areas called places which are managed by a Telescript engine; the outermost place of an engine is called the engine place. Engine places themselves run within a larger context known as regions, which are the connection points for the Telescript network. Thus, a region can consist of multiple engines running multiple places.
When an agent wishes to migrate to a remote destination, it issues a
command with a ticket argument that indicates the required destination and the time by which the trip must be completed, amongst other things. If the migration cannot be completed (if the trip takes too long, for example), then the agent is restarted and must handle the exception.
ing to a place that is across a region, the engine managing the place must authenticate the mobile agent by determining its authority from the receiving engine place. According to the Telescript literature, places can neither withhold nor falsify their authority information but it is not clear how this is prevented. The level of authority required to enter a given place is set by the receiving place, which may involve the exchange of encrypted authority information.
If access is denied, then agents can be routed to a special place called purgatory, where they are allowed to live for a short time in a limited environment. This allows them to recover and select a new migration destination. On the other hand, if access is granted then the agent enters their chosen place and can interact with the resources and other agents it finds there; execution begins with the instruction immediately after the
command. If the place is designated as a meeting place, then two agents can communicate and take advantage of each other's services. The engine managing the meeting place mediates a meeting protocol between two agents, giving each the chance to accept or reject the meeting. Mobile agents that are aware of each other's existence but are not in the same place, can communicate using the
Telescript imposes further security restrictions upon agents while they are executing within places. An agent is granted a permit when it travels between places or regions; the permit details the capabilities of the agent and the agent must agree to the restrictions imposed by the permit before it can enter a given place. When the agent leaves the restrictions are lifted but new ones are imposed by the next receiving place or region.
Not only do permits grant or deny an agent access to particular resources within a place, but they can also limit the amount of usage an agent can make of a resource through the allocation of an allowance. An allowance can be expressed in terms of time, size, computation and so on, and when an agent exceeds its allowance for a resource it is destroyed. Hence, it is the responsibility of the agent programmer to ensure that an agent does not try to access beyond its means. Yet, if an agent imposes temporary permits upon itself it can be notified rather than destroyed.
The Telescript language (White, 1995) is object-oriented in nature, with a process being the highest object within the class hierarchy. The language has classes for integrating with standard programming languages such as C and C++. In this way, Telescript is seen as a wrapping language; the interfaces to the stationary applications are written in C or C++ and the communication interface to the Telescript architecture is written in the Telescript language. The language itself is compiled into byte-code (a form of intermediate code for a virtual machine) for portability and is executed by engines which contain the interpreter and represent the runtime environment.
The Telescript development environment allows Telescript applications and agents to be built. It consists of the language, the engine and program development tools, such as class browsers and debuggers. More recently, Telescript has also introduced the Active Web Tool set which provides classes and tools for integrating Telescript technology into WWW servers.
Telescript is currently being used in the AT&T PersonalLink network for mobile communications and is seen as a potential mobile computer mechanism for Personal Digital Assistants (PDAs). It is also being used by France Telecom to further its Minitel electronic market place.
Agent TCL (Gray, 1995; Gray, 1995b) is a system for supporting transportable (mobile) agents that is being developed at Dartmouth College, USA. The architecture of Agent TCL (figure 4.5) is based upon the server model advocated by Telescript (White, 1994) and the high-level scripting language implementation is centred around an augmented form of the Tool Command Language (TCL) (Ousterhout, 1994).
All services that are available within the system are provided by agents, whether mobile or stationary. At each Agent TCL site, a server resides and handles the management of local agents and incoming mobile agents. The server also provides mechanisms for enforcing security, providing a hierarchical name space in which agents can be referenced and allowing agents to address each other locally.
Agents move between sites by issuing the mobility imperative
. This command packages the program context of the agent and transfers it to a destination site where the server restarts it at the instruction after the
command. The method in which the agent is transported is determined by the transport system advocated by the local site server, for example, TCP/IP, email and so on.
The execution of agents is handled by an interpreter that is appropriate to the source language of the mobile agent. Agents can be written in a language that supports interpretation, but the authors indicate that compiled agents might be possible in a limited capacity. For example, an agent written and compiled in C might be able to execute but not migrate due to its platform dependence. In Agent TCL, the interpreter of TCL was extended to support three extra modules:
Agent TCL uses PGP to authenticate servers and to protect agents and data whilst in transport. However, since there is no automated procedure for distributing PGP public keys each server must possess the keys of all servers from which it might receive agents in advance . Based upon this authentication, a resource manager assigns an agent an appropriate set of access permissions in order to protect resources; builtin resources are directly accessible through the language primitives (such as the system clock, the CPU and such like), but indirect resources are managed and protected through a third-party agent.
Safe TCL (Levy and Ousterhout, 1995) is used to protect builtin resources and provides two interpreters for the language:
However, to prevent agents from being too restricted, Agent TCL uses a modified version of SafeTCL in which dangerous commands are removed and replaced with a link to a secure version. The secure version of each command either uses an access control list to determine whether or not the agent can execute the command (matched against the agent's access permissions), or severely restricts the operation of the original command.
Agents use the
command to initiate a communication with another agent, and
to complete and synchronise the two agents. Additionally, agents can communicate in an asynchronous fashion using the
The Agent TCL architecture has been used in three information retrieval applications. The first is in the domain of technical reports, the second in text-based medical records and the third in three-dimensional drawings of mechanical parts. More recently, the team at Dartmouth College have been developing a number of sample applications to illustrate Agent TCL's suitability for the Internet, such as cooperative information gathering (Gray et al., 1996).
Tromosø and Cornell Moving Agents (TACOMA) (Johansen et al., 1995; Johansen et al., 1995b) is a joint project that is being developed by the University of Tromosø in Norway and Cornell University in the USA, and is primarily concerned with providing operating system support for agents. TACOMA has been through many stages of revision, but the latest prototype uses a version of the TCL scripting language that incorporates the Horus toolkit (van Renesse et al., 1994) to provide group communication and fault tolerance.
The TACOMA architecture (figure 4.6) consists of sites, places and agents; a site represents a collection of computers and resources where agents can interact, and a place represents a potentially restricted part of a site where guest agents can be executed. Agents are considered to be the computational unit of the system and have three potential storage mechanisms:
Filing cabinets are permanent data repositories that can contain folders. Agents can leave information at various sites by placing the data in a folder and then depositing this folder within a filing cabinet. Folders are also an abstraction mechanism, since each TACOMA operation takes a folder (which contains input data) and returns a folder (which contains output data).
Briefcases are containers that agents carry with them and can hold system folders and information folders. The essential difference between filing cabinets and briefcases are that filing cabinets are stationary.
There is only one abstraction for both communicating with other agents and transporting agents between sites: the
operation. When a mobile agent wishes to move to a new site, it issues a
and its briefcase as arguments. An entry indicating where to move to is added to the
folder in the agent's briefcase which is then populated with two additional folders;
contains the current data and state components of the agent before migration and
holds the code component of the agent.
At the receiving site, the guest agent is handled by the
agent which is responsible for managing the local site and for providing the agent with a suitable execution environment. Since an agent can be transferred in source code format, it may need to be compiled before it can be executed. This potentially lengthy task is handled by the
agent which extracts the code component of the agent from the
folder and either executes it immediately, or compiles it first and then executes it. The agent is started in a place (which may or may not be restricted) with its briefcase and must look in the
folder for its data and state components.
To initiate computation
in an RPC-style manner the agent
s with another agent indicating RPC and a briefcase as arguments; the
agent on the remote site launches the target agent through a
agent. When the target agent has completed, the
agent returns the briefcase containing the results back to the invoking agent.
The power of the
operation is the abstraction and extensibility afforded through agents (the particular agent that is met with determines the semantics of the meeting) and briefcases (agents do not need to understand the entire contents of a briefcase to be able to process sections of it).
TACOMA allows agents to charge for the use of their services through the adoption of electronic cash (ECUs) (Chaum, 1992); this also provides a mechanism for controlling the execution of an agent since once an agent has spent all of its units of electronic cash, it cannot access any more resources. When two agents have agreed on the details of a service then ECUs are exchanged through a separate agent, called a validation agent. The validation agent ensures that the paying agent has enough cash to pay for the service and that the ECUs are transferred correctly.
The future of the TACOMA project lies in investigating mechanisms for fault tolerance, for example, by using rear guards. A rear guard is a special agent that is left behind when an agent migrates; it is responsible for launching a new agent should a failure cause the future mobile agent to expire and for terminating itself when it is no longer necessary. As with Telescript, the TACOMA development team have recently been investigating the use of TACOMA agents on the WWW through the use of a WWW browser front-end and a HTTP delivery mechanism
(Johansen et al., 1996). When a user launches an agent, it is transmitted to a TACOMA server where a
CGI-script meets with the local
agent to execute the agent. Once it has finished executing, the
agent returns the results (from the briefcase) in a suitable form to the user's WWW browser.
The Agents for Remote Actions (ARA) system (Peine, 1997; Peine and Stolpmann, 1997) is being developed at the University of Kaiserslautern, Germany. The basic premise of ARA is to add mobility to the existing and well-developed body of programming by integrating as middleware between the operating system and specific applications. The ARA system does not support the use of any single language inherently, but allows multiple programming languages to be integrated into the system core; current support is for TCL and C/C++, but Java is planned next.
The ARA architecture (figure 4.7) centres around the concept of machines, places and service points. A place is a domain of logically related services which are managed under a common security policy and execute on a given machine. A service point is a named instance of an agent offering a particular service to other agents. When a service point is created within a place, the announcing agent becomes the server for that service; other agents that join the service point become clients who can then submit requests to the server agent.
Access limitation to general resources (such as writing to a file, memory allocation and so on) is enforced through the ascription of allowances to agents. An ARA allowance is similar to a Telescript allowance (see section 4.4.1), but differs in that groups of agents may shift allowances between themselves or share a group allowance. There are two types of allowances:
As with Telescript, it is the agent's responsibility to monitor allowance usage since when they exhaust their allowance they are terminated. The use of service points and server agents can protect specific resources that are available at the place, although without an authentication scheme and access control mechanism (see below) it is not clear how this can be achieved effectively.
When an ARA agent wishes to migrate to a remote machine, it issues the command
. This transfers the entire program context of the agent to the remote place, where it is invoked with an interpreter appropriate to its language and a local allowance based upon the security mechanism imposed by the place. When the agent resumes, it can check its local allowance to see to what extent the destination place has honoured its global allowance. Authentication is not currently implemented within ARA, but the envisaged system is based upon digital signatures that use public key cryptography. Additionally, the encryption of agents can help to prevent data being eavesdropped and agents becoming corrupted during migration. Each local place also provides a service for check-pointing in case a mobile agent suffers a terminal failure sometime in the future.
ARA is a relatively new mobile agent system and certain key areas are still being developed, mainly regarding security. However, it is being used to implement a number of practical applications, such as a service for searching through and retrieving USENET news articles: a mobile agent visits each news server to locate interesting articles and adapts its search objective based upon the contents of the articles it has already found.
The Frankfurt Mobile Agent Infrastructure (ffMAIN) (Lingnau et al., 1995; Lingnau et al., 1996) is a project being developed at the Goethe-University in Germany. It is designed to provide a low-level infrastructure to support agent mobility and communication through the use of HTTP as a transportation mechanism.
The architecture (figure 4.8) consists primarily of an agent server, which is a process that executes on every host that can be accessed by agents. Its tasks include accepting mobile agents, creating an appropriate runtime environment for agents to execute within, supervising the execution of agents and terminating agents if required. In addition to this, the agent server must also organise the transfer of mobile agents to other hosts, manage communication between agents and their users, and perform authentication and access validation (although this is currently not implemented).
Mobile agents move between agent servers by issuing the
imperative and are launched from a home server which keeps track of their progress through the network. Agents are transferred as encapsulated MIME documents by
ing them to a special URL that is managed by the receiving agent server. Upon receipt of this agent, the agent server parses the code of the mobile agent and determines whether it is acceptable in terms of a suitable interpreter being available. If the agent can execute on the host, then the agent server launches it within an appropriate runtime environment and forwards a URL to the home server of the agent to inform the user of its new location.
Communication between agents is also achieved through HTTP; each agent server maintains an information space which is accessible to agents via
ting. The agent server mediates access to entries within its information space by checking the requesting agent's credentials against the
header that was supplied with the entry when it was created.
There is no adopting of a language for ffMAIN agents as the aim of the infrastructure is to provide `mechanism, not policy'. However, a prototype has been developed which consists of a customised HTTP server written in Perl and a number of language-dependent modules to provide runtime support for agents written in Safe TCL and Perl.
At present, the framework is incomplete. For example, there is no provision for synchronous communication between agents and there is a painful lack of adequate security measures. However, research into security mechanisms for the WWW is an active area of research and the authors plan to implement solutions as they develop.
Project Mole (Straßer et al., 1996) is a prototype mobile agent system that is being developed by the Institute for Parallel and Distributed Computer Systems (IPVR) within the University of Stuttgart, Germany. Mobile agents are seen as offering a new perspective on distributed object-oriented systems by providing a distributed abstraction layer of security, mobility and communication. The system is programmed in and supports the Java language.
The Mole architecture is given in figure 4.9 and consists of locations and engines. A location manages the agents that are executing within its environment and offers services such as migration, communication (intra- and inter-location) and a form of yellow pages service which includes information about other agents in the location. An engine manages all of the locations executing on one system and is responsible for inter-location communication that spans engines and for providing a class server; a special type of agent that is responsible for obtaining Java classes which are required by any agent executing within the engine.
When a Mole agent decides to migrate between locations, it executes a migrate command (a stop method, in effect) that captures the program context of the agent. However, due to the problems of transferring the state component between heterogeneous platforms, orthogonal migration has been sacrificed in the present version of Mole. The Java Object Serialiser is used to package up the code and data components of the agent and send them across the network. At the destination location, the agent is recreated using the received components and a start method is invoked.
The current release of Project Mole is a prototype and key elements of the architecture are missing which will be addressed in future versions, such as security, versioning and accounting. A number of projects are being developed within the Mole architecture, such as a CORBA-enabled trader (Hohl et al., 1996), a multi-user dungeon and various tools to track and monitor the progress of agents.
Agent Process Interaction Language (APRIL) (McCabe and Clark, 1995; McCabe and Clark, 1996) is both a programming language and an agent support environment that grew out of the ESPRIT project IMAGINE and is now being developed jointly by the NetMedia Laboratory at Fujitsu in Japan and Imperial College London in England. Although not designed specifically for developing mobile agents, both the language and the architecture contain general mechanisms and primitives to support code migration and communication between mobile agents.
Figure 4.10 illustrates the architecture of the APRIL system. Each machine executes one or more communication servers to provide general communication facilities between agents executing within the server environment and between other communication servers. The servers also provide a yellow pages facility for detailing information about agents that are executing within the local server environment; when an agent starts, it registers its presence with the local server and when it terminates it unregisters.
Migration in APRIL is not reduced to a single instruction, but is provided for by the high-order nature of the language. APRIL possesses three abstraction mechanisms that are first-class citizens of the environment; functions, procedures and modules (the latter being collections of functions and procedures). Any combination of these abstractions can comprise an executing agent and the entire code component of an agent can be captured, serialised and sent across the network, and subsequently restarted. An agent's state component is not captured and the data component must be transferred with the migrating agent as an argument. However, an advantage of the APRIL system is that when an agent is compiled into portable byte-code, all names of functions, procedures and modules are removed. This means that agent code can be migrated without the fear of name clashing arising, and code can also be integrated dynamically into an executing process.
In APRIL, all agents are given a unique name that derives from a handle; a composite of the local agent name and the address of the communication server's environment. Two powerful communication mechanisms are provided that allow agents to communicate either within or between server environments:
Peer-to-peer (asynchronously) by sending a message to an agent's handle. This is analogous to emailing a message to an agent since there is no communication set up or tear down for the sending (or receiving) agent to worry about.
Security is handled in three ways by the APRIL subsystem. Firstly, there is a code verifier which inspects all code before it is executed to ensure that it does not corrupt APRIL's runtime environment. Secondly, access to general resources (such as memory or the filing system) can be restricted by assigning resource tanks to an application; a resource tank is similar to an allowance in that they can be attributed to different resources and the exhaustion of any tank results in the agent being destroyed. Thirdly, messages and migrating agents can be encrypted to help prevent their contents being inspected or tampered with during transit.
Other components of the APRIL environment are also available to assist in the development of agents, such as AdB and DialoX. Adb (McCabe, 1996) is a distributed database in which agents can create, access, maintain and share information stored as records. DialoX (McCabe, 1997) is an agent user interface system that allows databases of interfaces to be registered and used through a DialoX server; an interface is specified as a record containing a set of fields (gadgets and data). The agent is responsible for creating these records and the DialoX server is responsible for rendering them. In this way, an agent can be abstracted from the individual details of how interfaces are rendered and user information is collected.
The APRIL language and environment are being used to develop a number of agent service layers that sit above the base language, through the use of APRIL's pattern-based macro capabilities. For example, an object-oriented layer has been developed called April++ (Clark et al., 1996) as well as a system for maintaining and querying records that are expressed in a hybrid of Prolog and SQL syntax, termed AgentQ (Clark and Skarmeas, 1996). Other applications of APRIL include a schedule management system named IntelliDiary (Wada et al., 1996) in which agents plan and negotiate meetings and appointments with other agents.
The systems considered in the previous subsections are representative of current mobile agent systems, and it can be seen that they offer the same levels of functionalities which can be categorised into the characteristic groupings given in section 4.3; migration, communication, language support, constrained execution and security. Table 4.1 provides a comparison of the characteristics and individual features of each of the mobile agent systems described.
Although the first architecture, Telescript, seems to be the most complete mobile agent system in terms of the characteristics identified, the main problem is that it only supports the Telescript language and is not open to researchers to develop. On the other hand, the language was designed for writing mobile agents so migration is well supported and security provision is comprehensive. Yet in spite of this, General Magic has dropped its proprietary implementation of the Telescript language in favour of an implementation in Java. This is freely available and is called Odyssey, but it is not clear if the Telescript architecture will remain to enforce security and provide agent communication or if a more lenient WWW-based approach will be adopted.
TACOMA's data storage model is very flexible and the actual way in which information is transferred between agents and servers (the
operation) is beautifully simple. However, although synchronous communication may be possible through the language primitives, it is not explicitly provided for by the mobile agent system. Additionally, future development of TACOMA will have to address the issue of security (since it is virtually nonexistent), but the provision of electronic cash is a step in the right direction when trying to control mobile agents.
The ARA mobile agent system takes an interesting approach to language support, since C is made migratable through compilation to Mobile Agent Code Environment (MACE) (Stolpmann, 1995) byte-code. Yet, it needs to develop more security mechanisms, such as authentication and access permissions to enable allowances to be effective.
ffMAIN contains some potential problems due to the simplistic nature of the system. For example, by using HTTP as a communication model through a shared information space, it is not clear how mobile agents can pass real-time data between themselves in a consistent manner. The concept of a shared information space where agents can advertise their services is a useful metaphor, and is not too dissimilar to blackboards in allowing agents to meet and exchange data, but should not be the only supported mechanism of communication. Also, although it supports multiple languages, it is not clear how code verification is performed on other migratable languages (other than TCL), nor how constrained execution is to be achieved.
Both Project Mole and APRIL are interesting in that they support a programming language (rather than some of the previous mobile agent systems which support scripting languages), but the Mole system can be criticised for a lack of synchronous communication and execution constraints. APRIL has excellent communication and constrained execution support, but lacks to a non-orthogonal migration mechanism and also needs to develop more infrastructure support in the form of authentication and authorisation mechanisms.
One feature that all of these mobile agent architectures have currently failed to address is in defining a domain of applicability; they all concentrate on the mobility of agents rather than the integration of agents with information resources. General Magic, for example, has been recently criticised by the mobile agent community for concentrating the Telescript language too closely on the migration aspect and not considering integration with the desktop and third-party applications. This puts current mobile agent systems at a lower level of applicability than distributed information management tools require; a layer needs to be developed to bridge this gap.
This chapter has studied the concept of a mobile agent and has shown that the movement of processes between nodes was originally a mechanism employed in distributed operating systems to enhance certain aspects of execution. However, the mobile agent paradigm uses mobility to disengage agents from the restrictions traditionally imposed on such systems and by existing communication models, such as client-server and subprogramming, for programming in a large, loosely coupled distributed system.
Mobile agents can migrate freely between sites on a network according to their requirements; they can move to the site where a resource resides to access it locally and to help reduce the network transfer of intermediate data. Thus, a mobile agent can access a resource efficiently, even if the network conditions are poor or the resource only supports a low-level interface. It is the combination of their sense of interoperability from general agency and mobility from mobile agency which makes mobile agents an attractive solution for both mobile computers, electronic commerce and distributed information management in networked environments.
In another sense, mobile agents can be said to fulfil the criteria of weak agency identified in chapter 3. They need to be autonomous and possess a sense of longevity to be able to operate and survive independently from their user; this notion of autonomy is a combination of the interpretation of the agent's goals and the ability of the agent to move between sites. They need to have a strong social ability to be able to converse with service providers in different manners and using different communication protocols, and they also need to be aware of their environment and changes to it to be able to effect their goals, such as recognising services offered by other agents and the differences between them. Finally, they need to be pro-active and to exploit their sense of mobility to ensure that their goals are achieved within user-defined parameters. There are a number of factors that can affect the potential mobility of an agent, such as security, resource access constraints (migration can be considered as a resource!), language support, resource availability and so on; an agent may need to be able to develop counter-goals when it is prevented from achieving a particular objective.
Mobile agent systems are the environments which support agents, both in a stationary and a mobile capacity. It can be seen from the survey that the thrust of current mobile agent system research is not at the application level of mobile agents, but at the development and provision of mobility and communication within a secure and heterogeneous environment. Furthermore, although current mobile agent systems do not provide the solution to distributed information management inherently in their designs, they can provide a framework upon which such a solution could be developed.
The next chapter describes the requirements for a mobile agent architecture that can support distributed resource management at a higher level of agent service provision than afforded by current mobile agent systems. It is hoped that the future development of this architecture will provide the integration of distributed information resources and third-party applications through the use of an extensible and flexible mobile agent subsystem.
The first step is to decide what Internet services users need to access and limit their access to those services
-- Joseph Jesson
A key element in a DIM environment is the concept of an information resource21 whether that be a filing system, a database, a user's email account or whatever. Agents can offer a level of abstraction on new or existing resources through rewriting, transduction or wrapping, as identified by Genesereth and Ketchpel (Genesereth and Ketchpel, 1994) and as discussed in chapter 3. Mobile agents can travel between sites taking advantage of the services that they encounter without being explicitly aware or concerned about the underlying protocols or access mechanisms of an individual resource. In effect, they can offer a level of abstraction for the user from the details of their distributed information resources. White (White, 1994b) describes mobile agents as an ideal technology for making networks open platforms for third-party developers since they have to contend with many heterogeneous systems.
The last chapter has demonstrated that current mobile agent systems do not provide a direct solution for addressing DIM since by their very nature they are application-independent; they provide general mechanisms for communication, migration and security without imposing a specific policy or direction on their use. This chapter proposes that direct support for DIM should be built and tailored on top of a mobile agent system and describes the development and implementation of such an architecture.
The chapter begins with a description of the requirements, functionality and interactions of each agent within the architecture that is independent of any specific mobile agent system. Then, the specific services that have been used to implement these architectural agents are detailed which have been built on top of a chosen mobile agent system reviewed from chapter 4.
Although implying emphasis through the use of the term `mobile', the architecture (Dale and DeRoure, 1997) contains both mobile and stationary agents, each according to their function and task. Stationary agents are anchored within a networked environment to provide support and services, but mobile agents can move between networks to take advantage of those services and possibly offer services of their own. The design of the architecture was driven primarily by the following considerations:
Distribution. The architecture should permit resources to reside over geographically wide areas in which stationary agents provide access to resources and mobile agents travel between machines to help afford resource management and location transparency.
Separation. The functionality of the architecture should be distinct from the applications sitting above it. As mentioned in chapter 2, this involves separating the hypermedia functionality from the communication infrastructure.
To help achieve these considerations, all functional components within the architecture are embodied and abstracted through agents. By breaking down the functionality of the architecture into clearly defined roles which are embodied in the tasks that agents can perform, the management of agents and resources can be expressed in terms of agent interaction. Moreover, it allows the architecture to be modular and extensible in its design and also facilitates the development of common communication mechanisms and data formats to help deliver interoperability.
Agents, resources, machines and users can be grouped into logical collectives that provide standard environments called domains (figure 5.1). A domain makes and enforces decisions on policy to which each agent must adhere, such as security, migration, communication and constrained execution. Policies may change across domains and it is the duty of a migrating agent to determine if the policies are compatible with its objectives. A gateway domain is a logical grouping of domains and other gateway domains into a hierarchical structure and name space. Gateway domains allow further policies to be imposed upon migrating agents, which may be more lenient or more strict than those determined by the constituent domains.
The next subsections describe the design of the architecture in terms of the roles of each architectural agent, their place within the architecture, the functionality that they should possess and the relationships that they can have with other agents.
A domain agent (figure 5.2) is a stationary agent that supervises certain activities which can occur within a domain. As such, it has a number of duties and responsibilities to its resources, users and agents:
The domain agent is the key force for control and management within the domain and provides a number of core functions associated with a mobile agent system, such as security, migration and access permission allocation. It also provides an environment where agents can meet and communicate within the bounds and limitations imposed by the domain.
Resource agents (figure 5.3) are stationary agents that exist within a domain to provide a level of abstraction between a resource and mobile agents. The purpose of a resource agent is to mediate access to a particular resource for a mobile agent; the resource agent understands how to access the resource and also understands the permission structures associated with the resource. As such, a resource agent has a number of functions:
The question of system integration highlighted in chapter 2 can begin to be addressed through resource agents. The integration policy of rewriting, transduction or wrapping is reliant upon the functionality and extensibility of the legacy system in question. Yet, depending upon how a resource is integrated, its actual functionality can reside in a number of different places (figure 5.3):
(i) Within the legacy system where the resource agent provides a conversion service between the offered functionality of the resource and its data format. All interaction with the resource must occur through the resource process which manages the resource. It is difficult for the resource agent to augment the functionality of the legacy system, unless it can integrate directly as a new resource process.
(ii) Within the resource agent where the resource has been specifically written (or rewritten) to integrate with the communication protocols and data formats of the architecture. The resource agent has total control over access to and the functionality of the data within the resource.
(iii) Within the legacy system and the resource agent where they both share access to the data of the resource. The resource agent has little control over the functionality of the resource since it is limited by the need to be compliant with a specific data format and the existing functionality of the resource processes also managing the resource. However, this form of integration can be useful where the user wishes to maintain manual management of their information, but still requires that it be available to their mobile agents. An example is electronic mail where the user manages the filtering of mail through their favourite email application, but makes the data of their email available through a resource agent.
The interaction between resource agents and mobile agents forms the crux of the DIM aspect of the architecture. The flexibility that mobile agents are afforded in accessing these resources and the manner in which they interpret the results will determine their usefulness to the user. However, it is not envisaged that resources are just distributed information systems; they can be any system which presents an external interface through which they communicate or can be accessed. This form of legacy system integration allows resource agents to be developed for any type of resource: electronic mail, USENET news, databases, the WWW and so on.
Mobile agents (figure 5.4), as their name suggests, are the components within the architecture which can migrate between domains. They are the mechanism by which the user exercises control over their own distributed information resources and gains access to other shared resources, through the relevant resource agents.
Initially, a user (or user interface agent--see next subsection) launches a mobile agent within a given domain, called the host domain. Mobile agents are equipped with a set of objectives specific to the user's task that describe the nature and limits of their functionality, for example, a resource discovery agent is a mobile agent with different objectives to a navigation assistant agent. In addition to the limits on functionality that users place on their mobile agents, it is probable that the mobile agents themselves will encounter other restrictions that exist within domains, such as authentication checks, access permissions and execution constraints. In some cases, these limits will compromise the objectives that have been given to the mobile agent, for example, lack of funds to pay for resources.
The essential functions of mobile agents will be defined by the specific DIM tasks that they are allocated (discussed in chapter 6). However, a mobile agent can have the following interactions with the architecture:
Mobile agents are the ultimate effector of change within a DIM environment and as such need to be treated with caution. The possibility for them to make changes or gain access to private information must be closely monitored and approved by the domain agent and the individual resource agents. Comprehensive authentication, access permission and constrained execution mechanisms may be necessary to ensure that faulty or malicious agents can do no harm to a system and that legitimate mobile agents can fulfil their objectives.
The user interface agent (figure 5.5) is an agent that resides within a domain and provides a level of abstraction for the user away from the details of the mobile agent architecture. It is essentially an interface agent (see chapter 3) that is capable of the following tasks:
User interface agents provide the user with a window onto their agents, their status, their results and the mobile agent architecture. The creation of new DIM agents could either be the task of the user to write (or reuse from existing templates with different parameters), or the function of the user interface agent to generate automatically, based upon the interpreted requirements of the user. The latter approach is most desirable since the user interface agent can then take on more of the user's burden by automating certain tasks given by the user and by anticipating other tasks in advance.
A gateway agent (figure 5.6) is a stationary agent that provides entry to and exit from domains. It allows a number of domains which are perhaps related by geographic, commercial or other commonalities to be treated as a single domain, called a gateway domain. A gateway agent offers the following services to the domains within its locale:
Gateway agents are the ideal place at which corporations, authorities and countries can ensure that local ordinances are observed within their domains. Some organisations might need special security precautions to protect confidential information, such as patient records. They also provide a mechanism for ensuring that undesirable material does not enter the gateway domain and that restricted information does not leave the gateway domain.
Through the developed hierarchy of gateway domains, gateway agents add scalability to the system since messages that are for agents within a given domain only travel within that domain. The burden of dealing with messages to destinations unknown can be spread throughout the hierarchy, thus alleviating the pressure on individual gateway and domain agents.
The functional description of the architectural agents given previously has shown that comprehensive support is required to allow them to offer an environment in which agents for DIM activities can operate. For this reason, development of the mobile agent architecture has been split into a number of distinct layers (see figure 5.7) which provide differing aspects of functionality (Dale and DeRoure, 1997). The advantages of such an approach are motivated from two different perspectives:
For the architecture designer a layered model facilitates the development of each layer in parallel by clearly identifying the interface between adjacent layers. The code of each layer relies upon the services provided by the layers below but is independent of its actual implementation.
For the agent application developer a layered model offers abstraction and a wide set of primitives within which a varied range of activities can be expressed. Additionally, although not directly tackling the question of interoperability between agent systems, it is a starting point from which agents can communicate and share information with each other.
Extensible language support. Although the APRIL language is the only supported language, it offers all of the features found in a modern programming language and is extensible through its macro-processing facilities. Like Java, APRIL is based around a virtual machine which allows the details of the underlying operating system to be abstracted from the executing agents and also facilitates portability between different systems and platforms.
Agent naming and communication. Agents are assumed to execute within an Internet environment and, as such, are given unique names (called handles) by the APRIL subsystem. These handles allow all message deliveries and network communication to be handled independently and transparently from the agent by the APRIL communication servers. Agents may also adopt specific names that they can publish across the Internet (in a similar fashion to URLs) which are made unique by the location of their communication server. Communication between agents is achieved by sending a message asynchronously to the handle of another agent in a manner analogous to email.
Provision for verification and constrained execution. APRIL allows byte-code to be transferred between agents, but before any byte-code is executed it is verified by the APRIL subsystem to ensure that is does not try to corrupt the runtime environment. Agents can also be constrained in the amount of system resources (memory and processor time) that they can use, the sections of a filing system they can access and the messages that they can send and receive.
Flexible code migration. Although APRIL only supports non-orthogonal migration, this results in great flexibility when allowing an agent or portions of an agent to migrate to remote destinations. Additionally, since APRIL strips all function and procedure names upon compilation, code can be included into an agent while it is executing. This feature allows agents to be programmed (and reprogrammed) dynamically across the Internet.
The use of APRIL as the base layer of the architecture provides the layers above with important and critical features that can be passed through to higher-level agents. Specifically, these include network communication abstraction, portability between platforms and an agent programming language. These advantages and those identified earlier form the basic set of services upon which the architectural agents have been built. These services are described in the next subsections.
To help maintain the abstraction afforded by agents, communication between agents is achieved in the general case through asynchronous message passing (in the specific case, agents can communicate through synchronous data exchange). Agents become aware of the presence of other agents either by explicitly being told about them (by other agents or the user) or by interrogating the registration database of a domain agent. All agents are addressed using APRIL handles and when an agent wishes to communicate it issues a send statement of the form:
defines the context of the message (see subsection 5.3.2),
contains the data of the message and
is the handle of the destination agent. The send primitive
causes the message to be handled by the APRIL communication subsystem in an asynchronous fashion. Yet, even though APRIL allows the body of a message to be any data type (even functions and procedures), we introduce a new abstract data type for a number of reasons:
The Microcosm open hypermedia system (reviewed in chapter 2) uses a flexible message format (Heath, 1992) for expressing the content of messages which allows free-form data to be encoded in a text string as a series of name/value tuples. The major advantage here is that values (called tag bodies) are indexed by their associated names (called tags) and can be retrieved individually from within a message. This means that a process manipulates only the parts of a message in which it is interested, instead of having to process the entire contents. Additionally, although there is a core set of defined tags, new tags can be created dynamically and incorporated into a message to allow communication between processes to be extensible. Microcosm messages are similar in nature to TACOMA briefcases (Johansen et al., 1995b).
However, there are a number of restrictions and problems associated with standard Microcosm messages when considering truly flexible message formats (highlighted by Beitner (Beitner, 1995)):
Support only for `flat' messages. All tags (and their associated tag bodies) exist at the same level within a message which means that a message cannot possess an internal structure and cannot contain duplicate tag names. To help alleviate this somewhat, messages are allowed to be incorporated inside other messages as a special case, but this then means that hierarchies of messages cannot be handled reflexively and that the mechanism of their storage and retrieval must be handled by the programmer.
No type encoding of values. Since the entire contents of a message is a string, the tag bodies which represent the data have to be converted into a string format and any associated typing information is lost. For scalar types where the type of a value is easily inferred from the format of the string this is not much of a problem. However, when considering compound or user-defined data types the question of reconstructing a data value from its string format can become difficult.
To help address these problems, but to retain the flexibility of the original format, hierarchical messages have been developed as part of this thesis. A hierarchical message is essentially a tree abstract data type where each node in the structure is itself a hierarchical message comprised of tag and tag body tuples. A sample hierarchical message structure can be appreciated in figure 5.8, and the Extended Bakus-Naur Form (EBNF) description of a hierarchical message can be defined as:
A set of library functions, called the hierarchical messages library (HMS), allows hierarchical messages to be created, manipulated, accessed and destroyed. Paths provide access to specific tag and tag body tuples within a message and are constructed from a series of tag names originating from the root of the hierarchical message. Since the path to a given tuple may not be known in advance, a searching mechanism has been developed that can search for either a specific tag name, a tag body or both a tag name and a tag body. The result will yield a set of paths (which are relative to the initial search path) leading to the matched tuples. These paths can then be used to retrieve individual tag or tag body values.
Since the context of a tuple can be inferred from its path, tags of the same name can be used throughout a hierarchical message. This means that an agent can interrogate the contents of a message by query ; it can search for a given tag from anywhere within a hierarchical message and subsequently inspect the resultant paths to determine which are useful. It also affords an agent some abstraction from the actual structure of a given hierarchical message.
The type representation of a hierarchical message and the data that it contains is handled and preserved by the APRIL subsystem. When a hierarchical message is sent to another agent, its structure is serialised and converted into an independent data representation format (detailed in (McCabe and Clark, 1996)) that is sent across the network. The APRIL subsystem of the receiving agent is responsible for reconstructing this format into the corresponding hierarchical message.
Hierarchical messages are more free-form and flexible than traditional Microcosm messages and TACOMA briefcases since they allow an agent to express information in arbitrarily organised structures. Also, the type representation of each data value is preserved across agents and hierarchical messages can be queried to retrieve information that is of a potentially similar nature. The application of hierarchical messages as a communication mechanism between agents and the internal representation of an agent will be described in the next two subsections.
We introduce the term typed message to indicate a form of communication message that can be exchanged between agents where the type of the message implies a specific context to the contents of a message and also carries an associated processing connotation. Each typed message belongs to a container class and table 5.1 represents the major classes and component typed messages that have been implemented in the mobile agent architecture. Typed messages are sent between agents using APRIL handles and the asynchronous communication mechanism (
), and comprise a message type and message body tuple. A message type is a keyword such as those detailed in table 5.1 and the message body is a hierarchical message (see the previous subsection). Each typed message also has a specified core input and output interface represented by tag and tag body tuples within the hierarchical message. For example, a valid
message which represents a simple WWW
operation would be composed of the following fields:
A diagrammatic version of this message is given in figure 5.9. The
branch contains the name of the resource under which the same agent had previously registered the WWW resource (by sending an
message to the domain agent). The two
fields are differentiated between by their position in the hierarchy; if a search was made from the root node of the message for all
tags then two paths would be returned,
representing the resource name and the service name respectively. An agent processing a typed message can retrieve individual fields directly through a set of macro definitions which help to offer the agent a level of indirection should the structure of a particular message change. For example, a
is defined to be the path
and thus an
message is only interested in the fields
and so on. Extra fields can be added into a message to augment the functionality of a typed message without interfering with its operation, providing that they do not conflict with the existing interface.
Each typed message also has two associated messages to indicate whether processing of the message succeeded or failed. To enable agents to keep track of their asynchronous messages, each adds a timestamp24 to an outgoing message and when a receiving agent has processed the message, it returns an ok or failed message type. For example,
respectively can be returned when adding a service registration; the body of this message contains the timestamp of the original message and a
branch if it was processed successfully or a
branch if it failed. The
branch contains the specified output interface (it is empty if there are no return values) or the
branch contains a failure code and textual description. In either case, the timestamp of the original message can be used by the originating sending agent to update the status of their messages.
Typed messages are a flexible way of passing structured data between agents and their extensibility and dynamism are directly inherited from hierarchical messages. The set of typed messages that an agent supports will form the functionality that it can express within the mobile agent architecture and subsection 5.3.4 goes on to describe how the processing of messages can also be handled in an extensible, flexible and dynamic way.
As already mentioned, a set of macros allows agents to indirect the location of a field from its path within a message but there is also a mechanism that allows agents to abstract the required fields of a particular message type. A standard nomenclature has been used in defining the core message types and their contents across agents, specified in EBNF. An example definition of the
typed message is:
TypedMessage ::= <AddServiceMessage>
AddServiceInputs ::= <Resource> <Service>
AddServiceOutputs ::= <AddServiceOk> | <AddServiceFail>
<Resource> ::= <Name> <Aliases> [<AgentDetail> ...]
<Service> ::= <Name> <Aliases> <InputInterface>
In this way, when an agent needs to build a message of a particular type it consults the database of production rules (held locally on each machine within a domain) and follows the chain of terms, building a hierarchical message as each production rule is expanded. Base level non-terminals such as
comprise the terminal symbols of the message, that is, the tag body values, repetition (
) indicates multiple branches or leaves at the same level within a message, and choice and optionality (
) signify that whole branches can be omitted. Again, this permits an agent to be aware of what a message should contain and how it can be built without having to understand its exact structure.
This EBNF database not only shows agents how to construct communication messages, but also helps in determining how an agent can structure its internal data, called its knowledge base25. Akin to a typed message, the knowledge base of an agent is formed from a hierarchical message of tag and tag body tuples. The database starts from a root specification of
and describes a core set of branches organised by category (figure 5.10):
Information. Holds all of the identifying information about the agent (such as its name, owner and host domain), a list of message types to pre-load on migration (see the next subsection), a list of domains to visit, monetary details (such as its balance) and information pertinent to the current domain.
Contains information about any activities in which the agent is involved that are ongoing, such as registrations made with the local domain agent, messages that have been sent to other agents which are awaiting a reply, any clones that the agent has launched and any local tasks that the agent has invoked. Each message within the
branch contains a timestamp which indicates when the message was sent and a status field which indicates the current state of the message. For example, a message in the
branch can either be marked as
. The first two states indicate that the registration or unregistration message is in transit, the second two states confirm that the message was successfully registered and unregistered, and the final two states show that an action failed and that the message can be archived to the
branch respectively. By using the timestamps associated with messages, agents can match incoming messages with previously sent messages and alter their status field accordingly. This allows agents to handle asynchronous messages flexibly, since they can process the entire
branch to determine the state of all of their communications; overdue or failed messages can be resent and successfully completed messages can be moved to the
Registrations. Shows all of the registrations that an agent has made with other agents. At the moment only request and service registrations for resources are supported, but there is no reason why other types could not be incorporated, for example a notification registration where an agent registers to be alerted when a change occurs to a particular object. However, resource registrations are not allowed to be made or kept across domains since it is the purpose of the architecture to help reduce intermediate network traffic by moving mobile agents to the domain of individual resources.
History. This is an archive area and can be used to hold information relating to the past actions or experiences of an agent, such as visited domains, old outstanding messages (which are outstanding no more!), logging information and any completed transactions. The latter information can be useful in building an `experience' pool where all agents can submit the results of their transactions with other agents. Such a mechanism can be considered necessary to support a society of socially-aware agents as described by Rasmusson et al. (Rasmusson et al., 1997), where agents acquire a reputation depending upon their past transactions with other agents. In this way, reliance on the goodwill of the sender is minimised but security control is reactive; an agent on its first transaction will have a neutral social status which may enable it to deceive other agents and fraudulently obtain services.
Data. This is currently unspecified and left to the discretion of the agent. Principally, this branch should store gathered and processed information that can be returned as soon as possible to the user interface agent within the host domain. It may also store more long-term information that the agent may collect as it moves between domains, such as working data.
Objectives. This is currently unspecified and left to the discretion of the agent. It is used to store necessary information that the agent may need in determining what tasks it has to undertake and how to achieve them. The objectives of an agent and the method of specifying them will be detailed in the next chapter.
As with typed messages, the path to specific values within an agent's knowledge base are indirected through a set of macros; these are built directly from the EBNF database. For instance, the identifying details of an agent, such as its name, host domain and owner information can be extracted from the location pointed to by the macro
Adopting a standard structure for an agent's knowledge base facilitates easier exchange of general information. For example, agents can trade identification information by transferring the
branch of their knowledge base rather than having to process this into an independent transfer format; the receiving agent knows how to extract information from this since it uses the same structuring conventions itself. The extensible nature of the EBNF database allows agent programmers to extend this standard structure and even to change it (within certain limitations). Additions to the structure only require that the new database is made available to all agents but modifications to the existing structure requires that each agent make a manual alteration to their knowledge base.
The type of a message not only adds context to its message body, but also determines how the message will be handled by the receiving agent. Normally, when a message of a known type is received an internal function is called to process it and return the results (if any) to an appropriate agent, which may or may not be the original sending agent. However, if the message is of a type that is not recognised then it can be resolved either by:
The first solution is not acceptable if agents are to be able to cope with new message types and potentially new situations. The second approach can be useful if the arbitrator possesses some qualities which make it ideally suited to processing the message, such as speed, resource access and so on. However, if this is not the case then it can incur network latency, especially if the agent is required to ask the arbitrator to process many messages of the same type. The third solution allows the agent to handle the message itself dynamically if the architecture can provide for it; if not, then it may need to pass it on to an arbitrator or reject the message.
To allow agents to handle new message types in a flexible way, the architecture supports dynamically programmable agents (figure 5.11). Initially, an agent can only process a core set of message types and when it encounters a message type that it cannot deal with internally, it requests a handler function from the domain agent (i). To minimise the complexity of this operation to the agent, the original message data of the unknown message type is also sent to the domain agent. If other messages of the same unknown type arrive at the agent while the relevant handler function is being located, then these are also forwarded to the domain agent. These messages are queued to preserve their arrival order (ii).
The domain agent checks the received unknown message type against a database of all dynamic message types of which it is aware. If no match is found, then the architecture cannot help in dealing with that particular message type and a
message is returned to the originating agent. Conversely, if a match is found then the domain agent asks the
26 to invoke the appropriate module handler (iii), (iv). A module handler is an executable process that contains a handler function which embodies the necessary code to deal with a particular message type. When the module handler has started, the domain agent sends it a
message (v) which causes the module handler to create a closure of its internal handler function (vi). This function closure is transferred as serialised byte-code to the domain agent (vii).
All of the queued messages for the message type relating to the delivered handler function are packaged into an
message with the appropriate handler function code (viii), which is sent back to the originating agent (ix). The originating agent integrates this code into its event-handling mechanism and also extracts the outstanding messages which are subsequently processed by the newly installed handler function (x).
The abstraction offered by handler functions is preserved by the fact that they take and give a hierarchical message as an input and output argument; the actual data is represented as tag and tag body tuples within the hierarchical messages. The use of hierarchical messages allows the interface of a handler function to be full or partially specified and the inclusion or omission of fields from the input hierarchical message can alter the effect of the handler function. Moreover, with a variably defined interface the functionality of handler functions can be dynamically extended without affecting its existing use. Agents which are aware of the new functionality can use the extended set of fields and agents which are unaware continue to use the existing set.
One of the main advantages of handler functions is their dynamism; so long as the core input and output interface remains consistent they can be updated and debugged in-line. To help program and reprogram agents, module handlers can be invoked in one of two ways:
The module handler executes but waits for a
message before generating and returning a closure of the handler function to the sender. The domain agent invokes the module handler in manual mode when dealing with an unknown message type on behalf of an agent (as detailed previously).
Automatically. The module handler executes and immediately creates a closure of the handler function which it always forwards it to the local domain agent. This is how handler functions which have been updated or modified are distributed to agents that have previously requested and subsequently installed them. Once a handler function has been altered and recompiled, the programmer executes the module handler in automatic mode which forwards the byte-code of the new handler function to the domain agent. The domain agent then sends a message to each agent that has previously requested and installed that particular handler function telling them that it has changed. If they wish to update it, they request the new copy from the domain agent which will overwrite their existing version.
Dynamic programming has a bigger advantage for mobile agents since handler functions can be both added and removed. One of the potential problems which face mobile agents is that an increase in functionality often results in an increase in overall size; if mobile agents are equipped with too much static functionality they may become too large to move between domains. By taking advantage of dynamic programming, when a mobile agent is ready to move to a new destination it can remove all of its unnecessary handler functions to ensure that it is as small as possible. Once executing within the remote domain, it can re-install them as and when they are needed. This allows libraries of handler functions to be developed and pre-distributed across domains, so that a mobile agent can remove them without fear of not being able to install them at the remote domain. In the very worst case, if the handler functions required by a newly migrated mobile agent are not present within the domain, then they can be retrieved from the mobile agent's previous domain.
Of course, there are situations where a mobile agent will not want to allow certain handler functions to be removed or overwritten. For example, an agent that supports dynamic programming
functions installed before they can add or remove other handler functions! In this way, the handler functions of an agent are split into those intrinsically compiled into the agent at development time (static) and those that can be added while the agent is executing (dynamic). Static handler functions cannot be removed from the agent or overwritten by dynamic handler functions which allows agent programmers to equip an agent with a core functionality that is available at all times.
Agents become dynamically extensible. The functionality of an agent can be extended dynamically due to the fact that the execution of the agent does not have to be stopped while code is added and the agent is then recompiled. Also, the exact functionality of an agent does not have to be determined at compile-time.
The size of an agent is minimised. Mobile agents can be kept as small as possible by removing all unnecessary handler functions when they migrate and removing unused ones periodically when they execute within a domain.
Standard libraries of functions can be developed. The core functions of an agent can be standardised and made available across domains. This means that a mobile agent does not have to worry about removing handler functions when it migrates.
The maintenance of functions is easier. Since handler functions are retrieved from the domain agent, they can be debugged and updated dynamically. Also, there is one set of code for all agents, which means that duplicate functionality does not have to be supported across multiple agents. The automatic distribution of handler functions by the domain agent allows agents to be automatically informed when new versions become available.
The need for the dynamic programmability of agents originally stemmed from the fact that each architectural agent was containing the same sets of handler functions; if a change occurred in one agent, then it had to be changed in all of the other agents as well! Moreover, as the functionality of an agent grew, so did its code size. Dynamic handler functions allowed the code of the architectural agents to be both managed more easily and to be kept smaller, since they are only installed as they are needed. However, an agent can specify a list of dynamic handler functions that should be loaded automatically when it executes, detailed in the
branch of its knowledge base.
When combined with typed messages, handler functions act in a similar way to active messages (von Eicken et al., 1992). The fundamental difference between the two is that active messages were designed to increase the efficiency of message handling on multi-processor architectures. The speed of active message derives from the fact that the network is perceived as a pipeline in which messages cannot be buffered; buffer management introduces latency costs and so the sender must block until a handler can be invoked on a suitable processor node. The handlers for active messages are not inherently dynamic as are typed message handler function, since the operation of locating, retrieving and installing a new active message handler would introduce more latency. This thesis has taken the stance that the extensibility, dynamism and asynchronicity in handling messages that is afforded by typed message handler functions are more useful to mobile agents than the speed of handling the execution of such handlers.
Within the context of the mobile agent architecture, a resource is an information repository that is protected and abstracted through a resource agent. Typically, a resource agent will offer services which mobile agents use to manipulate the respective resource. By the same token, a mobile agent can be thought of as an agent which has a set of requests that can be fulfilled by compatible services. However, a resource agent is not limited to just offering services nor is a mobile agent limited to just having requests fulfilled. Mobile agents themselves may also offer services to other agents and resource agents may have requests that they need to be fulfilled.
The architecture provides support which allows agents to find services that match their requests and to find requests that match their services. This is achieved by allowing each agent to register their interests (that is, services or requests) with the domain agent to form a registration database. A number of strategies can then be developed for interrogating this database:
Active request matching, passive service matching. The request provider consults the domain agent to determine which services in the domain it can take advantage of; it initiates contact with the agents it considers suitable. This is analogous to a yellow-pages system where service providers (the advertisers) register their service with the domain agent and matching services attract enquiring requests (the interested public).
Active service matching, passive request matching. The service provider consults the domain agent to determine which requests in the domain it can fulfil; it initiates contact with matching agents. This is somewhat similar to a targeted mail-shot, where the service provider (the advertiser) tenders an offer to fulfil a request with parties (the interested public).
Active request and service matching. Both service and request providers constantly consult the domain agent to find their own service or request matches. This resembles a notice board, where service and request providers advertise and at the same time check other advertisements.
Brokering. A third-party, such as the domain agent, which is called a broker (Genesereth and Ketchpel, 1994) attempts to match registered service providers with registered request providers and vice-versa. This is analogous to a recruitment agency where the request providers (the job-seekers) submit to the agency and the agency seeks to match them with the service providers (the employers).
All registrations are stored within the knowledge base of the domain agent and are organised firstly by resource type, then by individual agent and finally by service or request (see figure 5.12). Each resource registration contains the name of the resource, a list of aliases by which it is also known and a number of agent detail registrations. An agent detail registration represents the individual registrations of an agent for a particular resource and contains the name of the agent, its contact address and a set of service and request entries. Agents can register multiple services and requests which each consist of the name of the service or request, a list of aliases by which it is also known, the input and output fields and any associated cost or remuneration. Agents may also add extra fields within a service or request entry that contain more information which may be of use to enquiring agents.
The domain agent allows the registration database to be queried by other agents in a flexible manner through the
handler function--this process is called matchmaking (Decker et al., 1996). A query request is a message of the type
which contains search term tuples at the root level. For example, a query of the form:
Would return a list of all agents which had registered a
service for the WWW resource at a consideration of 100 or less. Queries can be made more specific by adding additional fields or more general by omitting certain fields. The following example returns a list of all agents that have registered services
requests for the FTP resource:
The power of the registration mechanism derives from the flexible way in which registrations can be extracted from the domain agent and matches subsequently pursued. This allows for a rich set of interactions between agents depending upon the interrogation mechanism adopted. Moreover, an agent may adopt an active or passive stance according to the nature of the agents around it. For example, high competition between service providers will force them to become active in seeking out other agents with matching requests. The reverse holds true, too, where an agent that has a monopoly on a required service can afford to be passive while ever demand holds.
The third application for resource advertising is in helping to attract mobile agents into the domain. This is achieved by allowing mobile agents to interrogate domains before they migrate, to help find one that offers services and requests for resources that match its objectives. However, once the mobile agent has decided which domain to move to, it must migrate there before it can access any of the resources. In this way, agents are forced to move closer to the information to help reduce intermediate network communication.
The policy of migration within a domain is determined and enforced by the domain agent. As stated in section 5.2.2, it is the duty of the domain agent to allow mobile agents to migrate into and migrate out of the domain, where security restrictions permit. It is also the task of the domain agent to ensure that the migration of a mobile agent is completed successfully and to help it recover in the event of a failure.
field indicates the APRIL handle of the domain agent where the mobile agent wishes to migrate.
is the byte-code which represents a closure of the highest level function of the agent. In APRIL terms, this is a lambda function abstraction which serialises the function (and all of its sub-functions, procedures and modules) into an independent representation for transfer across the network. Since a closure does not contain any dynamic data, the knowledge base of the agent is included as the
field, thus transferring all of the agent's data with its code--this is an inherent advantage of using the knowledge base structure to hold all of the permanent and transient data of an agent.
The process of migration is given in figure 5.13. Once the mobile agent has sent its migration message to the local domain agent (i), it suspends its execution (ii). This is to ensure that the agent does not terminate before the local domain agent has received its behaviour and state and successfully archived them locally (iii). The domain agent then asks the manager agent to terminate the migrating agent (iv), which it does by sending it a
message (v), (vi). The local domain agent now has complete control and responsibility for the migrating agent; if an error had occurred before the agent was told to quit, the domain agent could have sent a
message to the mobile agent which would reawaken it to deal with the error.
The local domain agent then sends the core details of the mobile agent to the remote domain (vii), which can now perform authentication and verification checks. If the mobile agent fails and the domain agent does not support anonymous execution, then the remote domain agent sends a
message back to the local domain agent which contains details of why the agent was rejected. The local domain agent must now restart the agent from its archived behaviour and state.
If the migrating agent passes its security checks, then the remote domain agent archives its behaviour and state locally (viii) and informs the local domain agent that transfer was successful (ix). Then, the remote domain agent asks its local manager agent to launch the received agent by executing its behaviour (x), (xi); no state is passed to the manager agent, since the execution of the agent may still fail. Once the mobile agent has started, it recognises that it has just migrated and been restarted, and requests its state from the remote domain agent (xii). The remote domain agent transfers this to the migrated agent (xiii) and it uses this to restore its state and thus continue executing (xiv).
Due to the fact that there are so many stages where migration can fail, the respective domain agents must keep track of the status of the mobile agent and ensure that each segment of the transfer is completed successfully. Since some of these stages happen asynchronously, the task of managing migration is made slightly more complex because of race hazards. For example, at stage (iv), the local domain agent is transferring the behaviour and state of the agent to the remote domain agent at the same time that the manager agent is terminating the original migrating agent (iv and v). If the transfer fails, then the local domain must ensure that the original migrating agent has been terminated before it attempts to restart the archived version. However, this added complexity allows domain agents to continue operating asynchronously and to consequently interleave many different types of operation, not just a single migration.
Providing a secure environment in which agents can take advantage of services and fulfil requests is a difficult task to achieve because of the varying functionality requirements and the heterogeneous nature of the underlying networked operating systems. As such, there are a number of layers at which security mechanisms can be placed:
Networked operating systems such as UNIX and Windows NT, which cater for user, process and file security. However, the heterogeneity of security policies and mechanisms across networked operating systems makes them difficult to translate between.
Mobile agent systems which tend to provide security features that are both specific to mobile agents but also independent of any particular networked operating system. The means that while many mobile agent systems support code verification, authentication and even constrained execution, their user security model is poor. Telescript (White, 1995; White, 1996) is an exception to this case, but secure operation is only provided on UNIX hosts and through a proprietary security model.
Mobile agent architectures where decisions regarding policy are easier to make, since most mobile agent system are generic in that they aim to appeal to a broad spectrum of potential application areas. These policies tend to be flexible and customisable according to the needs of the architecture, but are directly reliant upon intrinsic support from a mobile agent system and possibly the networked operating system.
The Mobile Agent Facility draft proposal (Object Management Group, 1997) tries to make provision for security and other services independently of a networked operating system by defining a Common Conceptual Model (CCM). The realisation of the security aspect of the CCM is through the adoption of CORBA security services (Object Management Group, 1996), such as mutual authentication between agent systems, client authentication for the creation of new remote agents and specific security policies. However, it is made clear in the draft proposal that CORBA security does not meet all of the requirements for mobile agent security, for example there is no procedure for validating code before it is executed remotely. Moreover, it is not even clear that the CCM provides a comprehensive security solution for mobile agents!
Since the provision of security within and across mobile agent systems is still a matter of much debate, the mobile agent architecture aims to provide hooks for security mechanisms without adopting specific implementations:
Domain access mediation. When an agent migrates from a remote domain, it carries an identification signature comprised of an owner and a host domain. If this signature is known by the receiving domain agent then the agent is accorded the appropriate access permission. If it is not, then the agent is either accorded an anonymous status which grants a minimal access permission or it is immediately rejected from the domain.
Resource access mediation. When an agent attempts to access a resource, the resource agent checks the agent's stated access permissions with the local domain agent. If these are valid, then the resource agent accords the indicated level of access for that user; if not, then access is denied and the domain agent is informed.
Domain restrictions. When a mobile agent migrates into a domain its initial veracity is determined by the APRIL code verifier; if the agent fails then it is rejected. Additionally, a migrating agent can also be rejected if there are too many agents executing within the domain or when an agent accesses a resource without the required access permission.
Execution restrictions. Once a migrating agent has been verified and an access permission allocated, the domain agent can also impose an execution restriction by allocating an APRIL resource tank for memory or CPU usage. When the agent exhausts any of its tanks it is terminated.
Current domain and resource access mediation is based upon UNIX user names and access is granted (through the resource agent) based upon a user's access to the underlying resource, whether that is a file system or an application. However, this has serious implications for agents interacting with heterogeneous systems that employ disparate security mechanisms and policies. Chapter 7 describes a potential solution (Hayton, 1996) to applying a generalised security policy within the architecture that is both independent of the underlying network operating system and the mobile agent system, and can be used as a basis for secure interworking between agents and resources.
This chapter has stated that a primary component of a DIM environment is a resource, which can represent any collection of information that a user may wish to access or manage. The essential requirement of the mobile agent architecture presented is to provide an approach for supporting DIM functionality as described in chapter 2 that is both flexible and extensible.
In keeping with a major tenet of the agent paradigm, each component within the architecture abstracts a particular set of tasks or functions. Within a DIM environment, the essential components are resource agents and mobile agents; resource agents provide a set of services which represent the functionality of the resource that they are protecting and mobile agents embody the objectives of their user. This simplistic view may give the illusion that interaction between agents within the architecture is based around a client-server communication model. In reality, a peer-to-peer communication model is evident since mobile agents may also offer services to other agents and resource agents may take advantage of them to fulfil their own requirements. Other agents within the architecture provide agent management, specific agent services, environmental structure and interaction with the user.
The realisation of the architecture is based around a layered model where the lowest layer embodies the functionality provided by APRIL. Above this, a set of agent services have been implemented which are based around a core concept of a hierarchical message; hierarchical messages are the essential abstract data type for representing flexible communication between agents, dynamic and extensible message handling within agents and also internal information structuring that facilitates easier general information exchange between agents. Each of the architectural agents described have been built around these core set of services. The advantage of using a layered model and an open architecture design is that new services can be built within the model which can then be expressed by new types of agents within the architecture.
The next chapter will show how the architectural agents have been implemented and interact to form an environment within which DIM activities can be undertaken. It also looks at a standard model for representing information resources and DIM tasks that is a direct extension of the message class and type model presented within this chapter.
Sometimes when I consider what tremendous consequences come from little things... I am tempted to think there are no little things
-- Bruce Barton
The previous chapter described the design of a modular architecture composed from static and mobile agents and the implementation of various services to support their operation. This chapter goes on to explain how these architectural agents and support services have been used to implement a DIM environment; the composition of domains, agents, users, resources and DIM tasks.
As mentioned in chapter 2, DIM is the process by which users can create, disseminate, discover and generally manage their distributed information resources, such as their files which are spread across numerous filing systems, FTP sites, WWW sites and other such places; with current technology, a user must perform the management of these files directly and manually. The DIM environment presented in this chapter shows how mobile agents can interact with heterogeneous and distributed information resources in a uniform manner to achieve the DIM tasks that they have been given. It also highlights the fact that an environment of this type must support more than just the management of documents, it should also be extensible with regard to integrating and managing other environment objects such as hypermedia links and user interface presentations.
The anatomy of a mobile agent to undertake DIM tasks (called a DIM agent) is explored with regard to the way in which it is specified, launched and subsequently monitored by the environment. Finally, a number of sample DIM agents that are based around the DIM tasks identified in chapter 2 are presented and it is shown how they interact with other agents within the environment to fulfil the requirements of their behavioural programming.
The open hypermedia systems presented and discussed in chapter 2 have shown the value of hypermedia links when considering creating associations between documents of different media types and handling those associations in a flexible and extensible manner. One of the major advantages of open hypermedia systems such as Microcosm TNG (Goose et al., 1996) is that they attempt to provide an environment which integrates many different types of information resources and also, they provide a hypermedia linking service that spans heterogeneous systems, communication protocols and data formats. Typically, an open hypermedia system will attempt to provide support for the management of three segments of its functionality:
In early monolithic hypermedia systems these three management aspects were an integral part of the entire hypermedia application; contemporary open hypermedia systems have recognised that they should be handled by separate but communicating services (Goose, 1997). This distinction echoes the original definition of open hypermedia by Davis (Davis, 1995) which stated that new objects should be allowed to be imported into the system without restriction. Whereas the original intention of this statement was with regard to protocols, data formats, desktop applications and local and networked resources, it can also be applied generally to other objects within an open hypermedia system. That is, future open hypermedia systems will have to look at also integrating third-party management services, since a user may wish to use a particular service to manage their presentations, documents or links for the same reasons that they use a particular desktop application to manage the contents of their documents.
Hypermedia reference models and standards have begun to address this problem, but so far they have only tackled interoperability of data formats and interchange protocols; distributed information management refers to not only the form and content of information, but also to the process of its management. Therefore, a level of abstraction is needed between the individual mechanics of each service and the functionality that it is required to embody. Within the DIM environment, this has been achieved by defining sets of DIM primitives which refer to the management of presentations, documents and links; these primitives encapsulate a notion of data, protocol and process.
We introduce a new term for each collection of DIM primitives, called a facet of manipulation, and the entire facet collection combines to express the functionality of the DIM environment. Figure 6.1 shows that four basic facets have been defined; the architectural facet, the presentation facet, the document facet and the hypermedia link facet. The architectural facet represents all of the message classes and types identified in chapter 5, namely the
classes, which encompass the functionality of the mobile agent architecture. These primitives are implemented to a greater or lesser extent by all agents within the architecture.
The latter three facets exemplify the presentation, document and hypermedia link management services of an open hypermedia system. The primitives that they represent are implemented within user interface agents, resource agents and mobile agents as dynamically programmable typed messages. This means that a given DIM agent can subscribe to all or a subset of the available primitives within a facet; the agent makes the set of primitives it supports known by advertising them in the local domain agent. The following subsections describe the individual primitives that are associated with each facet and illustrate how they have been implemented within the prototype DIM environment.
Within a DIM environment, a document is considered to be a single-media object27 (such as a piece of text or a video clip) that is referenced through a document identifier. A document identifier is a hierarchical message structure which contains information that can uniquely identify a document within a given resource. Each of the primitives given in table 6.1 work through document identifiers to effect changes upon the documents themselves. For example, if a resource agent contains documents as a resource, then it will support some or all of the primitives specified by the document facet. A DIM agent that wishes to manipulate a given document will have to obtain a document identifier representing the document and then send the appropriate primitive and identifier to the resource agent. The resource agent will then perform the action associated with the primitive upon the document specified by the identifier on behalf of the DIM agent (after suitable security checks have been made).
<Document_Identifier> ::= <Doc_Name> <Doc_Id>... <Doc_Details>
<Doc_Id> ::= <Id_Address> <Id_Aliases>... <Id_Type>
<Doc_Details> ::= <Doc_Owner> <Doc_Size> <Doc_Type>
Each resource agent that protects a particular document resource type, for example a file system or an FTP28 site, understands how to construct a document identifier (in response to the
primitives) that is appropriate to represent a document within its resource. The
branch uses the
fields to specify the actual location of a document within a resource in multiple ways. The
field is based upon the standard URL form of:
indicates the type of the resource,
is an APRIL handle which represents the relevant resource agent and
details how the document can be retrieved. The
field allows alternate expressions of a document's location. For example, a document within a WWW resource can be specified externally to the WWW resource agent (through its URL) or internally (through a file specification). Such a document could be defined as possessing an
indicating its external address (such as
) and an
field representing its internal address (such as
). Thus, the
field can be used to provide extra information to help different parties in locating a document.
The purpose of supporting multiple
branches within a document identifier is to allow a document to be shown to exist within several resources. A
branch is present for each of these resources and the
field is used to signify the importance of each
; a primary
can be considered to be of higher importance than a secondary
The way in which the multiple
s are interpreted is left to the discretion of the agent processing the document identifier, for example, a resource agent will always default to using the primary
. However, it is important to note that where the document identifier represents a reference to a document, it does not represent the actual document itself and consequently the document can only be accessed through the primitives given in table 6.1. This means that document identifiers can have different forms:
A basic document identifier
is a document identifier which has been constructed by a resource agent. A DIM agent may only alter the contents of a basic document identifier with the purpose of changing the original document in some way (through the
primitives). The lifetime of a basic document identifier is the same as the lifetime of the document to which it refers and its scope extends across the agents which possess a copy of the identifier, including the resource agent that contains the actual document.
A modified document identifier
is a basic document identifier that has been augmented in some way by a DIM agent. For example, a DIM agent may construct a modified document identifier which represents a particular document that is replicated across multiple resources; the DIM agent may be responsible for ensuring that an update to the primary
is propagated across all specified secondary
s. A modified document identifier only has lifetime so long as a DIM agent retains it and only has scope amongst those DIM agents which possess a copy of it.
A meta-document identifier
refers to a virtual document that is specified by multiple
s. The DIM agent possessing the meta-document identifier uses this information to construct a new document dynamically from its constituent parts; the
branch of the identifier contains information about the DIM agent itself rather than of a particular resource agent. A meta-document identifier has the same lifetime and scope as a modified document identifier.
These three varying forms of a document identifier show that DIM agents are free to modify and interpret document identifiers in any way that suits their needs or purpose. However, this does mean that only a basic document identifier can be guaranteed to be accepted and processed by a resource agent. This is because after modifications, a document identifier may contain illegal or conflicting information that the resource agent cannot allow or resolve. Therefore, basic document identifiers are designed to work with the document facet primitives, and modified and meta-document identifiers are designed to be used and exchanged between DIM agents. This has the consequence that DIM agents can create and maintain documents within themselves and so disseminate them either:
This illustrates the essential mechanisms by which DIM agents can share documents; a document shared by value is essentially copied and a document shared by reference needs to have its contents resolved by the sharing agent (whether that be a resource agent or a DIM agent). This can have serious implications in terms of copyright, as identified by Nelson (Nelson, 1987), since an agent might not legally be able to copy the contents of a document and may be forced to work through references only. Nelson introduced the term transclusion to denote copy by reference and stated that the resolution of objects within a shared or public environment should be small enough to allow new and larger objects to be constructed from them (possibly down to the sentence or word level, to use text as an example). Since copyright only deals with copy by value (effectively granting or prohibiting it in certain situations), Nelson stated that access to objects by reference should be dealt with by transcopyright. This proposed law states that an object can be reused in unpredictable contexts so long as no bytes are copied (that is, a reference is used) and that the original publisher is notified of such use.
As well as document identifiers representing physical documents within a resource agent or across a group of resource agents, they can also represent a document that is generated as a the result of some computation, for example a database query. In this case, the primary
refers to the APRIL handle of an agent which can perform the processing; the
field may contain information that can influence the processing or generation of the results. This is particularly useful for documents that represent dynamically updated material, such as stock prices or weather reports, since a document can be expressed in terms of the agent that collates this information.
To retrieve the contents of a document, a DIM agent sends a
primitive with a basic document identifier to the appropriate resource agent and this returns a message that contains the original document identifier with the
field holding the actual contents of the document. Similarly, when a DIM agent wishes to change the contents of a document, it populates the
field of a basic document identifier and sends this to the resource agent as a
The document facet, then, is not primarily concerned with the contents of a document (which is left to the discretion of the processing agent) nor with the communication protocol that is required to access the documents (which is encapsulated and abstracted through the document facet primitives), although it does have to take both of these considerations into account. Instead, it seeks to allow a DIM agent to express a notion of process upon a document through the association of primitives to particular documents or document types; complex patterns of processing can be built into a DIM agent which are constructed from the basic primitives provided. The prototype implementation of the document facet primitives for various forms of resources is detailed in subsection 6.4.1.
The primitives of the hypermedia link facet are given in table 6.2 and perform analogous functions to their document facet counterparts. A link is defined to be an association between two or more documents which may be specified to varying degrees of detail. Within the DIM environment, a link is referenced through a link identifier which contains information that can uniquely identify a link within a resource. The EBNF for a link identifier is given as:
<Link_Identifier> ::= <Link_Name> <Link_Id>... <Link_Details>
<Link_Id> ::= <Id_Address> <Id_Aliases>... <Id_Type>
<Link_Details> ::= <Link_Owner> <Link_StartAnchor>
<Link_StartAnchor> ::= <Link_Selection> <Document_Identifier>
<Link_EndAnchor> ::= <Link_Selection> <Document_Identifier>
Start and end anchors which specify the start and end points of a link. Anchors can be composed of a selection which contains a document identifier and information about where the anchor begins and ends within the document. Selections are made as an index into the document and can either be derived from absolute values (for example a coordinate specification or a sequence of bytes to be pattern-matched) or from a computation that derives a sensible index.
Specific. Essentially a point-to-point link which is unidirectional from a source selection and document to an end selection and document. This is also the only link form supported inherently by the WWW, with the exception that the destination selection is set as the top of the document if it is omitted from the end anchor.
New forms of links can be constructed by including or excluding fields from the existing attribute set, for example, all of the Microcosm links could be made bidirectional or could be allowed to have multiple associated destination documents (rather than just the default one). The introduction of new fields could allow associations between documents to be made through concept matching rather than selections, as proposed by Beitner (Beitner, 1995). However, since hypermedia systems possess differing levels of hypermedia functionality, extensions to the basic model could result in information loss if links are passed between disparate hypermedia systems.
Since a resource agent is responsible for managing links within a specific resource, it has to make decisions regarding how to deal with differing link types. In the simplest case, the resource agent will advertise the link facet primitives it supports within the local domain agent and will also include the attributes that it supports as an input interface. For example, since the WWW only supports point-to-point links, a service registration entry for a
primitive could comprise:
(Aliases, ["Make link"]),
(InputInterface, ["Link_StartAnchor", "Link_EndAnchor",
The entries in the
field show other DIM agents the attributes that are supported and also any associated restrictions; in this case, only unidirectional, point-to-point links can be processed by the resource agent. This level of expression can allow an agent to judge how a link type will be treated before it submits it to the resource agent.
Link identifiers work in the same way as document identifiers described in the previous subsection and are bound by the same rules of sharing; resource agents generate basic link identifiers which can be modified to update a link (a modified link identifier) or to create clusters of links (a meta-link identifier). A virtual link can be defined within a meta-link identifier through the use of multiple
s; when the virtual link is followed, each composite
is also followed, thus generating multiple link resolutions. It is the task of the DIM agent possessing the virtual link to decide how to deal with the logistics of retrieving the individual documents pertaining to the links. In some ways, this process is similar to the Available Links filter within Microcosm; when the user performs an action which causes multiple link resolutions to be generated, an interface dialog is presented to allow the user to select from the choices.
The hypermedia link facet primitives provide the necessary constructs to allow DIM agents to perform link manipulation. One of the advantages of handling links in this way is that they become first-class citizens of the DIM environment; they can be handled and treated in similar ways to documents and can be shared between agents. This helps to promote interchangeable and interoperable link services; the user chooses the link services that they wish to use, DIM agents handle the transport and processing, and the resource agents handle the storage and implicit conversions. The prototype implementation of the hypermedia link facet primitives for various forms of resources is detailed in subsection 6.4.1.
A presentation is essentially a user interface component that conveys information to and accepts input from the user. The user interface agent is primarily responsible for implementing the presentation facet, since they are both concerned with managing the presentation of information to the user. A summary of the range of activities that can be expressed by the presentation facet is given in table 6.3. The management of presentations within the DIM environment is very simplistic; when the user interface agent receives results data from a DIM agent, it creates a new presentation for that data. The nature and form of the presentation is dictated by an assignment previously made by the user (with the
primitive). This is a simple association between a MIME type and an executable desktop application or agent which can handle the data.
This form of user interface management allows the user to employ their favourite desktop applications to view and manipulate their data. However, one of the biggest drawbacks to this approach is that there is no standard mechanism for either adding hypermedia linked information to a document30 that is to be viewed through a third-party application, or for capturing user input data that is entered inside of an application. The Universal Viewer (Davis et al., 1994) has shown how this form of integration can occur with desktop applications by employing a variety of GUI mechanisms (described in chapter 2).
As with both documents and hypermedia links, presentations are objects within the DIM environment that are referenced through a presentation identifier and are managed by user interface agents. Presentation identifiers can be expressed in EBNF as:
<Presentation_Identifier> ::= <Pres_Name> <Pres_Id>...
<Pres_Id> ::= <Id_Address> <Id_Aliases>...
<Pres_Details> ::= <Pres_Owner> <Pres_Application>
<Pres_Contents> ::= <Document_Identifier> | <Doc_Id> |
<Link_Identifier> | <Link_Id>
User interface agents can generate basic presentation identifiers on the creation of new presentations (
) or on the interrogation of existing presentations (
). The contents of a presentation (effectively the document or link with which the associated application has been invoked) can be specified by value or by reference, the latter case implying that the user interface agent must contact a resource agent (or agents) to resolve the contents. Data (such as user input) can be extracted from a presentation with the
Virtual presentations can be specified through the use of meta-presentation identifiers and allow separate presentations to be logically grouped; the composite
s point to the appropriate user interface agents that are managing each component presentation. Additionally, the allocation of primary and secondary presentations within a modified presentation identifier permits presentations to be shared across user interface agents (and hence, users); the secondary presentations are automatically updated when the primary presentation is changed. However, the presentation facet does not provide mechanisms to manage sharing or to structure and coordinate logically grouped presentations31.
Presentations can also allow DIM agents to pass information back to the user interface agent for display to the user (through the
primitive). When a DIM agent is created, a basic presentation identifier is generated by the user interface agent. This is assigned and retained by the DIM agent to act as its contact point through which it can return messages. The user interface agent maintains a list of DIM agents that have been assigned to presentations and ensures that any communication is routed to the correct presentation. It may be that multiple DIM agents have been allocated the same presentation identifier, which means that the user interface agent must control how this information is combined and presented to the user. Additionally, a DIM agent may possess multiple presentation identifiers which are related to different sections of its functionality. For example, all DIM agents for a given user interface agent could possess the same presentation identifier for communicating error or log messages, but possess different identifiers for communicating execution results. In this way, presentation identifiers can be thought of as named communication channels.
The main body of user interface presentations that a user employs to communicate their desires and intentions to the DIM environment have been implemented in the DialoX distributed user interface system (McCabe, 1997). DialoX integrates closely with APRIL and allows details of the management of a user interface to be handled independently from the user interface agent. The form of this management chiefly consists of rendering user interface presentations, handling user interaction and returning input data. The following sections describe the user interface components that have been implemented with DialoX.
The mobile agent architecture described in the previous chapter illustrated that each architectural agent represented a level of abstraction on a part of the functionality that could be expressed by the architecture. Essentially, domain agents abstract control and security issues, resource agents abstract resource management issues, user interface agents abstract user representation issues and mobile agents abstract task allocation and completion issues. This separation allows their functionality to be clearly contrasted and well-defined.
Within the DIM environment, resource agents implement the document and hypermedia link facets, and user interface agents implement the presentation facet. However, it is the DIM agents (mobile agents plus DIM tasks) which make use of the DIM primitives offered by resource and user interface agents, and indeed other DIM agents, to complete their objectives. The following subsections describe how the behaviour of a DIM agent is specified in terms of the identified primitives and how such an agent can be constructed, launched and subsequently monitored and controlled.
Each of the DIM primitives specified by the individual facets have been implemented as typed messages which take a hierarchical message as input data and return a hierarchical message as output data. Additionally, each single primitive is represented through a dynamically programmable handler function; this allows a resource or user interface agent to support all or a subset of the primitives offered by each facet. As mentioned earlier, a DIM agent can determine how much of the facet is supported by a given resource or user interface agent by examining the registration advertisements that have been made with the domain agent (see subsection 6.2.2).
Since DIM agents embody the characteristics of weak agents (identified in chapter 3), they do not employ `intelligence' or complex reasoning algorithms, but instead perform useful tasks that are centred around the manipulation of DIM objects, such as documents, hypermedia links and presentations. Therefore, the essential behaviour of an agent is expressed as a sequence of DIM primitives from the various facets combined with the APRIL programming language to provide basic decision-making capabilities. The main advantage of having such `dim' agents stems from the user-perceived need for predictability. In their user-trials of agents for an electronic marketplace, Chavez et al. (Chavez et al., 1997) conclude that for such agents to be widely accepted, it is crucial that the behaviour of an agent can be easily understood and controlled by the user. The behaviour of a DIM agent expressed in DIM primitives is simple to understand since each primitive has a well-defined meaning and purpose, and the cumulative effect of a given sequence of primitives can be easily inferred.
REASONbranch from the output message and handle it.
The use of APRIL as the underlying programming language to support the DIM primitives allows a DIM agent to be as complex or as simplistic as the agent programmer desires. However, with the current prototype, DIM agents can only possess one objective that is represented by this behavioural specification of primitives and APRIL code. This again makes them more simplistic since a DIM agent cannot possess multiple objectives which may conflict and need to be resolved.
To help agent programmers rapidly design and implement DIM agents, a skeleton agent has been developed. This agent possesses much of the functionality required by a DIM agent to integrate with the mobile agent architecture and the DIM environment. This skeleton agent possesses the following core features:
Migration capabilities and a standard migration decision mechanism.
When a DIM agent wishes to migrate, it sends a
primitive to the local domain agent with its knowledge base as an input hierarchical message argument; the migration process then takes place automatically until the agent is restarted (as fully described in chapter 5). The decision as to where the DIM agent should migrate is based upon the contents of the
branch of the agent's knowledge base. Each entry within this branch consists of a list of domain names with an associated priority rating; higher rated domains are visited before lower rated domains. In this way, the agent programmer can decide the level of control that a DIM agent can have over its migration movements. DIM agents can be preprogrammed with a specific journey and route that they must take and cannot make any alterations to this route, except to skip domains that are not currently available; when the DIM agent has visited all domains, it returns to its host domain. Alternatively, a basic decision-making algorithm has been developed which allows an agent to exercise control over its movements. This algorithm involves obtaining a potential list of domains to visit from the local domain agent and then the DIM agent contacting the domain agent of each domain and asking for a list of resource services and resource requests that are available. This list is matched against the DIM agent's own list of resource services and resource requests. The amount of successful matches ensures that the domain will be visited (by creating a new entry for it in
) and also determines its priority rating--no matches means that the domain will be visited last. A DIM agent can use this algorithm to either augment a list of preprogrammed domains or to continually derive new domain choices if it has not been limited in its journey.
Registration of resource, services and requests.
branch of its own knowledge base, a DIM agent stores information that is relevant to the resources in which it is interested, the services it can offer and the requests it wants fulfilled. When a DIM agent has successfully migrated to a new domain, these resource services and resource requests are automatically registered with the local domain agent; when the agent migrates out of the domain they are automatically unregistered. This information is also used in the algorithm described previously to help decide which domains are suitable for the agent to visit.
Management of new handler functions.
branch of a DIM agent's knowledge base determines which dynamic handler functions should be automatically installed into the agent when it is started (or restarted from having migrated). It also allows these handler functions to be removed when the DIM agent migrates to a new destination which helps to keep its code size as small as possible. The
branch indicates which dynamic handler functions cannot be loaded (ever), cannot be removed once they have been installed (even through migration) or cannot be overwritten once they have been installed for the first time (although they can be removed on migration). This allows an agent programmer to determine the functionality set that a DIM agent can possess throughout its lifetime.
/Registrationbranch of its knowledge base, with the local domain agent.
The skeleton agent is terminated by either the user, or by the local domain agent for attempting to commit a prohibited action. An agent that dies in this manner does not have the chance of returning its results data or knowledge base back to the user.
The skeleton agent completes its objective and returns to its host domain. The completion of an objective can take a number of forms, for example, the agent exhausts its migration list or the behaviour of an agent computes that the agent has met its programmed objective.
The skeleton agent provides a transportation and execution cradle in which the behaviour of a DIM agent can reside and it also helps to abstract the agent programmer away from the explicit details of migration, registration and dynamic message handling.
From DIM primitives.
The agent programmer constructs the behavioural specification of the DIM agent within a text editor from DIM primitives that are expressed in terms of blocks of APRIL code. Figure 6.2 depicts a screen shot of this editor and shows the form of a
primitive from within the document facet. Each code segment associated with a DIM primitive follows the sequence of operations detailed in subsection 6.3.1; new primitives can be inserted by selecting the primitive name from the popup menus along the bottom of the editor (the document facet primitives are expanded in the screen shot). It is the task of the agent programmer to add the input field data and to handle the return message, and also to provide continuity between sequences of DIM primitives to form the behavioural functionality of the DIM agent. This is the most fundamental way of programming an agent and places most of the burden of specifying the behaviour with the agent programmer. Once completed, the DIM agent is compiled as part of the skeleton agent and is launched.
From modifying a DIM template. To help alleviate the burden of programming DIM agents from base principles, a number of DIM agents have been pre-specified as DIM templates. A DIM template represents a particular type of DIM agent that has been written in terms of DIM primitives and can be modified and customised through input data. However, its fundamental operation may make assumptions which cannot be altered. Figure 6.3 shows the configuration options of a basic resource discovery agent, such as search terms and domain to visit lists. A number of these template agents have been developed and their operation and function is described further in section 6.4.
Besides specifying and creating DIM agents, part of the other duties of the user interface agent is to provide users with a window onto their distributed information resources, through the operation of their DIM agents. Once an agent has been created and launched, the user should be able to monitor and follow its progress wherever it moves to within the domain hierarchy. To this end, the user interface agent employs an agent notification system which receives communication from each of a user's DIM agents, classifies each message according to its type and displays this in the agent event queue of the appropriate agent. This notification system has been realised as a user interface presentation implemented in DialoX and an example screen shot of its operation is given in figure 6.4. Here, there are four DIM agents that belong to the user which are represented by the four rows of icons in the centre window; the left-most icon of each row yields the agent control panel of an agent (given in the bottom window) and the other icons in a row represent message events that have been posted by each individual agent. For example, the first agent, called
, is a resource discovery agent which was created in figure 6.3 from an existing DIM template. This DIM agent is currently within the domain
and its objective is to find any text documents that contain the search term
. The two icons in the agent notification window for this agent represent (from left to right) an information event and a found text document. The top-most window shows information about a document that the
DIM agent has discovered; it is a WWW HTML document which contains several embedded media formats. From here, the user can either look at the document in a viewer that has been assigned to the
MIME type, save it to the local filing system or forget about the existence of the document and remove the associated message from the event queue.
This form of notification system allows the results of all agents to be handled and presented to the user in a brief and consistent manner; each of the agent event queues are updated automatically when a new message is received from an agent. The agent control panel also provides the user with a mechanism to control the overall operation of an executing DIM agent. The icons across the bottom window in figure 6.4 represent (from left to right) resume agent, pause agent, stop agent, recall agent and terminate agent. Pausing an agent arrests the execution of a DIM agent in such a way that it can be resumed from the same point later, but stopping an agent will prevent it from completing its objective in the current domain and force it to migrate when it is next resumed (this is useful if the user wishes to skip processing within a particular domain).
The user can also communicate with their DIM agents through query messages (represented by an icon containing a question mark); these represent situations where a DIM agent requires clarification of a matter or further information to complete its processing. For example, if an agent cannot migrate to a listed domain, then it will post a query asking for future instructions regarding this domain, that is, should it try again on the next migration, try again on the last migration or forget about the domain.
To illustrate the potential functionality and operation of the DIM environment, a number of sample agents to undertake the DIM tasks discussed in chapter 2 have been developed; the configuration of this prototype DIM environment is given in figure 6.5. Within this prototype, there are three domains ( penfold.ecs.soton.ac.uk , roobarb.ecs.soton.ac.uk and penelope.ecs.ac.uk 32) which each contain a number of different resource types. These resources are abstracted, protected and managed by resource agents that have been tailored to work with each individual resource.
The following subsections describe how these resource agents have been implemented and how sample DIM templates have been developed to allow DIM agents that can undertake activities upon the resources in the prototype environment to be created.
The integration aspect of the DIM environment is provided by resource agents that implement the individual primitives of the document and hypermedia facets. A number of prototype resource agents for integrating particular systems have been developed and the basis of choice is that they either represent a fundamental access mechanism or exhibit properties that allow them to demonstrate the flexibility with which the primitives can be implemented and subsequently used. Each resource type is described below and has been further classified according to the modes of legacy system integration identified by Genesereth et al. (see chapter 3):
Filing system documents.
This resource agent can support all of the document facet primitives and represents the most basic level of access to documents. Management of this resource is performed through APRIL file handling functions and is controlled by the underlying APRIL subsystem. The filing system resource agent has been implemented in a version of APRIL executing upon UNIX and therefore adopts and advocates the UNIX security model33 and access permission structure34. When the filing system resource agent is executed, it is restricted to a sub-tree of the filing system resource that it is managing and the resource agent must have permission to access these files according to the intended functionality of the resource agent. For example, if the resource agent only implements document primitives for accessing documents (such as
), then the resource agent need only have read access to the files. However, if it implements all of the document facet then it must have read and write access to the files; the resource agent mediates final access on behalf of a DIM agent based upon the user identifier and permission bits associated with the file and the user identification credentials of the DIM agent. The integration of the filing system has been undertaken as a wrapper.
FTP documents. This resource can also support all of the document facet primitives and the implementation of the FTP resource agent has been achieved through issuing commands to an FTP client package called NcFTP35. Each document facet primitive is represented by a script of FTP commands which is executed by NcFTP and helps to reduce the complexity of the FTP protocol. As with the filing system resource agent, security is maintained through implementation on top of a UNIX system. The integration of FTP has been undertaken as a transducer.
WWW documents and links.
The WWW can only support the
primitives inherently due to its simplicity and lack of a comprehensive security model. To this extent, the document facet primitives of the WWW resource agent have been implemented as per the filing system agent described earlier, and the hypermedia link facet primitives have been implemented to work with the WWW's embedded point-to-point linking model. This means that a WWW resource agent which supports the hypermedia link facet must also support some document facet primitives to allow links to be embedded within the relevant documents: in effect, each document also performs as a link database (links inherit the security and access mechanisms of the documents in which they reside). The integration of the WWW has been undertaken as a rewrite.
Microcosm TNG documents and links. Microcosm TNG can support all of the document and hypermedia link facet primitives inherently, since each facet is handled by a separate process (the docuverse process36 and the linkbase processes respectively). The realisation of the Microcosm TNG resource agent has been achieved by the DIM primitives being expressed in terms of communication primitives in the Microcosm TNG protocol, that is, the resource agent understands how to converse with the docuverse and linkbase processes directly. Links are supported to the extent of the hypermedia functionality provided by Microcosm TNG, namely, unidirectional specific, local or generic links. The integration of Microcosm TNG has been undertaken as a transducer.
It is the flexibility and extensibility of the DIM primitives and their associated identifiers that allow the primitives from within the same facet to be implemented across different data formats, communication protocols and systems; specific restrictions are made known and available to DIM agents through the registration procedure outlined in subsection 6.2.2. For example, the integration of the WWW could have taken place by using HTTP to talk to WWW servers. However, this would have restricted the number of DIM primitives that such a resource agent could implement to just
, since HTTP does not yet support link management facilities or a security model to allow basic document management. It was felt that such a restrictive realisation of the WWW integration would not allow users to manage either their WWW documents or links effectively.
An associated drawback of the WWW is that resource agents which only support the document facet (for example, the filing system and FTP resource agents) must also support a subset of the hypermedia link facet primitives. This is because start and end anchors must be embedded within the source and destination documents. This is not a problem if both of these documents reside on WWW resource agents since they will already support the hypermedia link facet. However, for documents that are not managed by WWW resource agents, then a
primitive must be invoked with the involved document resource agents. Additionally, other hypermedia link facet primitives must also be supported by document resource agents to allow links to be subsequently manipulated (such as
). Due to this increased complexity, WWW links in the DIM prototype environment can only be created and modified between documents that are managed by WWW resource agents. For the Microcosm TNG resource agent, links can occur between any type of document (regardless of whether the associated document resource agent supports the hypermedia link facet or not), since link management is fully handled by the linkbase processes within the Microcosm TNG system; documents are accessed only on retrieval, not on link creation, modification or deletion.
The resource discovery DIM template allows the user to create a DIM agent that finds a textual search term within documents or links of various resources that are distributed across domains. As can be seen from the screen shot in figure 6.6, a term can be searched for within documents, link anchor selections or both across resources of a specific type or of
type. For document searches, any resource type means all those that support the document facet
primitive and, similarly for link searches, all those that support the hypermedia link facet primitive
. Resources can be targeted individually by specifying a list that consists of a resource name and facet pair, such as
which represents the WWW resource and its document facet.
The second half of the template determines the scope of the DIM agent, that is, the freedom it can express in visiting other domains to search for matches and how it should decide when it has completed its objective. In the example, this resource discovery DIM agent will search through all of the document facet-supporting resource agents within the domains of roobarb and penelope , and will consider its objective completed when it has visited and searched these two domains. any domains means that the resource discovery agent is free to choose the domains that it will visit according to the criteria specified in subsection 6.3.2. In this mode of operation, the resource discovery agent can be left executing until it is recalled, that is, until the user selects the eject button on the agent's control panel (see subsection 6.3.3). Other objective-completion mechanisms include:
When the resource discovery agent has been created and launched, its progress can be monitored through the agent notification system and influenced by the agent control panel. DIM agents are subsequently referenced by a name that the user assigns which can reflect the type of action that it is undertaking.
Searching performed by the
primitives of all resource agents is based upon an indexer and search engine called Swish37. For the
primitive, Swish pre-indexes all of the documents that the resource agent manages; for the
primitive, Swish either indexes a link database for Microcosm TNG resource agents or applies a restricted indexing algorithm for WWW resource agents (based upon identifying the start and end anchors within a document). Swish supports regular expressions and logic expressions, so search terms specified within the resource discovery DIM template can be made highly specialised or highly generalised.
Matching searches are ranked out of a thousand according to their perceived relevance to the original search term, such as the number of times a search term appears in the document or link selection, or how many words are in the document or link. To help prevent a resource agent either exhausting all of its document or link searching limits in the first search or returning irrelevant results, a minimum rank threshold can be specified as an objective modifier (called min_rank ).
Found documents and links are returned as message icons in the agent's event queue. However, it is not practical to post a message for each individual document or link found since this may generate many such messages. Instead, as the resource discovery agent performs searches on individual resources, a collective search message is posted in the agent's event queue which represents the results of that particular search. This allows a user to monitor the results of their DIM agent at each stage of its execution. Navigation assistance agents (see subsection 6.4.4) can be subsequently applied to these results to help a user process them into a manageable form and size.
The information integrity DIM template is concerned with allowing users to create DIM agents that can search for textual information across resources and insert, update or delete any such found information. The screen shot in figure 6.7 shows that, as with the resource discovery DIM template, information can be searched for across various resource types of documents, links or both.
When a match has been found, the user can specify that some information integrity action should be performed, such as inserting a new link, replacing a section of text or modifying the text within a link selection. In the example shown, a new link will be created on the selection Mountbatten in India in any resource type that supports link creation; the link destination points to a document previously found by the resource discovery agent (see figure 6.4). The creation of a new link uses the search term as the source selection and the found document as the source document of the start anchor and the other link creation parameters can be specified using the following keywords:
Sel indicates the end anchor selection. If this keyword is present, then this will cause the WWW resource agent to make an end anchor in the document specified by Doc . The Microcosm TNG resource agent only needs to make a new entry in its link database.
Dir indicates the direction of the link, either uni or bi . For bidirectional links, two links are created; one link from the source document to the destination document, and the other link from the destination document back to the source document. In the WWW resource agent, this has the effect of adding another start anchor in the destination document and a corresponding end anchor in the source document; the Microcosm TNG resource agent simply creates two link entries in its link database.
The final section of the user interface defines the scope and any limitations of the information integrity agent, which is similar in specification to the resource discovery DIM template. The extent to which an information integrity agent can make its manipulations will be directly linked to the resource specification. For example, the user may wish to create a link on every occurrence of their name across all documents within a single WWW resource, alter one phrase in a particular document, or insert a piece of text in every document that the information integrity agent ever encounters. The former two scenarios have very fixed and finite objectives, but the latter is much more open-ended since the DIM agent effects changes wherever it can (some actions may be prohibited due to security restrictions). Therefore, the power of the information integrity DIM template lies in the fact that its domain of applicability can be made highly specific or highly general, according to the intentions of the user.
Link structure. The DIM agent builds a structure of all of the links leading from a start document or link to a specified depth. This helps to give a user an overview of a particular hypermedia network.
Link assertion. The DIM agent asserts all of the links leading from a start document or link to a specified depth and determines which links can be followed and which links are potentially dangling. This helps to give a user an overview of the status of a particular portion of a hypermedia network.
Information overload management. The DIM agent takes the separate search results from a resource discovery DIM agent and combines these into an overall result which is organised according to a set of specific criteria. This helps to protect a user from the potential information overload from resource discovery agents.
The screen shot in figure 6.8 shows the user interface of the three navigation assistance DIM agents that can be customised and created. The top-most window represents the link structure and link assertion agents and the bottom-most window represents the information overload management agent. The initial start point of both the link structure and assertion agents can be specified as a
and the depth can either be a numeric value or a keyword. A numeric value indicates that links should be followed to the maximum depth specified; as the agent follows a link, an internal counter is incremented and when this counter equals the maximum depth, the agent backtracks to the next link. Alternatively, a keyword specification can be given, such as
, which restricts the agent to following only those links within the current resource agent or resource agents within the current domain; links can be followed to any depth so long as they do not point to somewhere outside of the remit of the resource agent or the domain. When all links have been investigated to the required depth, the navigation assistance agent posts its results to the agent's event queue in the agent notification system. The results are given as illustrated in figure 6.9; each structure component is represented by an icon denoting its type and is indented according to its depth in relation to the start document or link.
The overload information management DIM template allows the user to aggregate the results of an executing resource discovery agent according to some specified guidelines. In figure 6.8, the results of the Find Mountbatten resource discovery agent will be aggregated by taking the top 50 results which have a ranked relevance of 850 or over. To achieve this, the overload management agent takes all of the results that have been posted by the resource discovery agent and combines them into a list sorted in order of relevance. From this list, the overload management agent can perform a number of aggregation methods to allow the user to extract the results that they want. Both top and rel can be used to instruct the overload management agent to extract a slice of results, such as rel=>600 and <=850 . Aggregation methods are performed in a strict left to right order, generating an intermediate set of results upon which the next method is performed; this may cause some of the methods to be compromised. In the example given in figure 6.8, the top 50 out of the combined results will be extracted and then only those with a relevance of 850 or more will be selected, which may result in less than 50 final results. The results of the overload management agent are posted to its agent event queue as a link found message. The user can view the list of aggregated results and select the documents to be subsequently retrieved and displayed.
The overload management agent can be assigned to any existing resource discovery agent, whether or not it has completed its task; it simply operates on the results that have been returned so far. For a resource discovery agent that is to continue executing until it is recalled, the user can repeatedly apply an overload management agent to determine if a suitable set of results have been generated. Once the user is satisfied, the resource discovery agent can be recalled and a final aggregation derived.
The following subsections describe the life of an example DIM agent, from specification and creation through to operation and eventual termination. The domain and resource configuration to which this example relates was detailed back in section 6.4 and was given diagrammatically in figure 6.5.
For the purposes of this example, an Information Integrity Preserver DIM agent will be created and its progress followed. This agent, called Update Mountbatten , is specified through the Information Integrity DIM template (see subsection 6.4.3) a screenshot of which is given in figure 6.10. The information that has been entered into this template can be read as follows:
"Visit any domain and search through all resource types which contain documents and replace every instance of the text Mountbatten with the text Earl Mountbatten 38."
The specification of the Update Mountbatten agent is therefore quite liberal and could possibly reach a wide-range of eventual documents; the ability of the agent to effect changes upon those documents will be ultimately determined by the security restrictions enforced by the domain agents and resource agents of each domain. For the purposes of this exemplar, the Update Mountbatten agent will be created in the domain of penfold.ecs.soton.ac.uk --the domain agent of this domain is implicitly aware of the existence of the other two domains, roobarb.ecs.soton.ac.uk and penelope.ecs.soton.ac.uk . With regard to the sample domain and resource configuration given in figure 6.5, we can predict that the Update Mountbatten agent should have interactions with all of the resource agents in the three domains, since they all support the entire primitives of the document facet (some also support the primitives of the hypermedia link facet, but these are not within the scope of this agent).
When the Update Mountbatten agent has been created, it is launched within the local domain of penfold in the manner described in chapter 5. Once it has been initialised and setup, the agent behaves according to the interactions given in figure 6.11. This diagram shows the major messages that are sent between the agent and the local domain agent, the local resource agents and the remote domain agents.
The behaviour of the agent centres around the use of three document facet primitives,
. To begin with, the
agent registers these three primitives as requests with the local domain agent for any resource (i). The domain agent adds these entries within its registration database so that enquiring agents can determine what requests and services are available within the domain. Prior to this, the respective resource agents within the three domains will also have registered their supported primitives (namely, all of the document facet) as services with their local domain agents. This is the basic mechanism by which agents can find out about other agents.
Since the resource agents are essentially only reactive in this prototype example, the
agent must query the local domain agent to find services to match its requests. It does this by sending a
message to the domain agent to ask it to interrogate its registration database (ii). The body of the message contains the following search criteria:
Notice that these three primitives are not being queried as to whether they belong to a specific resource or not. This means that when the domain agent searches through its registration database, every resource which has registered at least these three document facet primitives will match. These matches are subsequently returned to the Update Mountbatten agent as a list of agent locations (effectively, APRIL handles).
In the example given, initially the
agent will find out about the
within the domain of
. From here, it sends a
to each of these resource agents with the contents of the message as the search term
(iii). Each resource agent invokes Swish, which has previously pre-indexed all of the documents that it contains, to derive a list of basic document identifiers which represents the documents that have matched the search criteria. These are returned to the
agent which iterates over all of the found identifiers and issues a
primitive to retrieve the contents of each document (iv). For each of these requests, the resource agent populates the
field of the basic document identifier with the actual document contents and returns the entire structure back to the
Upon receipt of each populated basic document identifier, the
agent searches through the contents of the document and alters every instance of
. The new document is stored back in the appropriate resource agent with a
primitive (v). Notice that the interaction between the agents happens completely asynchronously; this allows each agent to interleave many messages and thus not be left waiting on the results from particular resource agents.
Once the Update Mountbatten agent has processed all of the results from the resource agents within the current domain, it queries the domain agent for a list of known domains (not shown in figure 6.11). The agent sends the same query as initially sent to the local domain agent to each of the domain agents in the known domains list (vi), excluding those that the agent already knows about or has previously visited. The number of matches returned by each domain agent determines which order the domains are visited. In the example, both roobarb and penelope both have two resource agents that can process the relevant document facet primitives. In this case, the Update Mountbatten agent chooses the first in the known domains list ( roobarb ), unregisters its requests from the local domain agent (vii) and migrates there (viii).
The operation of the Update Mountbatten agent at roobarb (and later at penelope ) is the same as described previously for penfold . When the agent has migrated to both of these domains and it finds that it has no more domains left to visit39, a query for instruction is sent to the agent's ANS. This happens because the agent has been given general instructions that have become questionable; clarification is required before the agent can proceed. Figure 6.12 shows a screenshot of the ANS which has been receiving messages from the Update Mountbatten agent at each stage of its operation. The icons in the centre window represent (from left to right):
agent objective; registration with penfold ; document alteration; document alteration; unregistration from penfold ; migration to roobarb ; registration with roobarb ; unregistration from roobarb ; migration to penelope ; registration with penelope ; document alteration (shown); instruction query (shown).
From this, it is clear to see that the Update Mountbatten agent made changes in three documents; two in the domain of penfold and one in the domain of penelope . In fact, twelve replacements were made in the Mountbatten Biography document at penelope (shown in the top-most window). The final icon, the instruction query, means that the Update Mountbatten agent has moved from an active status to a waiting status, that is, waiting upon more domains to visit being made available. From the Agent Query Panel (the bottom-most window), the user can either enter more domains and press the `play' button to tell the agent to carry on executing, or they can recall the agent (the `eject' button) or terminate it (the `skull and crossbones' button).
The use of the facets of manipulation and their associated DIM primitives as the essential building blocks of the DIM environment has helped to facilitate uniform access to documents, hypermedia links and user interface presentations, effectively raising their profile to first-class value status. The advantage of such an approach to building functionality into the environment is in the way that a new object can be flexibly integrated by developing a specific facet and object-identifier structure to encapsulate its feature-set and characteristics. Once this has been implemented within the respective resource agents, DIM agents can exploit the features of this object through the primitive set to help further their objectives.
The DIM templates that have been developed help to show the potential functionality that can be expressed within a DIM environment that is comprised of distributed information resources. Users can create and launch new DIM tasks and then either leave them to operate in the background or influence their behaviour when certain types of events occur. This helps to promote an environment that is based around the indirect management of a user's resources, rather than a direct manipulation metaphor. As the user's sphere of influence grows to encompass larger groups of distributed information resources, so they will have to delegate the general management of this information to their DIM agents. However, as Chavez et al. (Chavez et al., 1997) note, the operation of these agents needs to be easily understood and managed by the user, otherwise they will not be used or trusted. The simplicity of the DIM primitives makes DIM agents easy to understand and their behaviour easy to predict.
In essence, the prototype DIM environment has shown that the integration of distributed information resources can be achieved in a way that was hinted towards, but not previously realised by open hypermedia systems: each individual aspect of the open hypermedia system is functionally separate and is used to manage their respective object. This advocates a move from open hypermedia systems inherently providing an integrating service between many different resource management systems to a separate communication infrastructure and execution environment that can handle objects, such as links and documents, in a uniform and extensible manner.
This thesis has advocated a different approach to building distributed information management systems than traditionally proposed and embodied by open hypermedia systems. The technology used to realise the work in this thesis has been based around a layered model in which various services are presented based upon a mobile agent metaphor.
Agent information representation and communication. Through the use of hierarchical messages and the EBNF database, agents have a consistent way of organising their own internal information and the information that they can communicate to other agents. Typed messages allow agents to converse in a standard manner and are similar in nature to KQML performatives, except that they directly express a processing connotation (whereas performatives indicate the manner in which the contents should be interpreted) and have an associated syntactic structure. De facto standards such as KQML and KIF were not used within the implementation primarily due to the fact that they were designed to package and express knowledge between heterogeneous agent-based systems. Since all of the agents within the architecture and the DIM environment are essentially homogeneous, the virtues of KQML and KIF are not explicitly required. However, such models of information communication and interchange would be required to allow agents within the architecture to communicate with agents in other agent-based systems. In such a case, there is no reason why typed messages cannot be expressed in terms of KQML performatives with hierarchical messages as the message content.
Intermediate data size versus mobile agent size. One of the potential advantages of mobile agents resides in their ability to help reduce intermediate network traffic by moving closer to a resource and preprocessing intermediate data beforehand. Counter to this is the problem of actual mobile agent size (in terms of its code and data) being transferred across the network. It can be argued that statically programmed mobile agents reach a point of diminishing returns with regard to their overall efficiency, that is, it may be more efficient in certain cases to transmit intermediate results across the network than send a mobile agent to perform the same task. The mobile agent architecture supports dynamically programmable agents and loadable handler functions to help prevent this problem; a mobile agent can unload all of its dynamic handler functions before migration and retrieve them as and when they are required in the new domain. Alternative solutions to this are to pre-distribute libraries of handler functions across each domain agent that the mobile agent is to visit which affords a domain more security control over the code that can be executed. However, this can present a number of problems for itinerant, mobile agents. Firstly, a mobile agent may not be aware of all of the domains that it wishes to visit and thus pre-distribution of non-standard libraries is not possible. Secondly, if the domain does not support the set of handler functions required by a mobile agent, then it cannot migrate there. This thesis advocates an approach where there may be pre-distributed libraries available to the mobile agent within a remote domain, but the actual usage of these handler functions remains the ultimate decision of the mobile agent (and, hence, the mobile agent programmer). Thus, a mobile agent can be specified in terms of required robustness; the more code that is statically included into the agent the less reliant it is upon pre-distributed libraries, but is potentially larger than its dynamic counterpart.
Fault-tolerant agent migration. Agent migration at the mobile agent system level (see chapter 4) is fundamentally different from agent migration at the mobile agent architecture level (see chapter 5). The former is primarily concerned with suspending an agent on a given node, transferring various components of the program context to a remote site and then restarting execution on a suitable node. The latter additionally involves the negotiation of transfer between the two domains and the preservation of the mobile agent in the event of a failure in the migration process. It is functionally more complex but affords mobile agents greater reliability, since a mobile agent can revert to a previous instance of itself, and helps to make migration as transparent as possible.
With regard to the design of the mobile agent architecture, the functional decomposition of each of the architectural agents operated successfully with the possible exception of the domain agent. The manager agent (see chapter 5) was developed and introduced into the architecture as a surrogate for some of the functionality of the domain agent in order to help alleviate part of its burden. This division was made by splitting the control of agents between the manager agent and the domain agent: the domain agent was still responsible for making the decisions regarding agents, but the manager agent actually executed the decisions. A potentially cleaner solution would have been to have moved the registration database out into a separate process, since this is a fundamental and heavily used feature of the DIM environment. This would have meant that agent management and control remained with the domain agent instead of being split across two agents.
The development and implementation of the mobile agent architecture has been made with a view to genericity and reusability, and as such, is being considered as the basis of an agent infrastructure for a number of projects which involve agents (Dale et al., 1997). This work has concentrated on generalising the services at each layer of the layered model to ensure that they can support modularity and extensibility.
The second part of the work in this thesis has centred around the development of a prototype system in which DIM tasks can be carried out by mobile agents in a resource-based environment (see chapter 6). Traditional open hypermedia systems have been comprised from mixed hypermedia and communication functionality; they have concentrated on providing hypermedia functionality in and across documents using user interface presentations. In the DIM environment that has been prototyped, the primary objects of an open hypermedia system have been kept distinct from the communication model of the mobile agent architecture and have been implemented as separate services in their own right; these services are represented by existing applications which can support their functionality.
However, although this level of abstraction allows DIM agents to operate uniformly on objects within the environment, it is not without some inherent drawbacks. The facets of manipulation are determined through an aggregation of the functionality of the resource management systems which they represent; for example, the document management facet contains the commonest primitives to manipulate documents, some of which only have meaning specifically within a document context. This has the repercussion that the maximum level of functionality that a resource management system can express within the environment is dictated by the DIM primitives. Systems with less functionality simply do not implement some of the primitives specified by the facet--a DIM agent uses the resource registration and matching facilities to determine which primitives are actually provided. Currently, there is no way to encompass any extra functionality of a resource management system, other than introducing new primitives for the specific facet. This is not an ideal solution, since programming a DIM agent is based upon the primitives that are known at the time; DIM agents are not able to make any choices regarding which primitives to use since they are hard coded into its behaviour. Future versions of this research should look at providing semantic descriptions of each primitive, possibly through the use of an ontological description language (Gruber, 1991), so that the name of a primitive is divorced from its actual function. This would allow an agent to reason about sets of similar DIM primitives and decide which was most suitable in helping to achieve its objective.
The prototype DIM agents within the DIM environment are not sufficient in themselves to deliver information management required by distributed information resources since they only offer a limited functionality set. Moreover, from the research conducted in this thesis, it is not clear if all information management functions can be expressed under the four categories identified in chapter 2. A real evaluation, involving user trials and feedback would need to be conducted to try and determine the kind of DIM actions that users perform on a day-to-day basis and the kind of information resources that they regularly use. This information could not only be used to design new DIM templates, but could also be used to develop new DIM primitives to fill the functional gaps in the resource agents.
These achievements serve to reinforce the original statements made in chapter 1, namely that this thesis offers a study of mobility within distributed information management which has been fulfilled through the amalgamation of three technologies; agency, mobility and open hypermedia.
The extensible and flexible nature of both the mobile agent architecture and the DIM prototype environment means that they are well suited to being developed as future research. This chapter describes some possibilities for taking forward the ideas presented in this thesis.
The resource agents within the DIM environment show how integration can be achieved with other heterogeneous information management systems and legacy systems. However, as with open hypermedia, the agent community is also looking at ways to make agent-based systems interoperable.
In one respect, other agent-based systems can be treated as legacy systems which can be abstracted and represented through appropriate resource agents. It has the advantage that data format and communication protocol conversions can happen within the resource agent, but it can also introduce difficulties in communication. This is essentially due to the fact that one agent-based system adopts a client stance and one adopts a server stance; the model also must work in reverse to allow peer-to-peer communication. Additionally, such an approach can place a heavy conversion burden on the resource agent and cause a communication bottleneck.
The question of peer-to-peer interoperability, where agents are free to converse across agent-based system boundaries is still an area of on-going research and debate. Standards such as the Mobile Agents Facility (MAF) specification are helping to define a common execution environment in which mobile agents can exist and be controlled and the Knowledge Sharing Effort (KSE) are helping to define standards for sharing and reusing knowledge across heterogeneous agents.
In terms of the mobile agent architecture presented in this thesis, peer-to-peer interoperability can be expressed at the Agent Services layer (see chapter 5), where compatibility with both MAF and KSE data formats and protocols can be expressed as separate services. Subsequently, these facilities can be realised at the mobile agent architecture level as a new form of architectural agent. Alternatively, other agents at the DIM environment layer could provide these facilities as internal services. The integration of KQML as a basic transport protocol could help to achieve a slightly higher level of interoperability between agents, for example, KQML protocol fields wrapped around hierarchical messages. However, the question of semantic conversion between general forms of content is still a problem (Ginsberg, 1991).
Another area where the mobile agent architecture could be extended is in its adoption of a common security model that is independent from any underlying security policies and access permissions. An example of such an independent system is the Open Architecture for Secure Interworking Services (OASIS) (Hayton, 1996), developed at the Computer Laboratory within the University of Cambridge. It has been common in the past to concentrate on enforcing access policies when considering internetworked security mechanisms, such as associating a particular request for a service with a given user identity and using this as the only parameter in deciding whether to grant or deny access. However, OASIS provides an approach that is much more flexible since it is based around client naming. Basically, a client obtains a name issued by a service provider which is based upon credentials held by the client or those which have been delegated from another client. A grammar has been developed that describes the conditions under which a name can be issued and the conditions under which it can be revoked which are both held by the server. This greater freedom allows complex security policies to be developed that can be derived from different forms of client credentials; the alternative would be to force all clients to adopt the same set and form of credentials (such as a user name).
The use of such a security model could be applied to the resource agents at the Agent Services layer; the resource agents become aware of how to specify and resolve policies according to the grammar specified by OASIS. This would free the resource agents from their current restriction of only basing access upon UNIX user names and access permissions.
The functionality of the DIM environment is expressed in terms of the facets of manipulation and the DIM primitives that they support. Chapter 6 described the implementation of three core facets that represent the basic objects and services that can be offered to DIM agents; the single-media document facet, the hypermedia link facet and the user interface presentation facet. However, since the design of the environment inherits the flexibility and extensibility of the mobile agent architecture, new facets can be added to permit new functionality.
Figure 7.1 shows how three complementary facets could be built on top of the existing core facets to supply some functionality normally attributed to information management systems. Example facets to extend the functionality of the DIM environment could include:
A composite facet would allow composite and structure objects to be created and managed. A form of composite object can already be created within the prototype DIM environment through the use of meta-identifiers (see chapter 6), but the management of these objects must be handled by the individual DIM agents. The presence of a composite facet would allow resource agents to offer composite DIM primitives which would manage the composite objects on behalf of DIM agents. A composite facet could work for any type of core facet, such as composite documents, composite hypermedia links and composite user interface presentations; each represented through a composite identifier. For example, a composite document would be comprised from several different single-media documents organised according to some structure, such as spatial relationships. The DIM primitives would allow a composite document to be created and destroyed, and also allow documents to be added to, removed from and manipulated within a composite document. A composite hypermedia link could represent the result of a link query or a set of links associated by a particular context; a composite presentation could represent the individual user interface presentations of a user or agent.
A cooperative facet would allow core objects within the DIM environment to be shared between multiple users. The DIM primitives of the present core facets do not contain any mechanisms for explicitly sharing objects between multiple users or for controlling access to those shared objects. The presence of such a facet would allow resource agents to offer sharing, concurrency and transaction control, and notification of changes to documents, hypermedia links and user interface presentations. It would also provide the lowest layer of operation for a Computer Supported Cooperative Working (CSCW) environment (Melly, 1995).
A versioning facet would allow the history of an object to be maintained through each instance of change. Each version of an object would be held and managed by the resource agent containing the original object and each instance of the object would carry a version number in its identifier. Versioning is a useful tool in helping to ensure that identifiers either point to the original version or the latest version of the object, that an audit trail of alterations can be derived, and to provide a level of safety for the users; an existing version of an object cannot be changed, only a new version created. This may lead to the creation of version trees; instances of a new version of an object that are created and used in parallel to the main version. A system such as the Revision Control System (RCS) (Tichy, 1982) could be used to implement version management.
Since objects within the DIM environment are represented through identifiers, these three new facets can work across all of the original core facets in a uniform manner; the type of the identifier determines which core facet primitives are invoked. Further layered facets of manipulation can be built on top of these, thus allowing more complex management operations which work upon the core objects.
One of the weaknesses of the current DIM prototype is in the way that new DIM agents can be created. Behavioural specification from DIM primitives allows great freedom and flexibility in the DIM agent, but can require the agent programmer to work at a level that is not necessarily user task-oriented. The DIM templates help make the task of creating a new DIM agent easier, but are restricted in the types of tasks that they can perform. A desirable solution would maintain the flexibility and freedom provided by the primitives but also make creating a new DIM agent as easy as completing a DIM template--in effect, an agent scripting language for DIM tasks.
An original design goal of APRIL (McCabe and Clark, 1995) was to allow the core language to be flexibly extensible in order to accommodate different agent-based solutions. Object-precedence grammar (McCabe and Clark, 1996) allows the syntactic form of new operators to be defined within the language40 and a built-in macro processor permits the semantics of the new operator to be introduced in terms of existing APRIL operators and macros41. Indeed, the APRIL language itself is represented as a series of language layers which are specified through macros; the APRIL++ object-oriented extension (Clark et al., 1996) is an example of extending the APRIL language into a new programming paradigm.
The design of such an agent scripting language should emphasise the distributed information management nature of the environment and should also offer access to the DIM primitives. One of the areas for improvement is the way in which document, hypermedia link and presentation identifiers are handled. At present it is a somewhat cumbersome operation to complete the minimum fields of a basic identifier and this could be made simpler by allowing document aliases which are retained by the user interface agent as they are discovered and used. Also, if the agent scripting language was conceptually simpler to use than specifying a DIM agent from first principles, then each of the DIM templates could be re-written and the behaviour made available to the agent programmer to augment beyond their initial input arguments.
User profiling was described in chapter 3 as the acquisition of information by an agent to determine how it can pro-actively assist the user. Some of these methods and technologies could be employed by DIM agents to anticipate and respond automatically to the needs of the information within the distributed environment of the user. Areas of application for such task anticipation include autonomously trying to find and repair dangling links, monitoring and responding to changes in information sources, and automatically propagating changes across multiple information resources. This would help to aid a user in the general management of their distributed information environment.
Sparck Jones42 introduced the term active content to describe the ascription of processes to documents; the processes are tailored to the specific details and requirements of the contents of the document and are aware of how to manage and maintain it. The processes could be represented through DIM agents which are programmed with tasks that were activated according to specific changes within the document or across other documents. These agents could also communicate between themselves to resolve information integrity and management conflicts, such as the movement of hypermedia link anchor information. For example, the agents could be responsible for informing other interested agents when the new version of a document was available and these agents could determine if the new version affected their document. This form of interaction would afford DIM agents extra autonomy in their actions and would result in a higher level of indirect management for the user.
The growth of internetworking that has recently taken place has demonstrated that electronic information systems are providing the user with more information than has ever been available. As more organisations and users start to take advantage of the services provided by large-scale networks such as the Internet, then not only will their needs and expectations increase but the burden on present and future systems will increase too.
It is proposed that current users are given few tools to help them create, navigate and manage their distributed information in a consistent and uniform manner; information that can be spread across wide areas and is heterogeneous in terms of its data formats and communication protocols. DIM is a term that is being commonly applied to systems which aim to unify the management of varying types of information, such as open hypermedia systems. It is suggested that if such tools as identified in chapter 2 were available, then the user would be better served by their distributed information resources.
Hypermedia systems have shown the value that is placed upon hypermedia linked information by users, specifically regarding the ability to associate documents and browse documents in an unstructured manner. Early hypermedia systems suffered from the fact that they provided such functionality in a very rigid and controlled manner; documents had to be converted into proprietary formats and hypermedia links were embedded inside documents. Additionally, they were typically monolithic in their composition which meant that their hypermedia functionality was not necessarily extensible.
Open hypermedia has been proposed as a move to help redress the problems raised by monolithic hypermedia systems. As described in chapter 2, these open hypermedia systems provide hypermedia functionality by storing links separately from documents (in link databases) and by integrating closely with desktop applications. Furthermore, open hypermedia systems such as Microcosm, have shown how hypermedia functionality can be made extensible through a collection of separate but communicating processes which can be added and removed dynamically from the system.
Since their application to distributed information resources, distributed open hypermedia systems have been considered a form of DIM , due to the fact that they manage both documents and hypermedia links. This form of management can be categorised in terms of the types of tasks that can be performed upon these objects: resource discovery, information integrity maintenance, navigation assistance and system integration. DIM tasks like these can help a user to create, disseminate and subsequently manage their information wherever it is stored, whatever data formats it uses and whichever communication protocols are required to access it. To this end, DIM relies on a high level of integration with local and networked resources and with desktop and legacy systems; it is not intended to provide this type of management activity within itself.
This thesis has shown that there is no definition of the word `agent' which is universally accepted; its meaning and behaviour are specified through the properties that it can possess and these properties can differ widely across agent-based systems. Instead, agent-based research has been presented from three different perspectives; the potential advantages and benefits that agents bring, the notions and characteristics of agency and finally, a taxonomy of agents and agent-based systems.
Agents and agent-based technologies are a product of their time--developed in a post-object-oriented period in which the large-scale network is just emerging; they inherit the principles of modularity and abstraction from the former, and interoperability and dynamism from the latter. Therefore, their perceived role in Internet and intranet environments is very much that of an assistant who helps the user manage their distributed information. Previously, computers were heavily user-driven and data was very much of a local nature and as Maes contrasts (Maes, 1997):
The situation that a computer user faces today is completely different. Suddenly the computer is a window into a world of information, people, software... And this world is vast, unstructured, and completely dynamic. It's no longer the case that a person can be in control of this world and master it.
This has given rise to a whole new philosophy of human-computer interaction, that of indirect management (presented and discussed in chapter 3), and agents are ideally placed to help deal with the problems of information overload and management. However, environments like the Internet are inherently distributed across wide areas and require a distributed metaphor to help agents deal with it.
Mobility as a characteristic of agents, described in chapter 4, allows them to move between nodes within a network to perform server-side processing that is tailored to the original request of the user. Previous approaches, such as client-server, perform initial processing at the server and subsequent user-tailoring at the client; this can lead to intermediate information being transferred across the network unnecessarily. For this and other reasons (Chess et al., 1995; Harrison et al., 1995), mobility is a central concept to the agent architecture that has been developed and is presented in chapter 5.
The architectural agents and the services that they take advantage of provide a core environment in which further agent-based applications can be developed. The use of agents permits the architecture to exhibit flexibility, extensibility and dynamism; new services and new agents can be installed to augment the functionality of the architecture. The adoption of hierarchical messages and EBNF specifications as a basis for all data structuring and communication provides an important abstraction upon which all services within the architecture can operate--their extensible nature allows for great flexibility when specifying input and output interfaces, and their ubiquity means that agents have a common interchange format for data.
The use of the APRIL has helped to bootstrap the architecture to a level where a core set of services, such as data conversion, communication between agents and constrained execution represent the lowest level of functionality--this is usually the highest level of functionality provided by mobile agent systems. Furthermore, the architectural agents describe a habitat which is directly tailored onto an Internet environment and offer a high-level of abstraction on individual services and resources. This has the benefit that DIM agents have a rich environment in which to operate.
Open hypermedia systems have been proposed as an ideal integrating technology (Malcolm et al., 1991) because they are at the centre of managing a user's information. Typically, the functionality of an open hypermedia system can be split into three components: the document management service, the hypermedia link service and the presentation service. Systems such as Microcosm TNG have described the open hypermedia system as the heart of the functionality, but this thesis advocates a different view in which each of the component services that combine to deliver hypermedia functionality should be fully interchangeable. Essentially, if the distribution and communication layers are mixed with the hypermedia functionality layer (as is the case with Microcosm TNG), then it can be very difficult for the user to take advantage of another hypermedia system.
To this end, this thesis has proposed an alternate view where hypermedia functionality is handled by a separate service, in the same way that the document management and presentation functionalities are provided by separate services. It is the mobile agent architecture described in chapter 5 that provides the distribution and communication infrastructure to integrate each of the services into a cooperating whole. This relationship can be appreciated in figure 7.2, which shows the traditional approach to providing hypermedia functionality in open hypermedia systems (on the left) and the approach favoured by the DIM environment (on the right). This has the inherent advantage that it promotes interoperability not only between services of different orders, but also services of the same order; open hypermedia research has only just begun to address the area of interoperability between hypermedia systems43.
The DIM primitives described in chapter 6 provide a high level of abstraction on each of the management systems that fulfil the individual services of the hypermedia functionality. The principle advantages of such an approach are two-fold. Firstly, all objects, such as documents, hypermedia links and user interface presentations are raised to first-class status within the environment. This means that they can be handled and manipulated in similar ways, and that they can be shared between agents as objects in their own right. Secondly, they allow a notion of process to be associated with objects within the environment; this notion is expressed through the DIM agents themselves. As Goose (Goose, 1997) summarised, one failing of existing hypermedia standards and solutions, such as the Dexter Hypertext Reference Mode, is that they only address the data and protocol aspect of interoperability; the DIM primitives address data, protocol and process.
The prototype DIM environment that has been developed has shown how various information resources can be accessed in an abstracted and uniform manner to the user through the use of DIM agents. The original DIM tasks identified in chapter 2 have been realised through a number of static templates which can be customised and launched to create active agents; these agents interact with their local environment and can migrate to remote destinations to help achieve their objectives. The sample set of DIM templates demonstrates the functionality and potential of the environment, namely, that separate information resources which are distributed over large-scale networks can be integrated to provide document and hypermedia functionality, and that these DIM agents can perform management tasks (as well as resource discovery and information retrieval tasks) for their user.
The future of distributed information management lies in successfully fusing all of these individual aspects into a coherent whole that integrates as readily with desktop applications as with local and networked information resources. It is here that both stationary and mobile DIM agents can play their part; mobile agents interacting with networked resources and stationary agents conversing with desktop applications to provide a truly integrated distributed information environment.
I find that a great part of the information I have was acquired by looking up something and finding something else on the way
-- Franklin P. Adams
Armstrong, R., Freitag, D., Joachims, T. and Mitchell, T., WebWatcher: A Learning Apprentice for the World Wide Web. In: Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995.
Atkins, D. E., Birmingham, W. P., Durfee, E. H., Glover, E. J., Mullen, T., Rundensteiner, E. A., Soloway, E., Vidal, J. M., Wallace, R. and Wellman, M. P., Toward Inquiry-Based Education Through Interacting Software Agents. In: IEEE Computer, 29(5), pages 69--76, 1996.
Bates, J., Loyall, A. B. and Reilly, W. S., An Architecture for Action, Emotion, and Social Behaviour. In: Proceedings of the Fourth European Workshop on Modelling Autonomous Agents in a Multi-Agent World: Artificial Social Systems, Castelfranchi, C. and Werner, E., Eds., Lecture Notes in Artificial Intelligence, 830, pages 55--68, Springer-Verlag, July 1992.
Beitner, N. D., Microcosm++: The Development of a Loosely Coupled Object-Based Architecture for Open Hypermedia Systems. PhD Thesis, Department of Electronics and Computer Science, University of Southampton, UK, 1995.
Bertrand, F., Colaitis, F. and Léger, A., The MHEG Standard and its Relation with the Multimedia and Hypermedia Area. In: Proceedings of the IEEE Conference on Image Processing and its Application, April 1992.
Caglayan, A., Snorrason, M., Jacoby, J., Mazzu, J. and Jones, R., Lessons from Open Sesame!, a User Interface Learning Agent. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 61--73, London, UK, April 1996.
Carr, L. A., DeRoure, D. C. and Hill, G. J., The Distributed Link Service: A Tool for Publishers, Authors and Readers. In: Proceedings of the Fourth International World Wide Web Conference, pages 647--656, Boston, USA, December 1995.
Castelfranchi, C., Guarantees for Autonomy in Cognitive Agent Architectures. In: Intelligent Agents: Theories, Architectures and Languages, Wooldridge, M. and Jennings, N. R., Eds., Lecture Notes in Artificial Intelligence, 890, pages 56--70, Springer-Verlag, 1995.
Chavez, A. and Maes, P., Kasbah: An Agent Marketplace for Buying and Selling Goods. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 75--90, London, UK, April 1996.
Chavez, A., Dreilinger, D., Guttman, R. and Maes, P., A Real-Life Experiment in Creating an Agent Marketplace. In: Proceedings of the Second International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 159--178, London, UK, April 1997.
Clark, K. L., Skarmeas, N. and McCabe, F. G., Agents as Clonable Objects with Knowledge Base State. In: Proceedings of the Second International Conference on Multi-Agent Systems, Kyoto, Japan, December 1996.
Crowder, R. M. and Hall, W., The Use of Interactive Media as a Training and Operational Interface in the Advanced Factory. In: Proceedings of the Third International Conference on Factory 2000: Competitive Performance Through Advanced Technology, pages 106--110, York, UK, July 1992.
Cugola, G., Ghezzi, C., Picco, G. P. and Vigna, G., A Characterization of Mobility and State Distribution in Mobile Code Language. In: Proceedings of the ECOOP Workshop on Mobile Objects, Linz, Austria, July 1996.
Dale, J. and DeRoure, D. C., Towards a Framework for Developing Mobile Agents for Managing Distributed Information Resources. Multimedia Research Laboratory Technical Report 97-1, University of Southampton, UK, 1997.
Davis, H. C., Hall, W., Heath, I., Hill, G. J. and Wilkins, R. J., Towards an Integrated Information Environment with Open Hypermedia Systems. In: Proceedings of the ACM European Conference on Hypertext, pages 181--190, Milan, Italy, November 1992.
Davis, H. C., Knight, S. J. and Hall, W., Light Hypermedia Services: A Study of Third-Party Application Integration. In: Proceedings of the ACM European Conference on Hypertext, pages 41--50, Edinburgh, Scotland, September 1994.
Dent, L., Boticario, J., McDermot, J., Mitchell, T. and Zabowski, D. A., A Personal Learning Apprentice. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pages 96--103, San Jose, USA, July 1992.
DeRoure, D. C., Hall, W., Davis, H. C. and Dale, J., Agents for Distributed Multimedia Information Management. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 91--102, London, UK, April 1996.
Falk, A. and Jonsson, I., PAWS: An Agent for WWW Retrieval and Filtering. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 169--179, London, UK, April 1996.
Fisher, M., A Survey of Concurrent METATEM--the Language and its Applications. In: Proceedings of the First International Conference on Temporal Logic, Gabbay, D. M. and Ohlbach, H. J., Eds., Lecture Notes in Artificial Intelligence, 827, pages 480--505, Springer-Verlag, 1994.
Foner, L. N., A Multi-Agent Referral System for Matchmaking. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 245--261, London, UK, April 1996.
Fountain, A., Hall, W., Heath, I. and Davis, H. C., Microcosm: An Open Model for Hypermedia with Dynamic Linking. In: Proceedings of the ACM European Conference on Hypertext, pages 298--311, Paris, France, November 1990.
Franklin, S. and Graesser, A., Is it an Agent, or Just a Program?: A Taxonomy for Autonomous Agents. In: Proceedings of the Third International Workshop on Agent Theories, Architectures and Languages, Budapest, Hungary, August 1996.
Geddis, D. F., Genesereth, M. R., Keller, A. M. and Singh, N. P., Infomaster: A Virtual Information Systems. In: Proceedings of the CIKM Workshop on Intelligent Information Agents, Baltimore, USA, 1995.
Goose, S., Dale, J., Hill, G. J., DeRoure, D. C. and Hall, W., An Open Framework for Integrating Widely Distributed Hypermedia Resources. In: Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems, pages 364--371, Hiroshima, Japan, June 1996.
Gray, R. S., Agent TCL: A Transportable Agent System. In: Proceedings of the CIKM Workshop on Intelligent Information Agents, Fourth International Conference on Information and Knowledge Management, Mayfield, J. and Finin, T., Eds., Baltimore, USA, December 1995.
Green, C. L., Bayer, D. and Edwards, P., Towards Practical Interface Agents which Manage Internet-Based Information. In: Proceedings of the BCS-SGES/DTI-ISIP Intelligent Agents Workshop, Oxford Brookes University, November 1995.
Grønbæk, K. and Trigg, R. H., Towards a Dexter-Based Model for Open Hypermedia: Unifying Embedded References and Link Objects. In: Proceedings of the ACM Conference on Hypertext, pages 149--160, Washington DC, USA, March 1996.
Gruber, T. R., The Role of Common Ontology in Achieving Shareable, Reusable Knowledge Bases. In: Proceedings the Second International Conference on Principles of Knowledge Representation and Reasoning, pages 601--602, 1991b.
Hohl, F., Baumann, J. and Straßer, M., Beyond Java: Merging CORBA-Based Mobile Agents and WWW. In: Proceedings of the Joint W3C/OMG Workshop on Distributed Object and Mobile Code, Boston, USA, June 1996.
Hoyle, M. A. and Lueg, C., Open Sesame!: A Look at Personal Assistants. In: Proceedings of the Second International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 51--60, London, UK, April 1997.
Jennings, N. R., Specification and Implementation of a Belief Desire Joint-Intention Architecture for Collaborative Problem Solving. In: Journal of Intelligent and Cooperative Information Systems, 2(3), pages 289--318, 1993.
Jennings, N. R., Varga, L. Z., Aarnts, R. P., Fuchs, J. and Skarek, P., Transforming Standalone Expert Systems into a Community of Cooperating Agents. In: The International Journal of Engineering Applications of Artificial Intelligence, 6(4), pages 317--331, 1993.
Johansen, D., van Renesse, R. and Schneider, F. B., Operating System Support for Mobile Agents. In: Proceedings of the Fifth IEEE Workshop on Hot Topics in Operating Systems, pages 42--45, Orcas Island, USA, May 1995.
Kozierok, R. and Maes, P., A Learning Interface Agent for Scheduling Meetings. In: Proceedings of the ACM-SIGCHI International Workshop on Intelligent User Interfaces, pages 81--88, Florida, USA, 1993.
Laurel, B., Oren, T. and Don, A., Issues in Multimedia Interface Design: Media Integration and Interface Agents. In: Proceedings of the CHI Conference on Human Factors in Computer Systems, pages 113--139, Seattle, USA, April 1990.
Levy, A. Y., Sagiv, Y. and Srivastava, D., Towards Efficient Information Gathering Agents. In: Software Agents--Papers from the 1994 Spring Symposium Technical Report SS-94-03, Etzioni, O., Ed., pages 64--70, 1994.
Lewis, P. H., Davis, H. C., Griffiths, S. R., Hall, W. and Wilkins, R. J., Media-Based Navigation with Generic Links. In: Proceedings of the ACM Conference on Hypertext, pages 215--223, Washington DC, USA, March 1996.
Lingnau, A., Drobnik, O. and Dömel, P., A HTTP-Based Infrastructure for Mobile Agents. In: Proceedings of the Fourth International World Wide Web Conference, pages 461--471, Boston, USA, December 1995.
Lingnau, A. and Drobnik, O., Making Mobile Agents Communicate: A Flexible Approach. In: Proceedings of the First Annual Conference on Emerging Technologies and Applications in Communications, Portland, USA, May 1996.
Magedanz, T., Rothermal, K. and Krause, S., Intelligent Agents: An Emerging Technology for Next Generation Telecommunications? In: Proceedings of the INFOCOM Conference, San Francisco, USA, March 1996.
Malcolm, K. C., Poltrock, S. E. and Schuler, D., Industrial Strength Hypermedia: Requirements for a Large Engineering Enterprise. In: Proceedings of the ACM Conference on Hypertext, pages 13--25, Texas, USA, December 1991.
Malkoun, M. T. and Kendall, E. A., CLAIMS: Cooperative Layered Agents for Integrating Manufacturing Systems. In: Proceedings of the Second International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 237--253, London, UK, April 1997.
Mayfield, J., Labrou, Y. and Finin, T., Desiderata for Agent Communication Languages. In: Proceedings of the AAAI Symposium on Information Gathering from Heterogeneous, Distributed Environments Technical Report SS-95-08, Knoblock, C. and Levy, A., Eds., pages 64--70, 1994.
Mayfield, M., Labrou, Y. and Finin, T., Evaluation of KQML as an Agent Communication Language. In: Intelligent Agents, Wooldridge, M., Müller, J. P. and Tambe, M., Eds., Lecture Notes in Artificial Intelligence, 1037, pages 347--360, Springer-Verlag, 1996.
McCabe, F. G. and Clark, K. L., APRIL--Agent Process Interaction Language. In: Intelligent Agents: Theories, Architectures and Languages, Wooldridge, M. and Jennings, N. R., Eds., Lecture Notes in Artificial Intelligence, 890, pages 324--340, Springer-Verlag, 1995.
Milojicic, D. S., Condict, M., Reynolds, F., Bolinger, D. and Dale, P., Mobile Objects and Agents. In: Proceedings of the Second USENIX Conference on Object-Oriented Technologies, Toronto, Canada, June 1996.
Müller, J. P., Pischel, M. and Thiel, M., Modelling Reactive Behaviour in Vertically Layered Agent Architectures. In: Intelligent Agents: Theories, Architectures and Languages, Wooldridge, M. and Jennings, N. R., Eds., Lecture Notes in Artificial Intelligence, 890, pages 261--276, Springer-Verlag, 1995.
Peine, H. and Stolpmann, T., The Architecture of the ARA Platform for Mobile Agents. In: Proceedings of the First International Workshop on Mobile Agents, Rothermel, K. and Popescu-Zeletin, R., Eds., Lecture Notes in Computer Science, 1219, Springer-Verlag, Berlin, Germany, April 1997.
Rasmusson, L., Rasmusson, A. and Janson, S., Using Agents to Secure the Internet Marketplace: Reactive Security and Social Control. In: Proceedings of the Second International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 193--206, London, UK, April 1997
Rose, A., Plaisant, C., Shneiderman, B. and Vanniamparampil, A. J., User interface Reengineering: Low Effort, High Payoff Strategies. Department of Computer Science Technical Report CS-TR-3459, University of Maryland, USA, 1996.
Shardanand, U. and Maes, P., Social Information Filtering: Algorithms for Automating "Word of the Mouth". In: Proceedings of the CHI Conference on Human Factors in Computer Systems, pages 210--217, 1995.
Steeb, R., Cammarata, S., Hayes-Roth, F. A., Thorndyke, P. W. and Wesson, R. B., Distributed Intelligence for Air Fleet Control. In: Readings in Distributed Artificial Intelligence, Bond, A. H. and Gasser, L., Eds., pages 90--101, Morgan-Kaufmann, 1988.
van Renesse, R., Hickey, T. M. and Birman, K. P., Design and Performance of Horus: A Lightweight Communications System. Department of Computer Science Technical Report TR94-1442, Cornell University, 1994.
von Eicken, T., Culler, D. E., Goldstein, S. C. and Schauser, K. E., Active Messages: A Mechanism for Integrated Communication and Computation. In: Proceedings of the Nineteenth ACM International Symposium on Computer Architecture, pages 256--267, Gold Coast, Australia, May 1992.
Wada, Y., Kawamura, A., McCabe, F. G., Shiouchi, M., Teramoto, Y. and Takada, Y., An Agent-Oriented Schedule Management Systems--IntelliDiary. In: Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Systems, pages 655--667, London, UK, April 1996.
Weihmayer, R. and Veltuijsen, H., Applications of Distributed Artificial Intelligence and Cooperative Problem Solving to Telecommunications. In: AI Approaches to Telecommunications and Network Management, Leibowitz, J. and Prereau, D., Eds., IOS Press, 1994.
Wilkins, R. J., The Advisor Agent: A Model for the Dynamic Integration of Navigation Information within an Open Hypermedia System. PhD Thesis, Department of Electronics and Computer Science, University of Southampton, UK, 1994.
5. It can be argued that nearly all hypermedia systems (whether open or not) can partake of data distribution in a local manner through network file sharing offered transparently by the operating system. However, distributed open hypermedia systems aim to move beyond just local distribution of data to global distribution of data and control.
8. A proxy is an intermediate server that can sit between a client and a normal WWW server; the proxy is free to interrupt or modify the data on its outward or return trip. Proxies can also be used to control access through a firewall and for caching documents where a request for a document from a client is checked against the local datastore of the proxy. If the proxy holds the document, then it serves it immediately to the client; if it does not, then it passes the request onto the intended WWW server and stores the subsequently returned document in its local datastore for possible future requests.
9. Primarily by Randall Trigg at the closing keynote address at the ACM Conference on Hypertext in 1996 and Malcolm et al. (Malcolm et al., 1991) when describing industrial strength hypermedia.
11. Such as those being developed by the Agent Society, the Foundation for Intelligent Physical Agents (FIPA), the Web Consortium and the Object Management Group (OMG) (Object Management Group, 1997).
13. Shoham introduces inheritance later into this equation as a group agent which constitutes an agent in itself; if its beliefs are defined to be common beliefs, then it follows that the mental attitudes of the group will be inherited by an individual agent.
20. In some mobile agent systems, as will be seen later in section 4.4, when an agent exhausts a constraint it can be notified by the mobile agent system rather than being destroyed.
25. The term knowledge base is used here to refer to the private information of an agent, which may or may not also contain the `knowledge' that an agent uses directly as part of its decision-making process. When an agent migrates, it can only take the contents of its knowledge base with it, which makes the process of migration simpler (see subsection 5.3.6).
27. Composite documents which can contain multiple and structured media types could be handled by a composite facet which is discussed in chapter 7.
29. The primitives do not take into account managing or preventing simultaneous access to documents between agents. Sharing and locking between documents, hypermedia links and user interface presentations could be handled by a cooperative facet (discussed in chapter 7).
31. Again, these could be handled by composite and cooperative facets (see chapter 7).
32. Within the present configuration, each domain contains only one host node which corresponds directly to the name of the domain. The representative domain agents are implicitly aware of the names of the other domains.
34. Access to files within a UNIX filing system is granted or denied according to a set of permission bits ; read, write or execute permissions can be assigned to a file for the owner, a group of users or all other users.
36. The term docuverse was first introduced by Ted Nelson (Nelson, 1987) to describe a universe of documents.
38. This will, of course, also have the side-effect of changing all instances of Earl Mountbatten to Earl Earl Mountbatten ! To prevent this kind of behaviour, the search criteria could be respecified as Mountbatten or "Earl Mountbatten" , which would catch the case where Earl was already present in the text and the case where Earl was missing.
39. In this small, closed environment an agent would be expected to exhaust its domain to visit list very quickly. In larger situations, such as the Internet, it would typically take much longer to exhaust, if ever--it is more likely that the user would recall or terminate an agent before it ever reached this particular query-state.
40. In terms of the type (such as prefix, infix of postfix), associativity (left, right or non-associative) and priority (a value between 0 and 1000 or 1001 and 2000--expression- and statement-level priorities, respectively) of the new operator.