ss2@ecs.soton.ac.uk
In April 2008, I completed my PhD thesis on Flexible Service Provisioning in Multi-Agent Systems [PhD 2008]. In collaboration with my supervisors Nick Jennings and Terry Payne, I combined techniques from the fields of artificial intelligence, multi-agent systems and decision theory to design efficient and effective algorithms for provisioning (or selecting) uncertain and heterogeneous services that are required for complex workflows. In particular, our approach deals with service uncertainty by flexibly provisioning multiple services redundantly for critical tasks, by re-provisioning failed services at run-time and by reserving services in advance where this is beneficial.
The following is an extended abstract of my thesis work. It is divided into five sections:
Service-oriented computing is an increasingly popular approach for providing applications, computational resources and business services over highly distributed and open systems (such as the Web, computational Grids and peer-to-peer systems). In this approach, service providers advertise their offerings by means of standardised computer-readable descriptions, which can then be used by software applications to discover and consume appropriate services in a fully automatic manner. This allows consumer applications to outsource key pieces of functionality and flexibly choose the best service providers at any given time. For example, the application of a computational scientist can seamlessly access remote data-processing and simulation services on the Grid without the need for an expensive local mainframe. Similarly, a supply-chain application can automatically discover and order from the cheapest supplier of stocks when inventory levels run low.
However, despite active research in service infrastructures, and in service discovery and composition mechanisms, little existing work has dealt effectively with some of the key features of large-scale service-oriented systems:
These issues are particularly critical when the consumer requires services to complete in a timely manner, when providers demand financial remuneration and when services are executed as part of larger workflows, where multiple services are composed to achieve some high-level goal. Such circumstances are common in many application scenarios of service-oriented technologies, from computational Grids to commercial business processes.
To address the issues described above, we have developed a number of algorithms that a service consumer can use to effectively and efficiently execute large service workflows in dynamic and uncertain environments.
More specifically, we have concentrated on the process of service provisioning (see Figure 1). Provisioning, i.e., the selection of particular service instances for specific tasks of a workflow, has received comparatively little attention in the research literature so far, but we believe that it is vital for controlling and mitigating nondeterministic service performance. Provisioning providers in an appropriate manner allows a service consumer to differentiate between providers that offer a service at differing levels of quality or reliability and to invest extra resources in tasks that are particularly critical. Furthermore, re-provisioning providers on-the-fly enables the service consumer to respond quickly to failures and recover partially complete workflows without starting from scratch.
To this end, we have developed a novel model for a distributed system and for simple workflows, which can be used in a wide range of application areas — from Grid to Web services and peer-to-peer systems. In the context of this model, we described a number of provisioning strategies, and, in doing so, advanced the state of the art in service provisioning as follows:
Automatic Service Redundancy: We devised the first algorithm that provisions multiple services redundantly for particularly critical workflow tasks in an automatic and principled manner, based on service performance information and the predicted benefit of doing so. Using such redundancy allows the consumer agent to decrease the probability of task failures and improve task execution times [ECAI 2006, TOIT 2009].
In the following, we briefly discuss a simple example of how we use redundancy, flexible re-provisioning and knowledge of heterogeneous service providers to address uncertainty.
Consider a single workflow task, which can be completed by a number of service providers. As shown in Figure 2, let us assume there are 17 such providers and each of them belongs to one of two classes: either cheap, unreliable and slow, or expensive, reliable and fast (this is for illustration only — in reality, there may be many more types of providers).
Figure 2: Available Service Providers
Now, a naïve approach may select only a single provider for the task. For example, if the task is of a high value and the consumer wants to maximise its probability of success, it may select a provider from the expensive population, as show below:
Figure 3: Single Reliable Provider
However, we could do better by introducing redundancy and re-provisioning. Figure 4 shows the result of invoking four of the cheaper providers in parallel, then waiting for an hour until re-assigning the task to four more providers if the task has not been successful (and repeating this another hour later). Surprisingly, using the far less reliable providers now results in a higher success probability, at a significantly lower cost. However, the duration and variance both rise in this case, due to the additional waiting time that is introduced.
Figure 4: Multiple Cheap Providers
Finally, it is possible for the consumer to invoke different types of services providers over time. Figure 5 shows a schedule where the consumer first invokes a number of cheap providers in parallel, followed some time later by more expensive ones. Here, the consumer may be lucky and have the task completed by one of the initial cheap providers. However, if this is not the case, it falls back onto the more reliable providers to ensure the task is completed within a reasonable time. In summary, this decision now results in almost guaranteed success and an expected cost and duration that are both lower than those of a single reliable provider.
Figure 5: Redundant Heterogeneous Providers
The example highlights how redundancy and re-provisioning can be used to proactively deal with uncertainty. A key benefit of our approach is that the schedules outlined above are found automatically, based on the performance of service providers, the structure of the workflow, time constraints and the overall workflow value. For example, this means that our algorithm might pick the cheaper allocation in Figure 4 when the user has a lower value and less urgent deadline for the workflow. However, when time is more critical and the workflow is more important, then the more expensive, reliable and faster allocation in Figure 5 might be chosen.
In addition to the above contributions, we have liaised with computational scientists to show that our algorithm can be used to ensure a long-running Grid workflow from the bioinformatics domain is completed within a certain deadline, even when services are highly unreliable (see Figure 6) [AHM 2008, TOIT 2009, PTRSA 2009].
Figure 6: Bioinformatics Workflow
Furthermore, we have developed a GUI to visualise both the provisioning decisions and workflow progress during execution (see Figure 7) [PTRSA 2009].
Figure 7: Service Provisioning GUI
[PTRSA 2009]: Stein, S., Payne, T. R. and Jennings, N. R. (2009) Flexible Selection of Heterogeneous and Unreliable Services in Large-Scale Grids. Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences. (In Press)
[TOIT 2009]: Stein, S., Payne, T. R. and Jennings, N. R. (2009) Flexible Provisioning of Web Service Workflows. ACM Transactions on Internet Technology. (In Press)
[AHM 2008]: Stein, S., Payne, T. R. and Jennings, N. R. (2008) Flexible QoS-Based Service Selection and Provisioning in Large-Scale Grids. In: UK e-Science 2008 All Hands Meeting (AHM), HPC Grids of Continental Scope, 8-11 September, Edinburgh.
[PhD 2008]: Stein, S. (2008) Flexible Service Provisioning in Multi-Agent Systems. PhD thesis, University of Southampton.
[AAMAS 2008]: Stein, S., Jennings, N. R. and Payne, T. (2008) Flexible Service Provisioning with Advance Agreements. In: Seventh International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-08), 12-16 May 2008, Estoril, Portugal, pp. 249-256.
[AAAI 2007]: Stein, S., Jennings, N. R. and Payne, T. R. (2007) Provisioning Heterogeneous and Unreliable Providers for Service Workflows. In: Twenty-Second AAAI Conference on Artificial Intelligence, 22-26 July 2007, Vancouver, Canada, pp. 1452-1458.
[SOCASE 2007]: Stein, S., Payne, T. R. and Jennings, N. R. (2007) An Effective Strategy for the Flexible Provisioning of Service Workflows. In: The Workshop on Service-Oriented Computing: Agents, Semantics, and Engineering (SOCASE 2007), May 14, 2007, Honolulu, Hawaii, USA, pp. 16-30.
[AAMAS 2007]: Stein, S., Jennings, N. R. and Payne, T. R. (2007) Provisioning Heterogeneous and Unreliable Providers for Service Workflows. In: 6th International Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS-07), May 14-18 2007, Hawaii, USA, pp. 523-525.
[ECAI 2006]: Stein, S., Jennings, N. R. and Payne, T. R. (2006) Flexible Provisioning of Service Workflows. In: 17th European Conference on Artificial Intelligence (ECAI-06), Aug 28th - Sept 1st, Riva del Garda, Italy, pp. 295-299.