The algorithmic principles of adaptation.

see Work In Progress

 

Hill-climbing is a simple form of optimisation:

Generate a modification of the current best solution;

If it provides an improvement then make this the current best solution;

Repeat.

It’s a simple, easy-to-understand algorithm, providing incremental improvement in the local neighbourhood of the current solution where available. Many natural processes can be understood as hill-climbing processes: e.g. physical systems ‘seeking’ higher entropy configurations; maybe even the advancement of a scientific discipline where modifications of existing methods and ideas are generated, and useful modifications are retained.

        It is not surprising then that the first plausible scientific explanation for the complexity of living things, Darwin’s theory of evolution by natural selection, utilised a simple hill-climbing model. In the subsequent century and a half, the hill-climbing model has become deeply ingrained in evolutionary biology as the discipline itself explored small variations in the local neighbourhood of the hill-climbing model.

        Recently, there has been increasing awareness of some previously under-researched biological phenomena (discussed below) – including mechanisms that have been instrumental in the major transitions in evolution. But, although these phenomena act within natural selection, they present a problem for the hill-climbing model of adaptation. One feature found in several of these phenomena is that genetic material that has been pre-adapted in different populations is subsequently brought together into a single population. The evolutionary significance of this is that it potentially allows innovation that neither ancestral population could achieve via random modifications within its own local neighbourhood of genotypes.

        Meanwhile, with the development of computer science, scientific thinking about the space of possible algorithms has become more sophisticated. We know that hill-climbing is but the simplest form of heuristic search, and that other methods, when applicable, can be vastly superior. Notably, divide and conquer algorithms are a large important class of algorithms where a problem is decomposed into more manageable parts, and a solution to the whole is assembled from sub-solutions. Historically, this type of problem solving has been associated only with human problem solving and design, using top-down knowledge of the problem domain, and not with non-teleological processes such as evolution. However, the biological phenomena mentioned above do indeed fit better within a divide and conquer framework, and cause us to move our understanding outside the hill-climbing paradigm.

        This observation provides an opportunity which is itself an example of the principle: opportunity arises for interdisciplinary exchange between computer science and biology that will allow innovation that neither discipline could achieve via small modifications within its own local neighbourhood of theories and methods.

 

Algorithmic Biology

 

This page describes my current research program and the significance of this work to interdisciplinary research between computing and biology. My past research has included publications on genetic algorithm theory, evolutionary robotics, artificial life, dynamical systems, and computational biology. Those explorations have focused my interests within algorithmic biology (see inset). My future research program will continue to be diverse and foster exchange between biology and computer science wherever opportunity arises.

 

Algorithmic biology and other interfaces
of computer science and biology

In the last 50 years, the tools and methods provided by computer science have become enormously important in nearly all disciplines of modern science. In biology, computational methods have been used to model and simulate biological systems (computational biology), and more recently, to manipulate and process the flood of data coming from genetics and genomics (bioinformatics). In turn, computer science has taken inspiration from biology to develop new algorithmic techniques (nature inspired computation) such as neural networks and evolutionary algorithms, and even new computational substrates (bio-computation) such as DNA computing. One aspect featured in various subfields of these different areas has a particular emphasis: not merely to take inspiration from biology for new computational methods, nor merely to apply existing computational methods to biological systems, but rather to use algorithmics and complexity theory to understand the underlying algorithmic principles of biological systems (algorithmic biology) – to get to the science that is common to both disciplines.

 

 

 

Impact

The above subjects describe a broad education in algorithmic biology. This program addresses the science of both biological adaptation and computational algorithms. As such it impacts the basic science of both computer science and biology disciplines (see goals). Developing a broader model of biological evolution has the potential to change the way we think about some fundamental issues in biology: for example, the kinds of systems that are evolvable or unevolvable, and the spontaneous formation of new levels of organization in natural systems. Developing new models for artificial evolution will also increase the suite of computational tools available for various applications – from the processing of genomic data, to design automation in various domains such as circuit design, controller design, robotics, and in any number of optimization domains from scheduling to routing problems. All these domains have partially decomposable problem structure that could be exploited by a method that is able to explicitly identify modules and manipulate their assembly hierarchically, as developed in preliminary work and discussed below.

          Building bridges between the disciplines capitalizes on the expertise the university already holds in each independently. By providing new models that expand evolutionary computation and evolutionary biology toward one another, and challenging deeply rooted assumptions in both disciplines, new routes of exchange can be forged which allow the wealth of existing results and expertise to flow from one discipline to the other. This research program provides the opportunity to move beyond the application of computational techniques to biological systems, and beyond the use of biology to inspire computational techniques, and to get beneath to the scientific principles that underlie both interesting biological mechanisms and interesting algorithms.

 

Research goals

In brief, the objective of this research is to identify and exploit the underlying algorithmic principles of adaptation in biological systems – in particular evolutionary adaptation. This involves two complementary subgoals:

·         To use algorithmics and complexity theory to expand our understanding of biological evolution;

·         To utilize these algorithmic principles in computational methods for design automation and optimization.

 

The remaining sections of this statement briefly describe more specific research objectives:

·         The motivations for this research (the shortcomings of existing algorithmic notions in evolutionary biology, and failure of artificial evolution techniques to scale),

·         Some main research questions for this program,

·         The progress thus far in this research (both in expanding evolutionary computation and in expanding models in theoretical population genetics).

 

Motivations

Motivations from Biology:                
The algorithmic principles of natural evolution

No algorithmic notion has had more impact on biology and the way we view ourselves and the world around us than the theory that complex life could result from random variation and selection. Darwin’s theory of evolution by natural selection provided the first workable scientific theory of how complexity could increase without teleological direction. In the century and a half since Darwin, the underlying algorithmic principle of this theory – the accumulation of “numerous, successive, slight variations” (Darwin 1859) – has remained intact. The theoretical framework of evolutionary biology, formalized in the 1930s, reinforced the notion that natural selection acts by accumulating individually beneficial mutations (because combinations of alleles are broken up by sexual recombination). The assumption of gradualism, at least at the genetic level, is now ubiquitous in our understanding of evolutionary processes. However, there are significant biological phenomena, from lateral gene transfer in bacteria to the major evolutionary transitions, that are not accommodated by this conception of evolutionary change (see inset). These phenomena are not mere curios – aberrations on an otherwise well-behaved gradual process – in fact, they undermine our preconceived notions of individuality on which a hill-climbing model rests, and the formation of new levels of organization in nature, as in the major evolutionary transitions, is critical to an understanding of biological evolution on the large scale.

 

Evolution that is not hill-climbing

Molecular sequence data shows that the pattern of evolution “is not as linear and treelike as Darwin imagined it” (Doolittle 2000) due to widespread lateral gene transfer. The eukaryote cell (ancestor of all plants and animals) has its origin not in the sequential accumulation of small mutational changes but in the symbiotic union of previously independent prokaryotes. The origin of eukaryotes is one of several ‘major evolutionary transitions’ (Michod 1999) where “units that were capable of independent replication before the transition can replicate only as part of a larger whole after the transition” (Maynard Smith & Szathmary 1995). The exchange of genetic material between lineages, the formation of a new evolutionary entity through the genetic integration of symbionts, and the formation of new replicating entities from collections of extant entities, are, though they result from natural selection, not accommodated by a hill-climbing model. They are excluded by the fact that the accumulation of adaptive genetic material does not occur sequentially in a single population but in parallel in different semi-independent populations and is subsequently brought together by some form of genetic integration – lateral transfer, symbiogenesis, and other forms of reproductive encapsulation. Sexual recombination is on the cusp: if we treat the population as a cloud around the population consensus genotype then recombination will not provide a significant deviation from the hill-climbing framework. But hybridization between populations that have evolved semi-independently is a different matter (ref. j).

 


Motivations from Computing:
The scalability (or not) of artificial evolution

Biological systems posses numerous properties that are desirable for engineered systems such as adaptability, robustness, self-repair, and decentralized control. All of these properties can be incorporated to some extent in engineered systems, but the sheer scale of the complexity in biological systems – the number of parts and the richness of their interaction – is far beyond that which is attainable in hand-designed systems. Computational methods for design automation and optimization of engineered systems could in principle take us beyond the limitations of hand-designed systems. Evolutionary algorithms, based loosely on natural evolution, have been employed for this purpose. However, although there has been some success in automating design with evolutionary algorithms there are problems too. Specifically, the improvement in a design seems to come rapidly at first and quickly plateau – approaching an apparent ceiling in complexity rather quickly. One possibility for this failure to scale–up in complexity is simply a lack of computing power (see inset). But we must also consider the possibility that the underlying algorithm is inadequate – that we are lacking something fundamental in our algorithmic abstraction of evolution by natural selection.

 

How long should it take?

The disparity between the complexity of biological systems and that of artificially evolved engineered systems has many possible causes. Perhaps most obvious is that biological evolution has simply had more time, and has been able to exploit the massive parallelism of physical matter in exploring alternatives. However, this excuse is not wholly convincing – we are not really asking artificial evolution to produce the kind of complexity that biology has produced, starting from individual molecules to a multi-celled organism – only to produce the kind of complexity that is plausible given the scope of the system and the primitives we provide. For example, we can evolve an electronic circuit given gates, a robot morphology given struts and actuators, or a neural network given neurons, with say 100 primitive parts very quickly. But even in cases where we know that superior solutions are available in the design space, and computable in the time allowed, we are lucky if two orders of magnitude more computing time will get us a system that uses a few percent more parts in a useful way or shows a few percent improvement in fitness. Faced with these observations, a good computer scientist should not be content to simply wait longer.

 

 

Summary

In summary, the proposed research program is motivated by significant gaps in our understanding of how biological complexity arose, and additionally, significant gaps in our ability to automatically design and optimise complex engineered systems, as outlined above. These observations are just an example of the kind of issues that might be tackled in this research program in algorithmic biology. However, these particular issues provide a particularly promising direction for future research: a model of natural evolution that moves beyond the hill-climbing paradigm radically alters our perception of natural evolution, of what is evolvable and unevolvable, and how evolution works; and a model of ‘natural evolution writ large’, including the major evolutionary transitions, the formation of coalitions (the creation of new levels of organization and new levels of selection) has not been attempted, but is desperately needed. Moreover, a computational method that is able to discover and exploit collections of extant entities to form new levels of organization spontaneously would provide a means to overcome the complexity ceilings witnessed in artificial optimization methods to date. The intuition that natural evolution utilizes algorithmic principles as yet not fully understood, and that such principles might provide new computational methods for design automation and optimization, is very appealing. This research program is poised to exploit this: to expand biological models to incorporate evolution over multiple scales; and to develop computational methods that exploit the algorithmic principles these processes use.


Appendix

Below I outline some of relevant research questions for evolutionary biology and the corresponding questions for computer science. Appropriately delimited, some of these could form the basis of thesis/project work for students.

Biological issues

Computational issues

Lateral gene transfer, symbiogenesis and the major evolutionary transitions are not accommodated by a hill-climbing perception of evolutionary change. Are these events rare aberrations on the hill-climbing model or are they fundamentally significant, providing the new units within which hill-climbing-change takes place?

Is a hill-climbing mechanism of optimization the best we can do for design automation and optimization? How can we accommodate the human designers ability to make and manipulate abstractions of functional subsystems in a stochastic unsupervised search process?

If the formation of new levels of organization, of coalitions of co-adapting entities, is involved in the major evolutionary transitions what does this mean for the assumption that competitive exclusion is the ‘prime mover’ in evolutionary change? How does biology mediate between the formation of coalitions and the exclusion of competitors?

When selecting among design alternatives some will be better than others as they stand, but some may offer different potentials for further modification. How do we identify alternatives with different potentials that might work well together?

When an organism acquires a gene or suite of genes from another species by lateral transfer, the new genetic material represents a large non-random change for the recipient species. Under what circumstances, if any, will this change be fitter than a random change of the same scale?

When will a large change produced by exchanging sub-solutions among competing candidates be more effective than a higher mutation rate?

What is the relationship between various mechanisms of genetic exchange (bacterial conjugation, lateral gene transfer, true sexual recombination, hybridization, symbiogenesis) in terms of their adaptive affordances?

Can we devise an optimization scheme that allows us to tune the optimization process from a hill-climber to divide-and-conquer approach?

In systems biology we seek to identify semi-independent functional subsystems. How might different genetic mechanisms, different genetic architectures, and different adaptive scenarios influence our ability to discover such biological modules?

Human-designed systems are nearly always modular in construction, often hierarchically modular. Is this an artifact of our inability to think about a complex system without approximating it as a number of semi-independent sub-systems, or are common engineering domains inherently modular in nature?

How does the presence of modularity in adaptive domains, the semi-independence of adaptive features, effect the utility of different genetic mechanisms? And how does the architecture of the genome (nucleotides, codons, exons, genes, chromosomes, genotypes) effect the efficacy of these mechanisms?

Exactly what features of a problem domain affect the suitability of a hill-climbing vs divide-and-conquer approach, and how would we identify them? How does the choice of representation interact with the efficacy of different approaches?