David R Newman

Individual Research Project (IRP)

The Use of Linkage Learning in Genetic Algorithms

A copy of my IRP report can be downloaded here. Also as part of my project I gave a presentation and produced an A1 poster.

Project Abstract

This report investigates the use of Linkage Learning in Genetic Algorithms (GAs) to try and determine why it is worthwhile using this technique to improve the power of GAs. This report first introduces GAs and how they go about converging on optimal solutions with comparison to other search algorithms. Then it discusses what linkage learning is and several of the techniques used to perform linkage learning. The GA from 'Learning Linkage' [1] is then analysed. A re-implementation of the analysed GA is then tested, modified and experimented on to gain greater understanding of how the GA learns linkage and what are the motivating factors behind the improvement in linkage.

From experimentation on the re-implementation it is concluded that performing crossover using recipient genomes that are the complement to the optimal building block and the truncated selection of only optimal building block donors are two of the factors that motivate improvement in linkage. Leading to the conclusion that Harik's linkage learning GA (LLGA) may not be able to learn both linkage and fitness at the same time. Further experimentation, which tests for both improvement in linkage and fitness is suggested, to support the conjecture that Harik's LLGA cannot learn both linkage and fitness simultaneously. Finally the conclusions from the experimentation are combined with the research on other GAs to answer the following questions:

  1. How can functionally dependent genes be aligned adjacently in LLGAs?
  2. What conditions are required for a LLGA to learn linkage?
  3. Why is important for linkage to increase?

Source Code

As part of this project I developed several pieces of source code to test and try and explain the results of the experiments from an earlier researcher. A ZIP file of the Matlab source code can be downloaded here. Below is a listing of the files included in the ZIP and the differences from one to the next.

  • Uses a fitness function based on the degree of linkage.
  • Re-implements Harik's "Linkage Skew" experiment.
  • Same as linkage2.m but initial population has random not optimal building blocks.
  • Same as linkage3.m but recepient genome has random rather than the complement of the optimal building blocks.
  • Same as linkage4.m but uses continuous rather than binary fitness function.
  • Same as linkage5.m except donor is taken from the previous generation rather than generated randomly.

The following commands can be used to run the Genetic Algorithms at the Matlab command line:

The definitions for the variables used above are as follows:

  • numgens:
  • numactive:
  • initpop:

  • numit:
  • thepower:
  • nummuts:

  • splicesize:
  • The number of genes in each genome.
  • The number of active genes in each genome
  • The size of the initial genome population and consequently the size of each generation's population.
  • The number of generations the algorithm should run for.
  • The power of which to raise the fitness function value by.
  • The number of mutations in each genome at each generation
  • The size of the splice to be taken from te donor (where 0 equals random, averaging half the genome size).

Page written by David R Newman (drn[at]ecs.soton.ac.uk). Last updated October 13 2009 18:12:11.