Saturday, April 20, 2013

76. Genetic Programming: Evolution of Computer Programs

'Further research into intelligence of machinery will probably be very greatly concerned with "searches" ...' (Turing 1948).

'We cannot expect to find a good child-machine at the first attempt.  One must experiment with teaching one such machine and see how well it learns.  One can then try another and see if it is better or worse.  There is an obvious connection between this process and evolution, by the identifications' (Turing 1950).

Evolutionary searches through a phase space can be made, not only for finding solutions using a suitably written computer program ('genetic algorithm'), but also for evolving the computer programs themselves. Genetic programming (GP) is about making computers evolve their own algorithms. Genetic programming now routinely delivers high-return 'human-competitive 'machine intelligence'.

But why should this be attempted at all?

As I stated in Part 74, the computations occurring in the human brain are of a massively parallel nature. And ambitions about mimicking the human brain must therefore involve writing programs with parallel computing on a very large scale. But this is not possible. I quote Tom Ray:
The complexity of programming a massively parallel machine is probably beyond us. I don't think we will ever be able to write software that fully uses the capacity of parallelism.
And it is not just the question of mimicking the human brain. For any highly complex system (and technological world abounds in them), it is very difficult, well nigh impossible, to write efficient, versatile, and robust computer programs. The problem becomes particularly acute when a large amount of parallel programming is essential. It appears that the only hope lies in letting computer codes evolve, just as the solution of a problem by the use of a fixed program evolves in a genetic algorithm (GA).

We have to breed software, which can be turned on itself, ad infinitum, so that it evolves to a desired end.

Such breeding of software becomes a necessity when we are trying to solve some extremely difficult problems in technology development. An example is the development of a multimillion-line program for automatic flying of aircraft. This is an example of wanting our machines to solve problems we do not know how to solve, but merely know how to state. It is impossible to make such huge programs bug-free. And just one bug may be enough to cause the plane to crash! We do not want that to happen.

Learning from ant colonies is one way to proceed. But a more general way is to attempt GP.

Here are some characteristics of GP for massively parallel computer codes:
  • There is an absence of central command. After all, no human is doing the coding. Only the problem is stated, and some 'boundary conditions' are specified.
  • The massiveness and distributedness of such codes ensures that even if there is a bug or a local crash, the system carries on, after some minor readjustment.
  • There is incremental expansion, followed by testing at each such 'modular' stage.
  • Parasites, viruses, adversaries are deliberately self-introduced at the testing stage, so that there is a built-in ability / experience of such eventualities. This is the equivalent of vaccination in living beings.
Naturally, such codes develop the all-important robustness feature. This is crucial when one is trying to develop, say, a code for flying a fully computer-controlled aircraft. We cannot afford a programming failure. [Think of the beehive. There is enough redundancy that even if a portion of the hive or population is decimated, the complex adaptive superorganism makes adjustments, and carries on regardless.]

Typically, one begins with a population of competing computer programs. Variations (including mutations) are introduced, as also the equivalent of crossover in biological reproduction (cf. Part 75). The variations are made heritable. Fitness criteria are defined, and the individual programs are made to compete for survival. The fitter programs are assigned a larger chance of surviving and reproducing in the next cycle of the evolutionary operation.

Each of the competing algorithms is normally a sequence of commands. What is being attempted, in effect, is that during the process of evolution the sequence of commands is shuffled and the commands are tempered with in other ways. This process is continued till the right combination of everything is achieved.

You might be wondering how can we tinker with a computer program and still hope that it would be functional. Even the slightest of errors in writing, say, a Fortran code can make it impossible to run.

Yes; conventional algorithms are fragile. Even a minor crossover or mutation of any command in an algorithm can really be a mutilation, resulting in a totally nonsensical or illegal statement, making the computer program to crash. In principle one could get over this problem by assigning zero fitness to all illegal algorithms, but then most of the members of the population would be declared unfit, and the evolution process would come to a halt because it would not have the all-important benefit of continual variety.

The way to make computer programs less fragile lies in giving them a tree structure, rather than a sequential structure. The tree structure can make the algorithm robust enough to withstand the depredations of crossover and mutation etc. with a certain degree of resilience. More details can be found in Sipper (2002), and Konar (2005).


The figure below is a program for the function f(X) = (2.2 – (X/11)) + (7 * cos(Y)).

And here is an example of how crossover between two parent GPs is carried out: 

Such evolutionary or 'bio-inspired' computing has also been used for evolving sophisticated analogue electrical circuits.

'Gene-expression programming' is an important variation on GP.