TII Ei U N I V E R S I'r Y 0 I" M I C H-1 I G A N COLLEGE OF LITERATURE, SCIENCE5, AND THE ARTS Computer and Communication Sciences Department GENETIC AL;GORITHMS AS FUNCTION OPTIMIZERS by bert D.\\Bethke April 1978 Logic of Computers Group Computer and Communication Sciences Department Technical Report No. 212 with assistance from: National Aeronautics and Space Administration Grant No. NGG-1176 Langley Research Center Hampton, Virginia and National Science Foundation Grant No. MCS76 -04297

This is the text of a talk given at the ACM Computer Science Conference held in Detroit, February 21-23, 1978.

The results I am about to present come primarily from two research projects in which I have participated. These results form the foundation for my thesis research. Genetic algorithms are the brainchild of my thesis committee chairman, Professor John -H. Holland. I would like to present a new, rather different approach to an old problem. This problem is easy to state but difficult to solve. It is: find the point at which a real-valued function takes its maximal value. Tlhe function to be maximized is called the objective function and its domain, D, is the search space. We will assume that the search space is made up of n-tuples of real numbers. That is, the objective function is a real-valued function of n real variables. The usual approach to such a difficult problem is to make some simplifying assumptions about the problem and hope that the solution obtained will be easily extended to solve the original problem. In particular, if the objective function is assumed to be differentiable or perhaps twice differentiable, it is possible to use calculus-based methods to find the op timum. One such algorithm is the conjugate gradient method. (Actually, there are many minor variations on the basic method so this is a class of algorithms.) This method is essentially a modification of Newton's method. The conjugate gradient method implicitly assumes that the objective function is approximately quadratic and works best, of course, on quadratic objective functions. It also works well on other convex, unimodal objective functions. In fact, it actually works reasonably well on most differentiable, unimodal objective functions. But it fails miserably when 1

2 the objective function is not unimodal or not sufficiently smooth. Consider the following bi-modal objective function. (See figure 1.) If the conjugate method is applied with a starting value larger than 7.5, it will converge to the optimlum. But if the starting point is less than 7.5, it will converge to the false peak. There seems to be no easy way to "fix-up" the conjugate gradient method to handle multi-modal and non-differentiable objective functions. So we need a different approach. Instead of basing our search strategy on the geometry of quasiquadratic functions, let's make the simplifying assumption that the search space contains regions of good objective function values and regions of poor values. Let's represent the points in the search space as binary strings and let's look for good bit patterns. For the bi-modal objective Function shown here, we may choose to represenlt points in the interval from zero to sixteen by 12 bit binary numbers with the binary point assumed to be between the 4th and 5th bits. If we need more accuracy or a wider ra d> of values, we may use more bits and/or a different code such as a "floating point" representation. Naturally, we wish to concentrate our exploration efforts on the good regionls of the search space. One class of simple algorithms which does this is genetic aiLgorithms. A genetic algorithm maintains a C lection (pojpulation) of several strings and explores lhe search space by generating successive populations which {are distributed differently in the search spaceI. Hopefully, the distribution. -changes ~o as to cluster about the optimum as tite lgorithm proceeds. Tile principal way of generatigng a;ew:;tring is by crossover: two

3 strings are broken at a randomly chosen point, the final segments are switched, and we have two new strings (figure 2). The overall algorithm consists of choosing a random initial population, and then generating successive populations using the following basic cycle (see figure 3). Each string is replicated in proportion to its function value to generate an intermediate population. Pairs of strings are chosen at random from the intermediate population and combined using crossover to generate the new population. In addition, a small number of randomly chosen bits in the new population are mutated (changed). This serves as a source of variability and improves the search pattern of the algorithm. To get some feeling for how a genetic algorithm explores the search space, let's look at the following typical run (figure 4). The objective function is the bi-modal objective function described earlier, and the strings are 12 bit binary numbers with the binary point assumed to be between the 4th and 5th bits. The initial population consists of random points spread throughout the interval from 0 to 16. After one time through the basic cycle, the points are beginning to cluster about the two peaks. After two more times through the basic cycle, nearly all the points are near the true optimum. Genetic algorithms can optimize a large class of objective functions. My thesis research addresses the problem of describing that class of functions. Tor the pulrpose.s of this talk. it is sufficient to say that genetic algorithms work for a much larger class of functions than conjugate gradient methods. In comparing two optimizers, the traditional cost measure is the number of times the objective function must be evaluated or sampled. Here are some comparisons between genetic algorithms and conjugate gradient

4 methods (figures 5 and 6). The red curve represents the performance of the conjugate gradient method; the green curve represents the performance of the genetic algorithlm —actually the average performance since the genetic algorithm is stochastic and produces slightly different results for different random number streams. For unimodal objective functions, the conjugate gradient method very rapidly converges to the optimum while the genetic algorithm takes much longer. For a bi-modal objective, the conjugate gradient method converges to the wrong peak for about one-half the starting points resulting in the average performance shown by the solid red curve. The genetic algorithm performs the samne as on the unimodal functions. For a multi-modal objective function of many variables, the conjugate gradient method rarely finds the true optimum. The genetic algorithm, on the other hand, rarely gets trapped on a false peak. If the objective function is contaminated by a small amount of random noise, so that sampling the same point several times gives slightly different function values, then the conjugate Yradient method behaves like random search. Genetic algorithms are insensitive to small amounts of noise and their performance is unchanged until the function values differ from the optimal v;.-Jlue by less than thle noise level. Performance curves for discontinuous objective functions are very similar to those for noisy objective functions. Also, note that genetic algo —. thms use l-:ss computational effort to generate each sample point. So comparisons based on CPU time would be more favorable to genetic algorithms. rIn conclusict: If you know that the objective funct — n has special properties which you can exploit, then use a special method like conjugate gradient or whatever is appropriate. If you.ton t know of any special

5 properties of the objective function, then you should use a method which is more general like genetic algorithms.

A BI-MODAL OBJECTI V E FUNCTION S 5". 4 ~-1.'*1 -? — 1 F(x) = 6 2 elxlIexI CON~AUGRTE GRADIENT METHOD WILL FRIL IF STARTING POINT iS TO TH'.E LEFT OFr 7S.

C) r @ O D tJ o O - -- O 0 -- 0 - -- m o — (9 o- o. -- uc o., O, o * o-g - r o- _ ~}-E oQ- m 0 ---..- O -t o _ o - o Z _ _ 0 0 _ _ _ _ o o - - O - 0c, - -'~ — 0._,

NOIL!1 fld d No L.1fqdod M0 0N 010 00 I I 0 0010 10 3- 11 0 01011 1 _ _ Ilo 01 10 0sm 101I ii11 ~ 3flD I A

~- E ~~~~VO ~~~~~3%3~~~1~~38~~3~ I NO~~~~~~~~~~~~~~~i~~~~~~M31\t3~~~~~~~~~~~~~~~~~~~ O nPl% AM33~ %, l,4n3::

I; I (:;JIt1: 5 PERFORMANCE *Y SRMPLES [ I MOD::DL O'JECTI\/E

3 90i5 02229 z448 PERFORMANCE F~ AX$!.- SAMP LES - NOISE-FREE OBJECT!VE MFaX III IIIISY t S~MPLES Nho i S OBZEC-T'IVE