Skip to content
forked from melisgl/mgl-gpr

MGL-GPR is a library for genetic programming: evolving typed expressions for a particular purpose from a set of operators and constants.

License

Notifications You must be signed in to change notification settings

algunion/mgl-gpr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPR Manual

Table of Contents

[in package MGL-GPR]

1 mgl-gpr ASDF System Details

  • Version: 0.0.1
  • Description: MGL-GPR is a library for genetic programming: evolving typed expressions for a particular purpose from a set of operators and constants.
  • Licence: MIT, see COPYING.
  • Author: Gábor Melis
  • Mailto: mega@retes.hu
  • Homepage: http://quotenil.com

2 Background

What is Genetic Programming? This is what Wikipedia has to say:

In artificial intelligence, genetic programming (GP) is an
evolutionary algorithm-based methodology inspired by biological
evolution to find computer programs that perform a user-defined
task. Essentially GP is a set of instructions and a fitness
function to measure how well a computer has performed a task. It
is a specialization of genetic algorithms (GA) where each
individual is a computer program. It is a machine learning
technique used to optimize a population of computer programs
according to a fitness landscape determined by a program's ability
to perform a given computational task.

Lisp has a long history of Genetic Programming because GP involves manipulation of expressions which is of course particularly easy with sexps.

GP is quick to get up and running, can produce good results across a wild variety of domains, but it needs quite a bit of fiddling to perform well and domain specific approaches will almost always have better results. All in all, GP can be very useful to cut down on the tedium of human trial and error.

I originally wrote this library while working for Ravenpack who agreed to release it under an MIT licence. Several years later I cleaned it up, and documented it. Enjoy.

3 Tutorial

GPR works with typed expressions. Mutation and crossover never produce expressions that fail with a type error. Let's define a couple of operators that work with real numbers and also return a real:

(defparameter *operators* (list (operator (+ real real) real)
                                (operator (- real real) real)
                                (operator (* real real) real)
                                (operator (sin real) real)))

One cannot build an expression out of these operators because they all have at least one argument. Let's define some literal classes too. The first is produces random numbers, the second always returns the symbol *X*:

(defparameter *literals* (list (literal (real)
                                 (- (random 32.0) 16.0))
                               (literal (real)
                                 '*x*)))

Armed with *OPERATORS* and *LITERALS*, one can already build random expressions with RANDOM-EXPRESSION, but we also need to define how good a certain expression is which is called fitness.

In this example, we are going to perform symbolic regression, that is, try to find an expression that approximates some target expression well:

(defparameter *target-expr* '(+ 7 (sin (expt (* *x* 2 pi) 2))))

Think of *TARGET-EXPR* as a function of *X*. The evaluator function will bind the special *X* to the input and simply EVAL the expression to be evaluated.

(defvar *x*)

The evaluator function calculates the average difference between EXPR and TARGET-EXPR, penalizes large expressions and returns the fitness of EXPR. Expressions with higher fitness have higher chance to produce offsprings.

(defun evaluate (gp expr target-expr)
  (declare (ignore gp))
  (/ 1
     (1+
      ;; Calculate average difference from target.
      (/ (loop for x from 0d0 to 10d0 by 0.5d0
               summing (let ((*x* x))
                         (abs (- (eval expr)
                                 (eval target-expr)))))
         21))
     ;; Penalize large expressions.
     (let ((min-penalized-size 40)
           (size (count-nodes expr)))
       (if (< size min-penalized-size)
           1
           (exp (min 120 (/ (- size min-penalized-size) 10d0)))))))

When an expression is to undergo mutation, a randomizer function is called. Here we change literal numbers slightly, or produce an entirely new random expression that will be substituted for EXPR:

(defun randomize (gp type expr)
  (if (and (numberp expr)
           (< (random 1.0) 0.5))
      (+ expr (random 1.0) -0.5)
      (random-gp-expression gp (lambda (level)
                                 (<= 3 level))
                            :type type)))

That's about it. Now we create a GP instance hooking everything up, set up the initial population and just call ADVANCE a couple of times to create new generations of expressions.

(defun run ()
  (let ((*print-length* nil)
        (*print-level* nil)
        (gp (make-instance
             'gp
             :toplevel-type 'real
             :operators *operators*
             :literals *literals*
             :population-size 1000
             :copy-chance 0.0
             :mutation-chance 0.5
             :evaluator (lambda (gp expr)
                          (evaluate gp expr *target-expr*))
             :randomizer 'randomize
             :selector (lambda (gp fitnesses)
                         (declare (ignore gp))
                         (hold-tournament fitnesses :n-contestants 2))
             :fittest-changed-fn
             (lambda (gp fittest fitness)
               (format t "Best fitness until generation ~S: ~S for~%  ~S~%"
                       (generation-counter gp) fitness fittest)))))
    (loop repeat (population-size gp) do
      (add-individual gp (random-gp-expression gp (lambda (level)
                                                    (<= 5 level)))))
    (loop repeat 1000 do
      (when (zerop (mod (generation-counter gp) 20))
        (format t "Generation ~S~%" (generation-counter gp)))
      (advance gp))
    (destructuring-bind (fittest . fitness) (fittest gp)
      (format t "Best fitness: ~S for~%  ~S~%" fitness fittest))))

Note that this example can be found in example/symbolic-regression.lisp.

4 Expressions

Genetic programming works with a population of individuals. The individuals are sexps that may be evaluated directly by EVAL or by other means. The internal nodes and the leafs of the sexp as a tree represent the application of operators and literal objects, respectively. Note that currently there is no way to represent literal lists.

  • [class] EXPRESSION-CLASS

    An object of EXPRESSION-CLASS defines two things: how to build a random expression that belongs to that expression class and what lisp type those expressions evaluate to.

  • [reader] RESULT-TYPE EXPRESSION-CLASS

    Expressions belonging to this expression class must evaluate to a value of this lisp type.

  • [reader] WEIGHT EXPRESSION-CLASS

    The probability of an expression class to be selected from a set of candidates is proportional to its weight.

  • [class] OPERATOR EXPRESSION-CLASS

    Defines how the symbol NAME in the function position of a list can be combined arguments: how many and of what types. The following defines + as an operator that adds two FLOATs:

      (make-instance 'operator 
                     :name '+
                     :result-type float
                     :argument-types '(float float))
    

    See the macro OPERATOR for a shorthand for the above.

    Currently no lambda list keywords are supported and there is no way to define how an expression with a particular operator is to be built. See RANDOM-EXPRESSION.

  • [reader] NAME OPERATOR

    A symbol that's the name of the operator.

  • [reader] ARGUMENT-TYPES OPERATOR

    A list of lisp types. One for each argument of this operator.

  • [macro] OPERATOR (NAME &REST ARG-TYPES) RESULT-TYPE &KEY (WEIGHT 1)

    Syntactic sugar for instantiating operators. The example given for OPERATOR could be written as:

      (operator (+ float float) float)
    

    See WEIGHT for what WEIGHT means.

  • [class] LITERAL EXPRESSION-CLASS

    This is slightly misnamed. An object belonging to the LITERAL class is not a literal itself, it's a factory for literals via its BUILDER function. For example, the following literal builds bytes:

      (make-instance 'literal
                     :result-type '(unsigned-byte 8)
                     :builder (lambda () (random 256)))
    

    In practice, one rarely writes it out like that, because the LITERAL macro provides a more convenient shorthand.

  • [reader] BUILDER LITERAL

    A function of no arguments that returns a random literal that belongs to its literal class.

  • [macro] LITERAL (RESULT-TYPE &KEY (WEIGHT 1)) &BODY BODY

    Syntactic sugar for defining literal classes. The example given for LITERAL could be written as:

      (literal ((unsigned-byte 8))
        (random 256))
    

    See WEIGHT for what WEIGHT means.

  • [function] RANDOM-EXPRESSION OPERATORS LITERALS TYPE TERMINATE-FN

    Return an expression built from OPERATORS and LITERALS that evaluates to values of TYPE. TERMINATE-FN is a function of one argument: the level of the root of the subexpression to be generated in the context of the entire expression. If it returns T then a LITERAL will be inserted (by calling its BUILDER function), else an OPERATOR with all its necessary arguments.

    The algorithm recursively generates the expression starting from level 0 where only operators and literals with a RESULT-TYPE that's a subtype of TYPE are considered and one is selected with the unnormalized probability given by its WEIGHT. On lower levels, the ARGUMENT-TYPES specification of operators is similarly satisfied and the resulting expression should evaluate without without a type error.

    The building of expressions cannot backtrack. If it finds itself in a situation where no literals or operators of the right type are available then it will fail with an error.

5 Basics

To start the evolutionary process one creates a GP object, adds to it the individuals that make up the initial population and calls ADVANCE in a loop to move on to the next generation.

  • [class] GP

    The GP class defines the search space, how mutation and recombination occur, and hold various parameters of the evolutionary process and the individuals themselves.

  • [function] ADD-INDIVIDUAL GP INDIVIDUAL

    Adds INDIVIDUAL to POPULATION of GP. Usually called to initialize the GP, but it is also allowed to add individuals (or change POPULATION in any way) in between calls to ADVANCE.

  • [function] RANDOM-GP-EXPRESSION GP TERMINATE-FN &KEY (TYPE (TOPLEVEL-TYPE GP))

    Creating the initial population by hand is tedious. This convenience function calls RANDOM-EXPRESSION to create a random individual that produces GP's TOPLEVEL-TYPE. By passing in another TYPE one can create expressions that fit somewhere else in a larger expression which is useful in a RANDOMIZER function.

  • [function] ADVANCE GP

    Create the next generation and place it in POPULATION.

6 Search Space

The search space of the GP is defined by the available operators, literals and the type of the final result produced. The evaluator function acts as the guiding light.

  • [reader] OPERATORS GP

    The set of OPERATORs from which (together with LITERALs) individuals are built.

  • [reader] LITERALS GP

    The set of LITERALs from which (together with OPERATORs) individuals are built.

  • [reader] TOPLEVEL-TYPE GP

    The type of the results produced by individuals. If the problem is to find the minimum a 1d real function then this may be the symbol REAL. If the problem is to find the shortest route, then this may be a vector. It all depends on the representation of the problem, the operators and the literals.

  • [reader] EVALUATOR GP

    A function of two arguments: the GP object and the individual. It must return the fitness of the individual. Often, the evaluator just calls EVAL, or COMPILE + FUNCALL, and compares the result to some gold standard. It is also typical to slightly penalize solution with too many nodes to control complexity and evaluation cost (see COUNT-NODES). Alternatively, one can specify MASS-EVALUATOR instead.

  • [reader] MASS-EVALUATOR GP

    NIL or a function of three arguments: the GP object, the population vector and the fitness vector into which the fitnesses of the individuals in the population vector shall be written. By specifying MASS-EVALUATOR instead of an EVALUATOR, one can, for example, distribute costly evaluations over multiple threads. MASS-EVALUATOR has precedence over EVALUATOR.

  • [function] COUNT-NODES TREE &KEY INTERNAL

    Count the nodes in the sexp TREE. If INTERNAL then don't count the leaves.

7 Reproduction

The RANDOMIZER and SELECTOR functions define how mutation and recombination occur.

  • [reader] RANDOMIZER GP

    Used for mutations, this is a function of three arguments: the GP object, the type the expression must produce and current expression to be replaced with the returned value. It is called with subexpressions of individuals.

  • [reader] SELECTOR GP

    A function of two arguments: the GP object and a vector of fitnesses. It must return the and index into the fitness vector. The individual whose fitness was thus selected will be selected for reproduction be it copying, mutation or crossover. Typically, this defers to HOLD-TOURNAMENT.

  • [function] HOLD-TOURNAMENT FITNESSES &KEY SELECT-CONTESTANT-FN N-CONTESTANTS

    Select N-CONTESTANTS (all different) for the tournament randomly, represented by indices into FITNESSES and return the one with the highest fitness. If SELECT-CONTESTANT-FN is NIL then contestants are selected randomly with uniform probability. If SELECT-CONTESTANT-FN is a function, then it's called with FITNESSES to return an index (that may or may not be already selected for the tournament). Specifying SELECT-CONTESTANT-FN allows one to conduct 'local' tournaments biased towards a particular region of the index range.

8 Environment

The following are just various knobs to control the environment in which individuals live.

  • [reader] GENERATION-COUNTER GP

    A counter that starts from 0 and is incremented by ADVANCE. All accessors of GP are allowed to be specialized on a subclass of GP which allows them to be functions of GENERATION-COUNTER.

  • [accessor] POPULATION-SIZE GP

    The number of individuals in a generation.

The new generation is created by applying a reproduction operator until POPULATION-SIZE is reached in the new generation. At each step, a reproduction operator is randomly chosen.

  • [accessor] COPY-CHANCE GP

    The probability of the copying reproduction operator being chosen. Copying simply creates an exact copy of a single individual.

  • [accessor] MUTATION-CHANCE GP

    The probability of the mutation reproduction operator being chosen. Mutation creates a randomly altered copy of an individual. See RANDOMIZER.

If neither copying nor mutation were chosen, then a crossover will take place.

  • [accessor] KEEP-FITTEST-P GP

    If true, then the fittest individual is always copied without mutation to the next generation. Of course, it may also have other offsprings.

9 Individuals

  • [accessor] POPULATION GP

    An adjustable array with a fill-pointer that holds the individuals that make up the population.

  • [reader] FITTEST GP

    The fittest individual ever to be seen by this GP and its fittness as a cons cell.

  • [accessor] FITTEST-CHANGED-FN GP

    If non-NIL, a function that's called when FITTEST is updated with three arguments: the GP object, the fittest individual and its fitness. Useful for tracking progress.


[generated by MGL-PAX]

About

MGL-GPR is a library for genetic programming: evolving typed expressions for a particular purpose from a set of operators and constants.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published