GPSR#
An important concept for understanding Bingo is GPSR. GPSR stands for genetic programming with symbolic regression. Symbolic regression involves finding an equation that models data and genetic programming is a method to do symbolic regression by modeling evolution.
Equations#
GPSR uses the process of evolution to find equations that model data well.
To accomodate the genetic-like process of GPSR, we encode equations as
directed acyclic graphs: AGraphs
. AGraphs
have nodes and connections
between nodes. These nodes are either terminals, which load data, or operators,
which perform operations (e.g., addition, multiplication, etc.).
Consider the AGraph
above which represents the equation
\(C_0 X_0 + X_0 + X_1\). Notice, how there are terminal nodes at the bottom
of the AGraph
\(C_0\), \(X_0\), and \(X_1\). Node \(C_0\)
loads constant 0, which is a free-form numeric value that can change based on
data (i.e. \(C_0\) could be -1.2, 5.0, etc. depending on its setting).
Nodes \(X_0\) and \(X_1\) load the first and second variables of the
dataset respectively. There are also operators that turn the terminals
into a full-scale equation. For example, the multiplication node results
in \(C_0 * X_0 = C_0 X_0\) and the root node results in the entire
equation \(C_0 X_0 + X_0 + X_1\).
Evolution#
GPSR works by evolving a population of equations. It evolves equations in stages: variation, evaluation, and selection. It continues to do this until a good enough equation is found or another criteria is met (see the picture below).
Variation#
In order to get better individuals, we have to change those that are already in the population. This is usually done through two possible operators: mutation and crossover. Mutation takes an equation and slightly changes it.
Crossover takes two equations and mixes their parts together. Some of the individuals from our population undergo one, both, or neither of these operators to form a child population.
Evaluation#
In order to make decisions about which individuals to keep, we want to be able to gauge how good/fit each individual is. Evaluation is the process of assigning fitness values to individuals which mark how good/bad they are at the particular task that we care about.
Selection#
We now have a population of parents and children with scores associated with each individual, so we can decide which ones should move onto the next generation. Selection is the process of deciding which individuals will continue in the evolutionary process. This is usually done by looking at individual’s fitness and mixing in some randomness or other factors.
Finishing#
Once we have reached a termination criteria (e.g., we evolved for a certain number of generations or we found a good enough individual), we stop evolving and get the final population. We can then select some individual from that population (or the entire run) to use to do our task.