Using the Scikit-Learn Wrapper#

A no-fuss way of using Bingo is by using the scikit-learn wrapper: SymbolicRegressor. Let’s setup a test case to show how it works.

Setting Up the Regressor#

There are many options that can be set in SymbolicRegressor. Here we set some basic ones including population_size (the number of equations in a population), stack_size (the max number of nodes per equation), and use_simplification (whether to use simplification to speed up equation evaluation and for easier reading). You can see all of SymbolicRegressor’s options here.

[1]:
from bingo.symbolic_regression.symbolic_regressor import SymbolicRegressor
regressor = SymbolicRegressor(population_size=100, stack_size=16,
                              use_simplification=True,
                              max_time=20)

Training Data#

Here we’re just creating some dummy training data from the equation \(5.0 X_0^2 + 3.5 X_0\). More on training data can be found in the data formatting guide.

[47]:
import numpy as np
np.random.seed(0)
X_0 = np.linspace(-10, 10, num=30).reshape((-1, 1))
X = np.array(X_0)
y = 5.0 * X_0 ** 2 + 3.5 * X_0
[ ]:
import matplotlib.pyplot as plt
plt.scatter(X, y)
plt.xlabel("X_0")
plt.ylabel("y")
plt.title("Training Data")
plt.show()

Fitting the Regressor#

Fitting is as simple as calling the .fit() method.

[ ]:
regressor.fit(X, y)

Getting the Best Individual#

[ ]:
best_individual = regressor.get_best_individual()
print("best individual is:", best_individual)

Predicting Data with the Best Individual#

You can use the regressor’s .predict(X) or the best_individual’s .predict(X) to get its predictions for X.

[51]:
pred_y = regressor.predict(X)
pred_y = best_individual.predict(X)
[ ]:
plt.scatter(X, y)
plt.plot(X, pred_y, 'r')
plt.xlabel("X_0")
plt.ylabel("y")
plt.legend(["Actual", "Predicted"])
plt.show()

Checking out the Pareto front#

The regressor has a get_pareto_front() function that can be used to investigate the tradeoff of fitness and complexity.

[ ]:
pareto_front = regressor.get_pareto_front()
plt.step([i.complexity for i in pareto_front],
         [max(i.fitness, 1e-20) for i in pareto_front],
         'o-')
for equ in pareto_front:
    plt.text(equ.complexity,
             (max(equ.fitness, 1e-20))*3,
             str(equ))
plt.yscale("log")
plt.xlabel("Complexity")
plt.ylabel("Fitness (MSE)")
[ ]: