Using the Scikit-Learn Wrapper#
A no-fuss way of using Bingo is by using the scikit-learn wrapper: SymbolicRegressor
. Let’s setup a test case to show how it works.
Setting Up the Regressor#
There are many options that can be set in SymbolicRegressor
. Here we set some basic ones including population_size
(the number of equations in a population), stack_size
(the max number of nodes per equation), and use_simplification
(whether to use simplification to speed up equation evaluation and for easier reading). You can see all of SymbolicRegressor
’s options here.
[1]:
from bingo.symbolic_regression.symbolic_regressor import SymbolicRegressor
regressor = SymbolicRegressor(population_size=100, stack_size=16,
use_simplification=True,
max_time=20)
Training Data#
Here we’re just creating some dummy training data from the equation \(5.0 X_0^2 + 3.5 X_0\). More on training data can be found in the data formatting guide.
[47]:
import numpy as np
np.random.seed(0)
X_0 = np.linspace(-10, 10, num=30).reshape((-1, 1))
X = np.array(X_0)
y = 5.0 * X_0 ** 2 + 3.5 * X_0
[ ]:
import matplotlib.pyplot as plt
plt.scatter(X, y)
plt.xlabel("X_0")
plt.ylabel("y")
plt.title("Training Data")
plt.show()
Fitting the Regressor#
Fitting is as simple as calling the .fit()
method.
[ ]:
regressor.fit(X, y)
Getting the Best Individual#
[ ]:
best_individual = regressor.get_best_individual()
print("best individual is:", best_individual)
Predicting Data with the Best Individual#
You can use the regressor’s .predict(X)
or the best_individual’s .predict(X)
to get its predictions for X
.
[51]:
pred_y = regressor.predict(X)
pred_y = best_individual.predict(X)
[ ]:
plt.scatter(X, y)
plt.plot(X, pred_y, 'r')
plt.xlabel("X_0")
plt.ylabel("y")
plt.legend(["Actual", "Predicted"])
plt.show()
Checking out the Pareto front#
The regressor has a get_pareto_front()
function that can be used to investigate the tradeoff of fitness and complexity.
[ ]:
pareto_front = regressor.get_pareto_front()
plt.step([i.complexity for i in pareto_front],
[max(i.fitness, 1e-20) for i in pareto_front],
'o-')
for equ in pareto_front:
plt.text(equ.complexity,
(max(equ.fitness, 1e-20))*3,
str(equ))
plt.yscale("log")
plt.xlabel("Complexity")
plt.ylabel("Fitness (MSE)")
[ ]: