Formatting Data#
All Bingo equations expect data to be formatted based on the number of variables and datapoints in the dataset.
Input#
Bingo expects that input data is formatted with each variable as a column and each datapoint as a row.
Layout of inputs:
\(i\) |
\(X_0\) |
\(X_1\) |
\(\ldots\) |
\(X_n\) |
---|---|---|---|---|
0 |
0.1 |
1.2 |
\(\ldots\) |
1.2 |
1 |
0.1 |
2.3 |
\(\ldots\) |
3.5 |
2 |
0.1 |
1.2 |
\(\ldots\) |
6.0 |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
Note
Bingo starts counting at 0, so \(X_0\) is the first variable, \(X_1\) is the second, and so on.
So, if we had 2 variables and 10 samples, we would have an array with 10 rows and 2 columns:
import numpy as np
X_0 = np.linspace(1, 10, num=10).reshape((-1, 1))
X_1 = np.linspace(-10, 1, num=10).reshape((-1, 1))
X = np.hstack((X_0, X_1))
Output#
Bingo expects output data to be formatted as a of the same number of samples as the input.
Layout of output:
\(i\) |
\(0\) |
\(1\) |
\(\ldots\) |
\(n\) |
---|---|---|---|---|
\(y_i\) |
0.0 |
-1.1 |
\(\ldots\) |
5.0 |
Using the previous setup, let’s create output data by using the equation \(5.0 * X_0 + X_1\):
y = 5.0 * X_0 + X_1