Techniques

Techniques#

The techniques used by UQPCE are discussed in this section. The additional statistics, confidence intervals on model parameters, and experimental designs will be discussed.

Statistics#

UQPCE provides several useful model statistics when the --stats flag is used. These statistics are discussed below.

R-Squared#

The \(R^2\) statistic is a measure of how well the results are modeled by the surrogate model.

To calculate the \(R^2\) statistic, the total sum or squares, \(SS_T\), and the error sum of squares, \(SS_E\) must first be calculated

\[ SS_T = y'y - \frac{\big(\sum_{i=1}^{n} y_i\big)^2}{n} \]

\[ SS_E = y'y - \hat{\beta}'X'y \]

From these two statistics, we can calculate \(R^2\)

\[ R^2 = 1 - \frac{SS_E}{SS_T} \]

[Mon13]

R-Squared Adjusted#

The \(R^2_{adj}\) statistic, much like \(R^2\), is a measure of how well the results are modeled by the surrogate model. However, this statistic reflects the fit for the number of terms included. If terms that only slightly improve the model are included, \(R^2_{adj}\) is penalized.

The \(R^2_{adj}\) is given by

\[ R^2_{adj} = 1\ -\ \frac{n - 1}{n - p}\bigg(1\ -\ R^2\bigg) \]

[Mon13]

Stepwise Regression#

Stepwise regression is used to intelligently build the surrogate model. Below, the equations used in stepwise regression will be discussed.

Stepwise regression uses an f-distributed variable to determine whether or not a term will be added to the model. The f-distribution used by UQPCE is shown below

\[ F_{in} = F_{\alpha_{in}}(df_1 = 1, df_2 = n_{iter}-p_{iter}) \]

\[ F_{out} = F_{\alpha_{out}}(df_1 = 1, df_2 = n_{iter}-p_{iter}) \]

where \(\alpha_{in}\) and \(\alpha_{out}\) are sensitivities to add and remove terms and \(f_{in} \geq f_{out}\).

The difference in the sum of squares, \(SS_R\), due to one model term is calculated for the model that includes the most terms and the model that omits term j.

\[ SS_R(\beta_j|\beta_i,\beta_0) = SS_R(\beta_i,\beta_j|\beta_0) - SS_R(\beta_i|\beta_0) \]

The mean squared error, \(MS_E\), is for the model that includes the most terms

\[ MS_E(x_j, x_i) = \frac{y'y - \hat{\beta}X'y}{n_{iter}-p_{iter}} \]

\[ f_j = \frac{SS_R(\beta_j|\beta_i,\beta_0)}{MS_{E_{full}}} \]

where \(\beta_0\) is the intercept term, \(\beta_i\) is the most correlated term, and \(\beta_j\) is the \(j^{th}\) remaining term.

Stepwise regression follows the following steps:

Build a model consisting of the intercept and the most-correlated term.
Calculate the partial F-statistic for the remaining terms.
Add the term that, when added, results in the largest partial F-statistic if \(F_j > F_{in}\).
Calculate the partial F-statistic for removing each of the terms in the current model.
Remove the term with the smallest partial F-statistic if \(F_j < F_{out}\).

[Mon07]

Backward Elimination#

Backward elimination is the process of removing model terms within some threshold, \(\alpha_{out}\), to reduce model overfitting. This method begins by building a full PCE model, deleting the least significant term at each iteration until a stopping condition is satisfied. To do this, a partial \(F\)-distributed variable shown in Equation (12) determines the significance of each term

(12)#\[ F_{out} = F_{\alpha_{out}}(df_1 = 1, df_2 = n_{iter}-p_{iter}) \]

The difference in the sum of squares, \(SS_R\), due to one model term is calculated for the model that includes the most terms and the model that omits term j.

\[ SS_R(\beta_j|\beta_i,\beta_0) = SS_R(\beta_i,\beta_j|\beta_0) - SS_R(\beta_i|\beta_0) \]

The mean squared error, \(MS_E\), is for the model that includes the most terms

\[ MS_E(x_j, x_i) = \frac{y'y - \hat{\beta}X'y}{n_{iter}-p_{iter}} \]

Eq. (13) shows a ratio of the \(SS_R\) of the model with term \(j\) deleted and the \(MS_E\) for the model that includes term \(j\) is calculated to determine the effect of a term on the model. In the equation, \(\beta_0\) is the intercept term and \(\beta_j\) is the \(j^{th}\) term.

(13)#\[ F_j = \frac{SS_R(\beta_j|\beta_{0...p})}{MS_{E_{full}}} \]

The following steps are followed to complete the backwards elimination process:

Build the full polynomial chaos expansion model.
Calculate the partial F-statistic for all terms.
Remove the term that, when removed, results in the smallest partial F-statistic if \(F_j < F_{out}\).
Repeat this until \(F_j >= F_{out}\).

[Mon07]

Individual Variable Order#

Individual variable order allows for one or more variables to have a different order than the other variables. Allowing for a higher order input for specified variables is intended for users that know which variables have a higher order in their model.

Example#

To use this option, include order for the desired variables as shown below, where you can see that Variable 4 has order: 2 while the model order is order: 1

Variable 0:
    distribution: normal
    mean: 1
    stdev: 0.5
    type: aleatory
Variable 1:
    distribution: uniform
    interval_low: 1.75
    interval_high: 2.25
    type: aleatory
Variable 2:
    distribution: exponential
    lambda: 3
    type: aleatory
Variable 3:
    distribution: beta
    alpha: 0.5
    beta: 2.0
    type: aleatory
Variable 4:
    distribution: gamma
    alpha: 1.0
    theta: 0.5
    type: aleatory
    order: 2
    
Settings:
    order: 1
    version: true
    verbose: true
    plot: true
    plot_stand: true
    model_conf_int: true
    stats: true
    verify: true

This results in a model that includes terms 1st order terms for all variables and 2nd order terms for only variable x4 as shown below:

intercept
x0
x1
x2
x3
x4
x0*x4
x1*x4
x2*x4
x3*x4
x4^2

Warning

This option requires the user to know their data and analytical tool well. Do not use this option if you’re unsure of which, if any, variables have a higher order term.