MIKA
  • Overview
  • MIKA
  • Datasets
  • Models
  • Examples
    • Utils
    • IR
    • KD
      • Topic Model Plus Example
      • Topic Modeling Results
      • Taxonomy Results
      • FMEA Example
      • FMEA Results
      • Hazard Extraction and Analysis of Trends (HEAT): ICS-209-PLUS
        • Imports
        • Data Import
        • Hazard Extraction
        • Primary Analysis:
        • Graphic Analysis:
        • Secondary Analysis:
        • Experimental
        • Colinearity
      • Hazard Extraction and Analysis of Trends (HEAT): SAFECOM
    • NTSB Case Study
MIKA
  • Examples
  • Hazard Extraction and Analysis of Trends (HEAT): ICS-209-PLUS
  • View page source

Hazard Extraction and Analysis of Trends (HEAT): ICS-209-PLUS

This notebook provides a demonstration of the hazard extraction and analysis of trends (HEAT) framework applied to the ICS-209-PLUS dataset, available at https://data.nal.usda.gov/dataset/data-all-hazards-dataset-mined-us-national-incident-management-system-1999%E2%80%932014

This example uses the trend analysis module from MIKA’s knowledge discovery toolkit, as well as the Data and ICS utilities.

Prior to performing the analysis in this notebook, hazards are extracted using BERTopic from topic model plus using the ICS_hazard_extraction_script.

Imports

[1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.style
matplotlib.style.use("seaborn-v0_8")
import matplotlib.pyplot as plt
import matplotlib.cm as cm
plt.rcParams["font.family"] = "Times New Roman"
import seaborn as sn
sn.color_palette("hls", 17)
import scipy.stats as st
import math
import dill
from pingouin import rcorr

[2]:
import sys
import os
sys.path.append(os.path.join("..", "..", "..", ".."))

from mika.kd.trend_analysis import *
from mika.utils.ICS import *
from mika.utils import Data

For figure consistency, global configuration variables are defined here.

[3]:
figsize = (6, 4)
fontsize = 14
matrix_figsize = (8,9)
matrix_fontsize = 10

Data Import

The preprocessed ICS-209-PLUS dataset is loaded in using the Data utility

[4]:
document_id_col = "Unique IDs"
extra_cols = ["CY","DISCOVERY_DATE", "START_YEAR", "REPORT_DOY", "DISCOVERY_DOY",
              "TOTAL_PERSONNEL", "TOTAL_AERIAL", "PCT_CONTAINED_COMPLETED"]
list_of_attributes = ["Combined Text"]
file = os.path.join('topic_model_results', 'preprocessed_data_combined_text.csv')
ICS = Data()
ICS.load(file, preprocessed=True, id_col=document_id_col, text_columns=["Combined Text"], preprocessed_kwargs={'drop_short_docs':False, 'drop_duplicates':True})

Now the preprocessed dataframe that will be used is defined:

[5]:
preprocessed_df = ICS.data_df

The dataframe is then filtered to ensure it includes the correct years and incidents.

[6]:
incident_file = os.path.join(os.path.abspath(os.path.join(os.getcwd(), os.pardir, os.pardir, os.pardir, os.pardir)),'data','ICS','summary_reports_cleaned.csv')
incident_summary_df = pd.read_csv(incident_file,low_memory=False)
incident_summary_df = incident_summary_df.drop("Unnamed: 0", axis=1)
incident_summary_df = incident_summary_df.loc[incident_summary_df["START_YEAR"]>=2006].reset_index(drop=True)

fire_ids = incident_summary_df['INCIDENT_ID'].unique()
sitrep_ids = preprocessed_df['INCIDENT_ID'].unique()
incident_summary_df = incident_summary_df[incident_summary_df['INCIDENT_ID'].isin(sitrep_ids)].reset_index(drop=True)

Hazard Extraction

First hazards are identified in documents using BERTopic modeling results and a hazard interptretation spreed sheet created from the topics.

The hazard file and topic modeling results file are specified here:

[7]:
hazard_file =  os.path.join('topic_model_results','hazard_interpretation_v1.xlsx')
results_file = os.path.join('topic_model_results',"Combined Text Sentences_BERT_topics_modified.csv")
[8]:
hazard_interpretation_df = pd.read_excel(hazard_file, sheet_name='topic-focused')
categories = hazard_interpretation_df['Hazard Category'].tolist()
hazards = hazard_interpretation_df['Hazard name'].tolist()

The hazard and results file are passed into the identify_docs_per_hazard function, which returns the frequency and documents associated with each hazard in addition to the hazard words and topics per each document. Hazard metrics specific to the ICS are calulcuated using calc_ICS_metrics:

[9]:
# frequency, docs_per_hazard, hazard_words_per_doc, topics_per_doc, hazard_topics_per_doc = identify_docs_per_hazard(hazard_file, preprocessed_df, results_file, text_field='Combined Text', results_text_field='Combined Text Sentences_BERT_to', time_field="CY", id_field='Unique IDs', doc_topic_dist_field=None, topic_thresh=0.0)
# time_of_occurence_days, time_of_occurence_pct_contained, frequency, fires, frequency_fires = calc_ICS_metrics(docs_per_hazard, preprocessed_df, id_col="INCIDENT_ID", unique_ids_col='Unique IDs', rm_outliers=False)

Results can be saved for future use using pickle.

[10]:
# with open("OTTO_days_w_outliers.pkl", "wb") as f:
#     dill.dump(time_of_occurence_days, f)
# with open("OTTO_pct_w_outliers.pkl", "wb") as f:
#     dill.dump(time_of_occurence_pct_contained, f)
# with open("frequency_w_outliers.pkl", "wb") as f:
#     dill.dump(frequency, f)
# with open("fires_w_outliers.pkl", "wb") as f:
#     dill.dump(fires, f)
# with open("frequency_fires_w_outliers.pkl", "wb") as f:
#     dill.dump(frequency_fires, f)
# with open("docs_per_hazard_w_outliers.pkl", "wb") as f:
#     dill.dump(docs_per_hazard, f)
# with open("hazard_words_per_doc_w_outliers.pkl", "wb") as f:
#     dill.dump(hazard_words_per_doc, f)
# with open("topics_per_doc_w_outliers.pkl", "wb") as f:
#     dill.dump(topics_per_doc, f)
# with open("hazard_topics_per_doc_w_outliers.pkl", "wb") as f:
#     dill.dump(hazard_topics_per_doc, f)

Since the results have already been saved, we can load them using pickle:

[11]:
with open("OTTO_days_w_outliers.pkl", "rb") as f:
    time_of_occurence_days = dill.load(f)
with open("OTTO_pct_w_outliers.pkl", "rb") as f:
    time_of_occurence_pct_contained = dill.load(f)
with open("frequency_w_outliers.pkl", "rb") as f:
    frequency = dill.load(f)
with open("fires_w_outliers.pkl", "rb") as f:
    fires = dill.load(f)
with open("frequency_fires_w_outliers.pkl", "rb") as f:
    frequency_fires = dill.load(f)
with open("docs_per_hazard_w_outliers.pkl", "rb") as f:
    docs_per_hazard = dill.load(f)
with open("hazard_words_per_doc_w_outliers.pkl", "rb") as f:
    hazard_words_per_doc = dill.load(f)
with open("topics_per_doc_w_outliers.pkl", "rb") as f:
    topics_per_doc = dill.load(f)
with open("hazard_topics_per_doc_w_outliers.pkl", "rb") as f:
    hazard_topics_per_doc = dill.load(f)

To evaluate the quality of the hazard extraction, we recommend randomly sampling 1000 documents to manually label as containing or not containing each hazard. This can be done using the sample_for_accuracy function. The 1000 document set is split into two 500 document sets for validation and testing.

[12]:
results_path=os.path.join('topic_model_results')

First we calculate the classification metrics on the validation set:

[13]:
metrics, true, pred = calc_classification_metrics(os.path.join('topic_model_results', 'labeled_ICS.csv'), docs_per_hazard=docs_per_hazard, id_col='Unique IDs')

Next we calculate the classification metrics on the test set:

[14]:
test_metrics, _, _ = calc_classification_metrics(os.path.join('topic_model_results', 'labeled_ICS_test_set_full.csv'), docs_per_hazard=docs_per_hazard, id_col='Unique IDs')

Mismatches between HEAT and manual labels are detected and saved to an excel file:

[15]:
_ = examine_hazard_extraction_mismatches(preprocessed_df, true, pred, hazards, hazard_words_per_doc=hazard_words_per_doc, topics_per_doc=topics_per_doc, hazard_topics_per_doc=hazard_topics_per_doc, results_path=results_path, id_col='Unique IDs', text_col='Combined Text')

To display the results tables consistently, we first create the primary results table then sort the hazard extraction evaluation table according to the order of the hazards in the primary table

[16]:
years = preprocessed_df["CY"].unique()
years.sort() #sort years
table_data = create_primary_results_table(time_of_occurence_days, time_of_occurence_pct_contained, frequency, frequency_fires, preprocessed_df, categories, hazards, years, interval=True) #primary results table with metrics
table = pd.DataFrame(table_data).sort_values('Hazard Category').reset_index(drop=True) #sort by category
hazards_sorted = table['Hazard Name'].tolist() #get sorted hazards for formatting evaluation table

Now we can display the hazard extraction evaluation table:

High precision -> only counting instances of the hazard, not over counting

Low recall -> under counting, there are instances with the hazard that are not counted

[17]:
hazard_extraction = pd.concat([metrics,test_metrics],axis=1,keys=['Validation','Test']).reindex(hazards_sorted)
hazard_extraction
[17]:
Validation Test
Recall Precision F1 Accuracy Support Recall Precision F1 Accuracy Support
Hazardous Terrain 0.883 0.938 0.910 0.940 171 0.859 0.907 0.882 0.912 192
Ecological Resources 0.621 0.900 0.735 0.948 58 0.667 0.881 0.759 0.934 78
Thunderstorms 0.875 1.000 0.933 0.992 32 0.857 0.818 0.837 0.986 21
Wind 0.840 0.988 0.908 0.968 94 0.795 0.912 0.849 0.956 78
Dry Weather 0.791 0.973 0.873 0.958 91 0.732 0.938 0.822 0.948 82
Rain 0.872 0.872 0.872 0.980 39 0.800 0.800 0.800 0.976 30
Smoke 0.943 0.847 0.893 0.976 53 0.929 0.796 0.857 0.974 42
Evacuations 0.885 0.958 0.920 0.984 52 0.909 0.930 0.920 0.986 44
Injury 0.786 1.000 0.880 0.994 14 0.700 1.000 0.824 0.994 10
Resource Shortage 0.776 0.826 0.800 0.962 49 0.778 0.651 0.709 0.954 36
Road Closures 0.792 0.704 0.745 0.948 48 0.760 0.613 0.679 0.928 50
Command Transition 0.902 0.965 0.932 0.984 61 0.758 0.833 0.794 0.948 66
Inaccurate Mapping 0.824 0.933 0.875 0.992 17 0.913 0.840 0.875 0.988 23
Aerial Grounding 1.000 0.765 0.867 0.992 13 0.625 0.833 0.714 0.992 8
Military Base 0.667 0.857 0.750 0.992 9 0.800 0.667 0.727 0.994 5
Cultural Resources 0.810 0.810 0.810 0.956 58 0.729 0.827 0.775 0.950 59
Law Violations 1.000 1.000 1.000 1.000 3 1.000 0.667 0.800 0.998 2
Infrastructure 0.714 0.921 0.805 0.966 49 0.725 0.829 0.773 0.966 40
Livestock 0.842 0.842 0.842 0.988 19 0.765 0.619 0.684 0.976 17

To visualize what words define the hazards, we create word clouds based on word frequency using the trend analysis module.

[19]:
word_frequencies = get_word_frequencies(hazard_words_per_doc, hazards_sorted)
[20]:
build_word_clouds(word_frequencies, nrows=4, ncols=5, figsize=(16, 8), cmap=None, save=False, save_path=os.path.join('topic_model_results',''), fontsize=20)
../_images/nblinks_ics_heat_37_0.png

Primary Analysis:

The primary analysis involves two main outputs: - hazard metrics - risk matrix

Hazard Metrics

Hazard metrics, including frequency, rate, and severity, are displayed in a table.

First, we reorder the dictionary containing fires for each hazard according to the sorted hazard list:

[21]:
fires = {hazard: fires[hazard] for hazard in hazards_sorted}

Next, we calculate severity and sort it according to the sorted hazard list:

[22]:
severity_total, severity_table = calc_severity(fires, incident_summary_df ,rm_all_outliers=False, rm_severity_outliers=False)
[23]:
severity_table = severity_table.set_index('Hazard').reindex(hazards_sorted).reset_index()

For comparison, we also calcluated severity values accross the entire dataset. This allows us to compare the average severity for incidents with certain hazards to a baseline value.

[24]:
severity_accross_all_incidents = []; injuries_all = []; fatalities_all = []; str_dam_all = []; str_des_all = []
for i in range(len(incident_summary_df)):
    severity = int(incident_summary_df.iloc[i]["STR_DESTROYED_TOTAL"]) + int(incident_summary_df.iloc[i]["STR_DAMAGED_TOTAL"])+ int(incident_summary_df.iloc[i]["INJURIES_TOTAL"])+ int(incident_summary_df.iloc[i]["FATALITIES"])
    severity_accross_all_incidents.append(severity)
    injuries_all.append(int(incident_summary_df.iloc[i]["INJURIES_TOTAL"])); fatalities_all.append(int(incident_summary_df.iloc[i]["FATALITIES"]))
    str_dam_all.append(int(incident_summary_df.iloc[i]["STR_DAMAGED_TOTAL"])); str_des_all.append(int(incident_summary_df.iloc[i]["STR_DESTROYED_TOTAL"]))

Now we save the total dataset information as a single-row dataframe so we can add it to the existing hazard metric table.

[25]:
total_incidents_df = pd.DataFrame({"Hazard Category": ['Total Reports'],
                                  "Hazard Name": [''],
                                  "OTTO %":[''],
                                   "Total Fire Frequency":[str(len(incident_summary_df))],
                                  "Rate":[str(round(np.average(incident_summary_df['START_YEAR'].value_counts().values),1))+"+-"+str(round(np.std(incident_summary_df['START_YEAR'].value_counts().values),1))],#len(incident_summary_df)/len(years))],
                                  "Fatalities":[str(round(np.average(fatalities_all),1))+"+-"+str(round(np.std(fatalities_all),1))],
                                  "Injuries":[str(round(np.average(injuries_all),1))+"+-"+str(round(np.std(injuries_all),1))],
                                  "Structures Damaged":[str(round(np.average(str_dam_all),1))+"+-"+str(round(np.std(str_dam_all),1))],
                                  "Structures Destroyed":[str(round(np.average(str_des_all),1))+"+-"+str(round(np.std(str_des_all),1))],
                                  "Severity":[str(round(np.average(severity_accross_all_incidents),1))+"+-"+str(round(np.std(severity_accross_all_incidents),1))]},
                                 index =['Total Reports'])

Here we rename the columns for readability:

[26]:
values = ['Fatalities', 'Injuries', 'Structures Damaged', 'Structures Destroyed']
for value in values:
    table[value] = severity_table['Average '+value].astype(str) + "+-" + severity_table['std dev '+value].astype(str)

Now we combine the dataframes to get the final primary results table:

[27]:
table['Severity'] = severity_table['formatted']
columns = ['Hazard Category', 'Hazard Name', 'OTTO %', 'Total Fire Frequency', 'Rate'] + values +['Severity']
table = table[columns]
table = table.set_index('Hazard Category')#.drop(['Hazard Category'], axis=1)
table = table.append(total_incidents_df.drop(['Hazard Category'], axis=1))
display(table)
Hazard Name OTTO % Total Fire Frequency Rate Fatalities Injuries Structures Damaged Structures Destroyed Severity
Environment Hazardous Terrain 53.3+-36.5 2900 322.2+-120.9 0.0+-0.5 1.4+-3.9 0.8+-10.1 4.5+-41.1 6.8+-46.4
Environment Ecological Resources 46.8+-35.1 792 88.0+-29.2 0.0+-0.4 2.5+-5.7 1.5+-18.4 6.4+-34.3 10.5+-50.0
Environment Thunderstorms 55.1+-35.8 1127 125.2+-51.2 0.0+-0.5 1.8+-4.8 1.1+-15.6 5.0+-33.6 8.0+-45.8
Environment Wind 51.8+-36.3 2950 327.8+-120.2 0.0+-0.5 1.3+-3.9 1.1+-12.1 6.2+-53.2 8.6+-60.1
Environment Dry Weather 55.4+-36.3 2171 241.2+-92.6 0.0+-0.5 1.5+-4.2 1.4+-13.8 7.6+-61.6 10.6+-69.5
Environment Rain 64.4+-37.7 1696 188.4+-42.7 0.0+-0.5 1.2+-3.4 1.2+-14.3 5.0+-49.5 7.4+-56.5
Environment Smoke 48.6+-37.5 1281 142.3+-33.4 0.0+-0.5 1.8+-5.0 1.4+-15.3 7.6+-58.9 10.8+-69.1
Mission Evacuations 36.0+-31.0 1296 144.0+-58.4 0.1+-0.7 2.5+-5.5 2.5+-18.0 14.6+-80.1 19.6+-90.0
Mission Injury 56.0+-34.3 783 87.0+-31.7 0.1+-0.5 3.9+-6.2 1.9+-18.5 14.1+-95.0 20.0+-104.9
Mission Resource Shortage 40.2+-33.7 1229 136.6+-58.3 0.0+-0.6 2.4+-5.4 1.5+-15.5 8.4+-60.0 12.4+-70.8
Mission Road Closures 44.5+-34.6 1726 191.8+-67.6 0.0+-0.4 2.0+-4.8 1.9+-16.0 9.9+-69.5 13.9+-78.6
Mission Command Transition 60.8+-38.2 1868 207.6+-55.0 0.0+-0.6 2.2+-4.9 1.7+-15.4 9.1+-66.7 13.1+-75.6
Mission Inaccurate Mapping 65.8+-34.3 1383 153.7+-47.5 0.0+-0.2 1.8+-4.6 1.0+-9.6 5.7+-32.4 8.5+-39.0
Mission Aerial Grounding 32.1+-26.7 149 16.6+-7.6 0.1+-0.8 6.3+-9.7 5.3+-40.2 10.4+-45.3 22.1+-86.3
Wildland Urban Interface Military Base 58.0+-35.7 83 9.2+-3.5 0.1+-0.5 2.9+-6.8 1.1+-3.9 17.6+-74.9 21.7+-82.8
Wildland Urban Interface Cultural Resources 44.9+-35.0 865 96.1+-29.7 0.0+-0.4 2.6+-5.4 1.5+-10.1 10.5+-70.9 14.6+-78.5
Wildland Urban Interface Law Violations 78.2+-31.3 328 36.4+-21.4 0.1+-0.6 1.0+-3.1 1.5+-11.1 7.9+-43.4 10.4+-51.0
Wildland Urban Interface Infrastructure 50.2+-35.0 877 97.4+-35.4 0.1+-0.8 2.4+-6.0 2.6+-20.2 15.5+-93.9 20.6+-104.7
Wildland Urban Interface Livestock 40.2+-34.9 530 58.9+-31.2 0.1+-0.5 2.6+-6.3 0.9+-5.0 11.8+-82.1 15.4+-88.6
Total Reports 8991 999.0+-312.3 0.0+-0.3 0.6+-2.5 0.6+-8.3 2.6+-31.4 3.8+-36.3

We can also create a table with just the severity metrics:

[29]:
avg_injuries = round(np.average(injuries_all))
avg_fatalities = round(np.average(fatalities_all))
avg_des = round(np.average(str_des_all))
avg_dam = round(np.average(str_dam_all))
avg_df = pd.DataFrame({"Total Avg Injuries":[avg_injuries for hazard in hazards],
                     "Total Avg Fatalities":[avg_fatalities for hazard in hazards],
                     "Total Avg Str Dam":[avg_dam for hazard in hazards],
                     "Total Avg Str Des":[avg_des for hazard in hazards]})
severity_results = pd.DataFrame({""})
[30]:
ICS_results = pd.concat([table.drop(['Total Reports']).reset_index(drop=True), severity_table, avg_df], axis=1)
#ICS_results.to_csv(os.path.join(os.path.dirname(os.getcwd()),'results','ICS_hazards.csv'))

Risk Matrix of Hazards (rate by severity)

The next part of the primary analysis is to create a risk matrix placing hazards in risk categories according to severity and rate. Risk matrices can be created according to either FAA or USFS specifications.

For display purposes, initialize the matplotlib with the following parameters:

[31]:
matplotlib.style.use("default")
plt.rcParams["font.family"] = "Times New Roman"
[32]:
#ICS_results = pd.read_csv(os.path.join(os.path.dirname(os.getcwd()),'results','ICS_hazards.csv'))

Set the index of the results dataframe to the hazards:

[33]:
ICS_results.index = ICS_results['Hazard Name']

Calculate the severity category from the results:

[34]:
severities = get_ICS_severity_FAA(ICS_results, hazards)

Calculate the likelihood category from the rates:

[35]:
rates = {hazard:float(table[table['Hazard Name']==hazard]['Rate'].values[0].split("+-")[0]) for hazard in hazards}
rates_FAA = get_likelihood_ICS_FAA(rates)

Now you can plot a risk matrix:

[36]:
plot_risk_matrix(rates_FAA, severities, figsize=(9,8), results_path=os.path.join('risk_matrix'), save=False, max_chars=22, fontsize=10)
../_images/nblinks_ics_heat_65_0.png

To produces the USFS risk matrix, follow the same method and calculate the likelihood and severity categories:

[37]:
likelihoods = get_likelihood_ICS_USFS(rates)
severities = get_ICS_severity_USFS(ICS_results, hazards)
[38]:
severities
[38]:
{'Evacuations': 'Critical',
 'Hazardous Terrain': 'Marginal',
 'Inaccurate Mapping': 'Marginal',
 'Ecological Resources': 'Marginal',
 'Command Transition': 'Marginal',
 'Wind': 'Marginal',
 'Dry Weather': 'Marginal',
 'Rain': 'Marginal',
 'Law Violations': 'Marginal',
 'Road Closures': 'Marginal',
 'Smoke': 'Marginal',
 'Military Base': 'Critical',
 'Cultural Resources': 'Marginal',
 'Resource Shortage': 'Marginal',
 'Thunderstorms': 'Marginal',
 'Infrastructure': 'Critical',
 'Injury': 'Critical',
 'Livestock': 'Marginal',
 'Aerial Grounding': 'Critical'}

Now plot the risk matrix:

[39]:
plot_USFS_risk_matrix(likelihoods, severities, figsize=(9,8), results_path=os.path.join('risk_matrix'), save=False, max_chars=24,fontsize=12)#fontsize)
../_images/nblinks_ics_heat_70_0.png

Graphic Analysis:

In the graphic analysis, we produce time series graphs of hazard metrics and predictors over time. - Time Series - Hazard Metrics: OTTO, Severity, Frequency - Predictors

Hazard Metrics Time Series

Here we graph time series for relevant hazard metrics. The metrics of interest in this example are: - frequency, - Operational Time To Occurrence (OTTO) in percent containment - severity

First we set the graphing parameters for consistency. For the time series we prefer seaborn style. Then, we format the data to be in the same order of the sorted hazards.

[40]:
matplotlib.style.use("seaborn")
plt.rcParams["font.family"] = "Times New Roman"
categories = table.index[:-1]#['Hazard Category'][:-1].index
metric_data = [time_of_occurence_days, time_of_occurence_pct_contained, frequency, frequency_fires]
time_of_occurence_days = {hazard: time_of_occurence_days[hazard] for hazard in hazards_sorted}
time_of_occurence_pct_contained = {hazard: time_of_occurence_pct_contained[hazard] for hazard in hazards_sorted}
frequency = {hazard: frequency[hazard] for hazard in hazards_sorted}
frequency_fires = {hazard: frequency_fires[hazard] for hazard in hazards_sorted}
[41]:
categories
[41]:
Index(['Environment', 'Environment', 'Environment', 'Environment',
       'Environment', 'Environment', 'Environment', 'Mission', 'Mission',
       'Mission', 'Mission', 'Mission', 'Mission', 'Mission',
       'Wildland Urban Interface', 'Wildland Urban Interface',
       'Wildland Urban Interface', 'Wildland Urban Interface',
       'Wildland Urban Interface'],
      dtype='object')

Now we can input the metrics into the graph_ICS_time_series utility function, which uses graphing methods from the trend analysis module. This results in four graphs, one for the OTTO in days, OTTO in perect containment, total frequency (i.e., number of reports with hazards - can have multiple per fire), and the fire frequency (i.e., number of fires with hazards)

[42]:
graph_ICS_time_series(time_of_occurence_days, time_of_occurence_pct_contained, frequency, frequency_fires, hazards, categories, save=False, std_dev=False, results_path=os.getcwd(), figsize=figsize, fontsize=fontsize, titles=False)
../_images/nblinks_ics_heat_77_0.png
../_images/nblinks_ics_heat_77_1.png
../_images/nblinks_ics_heat_77_2.png
../_images/nblinks_ics_heat_77_3.png

To prepare for the secondary analysis, we minmix scale the frequency:

[43]:
frequencies_fire = {hazard: [frequency_fires[hazard][year] for year in frequency_fires[hazard]] for hazard in frequency_fires}
fire_freqs_scaled = {hazard: minmax_scale(frequencies_fire[hazard]) for hazard in frequencies_fire}

Predictor Time Series

We are also interested in how potential predictors vary accross time. The predictors of interest here are: - fire characteristics - operations - intensity

Each of the predictors above are calculated by combining multiple lower level predictors. Hence, we first define the combine_predictors function and graph each of the subpredictors.

[44]:
def combine_predictors(predictors=[], scale=True):
    max_weight = 1/len(predictors)
    num_values = len(predictors[-1])
    if scale:
        variable_weights = [minmax_scale(p) for p in predictors]
    else:
        variable_weights = predictors
    combined_vars = [[max_weight*var_weight for var_weight in var_weight_list] for var_weight_list in variable_weights]
    combined_vars = [sum([combined_vars[var][i] for var in range(len(combined_vars))]) for i in range(num_values)]
    return combined_vars
[45]:
combined_predictors = pd.DataFrame()

Fire Characteristics

We start with the fire characteristics combined predictor. It is formed from the fire frequency, acres burned, and the number of days a fire burns on average per year. Potentially add FSR (WF_MAX_FSR), number of complexes (COMPLEX), evacuations (EVACUATION_REPORTED).

We define the columns of interest, then filter the incident summary reports by these columns:

[46]:
fire_trends_cols = ["FINAL_ACRES", "FOD_DISCOVERY_DOY", "FOD_CONTAIN_DOY", "START_YEAR"]
fire_trends_df = incident_summary_df[fire_trends_cols]

We get the fire frequency per year by counting the number of reports per year:

[47]:
counts = fire_trends_df["START_YEAR"].value_counts()
count = {int(year):counts[year] for year in counts.index.sort_values()}
[48]:
years = count.keys()

Next we calculate the total and average days burning and acreage per year:

[49]:
average_days_burning = {}
total_days_burning = {}
total_acres = {}
average_acres = {}
for year in years:
    temp_df = fire_trends_df.loc[fire_trends_df['START_YEAR']==year]
    list_of_days_burning = [temp_df.iloc[i]["FOD_CONTAIN_DOY"]-temp_df.iloc[i]['FOD_DISCOVERY_DOY'] for i in range(len(temp_df.dropna(subset=['FOD_DISCOVERY_DOY', "FOD_CONTAIN_DOY"]).reset_index(drop=True)))]
    average_days_burning[year] = np.average(list_of_days_burning)
    total_days_burning[year] = np.sum(list_of_days_burning)
    list_of_acres = temp_df['FINAL_ACRES'].dropna().tolist()
    average_acres[year] = np.average(list_of_acres)
    total_acres[year] = np.sum(list_of_acres)
#print(total_days_burning)

Now we can calculate the combined predictor variable, fire characteristics, using the function defined above.

[50]:
fire_predictors = [total_acres.values(), counts, total_days_burning.values()]
[51]:
fire_predictors = [total_acres.values(), counts, total_days_burning.values()]
combined_predictors['Fire Characteristics'] = combine_predictors(fire_predictors)
combined_predictors.index = years
Graphs

We graph the average and total values for the subpredictors comprising fire characteristics.

First we scale the predictors using minmax scaling:

[52]:
av_acres = average_acres.values()
av_days_burn = average_days_burning.values()
count = count.values()
freq_scaled = minmax_scale(count)
av_days_burn_scaled = minmax_scale(av_days_burn)
av_acres_scaled = minmax_scale(av_acres)

total_days_burn = total_days_burning.values()
total_acre = total_acres.values()
total_days_burn_scaled = minmax_scale(total_days_burn)
total_acres_scaled = minmax_scale(total_acre)

Now we graph the predictors:

[53]:
nrows = 1
ncols = 2
fig, axs = plt.subplots(nrows = nrows,
                            ncols = ncols,
                            figsize = (10,4))
fire_labels = ['Acres', 'Frequency', 'Days Burning']
fire_totals = [total_acres_scaled, freq_scaled, total_days_burn_scaled]
fire_avgs = [av_acres_scaled, freq_scaled, av_days_burn_scaled]
fig, axs[0] = plot_predictors(fire_totals, fire_labels, time=years, time_label='Year', title="Change in Fire Characteristics from 2006-2014",
                totals=True, averages=False, scaled=True, figsize=(12, 5), axs=axs[0], fig=fig, show=False, legend=False)
fig, axs[1] = plot_predictors(fire_avgs, fire_labels, time=years, time_label='Year', title="Change in Fire Characteristics from 2006-2014",
                totals=False, averages=True, scaled=True, figsize=(12, 5), axs=axs[1], fig=fig, show=False)
plt.show()
../_images/nblinks_ics_heat_97_0.png

Operations

Next we examine the operational trends predictor, which is defined by aerial assets (total and max in one day), personnel (total and max in one day), and projected cost. Could also potentially add number of sit reports (INC_MGMT_NUM_SITREPS)??

First we identify the columns of interest and filter the incident summary dataframe.

[54]:
operational_trends_cols = ["TOTAL_AERIAL_SUM", "TOTAL_PERSONNEL_SUM", "WF_PEAK_AERIAL", "WF_PEAK_PERSONNEL", "START_YEAR","PROJECTED_FINAL_IM_COST"]
operational_trends_df = incident_summary_df[operational_trends_cols]

Next we calculate the average and total values for the sub predictors:

[55]:
total_aerial = {}
average_aerial = {}
total_person = {}
average_person = {}
total_cost = {}
average_cost = {}
for year in years:
    list_of_person = []
    list_of_aerial = []
    temp_df = operational_trends_df.loc[operational_trends_df['START_YEAR']==year]
    list_of_person = temp_df['WF_PEAK_PERSONNEL'].fillna(value=0).tolist()
    list_of_aerial = temp_df["WF_PEAK_AERIAL"].fillna(value=0).tolist()
    list_of_cost = temp_df["PROJECTED_FINAL_IM_COST"].dropna().tolist()
    average_aerial[year] = np.average(list_of_aerial)
    total_aerial[year] = np.sum(list_of_aerial)
    average_person[year] = np.average(list_of_person)
    total_person[year] = np.sum(list_of_person)
    average_cost[year] = np.average(list_of_cost)
    total_cost[year] = np.sum(list_of_cost)

Now we calculate the combined operations predictor variable:

[56]:
ops_predictors = [total_cost.values(), total_aerial.values(), total_person.values()]
combined_predictors['Operations'] = combine_predictors(ops_predictors)

Now we grab the values and minmax scale them for graphing:

[57]:
av_aerial = average_aerial.values()
total_aerial = total_aerial.values()

av_person = average_person.values()
total_person = total_person.values()
av_cost = average_cost.values()
total_cost = total_cost.values()
[58]:
av_cost_scaled = minmax_scale(av_cost)
av_person_scaled = minmax_scale(av_person)
av_aerial_scaled = minmax_scale(av_aerial)

total_cost_scaled = minmax_scale(total_cost)
total_person_scaled = minmax_scale(total_person)
total_aerial_scaled = minmax_scale(total_aerial)
Graphs

We then graph each of the subpredictors together, both the averages and total values.

[59]:
nrows = 1
ncols = 2
fig, axs = plt.subplots(nrows = nrows,
                            ncols = ncols,
                            figsize = (10,4))
operations_labels = ['Cost', 'Personnel', 'Aerial Assets']
operations_totals = [total_cost_scaled, total_person_scaled, total_aerial_scaled]
operations_avgs = [av_cost_scaled, av_person_scaled, av_aerial_scaled]
fig, axs[0] = plot_predictors(operations_totals, operations_labels, time=years, time_label='Year', title="Change in Operations from 2006-2014",
                totals=True, averages=False, scaled=True, figsize=(12, 5), axs=axs[0], fig=fig, show=False, legend=False)
fig, axs[1] = plot_predictors(operations_avgs, operations_labels, time=years, time_label='Year', title="Change in Operations from 2006-2014",
                totals=False, averages=True, scaled=True, figsize=(12, 5), axs=axs[1], fig=fig, show=False)
plt.show()
../_images/nblinks_ics_heat_109_0.png

Intensity

The final predictor we consider is the intensity predictor, which is defined by the number of injuries, number of fatalities, number of structures damaged, number of structures destroyed.

Again, first we identify the relevant columns and filter the incident dataframe to include them:

[60]:
intensity_cols = ["STR_DESTROYED_TOTAL","STR_DAMAGED_TOTAL","INJURIES_TOTAL","FATALITIES", "START_YEAR"]
intensity_df = incident_summary_df[intensity_cols]

Now we caluclate the total and average values for each of the subpredictors:

[61]:
total_str_des = {}
average_str_des = {}
total_str_damage = {}
average_str_damage = {}
total_injuries = {}
average_injuries = {}
total_fatalities = {}
average_fatalities = {}

for year in years:
    temp_df =intensity_df.loc[intensity_df['START_YEAR']==year]
    list_of_dest = temp_df["STR_DESTROYED_TOTAL"].tolist()
    list_of_dam = temp_df["STR_DAMAGED_TOTAL"].tolist()
    list_of_injury = temp_df["INJURIES_TOTAL"].tolist()
    list_of_fatalities = temp_df["FATALITIES"].tolist()
    total_str_des[year] = np.sum(list_of_dest)
    average_str_des[year] = np.average(list_of_dest)
    total_str_damage[year] = np.sum(list_of_dam)
    average_str_damage[year] = np.average(list_of_dam)
    total_injuries[year] = np.sum(list_of_injury)
    average_injuries[year] = np.average(list_of_injury)
    total_fatalities[year] = np.sum(list_of_fatalities)
    average_fatalities[year] = np.average(list_of_fatalities)

The combined intensity predictor is then calculated:

[62]:
intensity_predictors = [total_fatalities.values(), total_str_damage.values(), total_injuries.values(), total_str_des.values()]
combined_predictors['Intensity'] = combine_predictors(intensity_predictors)

Once again, we grab the values and minmax scale them for graphing:

[63]:
av_des = average_str_des.values()
total_des = total_str_des.values()
av_damage = average_str_damage.values()
total_damage = total_str_damage.values()
av_injury = average_injuries.values()
total_injury = total_injuries.values()
av_fatality = average_fatalities.values()
total_fatality = total_fatalities.values()
[64]:
total_fatality_scaled = minmax_scale(total_fatality)
total_injury_scaled = minmax_scale(total_injury)
total_damage_scaled = minmax_scale(total_damage)
total_des_scaled = minmax_scale(total_des)

av_fatality_scaled = minmax_scale(av_fatality)
av_injury_scaled = minmax_scale(av_injury)
av_damage_scaled = minmax_scale(av_damage)
av_des_scaled = minmax_scale(av_des)
Graphs

The subpredictors of intensity are graphed together as a time series:

[65]:
nrows = 1
ncols = 2
fig, axs = plt.subplots(nrows = nrows,
                            ncols = ncols,
                            figsize = (10,4))
intensity_labels = ['Fatalities', 'Injuries', 'Damaged Structures', 'Destroyed Structures']
intensity_totals = [total_fatality_scaled, total_injury_scaled, total_damage_scaled, total_des_scaled]
intensity_avgs = [av_fatality_scaled, av_injury_scaled, av_damage_scaled, av_des_scaled]
fig, axs[0] = plot_predictors(intensity_totals, intensity_labels, time=years, time_label='Year', title="Change in Intensity from 2006-2014",
                totals=True, averages=False, scaled=True, figsize=(12, 5), axs=axs[0], fig=fig, show=False, legend=False)
fig, axs[1] = plot_predictors(intensity_avgs, intensity_labels, time=years, time_label='Year', title="Change in Intensity from 2006-2014",
                totals=False, averages=True, scaled=True, figsize=(12, 5), axs=axs[1], fig=fig, show=False)
plt.show()
../_images/nblinks_ics_heat_120_0.png

Combined predictors time series

To see how the predictors compare together, we graph all three of the combined predictors and their subpredictors in one graph. We graph both the total and average values.

First we define dictionaries to contain the data and minmax scale the total data values:

[66]:
totals = {"Fire Frequency": count,
    "total Days Fires Burned": total_days_burn,
    "total Acres Fires Burned": total_acre,
    "total Aerial Assets": total_aerial,
    "total Personnel": total_person,
    "total Cost": total_cost,
    "total Structures Damaged": total_damage,
    "total Structures Destroyed": total_des,
    "total Injuries": total_injury,
    "total Fatalities": total_fatality}
totals_df = pd.DataFrame(totals)

averages = {
    "fire frequency": count,
    "average days fire burns": av_days_burn,
    "average acres fire burns": av_acres,
    "average aerial assets per fire": av_aerial,
    "average personnel per fire": av_person,
    "average cost per fire": av_cost,
    "average structures damaged per fire": av_damage,
    "average structures destroyed per fire": av_des,
    "average injuries per fire": av_injury,
    "average fatalities per fire": av_fatality}
avs_df = pd.DataFrame(averages)

totals_scaled = {feature:minmax_scale(totals[feature]) for feature in totals}

Next we scale to combined predictors as well:

[67]:
combined_predictors_scaled = combined_predictors.copy()
for col in combined_predictors_scaled:
    combined_predictors_scaled[col] = minmax_scale(combined_predictors_scaled[col])

Now we define the line style, markers, and colors for the predictor graphs:

[68]:
lines = {"Fire Frequency": '--',
    "total Days Fires Burned": '--',
    "total Acres Fires Burned": '--',
    "total Aerial Assets": '-',
    "total Personnel": '-',
    "total Cost": '-',
    "total Structures Damaged": ':',
    "total Structures Destroyed": ':',
    "total Injuries": ':',
    "total Fatalities": ':'}
colors = cm.tab10(np.linspace(0, 1, len(lines)))
colors_dict = {}
i = 0
for feature in lines:
    colors_dict[feature] = colors[i]
    i+=1
markers = {"Fire Frequency": '.',
    "total Days Fires Burned": 'v',
    "total Acres Fires Burned": '^',
    "total Aerial Assets": 's',
    "total Personnel": 'p',
    "total Cost": 'P',
    "total Structures Damaged": 'h',
    "total Structures Destroyed": 'X',
    "total Injuries": 'D',
    "total Fatalities": '*'}

Here we plot the total scaled value for each predictor:

[69]:
plt.figure(figsize=figsize)
plt.ylabel("Total Sum Scaled", fontsize=fontsize)
plt.xlabel("Year", fontsize=fontsize)
for feature in totals_scaled:
    plt.plot(years, totals_scaled[feature], label=feature.replace("total ",""), linestyle=lines[feature], marker=markers[feature], color=colors_dict[feature])
    plt.tick_params(labelsize=fontsize)
plt.plot(years,combined_predictors_scaled['Fire Characteristics'], label='Fire Characteristics', color='black', linestyle='--')
plt.plot(years,combined_predictors_scaled['Operations'], label = 'Operations', color='black', linestyle = '-')
plt.plot(years,combined_predictors_scaled['Intensity'], label = 'Intensity', color = 'black', linestyle = ':')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=fontsize)
#plt.savefig('predictors_scaled.pdf', bbox_inches="tight")
plt.show()
../_images/nblinks_ics_heat_128_0.png

Next we graph the unscaled values:

[70]:
combined_predictors_unscaled = pd.DataFrame()
fire_predictors = [total_acres.values(), counts, total_days_burning.values()]
combined_predictors_unscaled['Fire Characteristics'] = combine_predictors(fire_predictors, scale=False)
combined_predictors_unscaled.index = years
intensity_predictors = [total_fatalities.values(), total_str_damage.values(), total_injuries.values(), total_str_des.values()]
ops_predictors = [total_cost, total_aerial, total_person]
combined_predictors_unscaled['Operations'] = combine_predictors(ops_predictors, scale=False)
combined_predictors_unscaled['Intensity'] = combine_predictors(intensity_predictors, scale=False)
[71]:
plt.figure()
plt.ylabel("Total", fontsize=16)
plt.xlabel("Year", fontsize=16)
for feature in totals:
    plt.plot(years, totals[feature], label=feature.replace("total ",""), linestyle=lines[feature], marker=markers[feature], color=colors_dict[feature])
    plt.tick_params(labelsize=16)
plt.plot(years,combined_predictors_unscaled['Fire Characteristics'], label='Fire Characteristics', color='black', linestyle='--')
plt.plot(years,combined_predictors_unscaled['Operations'], label = 'Operations', color='black', linestyle = '-')
plt.plot(years,combined_predictors_unscaled['Intensity'], label = 'Intensity', color = 'black', linestyle = ':')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=16)
plt.yscale('log')
#plt.savefig('predictors.pdf', bbox_inches="tight")
plt.show()
../_images/nblinks_ics_heat_131_0.png

Secondary Analysis:

The secondary analysis includes inferential statistics methods, specifically: - Correlation matrix - Multiple Regression

Combined Predictors

We use the combined predictors, rather than the raw predictors, for the secondary analysis.

Correlation Matrix

A correlation matrix is produced to examine the pairwise relationship between hazard frequency and each predictor.

This is done using the create_correlation_matrix function in the trend analysis module. The scaled predictors and hazard frequencies are passed into the function arguments:

[72]:
corrMatrix_fires, correlation_mat_total_fires, p_values = create_correlation_matrix(combined_predictors_scaled, fire_freqs_scaled, graph=False, figsize=(9,8), fontsize=12)

For display purposes, the correlation matrix can be reshaped to show hazards on one axis and predictors on the other:

[73]:
predictors = [p for p in combined_predictors_scaled]
hazards = [h for h in fire_freqs_scaled]
reshape_correlation_matrix(corrMatrix_fires, p_values, predictors, hazards)
../_images/nblinks_ics_heat_137_0.png

The full correlation matrix has correlations between all hazard and predictor pairs:

[74]:
corrMatrix_fires, correlation_mat_total_fires, p_values = create_correlation_matrix(combined_predictors_scaled, fire_freqs_scaled, graph=True, figsize=(8,6), fontsize=11, save=False, results_path=os.path.join('correlation_matrix'))
../_images/nblinks_ics_heat_139_0.png

Multiple regression

Typically multiple regression is used as a prediction algorithm where given a certain set of continuous inputs X=(x1,2,…xn), the regression predicts the value of continuous variable, y. Multiple regression uses a linear combination of X to produce y, and the error in y/goodness of fit indicates how good the model and predictors are at predicting the target.

The importance of a predictor, xi, is evaluated by shuffling its input values, and seeing how the goodness of fit/error changes.

Our goal:

use regression to determine what the most important predictors are for the frequency of hazards. Since we have a limited number of data points (9) we will not be predicting on unseen data.

Inputs/Predictors:

All operations trends, fire characteristics, and intensity

Output/y:

annual frequency of hazards time series

Method:

for each hazard, use its frequency time series: 1. fit linear regression model to the X,y 2. calculate correlation coefficient For each Xi: 3. record the regression coefficient (beta)

Future goal: use ML to determine whether or not a hazard will occur based on past incident reports

The output of the regression analysis is two graphs and a dataframe. The first graph shows the correlation coeffiecient (R2) for the full model with all three predictors for each hazard. The second graph show the importance of each predictor for each hazard. The results dataframes store the numeric information from the graphs.

[75]:
predictors = [p for p in combined_predictors_scaled]
hazards = [h.replace("total ","") for h in fire_freqs_scaled]
results_df, delta_df, coefficient_df = multiple_reg_feature_importance(predictors, hazards, correlation_mat_total_fires,
                                                       save=False, results_path=os.path.join('multiple_regression'),
                                                      r2_figsize=figsize, r2_fontsize=fontsize-4, predictor_import_figsize = (9, 2), predictor_import_fontsize=fontsize-4)
display(results_df, delta_df, coefficient_df)
../_images/nblinks_ics_heat_142_0.png
../_images/nblinks_ics_heat_142_1.png
hazard R2 for full model MSE for full model Fire Characteristics removed score Operations removed score Intensity removed score Fire Characteristics removed MSE Operations removed MSE Intensity removed MSE
0 Hazardous Terrain 0.398 0.048 0.409 0.406 -0.237 0.047 0.048 0.099
1 Ecological Resources 0.377 0.089 0.039 0.123 -0.113 0.138 0.126 0.160
2 Thunderstorms 0.511 0.047 0.450 0.471 -1.205 0.053 0.051 0.212
3 Wind 0.552 0.042 0.351 -0.748 -1.659 0.061 0.165 0.251
4 Dry Weather 0.426 0.062 0.367 0.361 -0.381 0.068 0.069 0.149
5 Rain 0.639 0.036 0.145 -0.455 -1.419 0.086 0.146 0.242
6 Smoke 0.154 0.084 0.129 0.161 -0.146 0.086 0.083 0.114
7 Evacuations 0.301 0.087 0.311 0.287 -0.505 0.086 0.089 0.188
8 Injury 0.323 0.067 0.320 -0.393 0.276 0.067 0.137 0.071
9 Resource Shortage 0.347 0.072 0.173 0.212 -0.158 0.091 0.087 0.127
10 Road Closures 0.358 0.064 0.369 0.240 0.051 0.062 0.075 0.094
11 Command Transition 0.437 0.066 0.437 0.450 -0.789 0.066 0.064 0.209
12 Inaccurate Mapping 0.412 0.059 0.392 0.417 -0.271 0.061 0.059 0.128
13 Aerial Grounding 0.793 0.027 -1.452 0.303 -0.084 0.321 0.091 0.142
14 Military Base 0.600 0.048 0.014 -0.170 0.230 0.118 0.140 0.092
15 Cultural Resources 0.328 0.055 0.246 -2.172 0.216 0.061 0.258 0.064
16 Law Violations 0.320 0.069 -0.475 -0.532 0.213 0.150 0.156 0.080
17 Infrastructure 0.192 0.092 0.173 0.198 -0.079 0.094 0.091 0.122
18 Livestock 0.336 0.059 0.354 0.322 -0.529 0.057 0.060 0.135
hazard R2 for full model MSE for full model Fire Characteristics removed score Operations removed score Intensity removed score Fire Characteristics removed MSE Operations removed MSE Intensity removed MSE
0 Hazardous Terrain 0.398 0.048 0.010 0.008 0.636 0.001 0.001 -0.051
1 Ecological Resources 0.377 0.089 0.338 0.254 0.490 -0.049 -0.037 -0.070
2 Thunderstorms 0.511 0.047 0.061 0.040 1.716 -0.006 -0.004 -0.165
3 Wind 0.552 0.042 0.201 1.301 2.211 -0.019 -0.123 -0.209
4 Dry Weather 0.426 0.062 0.059 0.065 0.807 -0.006 -0.007 -0.087
5 Rain 0.639 0.036 0.495 1.094 2.059 -0.050 -0.110 -0.206
6 Smoke 0.154 0.084 0.025 0.007 0.300 -0.002 0.001 -0.030
7 Evacuations 0.301 0.087 0.010 0.013 0.805 0.001 -0.002 -0.101
8 Injury 0.323 0.067 0.003 0.716 0.048 -0.000 -0.070 -0.005
9 Resource Shortage 0.347 0.072 0.174 0.134 0.505 -0.019 -0.015 -0.055
10 Road Closures 0.358 0.064 0.011 0.118 0.307 0.001 -0.012 -0.030
11 Command Transition 0.437 0.066 0.000 0.013 1.226 -0.000 0.002 -0.143
12 Inaccurate Mapping 0.412 0.059 0.021 0.005 0.684 -0.002 0.001 -0.069
13 Aerial Grounding 0.793 0.027 2.245 0.489 0.877 -0.294 -0.064 -0.115
14 Military Base 0.600 0.048 0.587 0.771 0.370 -0.070 -0.092 -0.044
15 Cultural Resources 0.328 0.055 0.082 2.500 0.112 -0.007 -0.203 -0.009
16 Law Violations 0.320 0.069 0.795 0.852 0.106 -0.081 -0.087 -0.011
17 Infrastructure 0.192 0.092 0.019 0.006 0.272 -0.002 0.001 -0.031
18 Livestock 0.336 0.059 0.018 0.013 0.864 0.002 -0.001 -0.076
Hazardous Terrain Ecological Resources Thunderstorms Wind Dry Weather Rain Smoke Evacuations Injury Resource Shortage Road Closures Command Transition Inaccurate Mapping Aerial Grounding Military Base Cultural Resources Law Violations Infrastructure Livestock
Fire Characteristics 0.095744 -0.313514 -0.105875 0.318352 0.241583 0.410931 0.132562 0.030633 -0.008230 -0.175735 0.067870 -0.005561 0.114858 -0.931697 0.657452 -0.113661 0.645235 0.216450 0.075068
Operations -0.022295 0.309108 -0.211922 -0.694471 -0.200577 -0.642130 -0.136309 -0.146819 0.436775 0.177629 0.154156 -0.102439 -0.013382 0.463498 0.578210 0.774194 -0.530265 -0.083715 -0.151685
Intensity 0.589151 0.659442 1.020362 1.178706 0.771584 1.156592 0.469682 0.801286 0.170047 0.603995 0.464643 0.895520 0.654770 0.843958 -0.560887 -0.258116 0.361587 0.431959 0.701494

Full predictors

As an experiment, we also perform the secondary analysis with the full set of subpredictors. This is not recommended for robust analysis due to the small sample of years in the hazard dataset. Specifically, there are more predictors than years in the dataset so the regression model will always have a perfect fit.

[76]:
corrMatrix_fires, correlation_mat_total_fires, p_values = create_correlation_matrix(totals_scaled, fire_freqs_scaled, graph=False)
[77]:
predictors = [p for p in totals_scaled]
hazards = [h for h in fire_freqs_scaled]
reshape_correlation_matrix(corrMatrix_fires, p_values, predictors, hazards)
../_images/nblinks_ics_heat_145_0.png
[78]:
predictors = [p for p in totals_scaled]
hazards = [h.replace("total ","") for h in fire_freqs_scaled]
results_df, delta_df, coefficient_df = multiple_reg_feature_importance(predictors, hazards, correlation_mat_total_fires)
display(results_df, delta_df, coefficient_df)
../_images/nblinks_ics_heat_146_0.png
../_images/nblinks_ics_heat_146_1.png
hazard R2 for full model MSE for full model Fire Frequency removed score total Days Fires Burned removed score total Acres Fires Burned removed score total Aerial Assets removed score total Personnel removed score total Cost removed score total Structures Damaged removed score ... Fire Frequency removed MSE total Days Fires Burned removed MSE total Acres Fires Burned removed MSE total Aerial Assets removed MSE total Personnel removed MSE total Cost removed MSE total Structures Damaged removed MSE total Structures Destroyed removed MSE total Injuries removed MSE total Fatalities removed MSE
0 Hazardous Terrain 1.0 0.0 0.680 0.206 -0.237 0.553 0.450 0.787 0.523 ... 0.026 0.064 0.099 0.036 0.044 0.017 0.038 0.018 0.015 0.022
1 Ecological Resources 1.0 0.0 -0.082 0.696 -1.720 -1.004 0.230 -0.698 0.570 ... 0.155 0.044 0.391 0.288 0.110 0.244 0.062 0.014 0.010 0.050
2 Thunderstorms 1.0 0.0 0.450 0.353 0.784 1.000 0.881 0.977 0.810 ... 0.053 0.062 0.021 0.000 0.011 0.002 0.018 0.055 0.012 0.005
3 Wind 1.0 0.0 0.894 0.694 -1.538 -1.166 0.810 0.891 0.974 ... 0.010 0.029 0.240 0.205 0.018 0.010 0.002 0.031 0.024 0.014
4 Dry Weather 1.0 0.0 0.877 0.736 -2.479 -2.215 0.448 0.040 0.714 ... 0.013 0.028 0.375 0.347 0.060 0.103 0.031 0.014 0.016 0.035
5 Rain 1.0 0.0 0.893 0.942 -0.920 -2.795 1.000 0.466 1.000 ... 0.011 0.006 0.192 0.380 0.000 0.053 0.000 0.027 0.005 0.002
6 Smoke 1.0 0.0 0.991 0.516 0.105 -0.402 0.802 0.019 -0.092 ... 0.001 0.048 0.089 0.139 0.020 0.097 0.108 0.033 0.001 0.035
7 Evacuations 1.0 0.0 0.501 0.802 -2.623 -2.489 0.437 -0.450 0.583 ... 0.062 0.025 0.453 0.436 0.070 0.181 0.052 0.022 0.009 0.048
8 Injury 1.0 0.0 0.481 0.322 0.088 0.998 0.259 0.177 0.006 ... 0.051 0.067 0.090 0.000 0.073 0.081 0.098 0.020 0.002 0.103
9 Resource Shortage 1.0 0.0 0.198 0.441 -1.338 -0.769 0.205 -0.071 0.493 ... 0.088 0.061 0.257 0.194 0.087 0.118 0.056 0.011 0.015 0.031
10 Road Closures 1.0 0.0 0.885 0.444 -0.509 0.163 0.449 -0.038 0.294 ... 0.011 0.055 0.149 0.083 0.055 0.103 0.070 0.018 0.006 0.052
11 Command Transition 1.0 0.0 0.352 0.980 0.491 0.970 0.950 0.630 0.745 ... 0.076 0.002 0.059 0.003 0.006 0.043 0.030 0.077 0.000 0.016
12 Inaccurate Mapping 1.0 0.0 0.873 0.783 0.183 0.624 0.808 0.379 0.426 ... 0.013 0.022 0.082 0.038 0.019 0.062 0.058 0.043 0.001 0.024
13 Aerial Grounding 1.0 0.0 -0.176 0.723 0.801 0.943 0.842 -0.085 0.996 ... 0.154 0.036 0.026 0.007 0.021 0.142 0.001 0.013 0.004 0.003
14 Military Base 1.0 0.0 -0.645 0.989 -8.206 -3.829 -1.656 0.755 0.902 ... 0.197 0.001 1.100 0.577 0.317 0.029 0.012 0.066 0.048 0.037
15 Cultural Resources 1.0 0.0 0.530 0.607 0.553 0.999 0.227 -0.268 -0.694 ... 0.038 0.032 0.036 0.000 0.063 0.103 0.138 0.007 0.000 0.050
16 Law Violations 1.0 0.0 0.810 0.999 -9.595 -18.816 0.125 -0.861 0.908 ... 0.019 0.000 1.078 2.017 0.089 0.189 0.009 0.017 0.032 0.009
17 Infrastructure 1.0 0.0 0.398 0.906 -4.407 -4.104 0.114 -0.643 0.306 ... 0.068 0.011 0.613 0.579 0.101 0.186 0.079 0.008 0.006 0.068
18 Livestock 1.0 0.0 0.275 0.316 -0.843 0.304 0.416 0.781 0.599 ... 0.064 0.060 0.163 0.062 0.052 0.019 0.035 0.023 0.016 0.034

19 rows × 23 columns

hazard R2 for full model MSE for full model Fire Frequency removed score total Days Fires Burned removed score total Acres Fires Burned removed score total Aerial Assets removed score total Personnel removed score total Cost removed score total Structures Damaged removed score ... Fire Frequency removed MSE total Days Fires Burned removed MSE total Acres Fires Burned removed MSE total Aerial Assets removed MSE total Personnel removed MSE total Cost removed MSE total Structures Damaged removed MSE total Structures Destroyed removed MSE total Injuries removed MSE total Fatalities removed MSE
0 Hazardous Terrain 1.0 0.0 0.320 0.794 1.237 0.447 0.550 0.213 0.477 ... -0.026 -0.064 -0.099 -0.036 -0.044 -0.017 -0.038 -0.018 -0.015 -0.022
1 Ecological Resources 1.0 0.0 1.082 0.304 2.720 2.004 0.770 1.698 0.430 ... -0.155 -0.044 -0.391 -0.288 -0.110 -0.244 -0.062 -0.014 -0.010 -0.050
2 Thunderstorms 1.0 0.0 0.550 0.647 0.216 0.000 0.119 0.023 0.190 ... -0.053 -0.062 -0.021 -0.000 -0.011 -0.002 -0.018 -0.055 -0.012 -0.005
3 Wind 1.0 0.0 0.106 0.306 2.538 2.166 0.190 0.109 0.026 ... -0.010 -0.029 -0.240 -0.205 -0.018 -0.010 -0.002 -0.031 -0.024 -0.014
4 Dry Weather 1.0 0.0 0.123 0.264 3.479 3.215 0.552 0.960 0.286 ... -0.013 -0.028 -0.375 -0.347 -0.060 -0.103 -0.031 -0.014 -0.016 -0.035
5 Rain 1.0 0.0 0.107 0.058 1.920 3.795 0.000 0.534 0.000 ... -0.011 -0.006 -0.192 -0.380 -0.000 -0.053 -0.000 -0.027 -0.005 -0.002
6 Smoke 1.0 0.0 0.009 0.484 0.895 1.402 0.198 0.981 1.092 ... -0.001 -0.048 -0.089 -0.139 -0.020 -0.097 -0.108 -0.033 -0.001 -0.035
7 Evacuations 1.0 0.0 0.499 0.198 3.623 3.489 0.563 1.450 0.417 ... -0.062 -0.025 -0.453 -0.436 -0.070 -0.181 -0.052 -0.022 -0.009 -0.048
8 Injury 1.0 0.0 0.519 0.678 0.912 0.002 0.741 0.823 0.994 ... -0.051 -0.067 -0.090 -0.000 -0.073 -0.081 -0.098 -0.020 -0.002 -0.103
9 Resource Shortage 1.0 0.0 0.802 0.559 2.338 1.769 0.795 1.071 0.507 ... -0.088 -0.061 -0.257 -0.194 -0.087 -0.118 -0.056 -0.011 -0.015 -0.031
10 Road Closures 1.0 0.0 0.115 0.556 1.509 0.837 0.551 1.038 0.706 ... -0.011 -0.055 -0.149 -0.083 -0.055 -0.103 -0.070 -0.018 -0.006 -0.052
11 Command Transition 1.0 0.0 0.648 0.020 0.509 0.030 0.050 0.370 0.255 ... -0.076 -0.002 -0.059 -0.003 -0.006 -0.043 -0.030 -0.077 -0.000 -0.016
12 Inaccurate Mapping 1.0 0.0 0.127 0.217 0.817 0.376 0.192 0.621 0.574 ... -0.013 -0.022 -0.082 -0.038 -0.019 -0.062 -0.058 -0.043 -0.001 -0.024
13 Aerial Grounding 1.0 0.0 1.176 0.277 0.199 0.057 0.158 1.085 0.004 ... -0.154 -0.036 -0.026 -0.007 -0.021 -0.142 -0.001 -0.013 -0.004 -0.003
14 Military Base 1.0 0.0 1.645 0.011 9.206 4.829 2.656 0.245 0.098 ... -0.197 -0.001 -1.100 -0.577 -0.317 -0.029 -0.012 -0.066 -0.048 -0.037
15 Cultural Resources 1.0 0.0 0.470 0.393 0.447 0.001 0.773 1.268 1.694 ... -0.038 -0.032 -0.036 -0.000 -0.063 -0.103 -0.138 -0.007 -0.000 -0.050
16 Law Violations 1.0 0.0 0.190 0.001 10.595 19.816 0.875 1.861 0.092 ... -0.019 -0.000 -1.078 -2.017 -0.089 -0.189 -0.009 -0.017 -0.032 -0.009
17 Infrastructure 1.0 0.0 0.602 0.094 5.407 5.104 0.886 1.643 0.694 ... -0.068 -0.011 -0.613 -0.579 -0.101 -0.186 -0.079 -0.008 -0.006 -0.068
18 Livestock 1.0 0.0 0.725 0.684 1.843 0.696 0.584 0.219 0.401 ... -0.064 -0.060 -0.163 -0.062 -0.052 -0.019 -0.035 -0.023 -0.016 -0.034

19 rows × 23 columns

Hazardous Terrain Ecological Resources Thunderstorms Wind Dry Weather Rain Smoke Evacuations Injury Resource Shortage Road Closures Command Transition Inaccurate Mapping Aerial Grounding Military Base Cultural Resources Law Violations Infrastructure Livestock
Fire Frequency -0.313581 -0.771285 -0.449935 -0.196329 -0.225604 0.202982 0.059588 -0.488699 -0.442255 -0.581090 -0.208721 -0.538615 -0.220789 -0.768341 -0.867605 -0.382661 0.272065 -0.511384 -0.495606
Days Fires Burned -0.385132 -0.318617 -0.380660 -0.259672 -0.257430 0.115959 -0.334563 -0.240227 -0.394124 -0.378236 -0.357760 -0.074543 -0.225562 -0.290985 0.054977 -0.272717 -0.013717 -0.157257 -0.375137
Acres Fires Burned 0.503948 0.999900 0.230524 0.783731 0.979924 0.701707 0.476732 1.077046 0.479210 0.810874 0.618201 0.390072 0.458425 0.258290 1.678019 0.304997 1.661379 1.252659 0.645825
Aerial Assets -0.345966 -0.980415 -0.006346 -0.827021 -1.076034 -1.126873 -0.681862 -1.207323 -0.026165 -0.805801 -0.526007 -0.107781 -0.355459 -0.158250 -1.388392 -0.019445 -2.595552 -1.390232 -0.453423
Personnel 0.404772 0.640593 0.205707 0.258009 0.470289 -0.007586 0.270491 0.511258 0.520117 0.569720 0.449915 0.148036 0.267860 0.277383 1.085640 0.483215 0.575140 0.610948 0.437802
Cost 0.254719 0.962682 0.092205 0.198314 0.627064 0.450813 0.608486 0.830326 0.554710 0.668622 0.624789 0.405531 0.486784 0.735040 0.333345 0.625970 0.848553 0.841511 0.270968
Structures Damaged -0.456810 -0.580431 -0.316060 -0.116516 -0.410260 0.000946 -0.768907 -0.533447 -0.730414 -0.551122 -0.617073 -0.402928 -0.560947 -0.055859 -0.252184 -0.867102 -0.225945 -0.655046 -0.439747
Structures Destroyed 0.428869 0.381243 0.757483 0.565964 0.377243 0.532924 0.583970 0.477964 0.457771 0.344615 0.436865 0.893835 0.668565 0.364034 -0.828011 0.272584 -0.425535 0.296102 0.492681
Injuries 0.278519 0.228356 0.249203 0.353876 0.288878 0.152031 0.058770 0.215857 0.090744 0.280603 0.171000 -0.001509 0.085335 0.148282 0.494017 -0.031798 0.406409 0.176420 0.286667
Fatalities -0.288774 -0.437670 -0.135920 -0.230199 -0.368195 0.094284 -0.367375 -0.427765 -0.628802 -0.341738 -0.448117 -0.245164 -0.300802 -0.107585 -0.377401 -0.436124 -0.184218 -0.510116 -0.359751
[79]:
cols = [col for col in delta_df.columns if "MSE" in col]
delta_df.drop(cols, axis=1)
[79]:
hazard R2 for full model Fire Frequency removed score total Days Fires Burned removed score total Acres Fires Burned removed score total Aerial Assets removed score total Personnel removed score total Cost removed score total Structures Damaged removed score total Structures Destroyed removed score total Injuries removed score total Fatalities removed score
0 Hazardous Terrain 1.0 0.320 0.794 1.237 0.447 0.550 0.213 0.477 0.220 0.189 0.272
1 Ecological Resources 1.0 1.082 0.304 2.720 2.004 0.770 1.698 0.430 0.097 0.071 0.349
2 Thunderstorms 1.0 0.550 0.647 0.216 0.000 0.119 0.023 0.190 0.574 0.126 0.050
3 Wind 1.0 0.106 0.306 2.538 2.166 0.190 0.109 0.026 0.326 0.259 0.146
4 Dry Weather 1.0 0.123 0.264 3.479 3.215 0.552 0.960 0.286 0.127 0.151 0.329
5 Rain 1.0 0.107 0.058 1.920 3.795 0.000 0.534 0.000 0.272 0.045 0.023
6 Smoke 1.0 0.009 0.484 0.895 1.402 0.198 0.981 1.092 0.330 0.007 0.355
7 Evacuations 1.0 0.499 0.198 3.623 3.489 0.563 1.450 0.417 0.176 0.073 0.382
8 Injury 1.0 0.519 0.678 0.912 0.002 0.741 0.823 0.994 0.205 0.016 1.051
9 Resource Shortage 1.0 0.802 0.559 2.338 1.769 0.795 1.071 0.507 0.104 0.140 0.278
10 Road Closures 1.0 0.115 0.556 1.509 0.837 0.551 1.038 0.706 0.185 0.058 0.531
11 Command Transition 1.0 0.648 0.020 0.509 0.030 0.050 0.370 0.255 0.657 0.000 0.134
12 Inaccurate Mapping 1.0 0.127 0.217 0.817 0.376 0.192 0.621 0.574 0.427 0.014 0.235
13 Aerial Grounding 1.0 1.176 0.277 0.199 0.057 0.158 1.085 0.004 0.097 0.033 0.023
14 Military Base 1.0 1.645 0.011 9.206 4.829 2.656 0.245 0.098 0.551 0.399 0.311
15 Cultural Resources 1.0 0.470 0.393 0.447 0.001 0.773 1.268 1.694 0.088 0.002 0.611
16 Law Violations 1.0 0.190 0.001 10.595 19.816 0.875 1.861 0.092 0.171 0.317 0.087
17 Infrastructure 1.0 0.602 0.094 5.407 5.104 0.886 1.643 0.694 0.074 0.054 0.600
18 Livestock 1.0 0.725 0.684 1.843 0.696 0.584 0.219 0.401 0.264 0.182 0.383

Experimental

The following cells are experimental supplementary analyses related to the secondary analysis. Here there are experiments regarding predictors and colinearity.

[80]:
totals = {key: list(totals[key]) for key in totals}
[81]:
totals_new = {predictor: totals_scaled[predictor] for predictor in totals_scaled if predictor not in ["total Structures Damaged", "total Structures Destroyed"]}
totals_new["total structure"] = minmax_scale([totals["total Structures Damaged"][i]+totals["total Structures Destroyed"][i] for i in range(len(totals["total Structures Destroyed"]))])
[82]:
corrMatrix_fires, correlation_mat_total_fires, p_values = create_correlation_matrix(totals_new, fire_freqs_scaled, graph=False)
[83]:
correlation_mat_total_fires
[83]:
Fire Frequency total Days Fires Burned total Acres Fires Burned total Aerial Assets total Personnel total Cost total Injuries total Fatalities total structure Hazardous Terrain ... Resource Shortage Road Closures Command Transition Inaccurate Mapping Aerial Grounding Military Base Cultural Resources Law Violations Infrastructure Livestock
0 1.000000 0.968467 0.971840 0.903676 1.000000 0.351300 0.837050 0.88 0.233468 0.290398 ... 0.130682 0.302326 0.000000 0.206667 0.000000 1.0 0.432692 0.582090 0.228571 0.142857
1 0.514501 0.417518 0.855911 0.641318 0.806630 0.484181 1.000000 0.84 1.000000 1.000000 ... 1.000000 1.000000 1.000000 1.000000 0.904762 0.8 0.875000 0.656716 1.000000 1.000000
2 0.543502 0.000000 0.269501 0.395437 0.604627 1.000000 0.778731 0.76 0.542703 0.711944 ... 0.857955 0.888372 0.534161 0.740000 1.000000 0.4 0.894231 0.791045 0.790476 0.619048
3 0.015038 0.251387 0.316664 0.065272 0.253145 0.123329 0.195540 0.12 0.192790 0.494145 ... 0.602273 0.530233 0.360248 0.413333 0.523810 1.0 0.653846 0.686567 0.800000 0.533333
4 0.258861 0.124672 0.000000 0.000000 0.055537 0.000000 0.000000 0.00 0.000000 0.210773 ... 0.136364 0.255814 0.006211 0.166667 0.095238 0.2 0.490385 0.223881 0.257143 0.152381
5 0.854995 1.000000 0.930638 0.426489 0.443150 0.278302 0.346484 0.36 0.723218 0.484778 ... 0.431818 0.655814 0.627329 0.693333 0.142857 0.6 0.576923 1.000000 0.866667 0.457143
6 0.461869 0.924088 1.000000 1.000000 0.896474 0.817606 0.696398 0.64 0.729902 0.697892 ... 0.778409 0.837209 0.894410 0.806667 0.904762 1.0 1.000000 0.313433 0.876190 0.695238
7 0.000000 0.569343 0.264532 0.419518 0.524836 0.559381 0.423671 1.00 0.264700 0.372365 ... 0.488636 0.376744 0.453416 0.393333 0.761905 0.6 0.711538 0.104478 0.447619 0.333333
8 0.111708 0.266861 0.176067 0.245300 0.000000 0.547244 0.240137 0.16 0.576248 0.000000 ... 0.000000 0.000000 0.124224 0.000000 0.619048 0.0 0.000000 0.000000 0.000000 0.000000

9 rows × 28 columns

[84]:
predictors = [p for p in totals_new]
hazards = [h for h in fire_freqs_scaled]
reshape_correlation_matrix(corrMatrix_fires, p_values, predictors, hazards)
../_images/nblinks_ics_heat_153_0.png
[85]:
predictors = [p for p in totals_new]
hazards = [h.replace("total ","") for h in fire_freqs_scaled]
results_df, delta_df, coefficient_df = multiple_reg_feature_importance(predictors, hazards, correlation_mat_total_fires)
display(results_df, delta_df, coefficient_df)
../_images/nblinks_ics_heat_154_0.png
../_images/nblinks_ics_heat_154_1.png
hazard R2 for full model MSE for full model Fire Frequency removed score total Days Fires Burned removed score total Acres Fires Burned removed score total Aerial Assets removed score total Personnel removed score total Cost removed score total Injuries removed score ... total structure removed score Fire Frequency removed MSE total Days Fires Burned removed MSE total Acres Fires Burned removed MSE total Aerial Assets removed MSE total Personnel removed MSE total Cost removed MSE total Injuries removed MSE total Fatalities removed MSE total structure removed MSE
0 Hazardous Terrain 1.0 0.0 0.998 -3.219 0.768 -0.623 -18.478 0.942 -5.249 ... -0.482 0.000 0.338 0.019 0.130 1.562 0.005 0.501 0.003 0.119
1 Ecological Resources 1.0 0.0 0.656 -1.463 -0.201 -2.790 -15.777 0.526 -4.314 ... 0.093 0.049 0.354 0.172 0.544 2.409 0.068 0.763 0.000 0.130
2 Thunderstorms 1.0 0.0 0.919 -2.368 0.992 0.703 -11.873 0.782 -4.017 ... -1.165 0.008 0.324 0.001 0.029 1.238 0.021 0.482 0.017 0.208
3 Wind 1.0 0.0 0.994 -0.259 -0.664 -2.116 -3.930 1.000 0.152 ... 0.025 0.001 0.119 0.157 0.295 0.466 0.000 0.080 0.000 0.092
4 Dry Weather 1.0 0.0 0.998 -0.988 -0.903 -4.106 -11.762 0.807 -2.505 ... 0.132 0.000 0.214 0.205 0.551 1.376 0.021 0.378 0.000 0.094
5 Rain 1.0 0.0 0.775 0.991 -0.465 -3.489 -0.057 0.718 0.612 ... 0.369 0.023 0.001 0.147 0.450 0.106 0.028 0.039 0.013 0.063
6 Smoke 1.0 0.0 0.292 -4.610 0.999 -3.227 -31.827 1.000 -16.137 ... -1.675 0.070 0.557 0.000 0.420 3.259 0.000 1.701 0.019 0.265
7 Evacuations 1.0 0.0 0.948 -1.310 -0.745 -4.893 -16.169 0.691 -4.964 ... -0.228 0.006 0.289 0.218 0.737 2.148 0.039 0.746 0.000 0.154
8 Injury 1.0 0.0 0.999 -4.561 0.985 0.284 -31.617 1.000 -13.042 ... -1.001 0.000 0.547 0.001 0.070 3.208 0.000 1.381 0.001 0.197
9 Resource Shortage 1.0 0.0 0.834 -2.396 0.128 -2.615 -17.949 0.861 -4.717 ... -0.018 0.018 0.373 0.096 0.397 2.082 0.015 0.628 0.003 0.112
10 Road Closures 1.0 0.0 0.937 -3.285 0.741 -1.578 -23.280 0.952 -8.632 ... -0.605 0.006 0.424 0.026 0.255 2.402 0.005 0.953 0.001 0.159
11 Command Transition 1.0 0.0 0.913 -0.671 0.991 0.409 -13.779 0.993 -7.398 ... -1.566 0.010 0.195 0.001 0.069 1.727 0.001 0.982 0.014 0.300
12 Inaccurate Mapping 1.0 0.0 0.937 -2.354 0.973 -0.752 -21.516 0.999 -10.048 ... -1.394 0.006 0.337 0.003 0.176 2.262 0.000 1.110 0.013 0.241
13 Aerial Grounding 1.0 0.0 0.080 0.345 0.913 0.852 -0.647 0.239 0.721 ... 0.726 0.121 0.086 0.011 0.019 0.216 0.100 0.037 0.000 0.036
14 Military Base 1.0 0.0 -0.739 0.964 -8.432 -3.656 -0.897 0.705 0.295 ... 0.182 0.208 0.004 1.127 0.556 0.227 0.035 0.084 0.045 0.098
15 Cultural Resources 1.0 0.0 0.985 -4.960 0.897 0.029 -42.847 0.999 -20.483 ... -0.955 0.001 0.485 0.008 0.079 3.566 0.000 1.747 0.010 0.159
16 Law Violations 1.0 0.0 0.762 0.980 -9.159 -19.375 -0.706 -0.674 0.927 ... 0.832 0.024 0.002 1.034 2.074 0.174 0.170 0.007 0.005 0.017
17 Infrastructure 1.0 0.0 0.945 -1.311 -1.770 -7.370 -21.867 0.699 -7.042 ... -0.043 0.006 0.262 0.314 0.949 2.593 0.034 0.912 0.000 0.118
18 Livestock 1.0 0.0 0.875 -2.789 0.435 -1.008 -17.469 0.957 -4.752 ... -0.545 0.011 0.335 0.050 0.178 1.633 0.004 0.509 0.001 0.137

19 rows × 21 columns

hazard R2 for full model MSE for full model Fire Frequency removed score total Days Fires Burned removed score total Acres Fires Burned removed score total Aerial Assets removed score total Personnel removed score total Cost removed score total Injuries removed score ... total structure removed score Fire Frequency removed MSE total Days Fires Burned removed MSE total Acres Fires Burned removed MSE total Aerial Assets removed MSE total Personnel removed MSE total Cost removed MSE total Injuries removed MSE total Fatalities removed MSE total structure removed MSE
0 Hazardous Terrain 1.0 0.0 0.002 4.219 0.232 1.623 19.478 0.058 6.249 ... 1.482 -0.000 -0.338 -0.019 -0.130 -1.562 -0.005 -0.501 -0.003 -0.119
1 Ecological Resources 1.0 0.0 0.344 2.463 1.201 3.790 16.777 0.474 5.314 ... 0.907 -0.049 -0.354 -0.172 -0.544 -2.409 -0.068 -0.763 -0.000 -0.130
2 Thunderstorms 1.0 0.0 0.081 3.368 0.008 0.297 12.873 0.218 5.017 ... 2.165 -0.008 -0.324 -0.001 -0.029 -1.238 -0.021 -0.482 -0.017 -0.208
3 Wind 1.0 0.0 0.006 1.259 1.664 3.116 4.930 0.000 0.848 ... 0.975 -0.001 -0.119 -0.157 -0.295 -0.466 -0.000 -0.080 -0.000 -0.092
4 Dry Weather 1.0 0.0 0.002 1.988 1.903 5.106 12.762 0.193 3.505 ... 0.868 -0.000 -0.214 -0.205 -0.551 -1.376 -0.021 -0.378 -0.000 -0.094
5 Rain 1.0 0.0 0.225 0.009 1.465 4.489 1.057 0.282 0.388 ... 0.631 -0.023 -0.001 -0.147 -0.450 -0.106 -0.028 -0.039 -0.013 -0.063
6 Smoke 1.0 0.0 0.708 5.610 0.001 4.227 32.827 0.000 17.137 ... 2.675 -0.070 -0.557 -0.000 -0.420 -3.259 -0.000 -1.701 -0.019 -0.265
7 Evacuations 1.0 0.0 0.052 2.310 1.745 5.893 17.169 0.309 5.964 ... 1.228 -0.006 -0.289 -0.218 -0.737 -2.148 -0.039 -0.746 -0.000 -0.154
8 Injury 1.0 0.0 0.001 5.561 0.015 0.716 32.617 0.000 14.042 ... 2.001 -0.000 -0.547 -0.001 -0.070 -3.208 -0.000 -1.381 -0.001 -0.197
9 Resource Shortage 1.0 0.0 0.166 3.396 0.872 3.615 18.949 0.139 5.717 ... 1.018 -0.018 -0.373 -0.096 -0.397 -2.082 -0.015 -0.628 -0.003 -0.112
10 Road Closures 1.0 0.0 0.063 4.285 0.259 2.578 24.280 0.048 9.632 ... 1.605 -0.006 -0.424 -0.026 -0.255 -2.402 -0.005 -0.953 -0.001 -0.159
11 Command Transition 1.0 0.0 0.087 1.671 0.009 0.591 14.779 0.007 8.398 ... 2.566 -0.010 -0.195 -0.001 -0.069 -1.727 -0.001 -0.982 -0.014 -0.300
12 Inaccurate Mapping 1.0 0.0 0.063 3.354 0.027 1.752 22.516 0.001 11.048 ... 2.394 -0.006 -0.337 -0.003 -0.176 -2.262 -0.000 -1.110 -0.013 -0.241
13 Aerial Grounding 1.0 0.0 0.920 0.655 0.087 0.148 1.647 0.761 0.279 ... 0.274 -0.121 -0.086 -0.011 -0.019 -0.216 -0.100 -0.037 -0.000 -0.036
14 Military Base 1.0 0.0 1.739 0.036 9.432 4.656 1.897 0.295 0.705 ... 0.818 -0.208 -0.004 -1.127 -0.556 -0.227 -0.035 -0.084 -0.045 -0.098
15 Cultural Resources 1.0 0.0 0.015 5.960 0.103 0.971 43.847 0.001 21.483 ... 1.955 -0.001 -0.485 -0.008 -0.079 -3.566 -0.000 -1.747 -0.010 -0.159
16 Law Violations 1.0 0.0 0.238 0.020 10.159 20.375 1.706 1.674 0.073 ... 0.168 -0.024 -0.002 -1.034 -2.074 -0.174 -0.170 -0.007 -0.005 -0.017
17 Infrastructure 1.0 0.0 0.055 2.311 2.770 8.370 22.867 0.301 8.042 ... 1.043 -0.006 -0.262 -0.314 -0.949 -2.593 -0.034 -0.912 -0.000 -0.118
18 Livestock 1.0 0.0 0.125 3.789 0.565 2.008 18.469 0.043 5.752 ... 1.545 -0.011 -0.335 -0.050 -0.178 -1.633 -0.004 -0.509 -0.001 -0.137

19 rows × 21 columns

Hazardous Terrain Ecological Resources Thunderstorms Wind Dry Weather Rain Smoke Evacuations Injury Resource Shortage Road Closures Command Transition Inaccurate Mapping Aerial Grounding Military Base Cultural Resources Law Violations Infrastructure Livestock
Fire Frequency -0.026710 -0.434907 -0.172281 -0.044706 0.030674 0.294113 0.518757 -0.157628 -0.022735 -0.264684 0.154315 -0.196994 0.155867 -0.679722 -0.892195 0.068827 0.304387 -0.154814 -0.205733
Days Fires Burned -0.887679 -0.907474 -0.868228 -0.526449 -0.706361 -0.045041 -1.138592 -0.820146 -1.128467 -0.932083 -0.993370 -0.674310 -0.885755 -0.447021 0.100759 -1.062265 -0.068719 -0.781091 -0.883145
Acres Fires Burned 0.218366 0.664421 -0.044154 0.634500 0.724769 0.612986 0.019111 0.747378 0.060722 0.495242 0.256264 0.051919 0.083995 0.171237 1.698509 -0.146490 1.626812 0.896498 0.357554
Aerial Assets -0.659419 -1.348238 -0.308950 -0.991925 -1.356072 -1.225549 -1.183809 -1.569110 -0.484941 -1.151817 -0.922921 -0.480188 -0.766777 -0.254556 -1.363317 -0.513679 -2.631943 -1.780378 -0.770022
Personnel 2.408827 2.991060 2.143770 1.315660 2.260642 0.627199 3.478669 2.824167 3.451628 2.780700 2.986533 2.532783 2.898649 0.895393 0.917536 3.639132 0.803142 3.103003 2.462551
Cost -0.132613 0.508451 -0.282524 -0.006252 0.281038 0.327950 -0.011527 0.383307 -0.011801 0.241353 0.134572 -0.055549 -0.021726 0.615493 0.366185 0.016190 0.804696 0.359965 -0.120391
Injuries -1.601518 -1.975918 -1.571002 -0.640381 -1.390647 -0.445877 -2.950252 -1.953818 -2.658327 -1.792774 -2.208000 -2.241003 -2.383291 -0.432888 0.656517 -2.989975 0.195391 -2.159981 -1.613143
Fatalities 0.111248 0.031103 0.252046 -0.017979 -0.010848 0.222284 0.272664 0.033852 -0.044206 0.099170 0.057863 0.232096 0.224665 0.016528 -0.413534 0.192506 -0.140251 -0.013458 0.044595
structure 1.055031 1.103966 1.395890 0.928990 0.936069 0.769336 1.576520 1.198997 1.357454 1.023104 1.219264 1.675715 1.500671 0.579354 -0.956481 1.220021 -0.399779 1.051994 1.131003

Colinearity

[86]:
from statsmodels.stats.outliers_influence import variance_inflation_factor
[87]:
vif_data = pd.DataFrame()
input_df = pd.DataFrame({predictor:totals_new[predictor] for predictor in totals_new})
vif_data["feature"] = input_df.columns
vif_data["VIF"] = [variance_inflation_factor(input_df.values, i)
                          for i in range(len(input_df.columns))]
vif_data
[87]:
feature VIF
0 Fire Frequency 10.545422
1 total Days Fires Burned 98.508631
2 total Acres Fires Burned 318.148035
3 total Aerial Assets 47.954849
4 total Personnel 215.555330
5 total Cost 21.487662
6 total Injuries 187.177715
7 total Fatalities 80.066638
8 total structure 54.806470
[88]:
vif_data = pd.DataFrame()
input_df = pd.DataFrame({predictor:combined_predictors_scaled[predictor] for predictor in combined_predictors_scaled})
vif_data["feature"] = input_df.columns
vif_data["VIF"] = [variance_inflation_factor(input_df.values, i)
                          for i in range(len(input_df.columns))]
vif_data
[88]:
feature VIF
0 Fire Characteristics 3.984066
1 Operations 17.004590
2 Intensity 13.810379
[89]:
sums = []
for col in input_df:
    temp_input = input_df.drop(col, axis=1)
    vif_data = pd.DataFrame()
    vif_data["feature"] = temp_input.columns
    vif_data["VIF"] = [variance_inflation_factor(temp_input.values, i)
                          for i in range(len(temp_input.columns))]
    sum_vif = sum(vif_data["VIF"].tolist())
    sums.append(sum_vif)
    display(col,sum_vif, vif_data)
print(min(sums), sums.index(min(sums)), predictors[sums.index(min(sums))])
'Fire Characteristics'
27.612735626886067
feature VIF
0 Operations 13.806368
1 Intensity 13.806368
'Operations'
6.469486432284111
feature VIF
0 Fire Characteristics 3.234743
1 Intensity 3.234743
'Intensity'
7.96581799128896
feature VIF
0 Fire Characteristics 3.982909
1 Operations 3.982909
6.469486432284111 1 total Days Fires Burned
[90]:
# to_drop = ["total Acres Fires Burned", "total Personnel", "total Injuries", "total Aerial Assets", "total Cost"]#,"total Days Fires Burned"]
# temp_input = input_df.drop(to_drop, axis=1)
# vif_data = pd.DataFrame()
# vif_data["feature"] = temp_input.columns
# vif_data["VIF"] = [variance_inflation_factor(temp_input.values, i)
#                       for i in range(len(temp_input.columns))]
# sum_vif = sum(vif_data["VIF"].tolist())
# display(sum_vif, vif_data)
Previous Next

© Copyright Copyright © 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. All Rights Reserved.

Built with Sphinx using a theme provided by Read the Docs.