SafeAeroBERT

SafeAeroBERT: A Safety-Informed Aviation-Specific Langauge Model

Available at: https://huggingface.co/NASA-AIML/MIKA_SafeAeroBERT

base-bert-uncased model first further pre-trained on the set of Aviation Safety Reporting System (ASRS) documents up to November of 2022 and National Trasportation Safety Board (NTSB) accident reports up to November 2022. A total of 2,283,435 narrative sections are split 90/10 for training and validation, with 1,052,207,104 tokens from over 350,000 NTSB and ASRS documents used for pre-training.

The model was trained on two epochs using AutoModelForMaskedLM.from_pretrained with a learning_rate=1e-5, and total batch size of 128 for just over 32100 training steps.

An earlier version of the model was evaluted on a downstream binary document classification task by fine-tuning the model with AutoModelForSequenceClassification.from_pretrained. SafeAeroBERT was compared to SciBERT and base-BERT on this task, with the following performance:

Classification Metrics

Contributing Factor

Metric

BERT

SciBERT

SafeAeroBERT

Aircraft

Accuracy

0.747

0.726

0.740

Precision

0.716

0.691 -

0.548

Recall

0.747

0.726

0.740

F-1

0.719

0.699

0.629

Human Factors

Accuracy

0.608

0.557

0.549

Precision

0.618

0.586

0.527

Recall

0.608

0.557

0.549

F-1

0.572*

0.426

0.400

Procedure

Accuracy

0.766

0.755

0.845

Precision

0.766

0.762

0.742

Recall

0.766

0.755

0.845

F-1

0.766

0.758

0.784

Weather

Accuracy

0.807

0.808

0.871

Precision

0.803

0.769

0.759

Recall

0.807

0.808

0.871

F-1

0.805

0.788

0.811

More infomation on training data, evaluation, and intended use can be found in the original publication

Citation: Sequoia R. Andrade and Hannah S. Walsh. “SafeAeroBERT: Towards a Safety-Informed Aerospace-Specific Language Model,” AIAA 2023-3437. AIAA AVIATION 2023 Forum. June 2023.