OSTEOPOROSIS FRACTURE PREDICTION

A comprehensive data science project assessing risk factors and predicting bone fractures in women with osteoporosis using various statistical models on the glow_bonemed dataset.

Collaboration: Waleed Amer & Nolan Dulude | Data Science Masters Program

Project Objective

Assess risk factors and predict bone fractures in women with osteoporosis using various statistical models. Compare model performance to identify the best predictive approach for clinical decision-making.

Dataset Overview

Analysis of the glow_bonemed dataset containing clinical variables including BMI, age, prior fractures, bone medications, and demographic factors to predict first-year fracture risk.

Methodology Workflow

1. DATA CLEANUP
Find missing data and replace or omit from dataset
2. EDA
Explore dataset through visualizations to understand interactions
3. SIMPLE MODEL
Create simple Logistic Regression model based on EDA analysis
4. CROSS VALIDATION
Train and test complex model with interactions
5. QDA MODEL
Use QDA to achieve improved model performance
6. RANDOM FOREST
Utilize Random Forest to improve model statistics

Model Performance Results

0.98
Random Forest AUROC
0.83
QDA AUROC
0.75
CV Interaction AUROC
0.57
Simple Logistic AUROC
ModelSensitivitySpecificityPPVNPVAUROC
CV Interaction Model0.80270.51200.83150.46380.7463
QDA0.98670.68000.90240.94440.8333
Random Forest1.000.960.98681.000.9800

Key Insights & Risk Factors

BMI & Age Impact
Higher BMI and age increase fracture risk, with odds ratios of 1.5 for BMI and 2.0 for age over 65
Smoking & Prior Fractures
Smokers and those with prior fractures have higher risks, with smoking increasing odds by 1.8 times and prior fractures by 2.5 times
Menopause & Family History
Premature menopause and maternal hip fractures increase risk, with odds ratios of 1.6 and 1.9, respectively
Bone Medication Adherence
Adherence to bone medications reduces fracture risk by 50%, as shown in our random forest model
Physical Function
Needing assistive devices to stand increases risk, with an odds ratio of 1.7. Strength and balance training is recommended
Model Performance
Random Forest outperformed all models due to its ability to capture non-linear relationships and feature interactions

Conclusions & Clinical Implications

Model Performance

The Random Forest model achieved the highest AUROC of 0.98, significantly outperforming simpler approaches. The simple logistic regression showed moderate predictive ability (AUROC 0.57), while complex models with interactions and QDA showed progressive improvement.

Clinical Relevance

AUROC and sensitivity are crucial metrics in clinical contexts - AUROC provides balanced performance assessment while high sensitivity minimizes false negatives, ensuring high-risk patients are identified for preventive interventions.

Future Directions

With additional time, the project would benefit from advanced feature engineering, external validation on diverse populations, and exploration of deep learning approaches for enhanced predictive accuracy.

Site Map

© 2024 Waleed Amer. All rights reserved.

← Back to Neural Network Home