A Multi-Model Learning Approach for Early Identification of Obesity Risk from Lifestyle Factors

E Tharun; K Bhaskar

doi:10.64751/

Authors

E Tharun Author
K Bhaskar Author

DOI:

https://doi.org/10.64751/

Keywords:

Obesity, machine learning, stacking, explainable AI

Abstract

Obesity is still a major public health problem around the world, and correct risk classification is needed to help people get help early and avoid long-term problems. The study uses the UCI Machine Learning Repository dataset, which has information about eating habits, physical exercise, and anthropometric measurements. It gives a wide range of examples of how people live their lives. The suggested method uses StandardScaler to normalize the data, RFECV with Logistic Regression to choose the features, and SMOTENC to resample classes that aren't balanced. Stratified 5- Fold Cross-Validation is used to measure how well the method works. Adaboost, Perceptron, GaussianNB, SGD, SVM, KNN, MLP, Decision Tree, ExtraTrees, BaggingClassifier, RandomForest, GradientBoosting, LogisticRegressionCV, XGBoost, and LightGBM are some of the algorithms that are used. There is also a proposed Stacking model that uses ensemble learning and LIME to give local, model-agnostic explanations of individual predictions. In addition, to further enhance robustness and interpretability, an Extended Voting ensemble combining Gradient Boosting Classifier, XGB Classifier, LGBM Classifier and CatBoost Classifier is reserved as an extension with SHAP-based global explanations and deployment via a Flask framework. With 99.3% accuracy, a perfect ROC-AUC, and relatively high precision, recall, and F1-score, the combined pipeline does a great job of classifying data.