ENHANCING BUG DETECTION IN SOFTWARE PROJECTS USING ML ENSEMBLE TECHNIQUES

A. Yashwanth Reddy; G. Rahul; S. Purushotham; R. Rohith; A. Akhil

doi:10.64751/ijdim.2025.v4.n3.pp67-75

Authors

A. Yashwanth Reddy Author
G. Rahul Author
S. Purushotham Author
R. Rohith Author
A. Akhil Author

DOI:

https://doi.org/10.64751/ijdim.2025.v4.n3.pp67-75

Keywords:

ML-based bug prediction, ensemble learning, bug classification, Eclipse-Mozilla dataset, TF-IDF, XGBoost, Random Forest

Abstract

Software maintenance in large open-source ecosystems hinges on rapid, accurate bug triage. The Eclipse project alone accumulates tens of thousands of issue reports annually, spanning multiple components and severity levels. Traditionally, human experts manually classify and route these reports, a process that is labour-intensive, inconsistent, and unscalable as project size grows. Prior attempts to automate classification using single machine-learning models—such as Support Vector Machines (SVM) or Logistic Regression—have yielded moderate accuracy (≈70–75%) but often require extensive feature engineering and fail to generalize across evolving bug corpora. This research addresses the urgent need for a scalable, ensemble-driven framework capable of automating eclipse bug classification with high fidelity. We propose an end-to-end system that ingests raw bug descriptions, applies rigorous data preprocessing (tokenization, stop-word removal, lemmatization), and converts textual data into numerical vectors via Term Frequency–Inverse Document Frequency (TF-IDF). Five classifiers—SVM, Random Forest Classifier (RFC), Logistic Regression Classifier (LRC), Extra-Trees Voting (EV) ensemble, and Extreme Gradient Boosting (XGBoost)—are trained and evaluated on a curated eclipse–mozilla dataset. Our proposed system integrates a user-friendly GUI to guide non-expert users through data upload, preprocessing visualization, and model selection. Under a 70/30 train-test split, performance metrics reveal: SVM achieves 74.23% accuracy, 83.03% precision, 73.94% recall, and 75.24% F₁-score; RFC records 83.51% accuracy, 87.68% precision, 83.40% recall, and 84.09% F₁; LRC attains 70.10% accuracy, 75.06% precision, 70.19% recall, and 71.10% F₁; EV yields 89.69% accuracy, 91.10% precision, 90.29% recall, and 90.21% F₁; while XGBoost outperforms all with 92.27% accuracy, 92.91% precision, 92.65% recall, and 92.50% F₁- score.

ENHANCING BUG DETECTION IN SOFTWARE PROJECTS USING ML ENSEMBLE TECHNIQUES

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Similar Articles

MULTI-MODAL PHISHING DETECTION: INTEGRATING URL, CONTENT, AND VISUAL FEATURES FOR ENHANCED ACCURACY

ML-DRIVEN APPROACH FOR DDOS DETECTION IN WIRELESS SENSOR NETWORKS

COMPARATIVE STUDY ON MACHINE LEARNING APPROACHES FOR TEXT CLASSIFICATION

ADVANCED MACHINE LEARNING MODELS FOR EARLY DETECTION OF CARDIAC DISORDERS

DATA-DRIVEN NEUROLOGY: MACHINE LEARNING APPLICATIONS IN STROKE FORECASTING

REAL-TIME INSIDER THREAT DETECTION IN CLOUD PLATFORMS THROUGH ENSEMBLE LEARNING AND USER BEHAVIOR ANALYTICS

AN ENSEMBLE LEARNING BASED INTRUSION DETECTION MODEL FOR INDUSTRIAL IOT SECURITY

SMART SCAN: HYBRID DEEP LEARNING AND MACHINE LEARNING FRAMEWORK FOR MRI-BASED BRAIN TUMOR DETECTION

TEXT ANALYZER USING MACHINE LEARNING

AI-BASED ENERGY CONSUMPTION FORECASTING IN SMART GRIDS FOR FUTURE IOT-BASED ENERGY METERS