An Optimized Transformer Ensemble Model for Context-Aware Speech Act Classification

Authors

  • M. Ganesh Author
  • Mogulagani Ankitha Author
  • Komakula Sathwika Author
  • Kaniganti Chandu Author
  • Pogula Nagaraju Author

DOI:

https://doi.org/10.64751/ijdim.2026.v5.n2(1).700

Keywords:

SMOTE (Synthetic Minority Over-sampling Technique), Light Gradient Boosting Machine (LGBM), Categorical Boosting (CatBoost), Support Vector Machine (SVM), Boosting Fusion Model (BFM), Natural Language Processing (NLP).

Abstract

The rapid growth of digital communication has increased the need for understanding user intent in textual data, leading to the development of speech act classification systems. Historically, traditional approaches relied on rule-based methods and basic machine learning techniques, which struggled to capture the complexity of natural language. The main problem lies in accurately classifying unstructured and context-dependent text into meaningful categories such as assertive, directive, expressive, and question. Conventional systems often depend on manual feature extraction techniques like bag-of-words and TF-IDF, which fail to capture semantic relationships and contextual dependencies. These limitations result in lower accuracy, poor generalization, and difficulty in handling imbalanced datasets. To address these challenges, there is a need for a more robust and intelligent system that can effectively process textual data and improve classification performance. This research proposes a transformer-driven approach that utilizes eXtreme Language Network (XLNet) for feature extraction, combined with multiple machine learning models including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and a Boosting Fusion Model (BFM) integrating Light Gradient Boosting Machine (LGBM) and Categorical Boosting (CB). The system incorporates Natural Language Preprocessing (NLP) preprocessing techniques and Synthetic Minority Oversampling Technique (SMOTE) based data balancing to enhance learning efficiency. The proposed system demonstrates significant improvements in accuracy and reliability, with the BFM achieving an accuracy of 96.33%, outperforming LR, RF, and SVM.

Downloads

Published

2026-04-10

How to Cite

M. Ganesh, Mogulagani Ankitha, Komakula Sathwika, Kaniganti Chandu, & Pogula Nagaraju. (2026). An Optimized Transformer Ensemble Model for Context-Aware Speech Act Classification. International Journal of Data Science and IoT Management System, 5(2(1), 253-265. https://doi.org/10.64751/ijdim.2026.v5.n2(1).700

Similar Articles

1-10 of 454

You may also start an advanced similarity search for this article.