Scalable Customer Support Ticket Analysis Using Hybrid NLP and Ensemble-Based Classification Models

Bulusu Rama; Ponthati Praveen Reddy; Bollapalli Sai Bharath; Varshith Kambhampati

doi:10.64751/ijdim.2026.v5.n2(1).811

Authors

Bulusu Rama Author
Ponthati Praveen Reddy Author
Bollapalli Sai Bharath Author
Varshith Kambhampati Author

DOI:

https://doi.org/10.64751/ijdim.2026.v5.n2(1).811

Keywords:

Customer Support Analytics, Multi-Target Classification, Ticket Priority Prediction, Transformer Models, XLNet, Text Mining

Abstract

Customer support centres generate vast volumes of support tickets reflecting customer issues, priorities, and satisfaction levels. Accurate and timely prediction of customer satisfaction, ticket priority, and resolution outcomes from these unstructured textual data is crucial to improving service quality and operational efficiency. Traditionally, support ticket management has relied on manual assessment or basic machine learning models using shallow text features like bag-of-words or Term Frequency–Inverse Document Frequency (TF-IDF), which lack a deep understanding of context and semantics. These approaches are limited in scalability, accuracy, and the ability to handle multiple predictions simultaneously, often leading to delayed or inconsistent customer service. Motivated by these limitations and the increasing availability of large-scale support ticket data, this project proposes a hybrid machine learning framework leveraging state-of-the-art natural language processing techniques. The system preprocesses support ticket text through lemmatization, stop word removal, and part-of-speech tagging to clean and normalize input data. Using eXtreem Language Network (XLNet), a powerful transformer-based language model, contextual embeddings are generated to capture semantic nuances beyond traditional feature engineering. Multiple classical machine learning classifiers including Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), Hist Gradient Boosting (HGB), Stochastic Gradient Descent (SGD) Classifier, and Nearest Centroid (NC) are trained on these embeddings to predict three key targets: Ticket Priority, Customer Satisfaction Rating, and Resolution. The proposed multi-target classification system uniquely combines advanced text mining with a diverse set of machine learning models to improve prediction accuracy, interpretability, and robustness across varied ticket types.