Machine Learning-Based Real-Time SQL Injection Detection System Using TF-IDF and Naive Bayes

Authors

  • GUDE BALAMBIKA,V.SARALA Author

DOI:

https://doi.org/10.64751/

Abstract

SQL Injection (SQLi) remains one of the most critical and prevalent security
vulnerabilities in modern web applications. Attackers exploit improper input validation
mechanisms to manipulate backend databases, leading to unauthorized data access, data
leakage, and system compromise. Traditional rule-based detection systems often fail to
identify evolving attack patterns, making it necessary to adopt intelligent, adaptive
approaches. This paper presents a machine learning-based SQL Injection Detection
System that leverages Natural Language Processing (NLP) techniques and probabilistic
classification to identify malicious queries in real time. The proposed system utilizes
Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction,
transforming raw SQL queries into numerical vectors that capture the importance of
terms within the dataset. A Multinomial Naive Bayes classifier is then trained on labeled
datasets containing both legitimate and malicious SQL queries. The model learns
statistical patterns associated with SQL injection attempts, enabling it to classify unseen
queries effectively. A user-friendly graphical interface is developed using Tkinter,
allowing users to input SQL queries and receive instant feedback regarding their safety.
The system provides not only classification results (SAFE or ATTACK) but also
confidence scores, enhancing transparency and usability. Additionally, a query history
feature is implemented to track previous inputs, aiding in monitoring and analysis.
The proposed system demonstrates several advantages, including high efficiency, low
computational complexity, and ease of deployment. Unlike traditional signature-based
systems, it can generalize and detect previously unseen attack patterns. The use of
machine learning significantly improves detection accuracy and adaptability in dynamic
environments. Experimental results indicate that the model performs effectively on realworld
datasets, achieving reliable classification with minimal latency

Downloads

Published

2026-04-05

How to Cite

GUDE BALAMBIKA,V.SARALA. (2026). Machine Learning-Based Real-Time SQL Injection Detection System Using TF-IDF and Naive Bayes. International Journal of Data Science and IoT Management System, 5(2), 766-778. https://doi.org/10.64751/