A Robust URL Classification Approach for Phishing Detection Using Machine Learning Techniques

LAKKAVARAPU RAJKUMAR, A.Durga Devi

doi:10.64751/

Authors

LAKKAVARAPU RAJKUMAR, A.Durga Devi Author

DOI:

https://doi.org/10.64751/

Keywords:

Phishing Detection, URL Classification, Machine Learning, Naive Bayes, Support Vector Machine, Logistic Regression, Cyber Security, Text Mining, Ensemble Learning, Django

Abstract

The rapid expansion of internet usage has significantly increased cyber threats, particularly phishing attacks that deceive users into revealing sensitive information. Detecting malicious URLs has become a critical aspect of cybersecurity. This project presents an intelligent URL classification system designed to detect phishing, malware, defacement, and benign URLs using machine learning techniques. The system is implemented using the Django web framework, providing an interactive interface for both administrators and users.The proposed system leverages text-based feature extraction techniques such as Count Vectorization to transform URLs into numerical representations suitable for machine learning models. Multiple classification algorithms, including Naive Bayes, Support Vector Machine (SVM), Logistic Regression, and Stochastic Gradient Descent (SGD), are employed to evaluate performance and ensure robustness. Additionally, an ensemble approach using a Voting Classifier is implemented to improve prediction accuracy.The dataset used consists of labeled URLs categorized into four classes: benign, phishing, defacement, and malware. During training, the dataset is preprocessed and transformed, and models are trained using a train-test split strategy. Performance metrics such as accuracy, confusion matrix, and classification reports are used to evaluate the models. The system also provides analytical features such as detection accuracy visualization, ratio analysis of URL types, and trending topics. These insights assist administrators in understanding patterns and improving security strategies. Users can input URLs and receive real-time predictions regarding their safety.Experimental results demonstrate that ensemble learning improves classification performance compared to individual models. The system achieves high accuracy and provides reliable predictions, making it suitable for real-world applications in cybersecurity. In conclusion, the developed system effectively identifies malicious URLs and enhances online security. It can be extended further by integrating deep learning techniques and real-time threat intelligence systems.