A Comprehensive Study on Malicious URL Detection: Leveraging Large-Scale Web Data for Accurate and Scalable Threat Identification

Posina Anusha; L. Charitha

doi:10.62674/ijiee.2025.v3i01.004

Articles

DOI: 10.62674/ijiee.2025.v3i01.004

Published: 2025-03-20

A Comprehensive Study on Malicious URL Detection: Leveraging Large-Scale Web Data for Accurate and Scalable Threat Identification

Posina Anusha⁺⁻
L. Charitha⁺⁻

Assistant Professor, Department of AI & DS, Annamacharya Institute of technology & sciences, Tirupati.

anusha.ksrm@gmail.com

Malicious URL Detection Cybersecurity Deep Learning CNN BiLSTM XGBoost Hybrid Model Web Threat Detection

Abstract

The rapid growth of cyber-attacks launched through the internet, such as phishing, spreading of malware, and cyber-attacks involving hacking of websites, has added a sense of challenge in malicious URL detection. Conventional techniques that rely upon blacklists of malicious patterns lack efficient strategies for handling dynamically changing URLs. Keeping these limitations in mind, in the suggested research work, a novel approach has been introduced for malicious URL detection using techniques in deep learning and ensembling, wherein an efficient approach for classifying large-scale data is being proposed using Convolutional Neural Networks, Bidirectional Long Short-Term Memory, and XGBoost. The data on which experiments are carried out is a publicly available large-scale dataset that consists of more than 650,000 URLs, which can be classified as benign, phishing, defacement, and malware types. The model that is proposed in this research work is compared with other approaches using various baseline techniques such as logistic regression, SVM, XGBoost, and CNN. Performance parameters that are used are accuracy, precision, recall, F1 score, ROC curve, and confusion matrix. The experimental results have shown that the proposed model achieves an accuracy of 96%, compared to all the other models, and hence proves that simply by combining the concepts of deep sequential features and gradient boosting, a better model can be obtained that can give better results while detecting malicious URLs.

References

Y. Tian, Y. Yu, J. Sun, and Y. Wang, “From past to present: A survey of malicious URL detection techniques, datasets and code repositories,” arXiv, Apr. 2025.
“A comprehensive review of malicious URL detection using deep learning techniques,” 2025 Int. Conf. ISNCC, Nov. 2025.
H. Kibriya et al., “Lightweight malicious URL detection using deep learning with URL embeddings,” Sci. Rep., 2025.
M. Khaldi, “Hyperparameter optimization for malicious URL detection via BiGRU and attention,” Informatica, 2025.
A. Cohen, “Client-side zero-shot LLM inference for comprehensive in-browser URL analysis,” arXiv, 2025.
T. Mahmud et al., “A machine learning-based framework for malicious URL detection in cybersecurity,” 2025 IEEE ICICT.
F. Turk, “Malicious URL detection with advanced machine learning,” Appl. Sci., 2025.
N. Q. Do, “Detection of malicious URLs using temporal convolutional network with self-attention,” Elsevier, 2025.
Recent comparative work on hybrid ML and DL approaches in malicious URL detection, IJRASET, 2025.
Emerging trends in explainable detection techniques for malicious URLs, ACM Trans., 2025.
L. Chen and L. Meng, “Metadata driven malicious URL detection using RoBERTa large and multi source network threat intelligence,” Scientific Reports, Nature Publishing Group, 2026.
P. Balasubramanian et al., “A cognitive platform for collecting cyber threat intelligence and real-time detection using cloud computing,” Decision Analytics Journal, vol. 14, p. 100545, 2025.
Y. Tian, Y. Yu, J. Sun, and Y. Wang, “From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories,” arXiv preprint arXiv:2504.16449, 2025.
A. E. Omolara and M. Alawida, “DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security,” Computers & Security, vol. 148, p. 104170, 2025.
F. Türk and M. Kılıçaslan, “Malicious URL detection with advanced machine learning and optimization-supported deep learning models,” Applied Sciences, vol. 15, no. 18, p. 10090, 2025.
S. Qi, A. R. Sangi, T. Sun, B. Niu, and Y. Huang, “Malicious URL Detection Using NLP: Comparing Classical and Transformer-Based Models,” in Proc. IEEE 6th Int. Conf. Pattern Recognition and Machine Learning (PRML), 2025, pp. 452–456.
H. Kibriya et al., “Lightweight malicious URL detection using deep learning and large language models,” Scientific Reports, Nature Publishing Group, 2025.
B. Wang, “Malicious URL detection with explainable machine learning techniques,” in Proc. 2nd Int. Conf. Informatics Education and Computer Technology Applications, 2025, pp. 293–299.
S. Mohanty and A. A. Acharya, “Detection of cyber attacks from malicious URLs using ensemble machine learning techniques,” in Intelligent Technologies: Concepts, Applications, and Future Directions, vol. 4. Springer, 2025, pp. 55–87.
S. Singh, T. Khanna, and D. K. Verma, “A hybrid ensemble model for ransomware detection using feature engineering and deep learning,” International Journal of Information Technology, vol. 17, no. 8, pp. 5095–5104, 2025.
Y. Tian, Y. Yu, J. Sun, and Y. Wang, “From past to present: A comprehensive survey on malicious URL detection techniques, datasets, and evaluation,” arXiv preprint arXiv:2504.16449, 2025.
H. Kibriya, M. R. Amin, and S. Islam, “Lightweight malicious URL detection using deep learning with large-scale web data,” Scientific Reports, vol. 15, no. 1, pp. 1–15, 2025

How to Cite

Posina Anusha, & L. Charitha. (2025). A Comprehensive Study on Malicious URL Detection: Leveraging Large-Scale Web Data for Accurate and Scalable Threat Identification. International Journal of Interpreting Enigma Engineers (IJIEE), 3(1), 22–34. https://doi.org/10.62674/ijiee.2025.v3i01.004

A Comprehensive Study on Malicious URL Detection: Leveraging Large-Scale Web Data for Accurate and Scalable Threat Identification

Abstract

References

How to Cite

Metrics

Article Contents

Indexed In

Indexed In

Tools

Keywords

A Comprehensive Study on Malicious URL Detection: Leveraging Large-Scale Web Data for Accurate and Scalable Threat Identification

Abstract

References

How to Cite

Download Citation

Metrics

Article Contents

Indexed In

Indexed In

Tools

Keywords