Abstract
The World Wide Web (Web) is a crucial part of the Internet. Web attacks are becoming more and more serious and complex. Malicious Web request detection aims to rapidly and accurately identify abnormal attacks on the network. Deep learning is being applied to malicious Web request detection, resulting in high detection performance. However, most deep learning-based methods are supervised and ignore special characters, which are hard to detect unknown malicious Web requests. The labels of Web request are fewer and Web request data is insufficient. Therefore, we propose an unsupervised malicious Web request detection based on transformer and contrastive learning (UTCDetector). UTCDetector exploits preprocessing and 2-gram word segmentationto preserve special characters, extracts semantic feature by Transformer, and leverages hypersphere loss function and contrastive learning to handle insufficient Web data without abnormal label. Since the public Web request datasets (CSIC 2010, CSIC TORPEDA 2012, and ECML/PKDD 2007) were created before 2012, we collected Web requests from a university Web application server in 2023 to build a private dataset named School 2023. This dataset contains more modern and complex attacks. The experimental results on the four datasets demonstrate that our method achieves a higher F1-score than other existing methods and ablation variants.
| Original language | English |
|---|---|
| Pages (from-to) | 3281-3294 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Network and Service Management |
| Volume | 22 |
| Issue number | 4 |
| Early online date | 21 Apr 2025 |
| DOIs | |
| Publication status | Published - Aug 2025 |
Funding
This work is supported in part by the National Natural Science Foundation of China under Grants 62272062, the Science and Technology Innovation Program of Hunan Province under Grant 2023RC3139, the Natural Science Foundation of Hunan Province 2025JJ50373, the Scientific Research Fund of Hunan Provincial Transportation Department under Grant 202143.
| Funders | Funder number |
|---|---|
| National Natural Science Foundation of China | 62272062 |
| Science and Technology Innovation Program of Hunan Province | 2023RC3139 |
| Natural Science Foundation of Hunan Province | 2025JJ50373 |
| Scientific Research Fund of Hunan Provincial Transportation Department | 202143 |
Keywords
- malicious web request
- unsupervised
- transformer
- contrastive learning
- special characters
- Malicious Web request