An Unsupervised Malicious Web Request Detection based on Transformer and Contrastive Learning

  • Shiming He
  • , Ying Zhang* (Corresponding Author)
  • , Diqing Liang
  • , Pradip Sharma
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The World Wide Web (Web) is a crucial part of the Internet. Web attacks are becoming more and more serious and complex. Malicious Web request detection aims to rapidly and accurately identify abnormal attacks on the network. Deep learning is being applied to malicious Web request detection, resulting in high detection performance. However, most deep learning-based methods are supervised and ignore special characters, which are hard to detect unknown malicious Web requests. The labels of Web request are fewer and Web request data is insufficient. Therefore, we propose an unsupervised malicious Web request detection based on transformer and contrastive learning (UTCDetector). UTCDetector exploits preprocessing and 2-gram word segmentationto preserve special characters, extracts semantic feature by Transformer, and leverages hypersphere loss function and contrastive learning to handle insufficient Web data without abnormal label. Since the public Web request datasets (CSIC 2010, CSIC TORPEDA 2012, and ECML/PKDD 2007) were created before 2012, we collected Web requests from a university Web application server in 2023 to build a private dataset named School 2023. This dataset contains more modern and complex attacks. The experimental results on the four datasets demonstrate that our method achieves a higher F1-score than other existing methods and ablation variants.
Original languageEnglish
Pages (from-to)3281-3294
Number of pages14
JournalIEEE Transactions on Network and Service Management
Volume22
Issue number4
Early online date21 Apr 2025
DOIs
Publication statusPublished - Aug 2025

Funding

This work is supported in part by the National Natural Science Foundation of China under Grants 62272062, the Science and Technology Innovation Program of Hunan Province under Grant 2023RC3139, the Natural Science Foundation of Hunan Province 2025JJ50373, the Scientific Research Fund of Hunan Provincial Transportation Department under Grant 202143.

FundersFunder number
National Natural Science Foundation of China62272062
Science and Technology Innovation Program of Hunan Province2023RC3139
Natural Science Foundation of Hunan Province 2025JJ50373
Scientific Research Fund of Hunan Provincial Transportation Department202143

    Keywords

    • malicious web request
    • unsupervised
    • transformer
    • contrastive learning
    • special characters
    • Malicious Web request

    Fingerprint

    Dive into the research topics of 'An Unsupervised Malicious Web Request Detection based on Transformer and Contrastive Learning'. Together they form a unique fingerprint.

    Cite this