TY - GEN
T1 - Neural Networks Remember More
T2 - 21st International Conference on Intelligent Computing, ICIC 2025
AU - Zeng, Biqing
AU - Li, Zehan
AU - Ayesh, Aladdin
PY - 2025/7/25
Y1 - 2025/7/25
N2 - Pre-trained language models(PLMs) suffer from catastrophic forgetting in continual learning, as sequential task training overwrites previously learned representations. The model’s ability to remain old tasks is referred to as stability, while its adaptability to new tasks is called plasticity. Therefore, the key to addressing this challenge requires balancing model plasticity with stability. To address this issue, in this paper, we propose a novel method to achieve a balance between model stability and plasticity, thereby mitigating catastrophic forgetting. More specific, our proposed approach leverages parameter isolation and subsequent combination strategy. Initially, in training stage, the model adapts on each downstream task via parameter isolation method to prevent potential inference among different tasks. We then combine all trained parameters which containing acquired knowledge by model merging method and finally apply to the backbone model. Empirical evaluations on continual language learning benchmarks substantiate the effectiveness of our approach, revealing a marked enhancement over existing state-of-the-art approaches.
AB - Pre-trained language models(PLMs) suffer from catastrophic forgetting in continual learning, as sequential task training overwrites previously learned representations. The model’s ability to remain old tasks is referred to as stability, while its adaptability to new tasks is called plasticity. Therefore, the key to addressing this challenge requires balancing model plasticity with stability. To address this issue, in this paper, we propose a novel method to achieve a balance between model stability and plasticity, thereby mitigating catastrophic forgetting. More specific, our proposed approach leverages parameter isolation and subsequent combination strategy. Initially, in training stage, the model adapts on each downstream task via parameter isolation method to prevent potential inference among different tasks. We then combine all trained parameters which containing acquired knowledge by model merging method and finally apply to the backbone model. Empirical evaluations on continual language learning benchmarks substantiate the effectiveness of our approach, revealing a marked enhancement over existing state-of-the-art approaches.
KW - Catastrophic Forgetting
KW - Continual Learning
KW - Model Merging
KW - Parameter-Efficient Fine-Tuning
UR - http://www.scopus.com/inward/record.url?scp=105013048407&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-9911-7_7
DO - 10.1007/978-981-96-9911-7_7
M3 - Published conference contribution
AN - SCOPUS:105013048407
SN - 9789819699100
T3 - Communications in Computer and Information Science
SP - 83
EP - 93
BT - Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
A2 - Huang, De-Shuang
A2 - Zhang, Chuanlei
A2 - Zhang, Qinhu
A2 - Pan, Yijie
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2025 through 29 July 2025
ER -