On the Low-density Latent Regions of VAE-based Language Models

Ruizhe Li* (Corresponding Author), Xutan Peng* (Corresponding Author), Chenghua Lin, Wenge Rong, Zhigang Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

1 Downloads (Pure)

Abstract

By representing semantics in latent spaces, Variational autoencoders (VAEs) have been proven powerful in modelling and generating signals such as image and text, even without supervision. However, previous studies suggest that in a learned latent space, some low density regions (aka. holes) exist, which could harm the overall system performance. While existing studies focus on empirically mitigating these latent holes, how they distribute and how they affect different components of a VAE, are still unexplored. In addition, the hole issue in VAEs for language processing is rarely addressed. In our work, by introducing a simple hole-detection algorithm based on the neighbour consistency between VAE’s input, latent, and output semantic spaces, we propose to deeply dive into these topics for the first time. Comprehensive experiments including automatic evaluation and human evaluation imply that large-scale low-density latent holes may not exist in the latent space. In addition, various sentence encoding strategies are explored and the native word embedding is the most suitable strategy for VAEs in language modelling task.
Original languageEnglish
Title of host publicationProceedings of Machine Learning Research
Subtitle of host publicationNeurIPS 2020 Preregistration Workshop
Pages343-357
Number of pages15
Volume148
Publication statusPublished - 8 Jul 2021
Externally publishedYes
EventNeurIPS 2020 Workshop on Pre-registration in Machine Learning - Virtual event
Duration: 11 Dec 202011 Dec 2020
https://preregister.science/neurips2020.html

Publication series

NameProceedings of Machine Learning Research
PublisherMLResearchPress
ISSN (Electronic)2640-3498

Workshop

WorkshopNeurIPS 2020 Workshop on Pre-registration in Machine Learning
Period11/12/2011/12/20
Internet address

Keywords

  • variational autoencoder
  • low-density regions
  • latent holes

Fingerprint

Dive into the research topics of 'On the Low-density Latent Regions of VAE-based Language Models'. Together they form a unique fingerprint.

Cite this