Jun 22 2021
Plug-and-play language models (PPLMs) enable topic-conditioned natural
language generation by pairing large pre-trained generators with attribute
models used to steer the predicted token distribution towards the selected
topic. Despite their computational efficiency, PPLMs require large amounts of
labeled texts to effectively balance generation fluency and proper
conditioning, making them unsuitable for low-resource settings. We present
ETC-NLG, an approach leveraging topic modeling annotations to enable
fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation
over emergent topics in unlabeled document collections. We first test the
effectiveness of our approach in a low-resource setting for Italian, evaluating
the conditioning for both topic models and gold annotations. We then perform a
comparative evaluation of ETC-NLG for Italian and English using a parallel
corpus. Finally, we propose an automatic approach to estimate the effectiveness
of conditioning on the generated utterances.