May 17 2023
Zero pronouns (ZPs) are frequently omitted in pro-drop languages (e.g.
Chinese, Hungarian, and Hindi), but should be recalled in non-pro-drop
languages (e.g. English). This phenomenon has been studied extensively in
machine translation (MT), as it poses a significant challenge for MT systems
due to the difficulty in determining the correct antecedent for the pronoun.
This survey paper highlights the major works that have been undertaken in zero
pronoun translation (ZPT) after the neural revolution, so that researchers can
recognise the current state and future directions of this field. We provide an
organisation of the literature based on evolution, dataset, method and
evaluation. In addition, we compare and analyze competing models and evaluation
metrics on different benchmarks. We uncover a number of insightful findings
such as: 1) ZPT is in line with the development trend of large language model;
2) data limitation causes learning bias in languages and domains; 3)
performance improvements are often reported on single benchmarks, but advanced
methods are still far from real-world use; 4) general-purpose metrics are not
reliable on nuances and complexities of ZPT, emphasizing the necessity of
targeted metrics; 5) apart from commonly-cited errors, ZPs will cause risks of
gender bias.