Analysis and Prediction of Unalignable Words in Parallel Text

Published in EACL, 2014

Frances Yung, Kevin Duh, Yuji Matsumoto

Download paper here

Professional human translators usually do not employ the concept of word alignments, producing translations ‘sense-forsense’ instead of ‘word-for-word’. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenonin-depth for Chinese-English translation. We further propose a simple and effective method to improve automatic word alignment by pre-removing unalignable words, and show improvements on hierarchical MT systems in both translation directions.