Conference Proceedings

Accurate Evaluation of Segment-level Machine Translation Metrics

Y Graham, N Mathur, T BALDWIN

The Association for Computational Linguistics | Published : 2015

Abstract

Evaluation of segment-level machine translation metrics is currently hampered by: (1) low inter-annotator agreement levels in human assessments; (2) lack of an effective mechanism for evaluation of translations of equal quality; and (3) lack of methods of significance testing improvements over a baseline. In this paper, we provide solutions to each of these challenges and outline a new human evaluation methodology aimed specifically at assessment of segment-level metrics. We replicate the human evaluation component of WMT-13 and reveal that the current state-of-the-art performance of segment-level metrics is better than previously believed. Three segment-level metrics -METEOR, NLEPOR and SEN..

View full abstract

Citation metrics