OpenMT Evaluation Demo
Linguistic Measures for Hetereogeneous Automatic MT Evaluation


Translation quality aspects are heterogeneous and diverse, involving, in general, many different linguistic dimensions. However, most automatic evaluation methods in use today rely on partial quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic MT evaluation. We have compiled a rich set of specialized similarity metrics operating at different linguistic levels (lexical, syntactic and semantic). We have also studied how the scores conferred by different metrics may be integrated into a single measure of quality, without having to adjust their relative importance.

This demo allows you to obtain automatic evaluation scores according to a selected set of metric representatives, together with ULC combined score (i.e., arithmetic mean) over a heuristically defined set of metrics.


Instructions:

Target Language          


Candidate Translation(s)
Output #1
Output #2
Output #3
Output #4
Output #5

Reference Translation(s)
Ref. #1
Ref. #2
Ref. #3
Ref. #4
Ref. #5

NOTES:


 
go to the openMT project site...