|
|
|
|
|
New study into Automated Language Translation highlights need for shared, clean and normalized data. In cooperation with the Translation Automation User Society (TAUS), Asia Online conducted an experiment to determine the optimum approaches for building statistical machine translation engines. The findings indicate that significant improvements in translation machine quality can be achieved with smaller pools of shared, clean data.
Conducted in early 2009, Asia Online conducted an extensive experiment to determine the optimum way to build a statistical machine translation (SMT) engine. Development of SMT engines have been hindered by uncertainty about the impact that different types and quantities of training data have on the translation quality of the final SMT engine.
For the purposes of the experiment, three TAUS members companies in the same industry domain provided sets of training data. Each company was a multinational software organization. Asia Online performed extensive analysis on the data and created a total of 29 separate SMT engines by combining the data from the three companies in various configurations. It then performed comparisons of the output quality of all 29 engines using the BLEU and the F-Measure metrics.
The full report can be downloaded by filling in the form registration form below.
|
|
|