Deprecation notice for get_ende_bleu.sh#1827
Conversation
This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU scores computed in undocumented ways. It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used. And there are multiple of these running around. https://www.aclweb.org/anthology/W18-6319/
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
|
Yes, I fully agree. It is even worse because using
Now, what is an even better practice in MT evaluation than using SacreBLEU (well, I mean in addition to using SacreBLEU)? To publish your translations of your dev and especially test sets (which should be some standard sets for a given language pair, e.g. WMT newstests). However,
|
|
OCELoT will accept arbitrary WMT submissions soon. |
This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU scores computed in undocumented ways. It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used. And there are multiple of these running around.
https://www.aclweb.org/anthology/W18-6319/