Deprecation notice for get_ende_bleu.sh by kpu · Pull Request #1827 · tensorflow/tensor2tensor

kpu · 2020-07-07T09:49:36Z

This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU scores computed in undocumented ways. It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used. And there are multiple of these running around.

https://www.aclweb.org/anthology/W18-6319/

This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU scores computed in undocumented ways. It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used. And there are multiple of these running around. https://www.aclweb.org/anthology/W18-6319/

googlebot · 2020-07-07T09:49:41Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

martinpopel · 2020-07-07T13:04:18Z

Yes, I fully agree.

It is even worse because using get_ende_bleu.sh is not enough to replicate the "Attention Is All You Need" scores. As I found out with a great and much appreciated help of @lukaszkaiser in April 2018 (and as documented at Gitter https://gitter.im/tensor2tensor/Lobby?at=5acfe16c7c3a01610dd81b46 and several previous posts of mine):

wmt13 wmt14
26.25 27.52 sacrebleu
26.59 28.33 sacrebleu -tok intl
27.10 28.85 sacrebleu -tok intl -lc
????? 29.02 get_ende_bleu.sh (with google-colab reference newstest2014.tok.de)

If anyone is trying to reproduce the Attention Is All You Need paper BLEU scores, note that you must manually tweak the newstest2014.de reference file (in addition to all the hacks in get_ende_bleu.sh): convert all unicode quotes including „ to ", which will be converted to " by tokenizer.perl (but make sure not to run tokenizer.perl twice to prevent double escaping). There is no such script in mosesdecoder which does this tweak. (replace-unicode-punctuation.perl ignores lower quotes, normalize-punctuation.perl -l de changes the order of comma-quote to quote-comma, which is not what we want). As you can see above, the difference between the official BLEU (sacreBLEU) and Google-tweaked BLEU can be e.g. 1.5 BLEU.

Now, what is an even better practice in MT evaluation than using SacreBLEU (well, I mean in addition to using SacreBLEU)? To publish your translations of your dev and especially test sets (which should be some standard sets for a given language pair, e.g. WMT newstests). However,

http://matrix.statmt.org is now deprecated (and probably was not meant for submitting submissions several years after deadline)
https://ocelot.mteval.org (by @cfedermann et al.) does not accept submissions for arbitrary WMT newstests yet.
https://paperswithcode.com/task/machine-translation seems to only copy the BLEU reported in the paper without recomputing it and storing the translations.
So my recommendation is to upload the translations anywhere else (e.g. to GitHub repo of your paper).
This last paragraph is not really related to this PR, but I consider it quite important.

cfedermann · 2020-07-07T21:19:38Z

OCELoT will accept arbitrary WMT submissions soon.

googlebot added the cla: no PR author has not signed CLA label Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecation notice for get_ende_bleu.sh#1827

Deprecation notice for get_ende_bleu.sh#1827
kpu wants to merge 1 commit intotensorflow:masterfrom
kpu:patch-1

kpu commented Jul 7, 2020

Uh oh!

googlebot commented Jul 7, 2020

Uh oh!

martinpopel commented Jul 7, 2020

Uh oh!

cfedermann commented Jul 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kpu commented Jul 7, 2020

Uh oh!

googlebot commented Jul 7, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

Uh oh!

martinpopel commented Jul 7, 2020

Uh oh!

cfedermann commented Jul 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants