Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

browsermt/marian-dev regression-test-failures #17

Open
1 of 6 tasks
jerinphilip opened this issue Feb 3, 2021 · 14 comments
Open
1 of 6 tasks

browsermt/marian-dev regression-test-failures #17

jerinphilip opened this issue Feb 3, 2021 · 14 comments

Comments

@jerinphilip
Copy link

jerinphilip commented Feb 3, 2021

Status

  • tests/scorer/scores/test_scores_cpu.sh
  • tests/decoder/intgemm/test_intgemm_16bit.sh
  • tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
  • tests/decoder/intgemm/test_intgemm_8bit.sh
  • tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
  • tests/models/wnmt18/test_student_small_aan_intgemm16.sh
Logs

Logs
http://vali.inf.ed.ac.uk/jenkins/job/browsermt-marian-regression-tests/7/console

Failed:
  - tests/scorer/scores/test_scores_cpu.sh
  - tests/decoder/intgemm/test_intgemm_16bit.sh
  - tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
  - tests/decoder/intgemm/test_intgemm_8bit.sh
  - tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
  - tests/models/wnmt18/test_student_small_aan_intgemm16.sh
Logs:
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/scorer/scores/test_scores_cpu.sh.log
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit.sh.log
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit_sse2.sh.log
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit.sh.log
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh.log
  - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/test_student_small_aan_intgemm16.sh.log

Issue updated as I figure what exactly is failing.

Available Machines, vector instructions

ansible -m shell -a "grep -o -e 'avx[^ ]*' -e 'sse[^ ]*' -e ssse3 /proc/cpuinfo | sort | uniq | tr '\n' ' '" gpu --limit '!fulla'
dagr | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
elli | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
baldur | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
bil | CHANGED | rc=0 >>
avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 
buri | CHANGED | rc=0 >>
sse sse2 sse4_1 sse4_2 ssse3 
hodor | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
frigg | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
hretha | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
gna | CHANGED | rc=0 >>
avx sse sse2 sse4_1 sse4_2 ssse3 
lofn | CHANGED | rc=0 >>
avx sse sse2 sse4_1 sse4_2 ssse3 
mani | CHANGED | rc=0 >>
avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 
mimir | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
meili | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
rindr | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
sigyn | CHANGED | rc=0 >>
avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 
startiger | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 
vor | CHANGED | rc=0 >>
avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 
snotra | CHANGED | rc=0 >>
avx sse sse2 sse4_1 sse4_2 ssse3 
thrud | CHANGED | rc=0 >>
avx sse sse2 sse4_1 sse4_2 ssse3 
zisa | CHANGED | rc=0 >>
avx avx2 sse sse2 sse4_1 sse4_2 ssse3 

@jerinphilip
Copy link
Author

jerinphilip commented Feb 3, 2021

Click to expand

[2021-02-02 11:37:57] Error: Required option 'use-legacy-batching' has not been set
[2021-02-02 11:37:57] Error: Aborted from T marian::Options::get(const char*) const [with T = bool] in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/common/options.h:134

[CALL STACK]
[0x6ffd1e]          bool marian::Options::  get  <bool>(char const*) const + 0x26e
[0xa0c380]          marian::cpu::Backend::  configureDevice  (std::shared_ptr<marian::Options const>) + 0xa0
[0x7094f0]          marian::Rescore<marian::Rescorer>::  Rescore  (std::shared_ptr<marian::Options>) + 0x740
[0x70b0c9]          std::shared_ptr<marian::Rescore<marian::Rescorer>> marian::  New  <marian::Rescore<marian::Rescorer>,std::shared_ptr<marian::Options>&>(std::shared_ptr<marian::Options>&) + 0x59
[0x67c602]          main                                               + 0x52
[0x7fd40cbb1840]    __libc_start_main                                  + 0xf0
[0x6aea29]          _start                                             + 0x29

test_scores_cpu.sh: line 18: 17405 Aborted                 (core dumped) $MRT_MARIAN/marian-scorer -c $MRT_MODELS/wmt16_systems/marian.en-de.scorer.yml --cpu-threads 2 -t $(pwd)/scores_cpu.src.in $(pwd)/scores_cpu.trg.in > scores_cpu.out

  1. tests/scorer/scores/test_scores_cpu.sh.log

@XapaJIaMnu (on slack): So this used to be the case that there are two wasy to do CBLAS_SGEMM with MKL. for the attention layer. Through a call of CBLAS_SGAMM_BATCHED or through a for loop with multiple CBLAS_SGEMM calls. Now since this project will use DNNL, the only available codepath is the the multiple CBLAS_SGEMM calls.
During one of the merges with master, this option got added and removed by upstream so i assume that's where it got messed up

@jerinphilip
Copy link
Author

  1. /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit.sh.log
+ python3 /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/sacrebleu/sacrebleu.py newstest2018.ref
+ tee intgemm_16bit.out.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 30.5 66.3/38.6/24.4/15.8 (BP = 0.968 ratio = 0.968 hyp_len = 2748 ref_len = 2838)
+ cat intgemm_16bit.avx.expected.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 30.5 66.3/38.6/24.5/15.8 (BP = 0.967 ratio = 0.967 hyp_len = 2745 ref_len = 2838)
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh intgemm_16bit.out intgemm_16bit.avx.expected
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_16bit.out /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_16bit.avx.expected
14c14
< Ago Leis, head of the Central Criminal Police Service, said the arrests were preceded by a probe into a year-and-a-half year-and-a-half investigation.
---
> Ago Leis, head of the Central Criminal Police Service, said the arrests were preceded by a year-and-a-half probe.
28c28
< For example, the latest court rulings, eight defendants separated from the so-called Dikayev Criminal Association criminal case who were ordered to pay BGN 80,000 for the proceeds of criminal damage, or the judgment of nine individuals, in 2006 that Igor Aleynikov established a criminal association aimed at the illegal trade in cigarettes and the committing of crimes related to human trafficking in East Virginia and the South in Estonia.
---
> For example, the latest court rulings, eight defendants separated from the so-called Dikayev Criminal Association criminal case, who were ordered to pay BGN 80,000 for the proceeds of criminal damage, or the judgment of nine individuals, in 2006 that Igor Aleynikov established a criminal association aimed at the illegal trade in cigarettes and the committing of crimes related to human trafficking in East Virginia and the South in Estonia.

Why is this failing?

@jerinphilip
Copy link
Author

  1. /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit_sse2.sh.log
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/marian-dev/build/marian-conv -f /var/lib/jenkins/workspace/browsermt-marian-regression-tests/models/student-eten/model.npz -t intgemm_16bit_sse2.avx.bin --gemm-type intgemm16sse2
[2021-02-02 11:54:06] Error: Unknown gemm-type: intgemm16sse2
[2021-02-02 11:54:06] Error: Aborted from int main(int, char**) in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/command/marian_conv.cpp:54

[CALL STACK]
[0x57b0b2]          main                                               + 0x1762
[0x7f8d8446a840]    __libc_start_main                                  + 0xf0
[0x59e8f9]          _start                                             + 0x29

test_intgemm_16bit_sse2.sh: line 37: 27191 Aborted                 (core dumped) $MRT_MARIAN/marian-conv -f $MRT_MODELS/student-eten/model.npz -t $prefix.$suffix.bin --gemm-type intgemm16sse2

This is a named parameter fail.

@jerinphilip
Copy link
Author

jerinphilip commented Feb 3, 2021

  1. /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit.sh.log
+ python3 /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/sacrebleu/sacrebleu.py newstest2018.ref
+ tee intgemm_8bit.out.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 29.6 65.4/38.0/23.8/14.9 (BP = 0.966 ratio = 0.966 hyp_len = 2742 ref_len = 2838)
+ cat intgemm_8bit.avx.expected.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 29.8 65.5/38.1/24.1/15.0 (BP = 0.968 ratio = 0.969 hyp_len = 2749 ref_len = 2838)
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh intgemm_8bit.out intgemm_8bit.avx.expected
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_8bit.out /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_8bit.avx.expected

Outputs are very different. 98 lines differ. Probably some gemm switch/feature to be enabled as a fix?

@jerinphilip
Copy link
Author

  1. /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh.log
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/marian-dev/build/marian-conv -f /var/lib/jenkins/workspace/browsermt-marian-regression-tests/models/student-eten/model.npz -t intgemm_8bit_ssse3.avx.bin --gemm-type intgemm8ssse3
[2021-02-02 11:54:15] Error: Unknown gemm-type: intgemm8ssse3
[2021-02-02 11:54:15] Error: Aborted from int main(int, char**) in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/command/marian_conv.cpp:54

[CALL STACK]
[0x57b0b2]          main                                               + 0x1762
[0x7f8e1417c840]    __libc_start_main                                  + 0xf0
[0x59e8f9]          _start                                             + 0x29

test_intgemm_8bit_ssse3.sh: line 37: 27310 Aborted                 (core dumped) $MRT_MARIAN/marian-conv -f $MRT_MODELS/student-eten/model.npz -t $prefix.$suffix.bin --gemm-type intgemm8ssse3

Another parameter fail.

@jerinphilip
Copy link
Author

  1. /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/test_student_small_aan_intgemm16.sh.log
+ cat optimize_aan_16.out
+ perl -pe 's/@@ //g'
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/moses-scripts/scripts/recaser/detruecase.perl
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/extract-bleu.sh
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/moses-scripts/scripts/generic/multi-bleu.perl newstest2014.ref
It is in-advisable to publish scores from multi-bleu.perl.  The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups.  Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization.  Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff-nums.py optimize_aan_16.bleu optimize_aan.bleu.expected -p 0.6 -o optimize_aan_16.bleu.diff
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff-nums.py /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan_16.bleu /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan.bleu.expected -p 0.6 -o /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan_16.bleu.diff
Line 1: 25.09 != 25.78

@XapaJIaMnu
Copy link
Collaborator

Regression tests will incompatible with upstream, they use a toned down feature level intgemm (THey don't pass the output layer through intgemm, we do). As such you can't get the same numbers as upstream tests, even if you match the architecture.

Some upstream gemm configurations are not available here. We use an architecture agnostic binary format, upstream has both architecture dependent and architecture agnostic.

@jerinphilip
Copy link
Author

jerinphilip commented Feb 3, 2021

@kpu told me to compile what's happening, it's being done in this issue. What is a recommended fix so we can get rid of the build failure on all browsermt/* updates while keeping them separate?

We can afford to keep separate regression tests if that's what it takes. I'm fairly certain I'm lacking enough context to get to the bottom of these test failures.

@XapaJIaMnu
Copy link
Collaborator

Sooo basically, you need to rerun the test sets on the different machines (sse, avx2, avx512, avx512vnni), create gold standard references for those and then replace the old reference with those

@jerinphilip
Copy link
Author

Sooo basically, you need to rerun the test sets on the different machines (sse, avx2, avx512, avx512vnni), create gold standard references for those and then replace the old reference with those

That sounds easy for places with diffs in expected vs outputs, something which I can do along setting up along with bergamot-translator tests.

What of the remaining command/argument failures? (1, 3, and 5)

@XapaJIaMnu
Copy link
Collaborator

Legacy batching, needs to be merged and fixed. Can you try the branch that I have proposed?
the nonexistent intgemm options can be removed

@jerinphilip
Copy link
Author

@XapaJIaMnu I tested the change, it's working. Didn't have to change tests, so --use-legacy-batching is default on?

@XapaJIaMnu
Copy link
Collaborator

XapaJIaMnu commented Feb 7, 2021 via email

@jerinphilip
Copy link
Author

Current status on lofn:

Skipped:
  - tests/decoder/align-ensemble/test_align_ensemble.sh
  - tests/decoder/align-ensemble/test_align_ensemble_beam_1.sh
  - tests/decoder/intgemm/test_intgemm_16bit_avx2.sh
  - tests/decoder/intgemm/test_intgemm_8bit_avx2.sh
  - tests/decoder/shortlist/test_shortlist_server.sh
  - tests/examples/iris/test_iris.sh
  - tests/examples/mnist/test_mnist_ffnn.sh
  - tests/interface/input-tsv/test_tsv_server.sh
  - tests/interface/input-tsv/test_tsv_server_dual_source.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed16.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed8.sh
  - tests/server/test_ende.sh
  - tests/server/test_ende_align.sh
  - tests/server/test_ende_batch32.sh
  - tests/server/test_ende_cpu.sh
  - tests/server/test_ende_with_empty_lines.sh
  - tests/training/features/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/multi-gpu/test_async_sgd_runs.sh
  - tests/training/multi-gpu/test_sync_sgd.sh
  - tests/training/restoring/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/restoring/multi-gpu/test_adam_sync.sh
  - tests/training/restoring/multi-gpu/test_async.sh
  - tests/training/restoring/multi-gpu/test_sync.sh
  - tests/training/restoring/optimizer/test_adam_params_async.sh
  - tests/training/restoring/optimizer/test_adam_params_sync.sh
Failed:
  - tests/decoder/align/test_align.sh
  - tests/decoder/align/test_align_beam_1.sh
  - tests/decoder/align/test_align_beam_1_batched.sh
  - tests/decoder/align/test_align_cpu.sh
  - tests/decoder/align/test_align_nbest.sh
  - tests/decoder/align/test_align_threshold.sh
  - tests/decoder/align/test_soft_align.sh
  - tests/decoder/align/test_soft_align_nbest.sh
  - tests/decoder/intgemm/test_intgemm_16bit.sh
  - tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
  - tests/decoder/intgemm/test_intgemm_8bit.sh
  - tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
  - tests/decoder/wmt16/test_ende.sh
  - tests/decoder/wmt16/test_ende_cpu.sh
  - tests/decoder/wmt16/test_ende_logs.sh
  - tests/decoder/wmt16/test_nbest.sh
  - tests/decoder/word-scores/test_word_scores.sh
  - tests/decoder/word-scores/test_word_scores_batch.sh
  - tests/decoder/word-scores/test_word_scores_ensemble.sh
  - tests/decoder/word-scores/test_word_scores_nbest.sh
  - tests/decoder/word-scores/test_word_scores_nbest_with_align.sh
  - tests/decoder/word-scores/test_word_scores_normalized.sh
  - tests/examples/unit-tests/test_unit_tests.sh
  - tests/interface/config/test_dump_config_with_relative_paths.sh
  - tests/interface/config/test_relative_paths.sh
  - tests/interface/config/test_relative_paths_apply_only_to_config_files.sh
  - tests/interface/config/test_relative_paths_are_not_applied_to_cmd_options.sh
  - tests/interface/config/test_relative_paths_for_each_config_file.sh
  - tests/interface/config/test_relative_paths_for_input_in_config_file.sh
  - tests/interface/envvars/test_interpolate_envvars.sh
  - tests/interface/input/test_empty_file.sh
  - tests/interface/version/test_no_version_from_old_models.sh
  - tests/models/wmt16-ende/test_translation_b6n.sh
  - tests/models/wmt16-ende/test_translation_b6n_batch32.sh
  - tests/models/wmt16-ende/test_translation_b6n_batch64.sh
  - tests/models/wnmt18/test_student_small_aan_intgemm16.sh
  - tests/scorer/align/test_scorer_align.sh
  - tests/scorer/align/test_scorer_align_batch_1.sh
  - tests/scorer/align/test_scorer_align_nbest.sh
  - tests/scorer/align/test_scorer_soft_align.sh
  - tests/scorer/nbest/test_compare_parallel_and_nbest.sh
  - tests/scorer/nbest/test_custom_feature_name.sh
  - tests/scorer/nbest/test_score_nbest_list.sh
  - tests/scorer/scores/test_compare_with_decoder_scores.sh
  - tests/scorer/scores/test_scores.sh
  - tests/scorer/scores/test_scores_cpu.sh
  - tests/scorer/scores/test_scores_normalized.sh
  - tests/scorer/scores/test_summary.sh
  - tests/scorer/scores/test_summary_perplexity.sh
  - tests/scorer/scores/test_word_scores.sh
  - tests/scorer/scores/test_word_scores_mini_batch_1.sh
  - tests/scorer/scores/test_word_scores_nbest.sh
  - tests/scorer/scores/test_word_scores_normalized.sh
  - tests/training/features/guided-alignment/test_guided_alignment_rnn.sh
  - tests/training/features/guided-alignment/test_guided_alignment_transformer.sh
  - tests/training/features/guided-alignment/test_guided_alignment_transformer_sync.sh
  - tests/training/restarting/test_restarting_finished.sh
---------------------
Ran 82 tests in 00:01:0.497s, 0 passed, 25 skipped, 57 failed

Some appear due to changes in the model archives where files have gone missing.

@jerinphilip jerinphilip removed their assignment Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants