Anserini Regressions Log

The following change log details commits to regression tests that alter effectiveness and the addition of new regression tests. This documentation is useful for figuring why results may have changed over time.

April 12, 2020

commit 35f9f8 (04/12/2020)

Regression results for Core18 (Washington Post) changed due to refactoring to conform to clarified definitions of contents() and raw() in SourceDocument, per Issue #1048. Previously, both contents() and raw() returned the raw JSON, and the WashingtonPostGenerator extracted the article contents for indexing. Now, raw() returns the raw JSON and contents() returns the extracted article contents for indexing (i.e., the logic for parsing the JSON has been moved from WashingtonPostGenerator into the collection itself). This conforms to the principle that every collection should "know" how to parse its own contents.

Regression values went down slightly for Ax as a result of this refactoring. The difference is that, before, the "empty document check" was performed on the JSON, so it never triggered (since the JSON was never empty). With this new processing logic, the "empty document check" is performed on contents() (hence, the parsed article contents), and so the number of empty documents is now accurate (there are six based on the current parsing logic). From these changes and those below, it seems that Ax is very sensitive to tiny collection differences.

April 7, 2020

commit 9a28a0 (04/07/2020)

Regression results for Core17 (New York Times) changed as the result of a bug fix. Previously, Core17 used the NewYorkTimesCollection and was indexed with JsoupGenerator as the generator, which assumes that the input is HTML (or XML) and removes tags. However, this was unnecessary, because the collection implementation already removes tags internally. As a result, angle brackets in the text were interpreted as tags and removed. Fixing this bug increased the number of terms in the collection (and a document that was previously empty is no longer empty). However, effectiveness of bm25+ax and ql+ax decreased slightly; bm25/bm25+rm3 and ql/ql+rm3 remain unchanged.

March 6, 2020

commit 10ff01 (03/06/2020)

Added regressions for background linking task from the TREC 2018 and 2019 News Tracks.

Febrary 25, 2020

commit a62004 (02/25/2020)
commit 0d42d3 (02/25/2020)

Added regressions for the TREC 2019 Deep Learning Track, both document and passage ranking task.

November 27, 2019

commit 411618 (11/27/2019)
commit b9264d (11/27/2019)

Added regressions for TREC 2002 (Arabic), CLEF 2006 (French), and FIRE 2012 (English, Bengali and Hindi).

October 11, 2019

commit 445bb45 (10/11/2019)

Add regressions for NTCIR-8 ACLIA (IR4QA subtask, Monolingual Chinese).

September 5, 2019

commit e88b931 (9/5/2019)

As it turns out, we were incorrect in entry below (commit 2f1b665). Regressions numbers after BM25prf fix did change slightly.

August 14, 2019

commit 2f1b665 (8/14/2019)

Resolves inconsistent tie-breaking for BM25prf that leads to non-deterministic results, per #774. Note that regression numbers did not change.

August 9, 2019

commit 1217d47 (8/9/2019)
commit 75dfaa6 (8/9/2019)

Added new Doc2query regression car17v2.0-doc2query to replicate Nogueira et al. (arXiv 2019) on the TREC 2017 Complex Answer Retrieval (CAR) section-level passage retrieval task (v2.0). Added +Ax and +PRF regressions with both tuned and default BM25 parameters for MS MARCO passage ranking task.

August 5, 2019

commit 80c5447 (8/5/2019)

Added +Ax and +PRF regressions with both tuned and default BM25 parameters for MS MARCO document ranking task.

June 20, 2019

commit 86be3d2 (6/20/2019)
commit b656da3 (6/20/2019)

Added new Doc2query regression msmarco-passage-doc2query to replicate Nogueira et al. (arXiv 2019) on the MS MARCO passage ranking task. Added tuned BM25 parameters to msmarco-doc regression. Associated documentation updated.

June 12, 2019

commit 75e36f9 (6/9/2019)

Upgrade to Lucene 8: minor changes to all regression experiments. JDIQ 2018 experiments are no longer maintained.

June 9. 2019

commit 93f8f3c (6/9/2019)
commit 781d9ed (6/8/2019)

Added regressions for MS MARCO passage and document ranking tasks.

June 3, 2019

commit 3545350 (6/3/2019)
commit a3ccdef (6/3/2019)

Fixed bug in topic reader for CAR. Better parsing of New York Times documents. Regression numbers in both cases improved slightly.

May 31, 2019

commit 27493ed (5/31/2019)

Per #658: fixed broken regression in Core18 introduced by commit c4ab6b (4/18/2019).

May 11, 2019

commit 3eef2fb (5/11/2019)
commit 2ba2b95 (5/11/2019)
commit d911bba (5/10/2019)

CAR regression refactoring: added v2.0 regression and renamed existing regression to v1.5. Both use benchmarkY1-test to support consistent comparisons.

January 2, 2019

commit 407f308 (1/2/2019)

Added fine tuning results (i.e., SIGIR Forum article experiments) for axiomatic semantic term matching.

December 24, 2018

commit 1aa3970 (12/24/2018)

Changed RM3 defaults to match settings in Indri.

December 20, 2018

commit e71df7a (12/20/2018)

Added Axiomatic F2Exp and F2Log ranking models back into Anserini (previously, we were using the default Lucene implementation as part of version 7.6 upgrade).

December 18, 2018

commit e71df7a (12/18/2018)

Upgrade to Lucene 7.6.

November 30, 2018

commit e5b87f0 (11/30/2018)

Added default regressions for TREC 2018 Common Core Track.

November 16, 2018

commit 2c8cd7a (11/16/2018)

This is the commit id references in the SIGIR Forum 2018 article. Note that commit 18c3211 (12/9/2018) contains minor fixes to the code.

October 22, 2018

commit 10255e0 (10/22/2018)

Fixed incorrect implementation of -rm3.fbTerms.

September 26, 2018

commit 7c882d3 (9/26/2018)

Fixed bug as part of #429: cw12 and mb13 regression tests changed slightly in effectiveness.

August 8, 2018

commit d4b3272 (8/8/2018)

Added regressions tests for CAR17.

August 5, 2018

commit c0da510 (8/5/2018)

This commit adds the effectiveness verification testing for the JDIQ2018 Paper.

July 22, 2018

commit 3a7beee (7/22/2018)
commit ec5fd3d (7/22/2018)
commit 5f8c26d3 (7/22/2018)

These three commits establish the new regression testing infrastructure with the following tests:

Experiments on Disks 1 & 2: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Disks 4 & 5 (Robust04): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on AQUAINT (Robust05): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on New York Times (Core17): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Wt10g: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Gov2: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on ClueWeb09 (Category B): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on ClueWeb12-B13: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on ClueWeb12: {BM25, QL} ⨯ {RM3} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on Tweets2011 (MB11 & MB12): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Tweets2013 (MB13 & MB14): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regressions-log.md

regressions-log.md

Anserini Regressions Log

April 12, 2020

April 7, 2020

March 6, 2020

Febrary 25, 2020

November 27, 2019

October 11, 2019

September 5, 2019

August 14, 2019

August 9, 2019

August 5, 2019

June 20, 2019

June 12, 2019

June 9. 2019

June 3, 2019

May 31, 2019

May 11, 2019

January 2, 2019

December 24, 2018

December 20, 2018

December 18, 2018

November 30, 2018

November 16, 2018

October 22, 2018

September 26, 2018

August 8, 2018

August 5, 2018

July 22, 2018

Files

regressions-log.md

Latest commit

History

regressions-log.md

File metadata and controls

Anserini Regressions Log

April 12, 2020

April 7, 2020

March 6, 2020

Febrary 25, 2020

November 27, 2019

October 11, 2019

September 5, 2019

August 14, 2019

August 9, 2019

August 5, 2019

June 20, 2019

June 12, 2019

June 9. 2019

June 3, 2019

May 31, 2019

May 11, 2019

January 2, 2019

December 24, 2018

December 20, 2018

December 18, 2018

November 30, 2018

November 16, 2018

October 22, 2018

September 26, 2018

August 8, 2018

August 5, 2018

July 22, 2018