Luc4IR

Luc4IR (pronounced Lucifer) is a Java implementation of sparse indexing and retrieval. The code is distributed in the hope that it'll be useful for IR practitioners and students who want to get started with retrieving documents from a collection and measure effectiveness with standard evaluation metrics.

To index TREC document disks 4/5

Due to the lack of file size restrictions, the index could not be made available on this repository. To recreate the index, download the TREC disks 4/5 collection from here.

After downloading the collection and unzipping it, build the index by executing the following script

./index_trecd45 <path to the collection>

You may even download the index from this shared OneDrive folder.

For retrieval, simply run the script

./retrieve_trecd45.sh <INDEX-PATH> <QUERY FILE> <QRELS FILE>

which executes a series of queries from a TREC formatted topic file (using the LM-Dir retrieval model) and reports MAP.

Another small test collection that is included in the repository is the ToucheV2 dataset. To run BM25 just execute the following commands, which will prepare the index and execure retrieval on 49 test queries. The result file, named touche.res is saved in the project base folder, which can then be evaluated with trec_eval.

./index_touche.sh 
./retrieve_touche.sh

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data		data
msmarco		msmarco
orcas		orcas
rcd		rcd
rcd_coll		rcd_coll
rcd_index		rcd_index
src/main/java/org/luc4ir		src/main/java/org/luc4ir
webis-touche2020		webis-touche2020
wiki		wiki
README.md		README.md
getstats.sh		getstats.sh
index.properties		index.properties
index2tsv.sh		index2tsv.sh
index_rcts.sh		index_rcts.sh
index_touche.sh		index_touche.sh
index_trecd45.sh		index_trecd45.sh
init.properties		init.properties
pom.xml		pom.xml
res.txt		res.txt
retrieve.properties		retrieve.properties
retrieve_touche.sh		retrieve_touche.sh
retrieve_touche_constrained.sh		retrieve_touche_constrained.sh
retrieve_trecd45.sh		retrieve_trecd45.sh
sample.res		sample.res
stop.txt		stop.txt
viewdocs.sh		viewdocs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Luc4IR

To index TREC document disks 4/5

About

Releases

Packages

Languages

gdebasis/luc4ir

Folders and files

Latest commit

History

Repository files navigation

Luc4IR

To index TREC document disks 4/5

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages