How to start | Code snippets / Examples | Extending LeechCrawler | Mailing list | People/Legal Information | Supporters| Data Protection
Add LeechCrawler to your maven project
LeechCrawler is offered in our own repository. Add following entries to your pom.xml:
<repositories>
<repository>
<id>dfki-artifactory-libs-releases</id>
<url>http://www.dfki.uni-kl.de/artifactory/libs-releases</url>
</repository>
<repository>
<id>dfki-artifactory-libs-snapshots</id>
<url>http://www.dfki.uni-kl.de/artifactory/libs-snapshots</url>
</repository>
</repositories>
and in the <dependencies> section
<dependency>
<groupId>de.dfki.sds</groupId>
<artifactId>leechcrawler</artifactId>
<version>2.5.0</version>
</dependency>
The version corresponds to the used Tika release version. Currently, these versions are available:
1.3, 1.4, 1.5, 1.6, 1.6.1 (groupId: de.dfki.km, artifactId: leech)
1.7, 1.8, 1.8.1, 1.10.0, 1.10.1, 1.11 (groupId: de.dfki.km, artifactId: leechcrawler)
Nowadays: groupId: de.dfki.sds, artifactId: leechcrawler
Tika 1: 1.25.0, 1.25.1, 1.26.0, 1.27.0
Note that Tika changed the metadata attribute names from Tika 1 to Tika 2, e.g. 'title' is 'dc:title' now.
Tika 2: 2.0.0, 2.1.0, 2.4.1, 2.5.0
You can also download all needed libraries in the case you don't use maven.
As a next step, try out our Code snippets / examples section.