Skip to content

Latest commit

 

History

History
52 lines (32 loc) · 2.11 KB

how2start.md

File metadata and controls

52 lines (32 loc) · 2.11 KB

How to start | Code snippets / Examples | Extending LeechCrawler | Mailing list | People/Legal Information | Supporters| Data Protection


How to start

Add LeechCrawler to your maven project

LeechCrawler is offered in our own repository. Add following entries to your pom.xml:

<repositories>
    <repository>
        <id>dfki-artifactory-libs-releases</id>
        <url>http://www.dfki.uni-kl.de/artifactory/libs-releases</url>
    </repository>
    <repository>
        <id>dfki-artifactory-libs-snapshots</id>
        <url>http://www.dfki.uni-kl.de/artifactory/libs-snapshots</url>
    </repository>
</repositories>

and in the <dependencies> section

    <dependency>
        <groupId>de.dfki.sds</groupId>
        <artifactId>leechcrawler</artifactId>
        <version>2.5.0</version>
    </dependency>

The version corresponds to the used Tika release version. Currently, these versions are available:

1.3, 1.4, 1.5, 1.6, 1.6.1 (groupId: de.dfki.km, artifactId: leech)

1.7, 1.8, 1.8.1, 1.10.0, 1.10.1, 1.11 (groupId: de.dfki.km, artifactId: leechcrawler)

Nowadays: groupId: de.dfki.sds, artifactId: leechcrawler

Tika 1: 1.25.0, 1.25.1, 1.26.0, 1.27.0

Note that Tika changed the metadata attribute names from Tika 1 to Tika 2, e.g. 'title' is 'dc:title' now.

Tika 2: 2.0.0, 2.1.0, 2.4.1, 2.5.0


You can also download all needed libraries in the case you don't use maven.

As a next step, try out our Code snippets / examples section.