GitHub - sklarman/matchmaker: Matchmaker - a tool for semi-supervised label matching

Matchmaker

A prototype of a tool for semi-automated (interactive) label matching.

The intended use-case scenario is focused on the task of matching labels extracted from text (noun phrases represented as bags of words) against the most relevant and semantically related labels of SKOS concepts from a given SKOS taxonomy, in order to annotate the text with SKOS concepts and/or extend the SKOS-based knowledge graph with new concepts originating in unstructured data.

Key components involved:

WordNet vocabulary (http://wordnet-rdf.princeton.edu/) and WordNet-based semantic similarity service (https://github.com/sklarman/wordnet-distance)
WEKA machine learning library (http://www.cs.waikato.ac.nz/ml/weka/)

Example data includes:

noun phrases extracted from a sample of DBLP data (http://dblp.l3s.de/dblp++.php) using NLP GATE library (https://gate.ac.uk/)
an ACM SKOS taxonomy (https://www.acm.org/publications/class-2012)

This work is HEAVILY UNDER CONSTRUCTION!

A sample workflow:

(supervised) training of a word matching classifier to account for inflectional variants of concepts and minor typos occurring in the source and target labels, e.g.:

logic - logics (true)

logic - logica (true)

logic - logically (true)

logic - login (false)

applying the classifier for generating mappings from the words in the source and target labels to WordNet vocabulary;

intelligence -intelligentsia|intelligently|intelligent|intelligence

generating bags of words out of noun phrases extracted from labels (here conference names and SKOS labels) e.g.:

logic programming artificial intelligence reasoning

Logic for Programming, Artificial Intelligence, and Reasoning - 19th International Conference, LPAR-19, Stellenbosch, South Africa, December 14-19, 2013. Proceedings, http://dblp.l3s.de/d2r/page/publications/conf/lpar/2013

automated reasoning

Automated reasoning (ACM:10003794)

generating similarity matrices between source and target bags of words, e.g.:

+------------+--------------------+--------------------+--------------------+
|(NULL)      |automated           |reasoning           |(NULL)              |
+------------+--------------------+--------------------+--------------------+
|logic       |0.013245033112582781|0.043010752688172046|0.043010752688172046|
+------------+--------------------+--------------------+--------------------+
|programming |0.19444444444444445 |0.04395604395604396 |0.19444444444444445 |
+------------+--------------------+--------------------+--------------------+
|artificial  |0.013513513513513514|0.015625            |0.015625            |
+------------+--------------------+--------------------+--------------------+
|intelligence|0.053763440860215055|0.5                 |0.5                 |
+------------+--------------------+--------------------+--------------------+
|reasoning   |0.013333333333333334|1.0                 |1.0                 |
+------------+--------------------+--------------------+--------------------+
|(NULL)      |0.19444444444444445 |1.0                 |null                |
+------------+--------------------+--------------------+--------------------+

Propagating the matching score information to the neighborhood SKOS concepts.
Training a label matching classifier using users accept-reject responses to subsequently proposed matches.
Generating mappings by means of the classifier

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
src/main/java		src/main/java
README.md		README.md
instancematcher.iml		instancematcher.iml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matchmaker

A sample workflow:

About

Releases

Packages

Languages

sklarman/matchmaker

Folders and files

Latest commit

History

Repository files navigation

Matchmaker

A sample workflow:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages