Modern TypeScript implementation of pq-gram distance, an efficient approximation for tree-edit distance. Algorithm based on the academic paper 12. Implementation ported from LitterBox3. Node.js API inspired by jqgram4.
The package can be installed as @se2p/pq-distance
from npm:
npm install @se2p/pq-distance
The package exports a single function pqDistance
:
const {pqDistance} = require("@se2p/pq-distance");
TypeScript users can also import the PQTree
and PQOpts
types:
import {pqDistance, PQTree, PQOpts} from "@se2p/pq-distance";
Example trees:
a a
/|\ /|\
t1: a b c t2: a b c
/ \ / \
e b e x
Arbitrary tree-representations are supported. Provide two functions getLabel()
to extract the label as string from a
node, and getChildren()
that returns a node's children in an array.
For example:
const t1 = {
label: 'a',
children: [
{
label: 'a',
children: [
{
label: 'e',
children: []
},
{
label: 'b',
children: []
}
]
},
{
label: 'b',
children: []
},
{
label: 'c',
children: []
}
]
};
const t2 = {
node: 'A',
child: [
{
node: 'A',
child: [
{node: 'E'},
{node: 'B'}
]
},
{node: 'B'},
{node: 'X'}
]
};
To compute the pq-gram distance between t1
and t2
, pass them as PQTree
objects to pqDistance()
:
const tree1: PQTree = {
root: t1,
getLabel: ({label}) => label,
getChildren: ({children}) => children
};
const tree2: PQTree = {
root: t2,
getLabel: (n) => n.node.toLowerCase(),
getChildren: (n) => n.child ?? []
};
Finally:
const opts: PQOpts = {p: 2, q: 3}; // default values
pqDistance(tree1, tree2, opts); // 0.3076923076923077
The object opts
sets the p
and q
values for distance computation, and may be omitted to use the default values
p=2
and q=3
. Please refer to the academic paper how they affect the distance value.
pq-distance is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
pq-distance is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Footnotes
-
Nikolaus Augsten, Michael H. Böhlen, and Johann Gamper. 2005. Approximate Matching of Hierarchical Data Using pq-Grams. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 – September 2, 2005 ↩