v0.3 - Many bug fixes and important performance improvements
Pre-release
Pre-release
This release marks a point when somewhat usable results, with reasonable processing time (< 20 s) have been achieved with datasets of sizes around 0.5M triples.
See the commit history for more details, but some highlights:
- Don't add duplicate facts or categories
- Shorten titles to MediaWiki's max
- Fixed silly code that allocated insane amounts of memory
- Better RDF parsing error checking
- Collapse multiple argument to same variable to comma-separated list
The usage is also slightly updated, with a dedicated flag for the out-file:
./rdf2smw --in mydataset.nt --out mydataset.xml