The Termium Ruby gem parses export data formats from the TERMIUM Plus terminology database service from the Government of Canada.
The default Termium XML output is invalid where the term domains using angular brackets have the "greater than" sign not escaped:
<textualSupport order="1" type="DEF">
<value><artificial intelligence> operation that allows the firing of a rule, or the
invocation of a program or a subprogram</value>
<sourceRef order="1" />
</textualSupport>
The remedy is to manually escape the "greater than" sign using a find/replace or a regular expression:
string.gsub(/<([^>]+)>/, '<\1>')
Results in:
<textualSupport order="1" type="DEF">
<value><artificial intelligence> operation that allows the firing of a rule, or the
invocation of a program or a subprogram</value>
<sourceRef order="1" />
</textualSupport>
termium convert
-
Convert a TERMIUM Plus export XML file to a Glossarist dataset
Flag | Description |
---|---|
|
Source path to TERMIUM Plus XML export file.
The file needs to start with the |
|
Destination path to Glossarist dataset directory.
If the directory doesn’t exist it will be created.
If not provided, defaults to the basename of the input file, e.g. |
This gem makes heavy use of the lutaml-model
classes for XML serialization.
The following code converts the Termium extract into a Glossarist dataset.
termium_extract = Termium::Extract.from_xml(IO.read(termium_extract_file))
glossarist_col = termium_extract.to_concept
FileUtils.mkdir_p(glossarist_output_file)
glossarist_col.save_to_files(glossarist_output_file)
This gem is developed, maintained and funded by Ribose Inc.
The gem is available as open source under the terms of the 2-Clause BSD License.