Skip to content

Ruby model for the TERMIUM Plus terminology data bank from the Government of Canada

Notifications You must be signed in to change notification settings

glossarist/termium

Repository files navigation

Termium Ruby gem

Purpose

The Termium Ruby gem parses export data formats from the TERMIUM Plus terminology database service from the Government of Canada.

WARNING - Termium XML output requires manual correction

The default Termium XML output is invalid where the term domains using angular brackets have the "greater than" sign not escaped:

<textualSupport order="1" type="DEF">
  <value>&lt;artificial intelligence> operation that allows the firing of a rule, or the
    invocation of a program or a subprogram</value>
  <sourceRef order="1" />
</textualSupport>

The remedy is to manually escape the "greater than" sign using a find/replace or a regular expression:

string.gsub(/&lt;([^>]+)>/, '<\1>')

Results in:

<textualSupport order="1" type="DEF">
  <value>&lt;artificial intelligence&gt; operation that allows the firing of a rule, or the
    invocation of a program or a subprogram</value>
  <sourceRef order="1" />
</textualSupport>

Commands

termium convert

Convert a TERMIUM Plus export XML file to a Glossarist dataset

Usage

$ termium convert -i INPUT_XML_FILE [-o OUTPUT_PATH]

Options

Flag Description

-i, --input-path

Source path to TERMIUM Plus XML export file. The file needs to start with the <extract> tag.

-o, --output-path

Destination path to Glossarist dataset directory. If the directory doesn’t exist it will be created. If not provided, defaults to the basename of the input file, e.g. foo/bar.xml will export to foo/bar/.

Library

Usage

This gem makes heavy use of the lutaml-model classes for XML serialization.

The following code converts the Termium extract into a Glossarist dataset.

termium_extract = Termium::Extract.from_xml(IO.read(termium_extract_file))
glossarist_col = termium_extract.to_concept
FileUtils.mkdir_p(glossarist_output_file)
glossarist_col.save_to_files(glossarist_output_file)

Credits

This gem is developed, maintained and funded by Ribose Inc.

License

The gem is available as open source under the terms of the 2-Clause BSD License.

About

Ruby model for the TERMIUM Plus terminology data bank from the Government of Canada

Resources

Code of conduct

Stars

Watchers

Forks

Packages

No packages published