Experimental two-level rule compilation using Python HFST. For more information, see https://github.com/hfst/python
The Python program twol.py
is a rule compiler and tester for rules of simplified two-level model, see https://pytwolc.readthedocs.io/en/latest/formalism.html for more information on the rule formalism and the compiler. The HST package can be loaded using the command:
$ python3 -m pip install hfst
The program twol.py uses and depend on the 'tatsu' Python parser generator by Juancarlo Añez, seee http://tatsu.readthedocs.io/en/stable/index.html for detailed documentation. You can load and install TaTsu from the net using a command:
$ python3 -m pip install tatsu
The program is prepared to handle input in Unicode, including user percieved graphemes which are combined out of two or more Unicode characters (with a so called code point). In order to recognize such graphemes, an additional package has to be installed:
$ python3 -m pip install grapheme
The compiler needs two files: (1) examples as a FST and (2) a rule file. The human readable examples must be converted into a FST using twexamp.py
program.
The compiler is normally executed as follows:
$ python3 twol.py examples.fst rules.twolc
One can get more information by using the --help
parameter. More documentation on twol.py can be found at https://pytwolc.readthedocs.io/en/latest/compiletest.html
The module twexamp.py
handles various tasks for the compiler during the compilation process. It is also needed for converting human readable examples into a FST so that ti is not necessary recompile it at every step of testing rules. A recompilation is only needed when the examples are changed. In order to convert examples from a pair string format into a fst you can e.g.:
$ python3 twexamp.py examples.pstr examples.fst
The sequence of programs parad2words.py
, words2zerofilled.py
, zerofilled2raw.py
and raw2named.py
is intended for determining the underlying or morphophonemic representations for word stems. It starts from a table of word forms or paradigms where morphs are separated from each other e.g. by a period (.
). See https://pytwolc.readthedocs.io/en/latest/morphophon.html for more information on their use. Each program is run from the command line, and one can get detailed information on the parameters by running the command with a --help
argument, e.g.
$ python3 words2zerofilled.py --help
Some of the programs of this sequence need the package orderedset
which one can get from the net by
$ python3 -m pip install orderedset
Especially the zero-filling program needs the same package for handling combined graphemes as twol.py uses:
$ python3 -m pip install grapheme
There is a Makefile in the subdirectory parad
and examples which may help in testing and using the programs.
This program builds tentative or raw rules out of a set of examples. The examples must be given one example per line as a space-separated list of symbol pairs. See https://pytwolc.readthedocs.io/en/latest/twdiscov.html for more information.