EOParser


EOParser parsing an article from Vikipedio

What is it?

EOParser is a python library that decomposes Esperanto words into it's particles. This is possible soly due to Esperanto's regular grammar! For example, the word "malsanulo" (sick person) will be decomposed into mal (opposite of), san (health), ul (suffix for a person) and o (marking it as a noun).

Really, what is it?

Techincally EOParser is a handwritten GLL parser and disambiguator with a big list of poiible prefixes, suffixes, roots, etc... It uses the LL parser to parse words into it's particles.

Limitations

EOParser is a word-level analyser. It does not care about the overall grammar and context of a sentence. Hence it may not disambiguate gwords correctly in some cases.

How to use

>>> import eoparser
>>> parser = eoparser.eoparser()
>>> parser.parse('geparto') # English: parent
[['ge', 'prefix'], ['part', 'root']]

The eoparser.parse method return a list of lists. Storing the particles and their in type order. In the example, geparto is decomposed into ge (the prefix for both sexes), part (the word root for parent) and the part-of-speech marker is omited. Meaning both father and mother.

Use the keep_ending_marker if you wish to keep the -o part of speech marker.

>>> parser.parse('geparto', keep_ending_marker=True)
[['ge', 'prefix'], ['part', 'root'], ['o', 'pos_marker']]

This is the list of particle types

Types
root
pos_marker
suffix
prefix
word (special single word)

Using a different dictionary

EOParser uses severial dictionaries to track which particles are which. By default it loads from wherever the library lives.

parser = eoparser.eoparser(numbers="my_list_of_numbers.txt")
# Or
my_list_of_numbers = load_numbers()
parser = eoparser.eoparser(numbers=list(my_list_of_numbers))

Parameter
numbers
prefixes
suffixes
roots
full_words
correlatives

Contribution

EOParser is by no means perfect! Feel free to submit an issue, open a PR or anything!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
asset		asset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
correlatives.txt		correlatives.txt
demo.py		demo.py
eoparser.py		eoparser.py
full_words.txt		full_words.txt
numbers.txt		numbers.txt
prefixes.txt		prefixes.txt
roots.txt		roots.txt
suffixes.txt		suffixes.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EOParser

What is it?

Really, what is it?

Limitations

How to use

Using a different dictionary

Contribution

About

Releases

Packages

Languages

License

marty1885/eoparser

Folders and files

Latest commit

History

Repository files navigation

EOParser

What is it?

Really, what is it?

Limitations

How to use

Using a different dictionary

Contribution

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages