Releases: Halvani/Constituent-Treelib
Releases · Halvani/Constituent-Treelib
v0.0.7
What's new?
The underlying language detector fasttext caused various errors, which among other things led to a dedicated stack overflow thread. As a result, it has been replaced with an alternative language detector. I decided to use langid for this, as it not only works more reliably, but also integrates the language model directly, so there are no external dependencies.
v0.0.6
What's new?
- The structure of the constituent tree can be modified. By default, inner postag nodes and token leaves are present (Structure.Complete). Alternatively, postag nodes or token leaves can be removed. In the case of the latter, postag sequences result from the extracted phrases.
- Ensured that there are no multiple spaces at the end of a sentence that cause an exception regarding benepar when the sentence is parsed.
- Create_pipeline() downloads the benepar model to the path "share\nltk_data\models" so that no remaining data is left behind in the CTL directory when CTL is uninstalled.
- Create_pipeline() is supplied with a 'quite' parameter to suppress pip installation output.
- Integrated optional expansion of contractions (e.g., I'm --> I am) within sentences. Note that this is only supported for English.
- Incorporation of comprehensive error handling (e.g., validating language mismatch between the given sentence and the benepar and spaCy models). Integrated custom exceptions that simplify the debugging process.
- Extensive code refactoring (e.g., reduction of code repetitions, conversion of all string literals from ' to ", etc.)