Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
-
Updated
Jul 15, 2019 - Python
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
Code for NAACL 2019 paper: "Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions"
Data for the DiMSUM shared task at SEMEVAL 2016
A Python package for Exploratory Data Analysis (EDA) for text-based data.
Comparison between various noun compound embeddings
A set of useful tools for use with multiword expression extraction from parallel corpora for Moses statistical machine translation system
Foma-based multi-word tagger and morphological analyzer
Data and code for the paper "ID10M: Idiom Identification in 10 Languages" (NAACL 2022).
Repo for the paper "MWE as WSD: Solving Multi-Word Expression Identification with Word Sense Disambiguation"
A SpaCy MWE identification pipeline component
Data and code for the paper "NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection".
Adjacent code related to the paper prepared for Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD 2024), 25th May, 2024.
Rigor-Mortis is an online GWAP where players have to find multiword expressions in French sentences
Java implementation of substitution driven measures of association that can be used to identify MWEs.
Python implementation of Substitution-driven Measures of Association
Learning English expressions has never been so easy
Add a description, image, and links to the multiword-expressions topic page so that developers can more easily learn about it.
To associate your repository with the multiword-expressions topic, visit your repo's landing page and select "manage topics."