#

text-corpus

Here are 35 public repositories matching this topic...

miras-tech / MirasText

MirasText

nlp sentiment-analysis article corpus language-modeling dataset persian-nlp text-corpus word-embedding irony-detection

Updated Aug 12, 2020
Python

Ermlab / PoLitBert

Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.

nlp polish text-corpus roberta

Updated May 25, 2021
Python

t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.

nlp machine-learning text-corpus

Updated Feb 22, 2022

WING-NUS / nus-sms-corpus

This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely …

social-media sms nus text-corpus short-message-service

Updated Jan 20, 2024

mrzjy / StarrailDialog

A project that extracts Honkai: Star Rail text corpus

game nlp character conversation npc rpg-game multilanguage text-corpus honkai mihoyo honkai-star-rail

Updated Jul 12, 2024
Python

AsoSoft / AsoSoft-Text-Corpus

AsoSoft Text Corpus is the first large scale text corpus for the Kurdish language.

natural-language-processing corpus kurdish sorani text-corpus kurdish-language-processing central-kurdish

Updated Apr 1, 2022

nikitaeverywhere / edu-text-analysis-experiments

Statistical text analysis and semantic networks with Python

analysis text-analysis tf-idf gephi semantic-networks sigma text-corpus text-analyzer sigma-analysis

Updated Nov 30, 2017
Python

lucylow / Yeezy-Taught-Me

Yeezy Taught Me Text Generation. Training next character predictions RNN LSTM model with user input text corpus

time-series neural-network text-classification corpus recurrent-neural-networks lstm speech-recognition rnn text-processing character-generator text-corpus time-series-analysis indexdb lstm-cells time-series-classification time-series-prediction character-prediction tenorflow lstm-models

Updated Dec 4, 2022
JavaScript

jonsafari / habeas-corpus

Command-line corpus tools

vocabulary corpus corpora corpus-linguistics command-line-tools text-corpus

Updated May 15, 2017
Shell

appeler / search_names

Search a long list of names (patterns) in a large text corpus systematically and quickly

search names text-corpus

Updated Oct 7, 2022
Python

JuliusBahr / SimpleSimilarity

A framework for semantic text search

search nlp macos swift search-engine ios natural-language-processing help-wanted search-algorithm text-processing ios-framework text-corpus text-search corpus-creation textual-search

Updated Jun 27, 2022
Swift

jcrippen / tlingit-corpus

Text corpus the of Tlingit language for linguistic research.

indigenous-languages text-corpus linguistic-corpora native-american linguistics-databases

Updated Oct 30, 2024
Shell

luonglearnstocode / Seinfeld-text-corpus

text corpus 📃 scraped from the scripts 💬 of all Seinfeld episodes

regex requests web-scraping seinfeld text-corpus beautifulsoup4

Updated Jan 8, 2019
Jupyter Notebook

thecsw / katya-dev

Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!

corpus russian tagger corpus-linguistics corpus-generator corpus-builder text-corpus russian-literature corpus-processing corpus-analysis

Updated Mar 14, 2024
Go

cligs / conha19

Corpus de novelas hispanoamericanas del siglo XIX (conha19)

xml spanish mexico novels digital-humanities cuba argentina tei genre text-corpus 19th-century linguistic-annotation

Updated Sep 8, 2023
XSLT

capetocape / crawl-text-title-as-corpus

Crawling data from websites as text corpus

python nlp crawling text-corpus

Updated Sep 8, 2018
Python

TextCorpusLabs / wikimedia

Walk through to convert WikiMedia into a text corpus

wikimedia python3 text-corpus

Updated Jan 26, 2023
Python

DroppedText_Corpus

seanpm2001 / DroppedText_Corpus

A text corpus collection for the DroppedText language.

collection gplv3 markup corpus markup-language md txt text-corpus gpl3 droppedtext droppedtext-lang dropped-text dropped-text-lang drotex droppedtext-corpus markuplanguage

Updated Sep 14, 2022

kurpicz / tcc

Text Corpus Collection

downloader text-corpus

Updated Jul 24, 2019
C++

soumyadeepghoshGG / Twitter-Sentiment-Analysis-with-NLP

Using natural language processing techniques to determine the sentiment expressed in a tweet, classified as positive or negative.

natural-language-processing social-media twitter sentiment-analysis nltk text-corpus

Updated May 23, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the text-corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-corpus topic, visit your repo's landing page and select "manage topics."