Skip to content

ilham-bintang/indonesian-NLP-resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

Sentences Dataset

  1. leipzig indonesian sentence collectoin news articles, web articles, wikipedia data from 2008-2016
  2. wn-msa.sourceforge.net Wordnet Bahasa
  3. Quran indonesian quran translation (id.muntakhab, id.jalalayn, id.indonesian)
  4. Kompas online collection. This corpus contains Kompas online news articles from 2001-2002. See here for more info and citations.
  5. Tempo online collection. This corpus contains Tempo online news articles from 2000-2002. See here for more info and citations.
  6. corpus-frog-storytelling spoken text story telling
  7. TED-Multilingual-Parallel-Corpus Monolingual_data/Indonesian
  8. Opus Opus NLPL
  9. Sealang Sealang dataset

Word reference (kemdikbud) link

  1. Entri Dasar : 48.748 (44,64 %)
  2. Kata Turunan : 26.312 (24,09 %)
  3. Gabungan Kata : 30.625 (28,04 %)
  4. Peribahasa : 2.040 (1,87 %)
  5. Kiasan : 268 (0,25 %)
  6. Ungkapan : 1.129 (1,03 %)
  7. Varian : 91 (0,08 %)
  8. Entri Total : 109.213 (100,00 %)
  9. Makna Total : 127.775
  10. Contoh Total : 29.495
  11. Kategori Total : 255
  12. Makna Per Entri : 1,170
  13. Contoh Per Makna : 0,231

Words dataset

  1. Word Sastrawi
  2. Word spaCy : id
  3. Word name : random-name
  4. Word Indo name : genderprediction
  5. Word Indo place : Wilayah-Administratif-Indonesia
  6. Word Indo place : Indonesia-Postal-Code
  7. Word Wiktionary : word id
  8. Word sentiment : analisis-sentimen
  9. Word sentiment : ID-OpinionWords
  10. Word sentiment : Analisis-Sentimen-ID
  11. Word Acronims
  12. word : serangkai

Tagged dataset

  1. NER : yohanesgultom/nlp-experiments 1700 sentences
  2. NER : yusufsyaifudin/indonesia-ner 1835 sentences
  3. POS-TAG : famrashel/idn-tagged-corpus
  4. POS-TAG : pebbie/pebahasa ~600 sentence
  5. POS-TAG Parser : UniversalDependencies/UD_Indonesian-GSD ~4477 sentence
  6. Sentimen 1506 sentences
  7. panl10n Pan Localization

Parallel corpus Eng-Ind

  1. parallel-corpora-en-id
  2. Indonesian-English-Bilingual-Corpus
  3. TALPCo
  4. opus
  5. Multi-Wiki

Morph

  1. MALINDO_Morph
  2. morphind
  3. INDRA

Crawler Data

  1. Crawler Indonesian news portal

About

data resource untuk NLP bahasa indonesia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published