From 46084d988f0a7a746ab8b3074ae1e8ef6b558ec8 Mon Sep 17 00:00:00 2001 From: StephanAkkerman Date: Sun, 17 Nov 2024 12:50:52 +0100 Subject: [PATCH] Updated supported languages for the vocab guide --- supported-languages.md | 202 ++++++++++++++++++++--------------------- 1 file changed, 101 insertions(+), 101 deletions(-) diff --git a/supported-languages.md b/supported-languages.md index 93ccbae..fa16605 100644 --- a/supported-languages.md +++ b/supported-languages.md @@ -10,108 +10,108 @@ The codes are the ISO-639-3 codes with some modifications to distinguish local d The languages in the table below are currently supported for learning. The languages with a check mark in the `g2p` column are supported for grapheme-to-phoneme conversion. The languages with a check mark in the `translation` column are supported for translation. If the language does not have a checkmark for translation, it is supported but the performance may not be as good as the languages with a checkmark. -TODO: update list and add supported column add the end +The `vocab guide` column indicates whether a vocabulary guide is available for the language. The vocabulary guide is a list of the most common words in the language, which can be used to learn the language. You can find the original list [here](https://huggingface.co/datasets/StephanAkkerman/frequency-words-2018#supported-languages). -| Language | Code | g2p | translation | -| ------------------------- | --------- | --- | ----------- | -| Adyghe | ady | ✅ | | -| Afrikaans | afr | ✅ | ✅ | -| Albanian | sqi | ✅ | ✅ | -| Amharic | amh | ✅ | ✅ | -| Arabic | ara | ✅ | ✅ | -| Aragonese | arg | ✅ | | -| Armenian (Eastern) | arm-e | ✅ | ✅ | -| Armenian (Western) | arm-w | ✅ | ✅ | -| Azerbaijani | aze | ✅ | ✅ | -| Bashkir | bak | ✅ | ✅ | -| Basque | eus | ✅ | ✅ | -| Belarussian | bel | ✅ | ✅ | -| Bengali | ben | ✅ | ✅ | -| Bosnian | bos | ✅ | ✅ | -| Bulgarian | bul | ✅ | ✅ | -| Burmese | bur | ✅ | ✅ | -| Catalan | cat | ✅ | ✅ | -| Chinese (Cantonese) | yue | ✅ | ✅ | -| Chinese (Traditional) | zho-t | ✅ | ✅ | -| Chinese (Simplified) | zho-s | ✅ | ✅ | -| Chinese (Min) | min | ✅ | | -| Czech | cze | ✅ | ✅ | -| Danish | dan | ✅ | ✅ | -| Dutch | dut | ✅ | ✅ | -| English (UK) | eng-uk | ✅ | ✅ | -| English (US) | eng-us | ✅ | ✅ | -| Esperanto | epo | ✅ | ✅ | -| Estonian | est | ✅ | ✅ | -| Finnish | fin | ✅ | ✅ | -| French | fra | ✅ | ✅ | -| French (Quebec) | fra-qu | ✅ | ✅ | -| Gaelic | gla | ✅ | ✅ | -| Georgian | geo | ✅ | ✅ | -| German | ger | ✅ | ✅ | -| Greek | gre | ✅ | ✅ | -| Greek (Ancient) | grc | ✅ | | -| Guarani | grn | ✅ | ✅ | -| Gujarati | guj | ✅ | ✅ | -| Hindi | hin | ✅ | ✅ | -| Hungarian | hun | ✅ | ✅ | -| Ido | ido | ✅ | | -| Indonesian | ind | ✅ | ✅ | -| Interlingua | ina | ✅ | | -| Italian | ita | ✅ | ✅ | -| Jamaican Creole | jam | ✅ | ✅ | -| Japanese | jpn | ✅ | ✅ | -| Kazakh | kaz | ✅ | ✅ | -| Khmer | khm | ✅ | ✅ | -| Korean | kor | ✅ | ✅ | -| Kurdish | kur | ✅ | ✅ | -| Latin (Classical) | lat-clas | ✅ | ✅ | -| Latin (Ecclesiastical) | lat-eccl | ✅ | ✅ | -| Lithuanian | lit | ✅ | ✅ | -| Luxembourgish | ltz | ✅ | ✅ | -| Macedonian | mac | ✅ | ✅ | -| Maltese | mlt | ✅ | ✅ | -| Northeastern Thai | tts | ✅ | | -| Norwegian Bokmål | nob | ✅ | ✅ | -| Oriya | ori | ✅ | ✅ | -| Papiamento | pap | ✅ | ✅ | -| Persian | fas | ✅ | ✅ | -| Polish | pol | ✅ | ✅ | -| Portuguese (Portugal) | por-po | ✅ | ✅ | -| Portuguese (Brazil) | por-bz | ✅ | ✅ | -| Romanian | ron | ✅ | ✅ | -| Russian | rus | ✅ | ✅ | -| Sanskrit | san | ✅ | ✅ | -| Serbian | srp | ✅ | ✅ | -| Serbo-Croatian (Latin) | hbs-latn | ✅ | ✅ | -| Serbo-Croatian (Cyrillic) | hbs-cyrl | ✅ | ✅ | -| Sindhi | snd | ✅ | ✅ | -| Slovak | slo | ✅ | ✅ | -| Slovenian | slv | ✅ | ✅ | -| Spanish | spa | ✅ | ✅ | -| Spanish (Latin America) | spa-latin | ✅ | ✅ | -| Spanish (Mexico) | spa-me | ✅ | ✅ | -| Swahili | swa | ✅ | ✅ | -| Swedish | swe | ✅ | ✅ | -| Tagalog | tgl | ✅ | ✅ | -| Tamil | tam | ✅ | ✅ | -| Tatar | tat | ✅ | ✅ | -| Thai | tha | ✅ | ✅ | -| Turkish | tur | ✅ | ✅ | -| Turkmen | tuk | ✅ | ✅ | -| Ukrainian | ukr | ✅ | ✅ | -| Vietnamese (Northern) | vie-n | ✅ | ✅ | -| Vietnamese (Central) | vie-c | ✅ | ✅ | -| Vietnamese (Southern) | vie-s | ✅ | ✅ | -| Welsh (North) | wel-nw | ✅ | ✅ | -| Welsh (South) | wel-sw | ✅ | ✅ | -| Icelandic | ice | ✅ | ✅ | -| Old English | ang | ✅ | | -| Irish | gle | ✅ | ✅ | -| Middle English | enm | ✅ | | -| Classical Syriac | syc | ✅ | | -| Galician | glg | ✅ | ✅ | -| Northern Sami | sme | ✅ | ✅ | -| Egyptian | egy | ✅ | | +| Language | Code | g2p | translation | vocab guide | +| ------------------------- | --------- | --- | ----------- | ----------- | +| Adyghe | ady | ✅ | | | +| Afrikaans | afr | ✅ | ✅ | ✅ | +| Albanian | sqi | ✅ | ✅ | ✅ | +| Amharic | amh | ✅ | ✅ | | +| Arabic | ara | ✅ | ✅ | ✅ | +| Aragonese | arg | ✅ | | | +| Armenian (Eastern) | arm-e | ✅ | ✅ | ✅ | +| Armenian (Western) | arm-w | ✅ | ✅ | ✅ | +| Azerbaijani | aze | ✅ | ✅ | | +| Bashkir | bak | ✅ | ✅ | | +| Basque | eus | ✅ | ✅ | ✅ | +| Belarussian | bel | ✅ | ✅ | | +| Bengali | ben | ✅ | ✅ | ✅ | +| Bosnian | bos | ✅ | ✅ | ✅ | +| Bulgarian | bul | ✅ | ✅ | ✅ | +| Burmese | bur | ✅ | ✅ | ✅ | +| Catalan | cat | ✅ | ✅ | | +| Classical Syriac | syc | ✅ | | | +| Chinese (Cantonese) | yue | ✅ | ✅ | | +| Chinese (Traditional) | zho-t | ✅ | ✅ | ✅ | +| Chinese (Simplified) | zho-s | ✅ | ✅ | ✅ | +| Chinese (Min) | min | ✅ | | | +| Czech | cze | ✅ | ✅ | ✅ | +| Danish | dan | ✅ | ✅ | ✅ | +| Dutch | dut | ✅ | ✅ | ✅ | +| Egyptian | egy | ✅ | | | +| English (UK) | eng-uk | ✅ | ✅ | ✅ | +| English (US) | eng-us | ✅ | ✅ | ✅ | +| Esperanto | epo | ✅ | ✅ | | +| Estonian | est | ✅ | ✅ | ✅ | +| Finnish | fin | ✅ | ✅ | ✅ | +| French | fra | ✅ | ✅ | ✅ | +| French (Quebec) | fra-qu | ✅ | ✅ | ✅ | +| Gaelic | gla | ✅ | ✅ | | +| Galician | glg | ✅ | ✅ | ✅ | +| Georgian | geo | ✅ | ✅ | ✅ | +| German | ger | ✅ | ✅ | ✅ | +| Greek | gre | ✅ | ✅ | ✅ | +| Greek (Ancient) | grc | ✅ | | | +| Guarani | grn | ✅ | ✅ | | +| Gujarati | guj | ✅ | ✅ | | +| Hindi | hin | ✅ | ✅ | ✅ | +| Hungarian | hun | ✅ | ✅ | ✅ | +| Icelandic | ice | ✅ | ✅ | ✅ | +| Ido | ido | ✅ | | | +| Indonesian | ind | ✅ | ✅ | ✅ | +| Interlingua | ina | ✅ | | ✅ | +| Irish | gle | ✅ | ✅ | | +| Italian | ita | ✅ | ✅ | | +| Jamaican Creole | jam | ✅ | ✅ | | +| Japanese | jpn | ✅ | ✅ | ✅ | +| Kazakh | kaz | ✅ | ✅ | ✅ | +| Khmer | khm | ✅ | ✅ | | +| Korean | kor | ✅ | ✅ | ✅ | +| Kurdish | kur | ✅ | ✅ | | +| Latin (Classical) | lat-clas | ✅ | ✅ | | +| Latin (Ecclesiastical) | lat-eccl | ✅ | ✅ | | +| Lithuanian | lit | ✅ | ✅ | ✅ | +| Luxembourgish | ltz | ✅ | ✅ | | +| Macedonian | mac | ✅ | ✅ | ✅ | +| Maltese | mlt | ✅ | ✅ | | +| Middle English | enm | ✅ | | | +| Northeastern Thai | tts | ✅ | | | +| Northern Sami | sme | ✅ | ✅ | | +| Norwegian Bokmål | nob | ✅ | ✅ | ✅ | +| Oriya | ori | ✅ | ✅ | | +| Old English | ang | ✅ | | | +| Papiamento | pap | ✅ | ✅ | | +| Persian | fas | ✅ | ✅ | ✅ | +| Polish | pol | ✅ | ✅ | ✅ | +| Portuguese (Portugal) | por-po | ✅ | ✅ | ✅ | +| Portuguese (Brazil) | por-bz | ✅ | ✅ | ✅ | +| Romanian | ron | ✅ | ✅ | ✅ | +| Russian | rus | ✅ | ✅ | ✅ | +| Sanskrit | san | ✅ | ✅ | | +| Serbian | srp | ✅ | ✅ | ✅ | +| Serbo-Croatian (Latin) | hbs-latn | ✅ | ✅ | ✅ | +| Serbo-Croatian (Cyrillic) | hbs-cyrl | ✅ | ✅ | | +| Sindhi | snd | ✅ | ✅ | | +| Slovak | slo | ✅ | ✅ | ✅ | +| Slovenian | slv | ✅ | ✅ | ✅ | +| Spanish | spa | ✅ | ✅ | ✅ | +| Spanish (Latin America) | spa-latin | ✅ | ✅ | | +| Spanish (Mexico) | spa-me | ✅ | ✅ | | +| Swahili | swa | ✅ | ✅ | | +| Swedish | swe | ✅ | ✅ | ✅ | +| Tagalog | tgl | ✅ | ✅ | ✅ | +| Tamil | tam | ✅ | ✅ | ✅ | +| Tatar | tat | ✅ | ✅ | | +| Thai | tha | ✅ | ✅ | ✅ | +| Turkish | tur | ✅ | ✅ | ✅ | +| Turkmen | tuk | ✅ | ✅ | | +| Ukrainian | ukr | ✅ | ✅ | ✅ | +| Vietnamese (Northern) | vie-n | ✅ | ✅ | ✅ | +| Vietnamese (Central) | vie-c | ✅ | ✅ | ✅ | +| Vietnamese (Southern) | vie-s | ✅ | ✅ | ✅ | +| Welsh (North) | wel-nw | ✅ | ✅ | | +| Welsh (South) | wel-sw | ✅ | ✅ | | ## Native languages