Skip to content

Commit

Permalink
Updated supported languages for the vocab guide
Browse files Browse the repository at this point in the history
  • Loading branch information
StephanAkkerman committed Nov 17, 2024
1 parent 427e5df commit 46084d9
Showing 1 changed file with 101 additions and 101 deletions.
202 changes: 101 additions & 101 deletions supported-languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,108 +10,108 @@ The codes are the ISO-639-3 codes with some modifications to distinguish local d

The languages in the table below are currently supported for learning. The languages with a check mark in the `g2p` column are supported for grapheme-to-phoneme conversion. The languages with a check mark in the `translation` column are supported for translation. If the language does not have a checkmark for translation, it is supported but the performance may not be as good as the languages with a checkmark.

TODO: update list and add supported column add the end
The `vocab guide` column indicates whether a vocabulary guide is available for the language. The vocabulary guide is a list of the most common words in the language, which can be used to learn the language. You can find the original list [here](https://huggingface.co/datasets/StephanAkkerman/frequency-words-2018#supported-languages).

| Language | Code | g2p | translation |
| ------------------------- | --------- | --- | ----------- |
| Adyghe | ady || |
| Afrikaans | afr |||
| Albanian | sqi |||
| Amharic | amh |||
| Arabic | ara |||
| Aragonese | arg || |
| Armenian (Eastern) | arm-e |||
| Armenian (Western) | arm-w |||
| Azerbaijani | aze |||
| Bashkir | bak |||
| Basque | eus |||
| Belarussian | bel |||
| Bengali | ben |||
| Bosnian | bos |||
| Bulgarian | bul |||
| Burmese | bur |||
| Catalan | cat |||
| Chinese (Cantonese) | yue || |
| Chinese (Traditional) | zho-t |||
| Chinese (Simplified) | zho-s |||
| Chinese (Min) | min || |
| Czech | cze || |
| Danish | dan |||
| Dutch | dut |||
| English (UK) | eng-uk |||
| English (US) | eng-us || |
| Esperanto | epo |||
| Estonian | est | ||
| Finnish | fin |||
| French | fra |||
| French (Quebec) | fra-qu |||
| Gaelic | gla |||
| Georgian | geo | ||
| German | ger |||
| Greek | gre |||
| Greek (Ancient) | grc || |
| Guarani | grn |||
| Gujarati | guj |||
| Hindi | hin || |
| Hungarian | hun |||
| Ido | ido | | |
| Indonesian | ind |||
| Interlingua | ina || |
| Italian | ita |||
| Jamaican Creole | jam || |
| Japanese | jpn |||
| Kazakh | kaz | ||
| Khmer | khm |||
| Korean | kor |||
| Kurdish | kur || |
| Latin (Classical) | lat-clas |||
| Latin (Ecclesiastical) | lat-eccl |||
| Lithuanian | lit |||
| Luxembourgish | ltz |||
| Macedonian | mac |||
| Maltese | mlt | | |
| Northeastern Thai | tts | | |
| Norwegian Bokmål | nob |||
| Oriya | ori || |
| Papiamento | pap |||
| Persian | fas |||
| Polish | pol || |
| Portuguese (Portugal) | por-po || |
| Portuguese (Brazil) | por-bz |||
| Romanian | ron | ||
| Russian | rus |||
| Sanskrit | san || |
| Serbian | srp || |
| Serbo-Croatian (Latin) | hbs-latn |||
| Serbo-Croatian (Cyrillic) | hbs-cyrl |||
| Sindhi | snd | ||
| Slovak | slo | ||
| Slovenian | slv |||
| Spanish | spa |||
| Spanish (Latin America) | spa-latin |||
| Spanish (Mexico) | spa-me |||
| Swahili | swa | ||
| Swedish | swe || |
| Tagalog | tgl |||
| Tamil | tam |||
| Tatar | tat |||
| Thai | tha |||
| Turkish | tur || |
| Turkmen | tuk | | |
| Ukrainian | ukr |||
| Vietnamese (Northern) | vie-n |||
| Vietnamese (Central) | vie-c |||
| Vietnamese (Southern) | vie-s |||
| Welsh (North) | wel-nw |||
| Welsh (South) | wel-sw |||
| Icelandic | ice |||
| Old English | ang || |
| Irish | gle |||
| Middle English | enm || |
| Classical Syriac | syc || |
| Galician | glg | ||
| Northern Sami | sme || |
| Egyptian | egy | | |
| Language | Code | g2p | translation | vocab guide |
| ------------------------- | --------- | --- | ----------- | ----------- |
| Adyghe | ady || | |
| Afrikaans | afr ||||
| Albanian | sqi ||||
| Amharic | amh ||| |
| Arabic | ara ||||
| Aragonese | arg || | |
| Armenian (Eastern) | arm-e ||||
| Armenian (Western) | arm-w ||||
| Azerbaijani | aze ||| |
| Bashkir | bak ||| |
| Basque | eus ||||
| Belarussian | bel ||| |
| Bengali | ben ||||
| Bosnian | bos ||||
| Bulgarian | bul ||||
| Burmese | bur ||||
| Catalan | cat ||| |
| Classical Syriac | syc || | |
| Chinese (Cantonese) | yue || | |
| Chinese (Traditional) | zho-t | | ||
| Chinese (Simplified) | zho-s || | |
| Chinese (Min) | min || | |
| Czech | cze | | ||
| Danish | dan | | ||
| Dutch | dut | | ||
| Egyptian | egy || | |
| English (UK) | eng-uk | | ||
| English (US) | eng-us || ||
| Esperanto | epo || | |
| Estonian | est || ||
| Finnish | fin | | ||
| French | fra | | ||
| French (Quebec) | fra-qu || ||
| Gaelic | gla || | |
| Galician | glg | | ||
| Georgian | geo || | |
| German | ger | | ||
| Greek | gre | | ||
| Greek (Ancient) | grc || | |
| Guarani | grn || | |
| Gujarati | guj || | |
| Hindi | hin | | ||
| Hungarian | hun || | |
| Icelandic | ice || ||
| Ido | ido || | |
| Indonesian | ind || ||
| Interlingua | ina || ||
| Irish | gle || | |
| Italian | ita || | |
| Jamaican Creole | jam || | |
| Japanese | jpn | | ||
| Kazakh | kaz | | ||
| Khmer | khm || | |
| Korean | kor | | ||
| Kurdish | kur || | |
| Latin (Classical) | lat-clas || | |
| Latin (Ecclesiastical) | lat-eccl || | |
| Lithuanian | lit | | ||
| Luxembourgish | ltz | | | |
| Macedonian | mac | | ||
| Maltese | mlt || | |
| Middle English | enm || | |
| Northeastern Thai | tts || | |
| Northern Sami | sme || | |
| Norwegian Bokmål | nob || ||
| Oriya | ori || | |
| Old English | ang || | |
| Papiamento | pap | | | |
| Persian | fas | | ||
| Polish | pol | | ||
| Portuguese (Portugal) | por-po || ||
| Portuguese (Brazil) | por-bz || ||
| Romanian | ron | | ||
| Russian | rus | | ||
| Sanskrit | san || | |
| Serbian | srp | | ||
| Serbo-Croatian (Latin) | hbs-latn || ||
| Serbo-Croatian (Cyrillic) | hbs-cyrl || | |
| Sindhi | snd || | |
| Slovak | slo | | ||
| Slovenian | slv || ||
| Spanish | spa | | ||
| Spanish (Latin America) | spa-latin || | |
| Spanish (Mexico) | spa-me | | | |
| Swahili | swa || | |
| Swedish | swe | | ||
| Tagalog | tgl | | ||
| Tamil | tam | | ||
| Tatar | tat || | |
| Thai | tha | | ||
| Turkish | tur | | ||
| Turkmen | tuk | | | |
| Ukrainian | ukr || ||
| Vietnamese (Northern) | vie-n || | |
| Vietnamese (Central) | vie-c || | |
| Vietnamese (Southern) | vie-s || ||
| Welsh (North) | wel-nw | | | |
| Welsh (South) | wel-sw || | |

## Native languages

Expand Down

0 comments on commit 46084d9

Please sign in to comment.