In this project, we built a deep neural network that functions as part of a machine translation. The project accepts English text as input and returns the Hindi translation. The goal is to achieve the highest translation accuracy possible.
Language is the core medium of communication and translation is the core tool for understanding information in unknown languages. Machine translation helps people to understand the information of unknown languages without the help of Human translator. This project is a brief introduction to machine translation. Machine translation is an excellent choice in situations where having the materials translated by a human translator would be too large an investment.
A computer program can translate tremendous amounts of content quickly. Even the most trustworthy employees can make mistakes. That's why, as a rule, the more people who have access to a document, the higher the security risks. By reducing the number of humans who must access sensitive data, machine translation can improve data security. In ancient days, English people suffered a lot with this translation issues. All the medical prescriptions were written in English, which Hindi people can’t understand. They must depend entirely on human translators. As no human can work accurately, sometimes the medicine name would be translated to another name. This resulted in the wrong medication. So, these types of complex problems can be solved by using machine translation. This is the main motto to choose this project. Here we will be translating English to Hindi using attention models as merely half of the world’s population speak English, Hindi. We also translate human readable dates like April 8th, 2000, to machine readable dates 04/08/2000 in this project.
I have attched Kaggle URL to the dataset
The Hindi-English Truncated Corpus dataset contains 127607 parallel sentences in Hindi and English, preprocessed and filtered for quality. It is a useful resource for natural language processing tasks such as machine translation and text classification and can serve as a benchmark dataset for evaluating the performance of different models and techniques in this field.