This project focuses on building an end-to-end system to detect and convert handwritten text, specifically in Devanagari script. Leveraging advanced models like TrOCR (Transformer-based OCR), the system extracts text from scanned documents, including both printed and handwritten content, with a focus on Nepali language. The model uses a Vision Transformer (ViT) as an encoder to process image features and NepBERT, a variant of RoBERTa, as a decoder to generate text.
- Develop a high-accuracy OCR model for handwritten Nepali text recognition.
- Automate the conversion of scanned handwritten documents into digital text.
- Aayush Puri
- Anil Paudel
- Yubraj Sigdel
- Current phase: Model Deployment
- Minor inaccuracies in detecting certain handwritten styles.
- Overfitting on specific types of Devanagari words. IT still lacks robustness to generailze in Nepali Handwritten Texts.
- Fine-tune the model to handle additional handwritten styles.
- Expand the system to support batch inference of documents.
This project requires python-3.10
. To ensure compatibility, we recommend creating a virtual environment.
conda create -n handwritten python==3.10
conda activate handwritten
git clone git@github.com:fuseai-fellowship/hand-written-document-conversion.git
git clone https://github.com/fuseai-fellowship/hand-written-document-conversion.git
pip install -r requirements.txt
To sync and clean unused dependencies:
make deps-sync
The sample UI is as shown: (Delete this and paste the ui screenshot via update readme via github)
Follow the below instructions to run the system and test it on your documents:
- Upload a scanned handwritten document.
- Run the system to extract the handwritten text.
- View the results in digital format displayed beside the image input.
- The system uses a custom dataset with handwritten Nepali text, both printed and annotated.
- Source data includes documents from various sectors such as education and government.
- /src: Contains the core processing scripts.
- /notebook: Contains the notebook used while finetuning TrOCR model.
- /models: Includes the pre-trained YOLO model for text-detection.
- /data: Houses training and test datasets.
- Output files and extracted texts are stored in the
/output
directory.
- Character Error Rate (CER): Measures accuracy in recognizing handwritten characters.
- The system achieved a CER of 9.05% on the test set.
- These results demonstrate the model’s ability to generalize across different handwriting styles.