Keywords: OCR
, Tesseract-OCR
, Google Translate
, Shell Script
, Linux
Immigrants often struggle to understand letters in a foreign language received by mail. OCR Translator aims to overcome language barriers, by using Tesseract-OCR and Google Translate.
notice: the preferred way is using a flatbed scanner, camera-based functionality will be added in future releases.
-
Install Tesseract OCR; at time of writing, tesseract 4.0.0-beta.1 was used as OCR engine.
-
Install dependencies (using conda virtualenv)
# navigate to ./anaconda
conda env create --file environment.yml
# activate OCR_Translator_env
source activate OCR_Translator_env
Notes:
- currently supported data types: PDF, png
- one page only (multiple pdf pages won't work)