in this repository we will share our works related to multilingual document reading (english,bangla and arabic).this is a work in progress,we will gradually update the repo inshaa allah
DEV LOCAL ENVIRONMENT
OS : Ubuntu 20.04.3 LTS
Memory : 23.4 GiB
Processor : Intel® Core™ i5-8250U CPU @ 1.60GHz × 8
Graphics : Intel® UHD Graphics 620 (Kabylake GT2)
Gnome : 3.36.8
python requirements
- dev - cpu - test -install
stable test environment
- Manual Setup
conda create -n mlreader python=3.8 -y
conda activate mlreader
conda install -n mlreader ipykernel --update-deps --force-reinstall -y
./install.sh
- Line based detector model:
paddleOCR en-dbnet
- Word based detector model:
paddleOCR ml-dbnet
- English recognizer:
paddleocr - en -SVTR_LCnet
- Arabic recognizer:
paddleocr - ar
- Bangla recognizer:
easyocr - bn
- Word classifier : Custom
- merging solved
- lang model auto download
- classifier addition
- solved negative stride issue
docs/dev.md
: dev branch docweights/weights.md
: custom weights integration doc
- see :
demo.ipynb