Simple project made using OpenCV and Tesseract with C++ / Qt framework to read screenshots from book pages and extract text content to be fetch into a dataset. The aim was to automate the process of taking notes manually and to implement an interface where the user can select book extracts to be saved.
The project involves python FastAPI web framework to provide an api and a simple frontend client built with React and Nginx web server. Qt offers a GUI to process and translate image to text to generate new entries to the dataset.
- Docker 26.0.1+
- Docker compose 2.26.1+
This repository is splitted upon two main directories. The api
folder containing all of our frontend and backend services, while the ocr
folder gathers all files related to our gui
:
.
├── api
│ ├── backend
│ ├── db
│ └── frontend
└── ocr
└── gui
First we need build our backend and frontend services. To do so, a docker-compose.yml
file is present inside the api
directory. To automatically generate our build images and run the necessary containers :
cd api && docker compose up --build
Once our containers are running, we can access both our frontend and backend services.
Running at : http://localhost:3000
Running at : http://localhost:8989/docs
FastAPI uses Swagger UI to generate an interactive documentation to visualize and interact with the api and its relied dataset.
Note: By default, the current project is shipped with mysql database schema and minimal dataset using mysql docker container.
Note : the gui
part of the project has been developed on an x86_64
cpu architecture using Ubuntu 22.04
operating system. All the following steps will describe the process of building the project on that specific architecture and setup only.
As our gui
was build using OpenCV and Tesseract, we first need to install the dependencies following those instructions :
The simplest way to build the project as configured, is to use Qt Creator to generate the final executable. Installing Qt Creator will automatically install all necessary dependencies to manage it.
- FastAPI : Python Web framework to build APIs
- SQLAlchemy : SQL toolkit and ORM (for database interactions)
- Pydantic : Data validation and settings management
- Uvicorn : ASGI web server
- React : frontend Javascript Framework
- React-Bootstrap : Bootstrap frontend components (wip)
- Nginx : HTTP web server
- Qt : Cross-Platform application development framework for desktop, embedded and mobile
- QML : Multi-Paradigm Language for creating highly dynamic applications in Qt
- QtQuick : Standard library for writing QML applications
- OpenCV : Open Source Computer Vision Library
- Tesseract : Open Source OCR Engine
- Docker : open platform to build, ship, and run distributed applications
- Docker compose : Define and run multi-container applications with Docker
Work In Progress...