Skip to content

This project is a proof of concept using Qt/QML and Tesseract to extract text from image to fetch book dataset

License

Notifications You must be signed in to change notification settings

aperlini/OCR-book-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR-book-dataset

Simple project made using OpenCV and Tesseract with C++ / Qt framework to read screenshots from book pages and extract text content to be fetch into a dataset. The aim was to automate the process of taking notes manually and to implement an interface where the user can select book extracts to be saved.

The project involves python FastAPI web framework to provide an api and a simple frontend client built with React and Nginx web server. Qt offers a GUI to process and translate image to text to generate new entries to the dataset.

Content

Demo

1. Default Client

webservice

2. OCR GUI while translating image selection to text

ocr

3. OCR GUI saving new book extract to database

new-entry

4. Refreshing client to see effective change

refresh

Dependencies

API requirements

GUI requirements

Installation

This repository is splitted upon two main directories. The api folder containing all of our frontend and backend services, while the ocr folder gathers all files related to our gui :

.
├── api
│   ├── backend
│   ├── db
│   └── frontend
└── ocr
    └── gui

Build and run the API

First we need build our backend and frontend services. To do so, a docker-compose.yml file is present inside the api directory. To automatically generate our build images and run the necessary containers :

cd api && docker compose up --build

Once our containers are running, we can access both our frontend and backend services.

Frontend (Default view)

Running at : http://localhost:3000

frontend-default

Backend (Interactive API docs)

Running at : http://localhost:8989/docs

backend-default

FastAPI uses Swagger UI to generate an interactive documentation to visualize and interact with the api and its relied dataset.

Note: By default, the current project is shipped with mysql database schema and minimal dataset using mysql docker container.

Build the OCR (GUI)

Note : the gui part of the project has been developed on an x86_64 cpu architecture using Ubuntu 22.04 operating system. All the following steps will describe the process of building the project on that specific architecture and setup only.

OpenCV and Tesseract libraries installation

As our gui was build using OpenCV and Tesseract, we first need to install the dependencies following those instructions :

Qt Creator installation

The simplest way to build the project as configured, is to use Qt Creator to generate the final executable. Installing Qt Creator will automatically install all necessary dependencies to manage it.

Technology Stack

Backend

  • FastAPI : Python Web framework to build APIs
  • SQLAlchemy : SQL toolkit and ORM (for database interactions)
  • Pydantic : Data validation and settings management
  • Uvicorn : ASGI web server

Frontend

GUI

  • Qt : Cross-Platform application development framework for desktop, embedded and mobile
  • QML : Multi-Paradigm Language for creating highly dynamic applications in Qt
  • QtQuick : Standard library for writing QML applications
  • OpenCV : Open Source Computer Vision Library
  • Tesseract : Open Source OCR Engine

Deployment

  • Docker : open platform to build, ship, and run distributed applications
  • Docker compose : Define and run multi-container applications with Docker

Releases

Work In Progress...