Textractor

Textractor is an OCR application for Sailfish OS. Main features:

OCR can be run on:

an image taken with the app
an image selected from the device
a PDF file (one or multiple pages)

Cropping is supported in any reasonable quadrilateral arrangement and perspective correction is applied for the selection. User has access to advanced image preprocessing settings.

Found text can be edited or copied to clipboard. As SFOS is a true multitasking OS, the whole OCR process can be run on background while user can use the device for other purposes at the same time.

Documentation and Help

Textractor Documentation

Environment and building

To be able to build this, follow this Gist to setup the environment correctly: https://gist.github.com/skvark/49a2f1904192b6db311a

In short:

Add my repositories containing Tesseract OCR and Leptonica to the build machine targets.

Preprocessing

Tesseract OCR is just plain engine so Leptonica is used for preprocessing the image.

Currently following steps will be done before the image is passed to the engine for recognition:

Image is first opened using QImage, dpi is set to 300, image is rotated according to device angle and the image is saved in jpg format.
Load the jpg image with Leptonica and convert the 32 bpp image to gray 8 bpp image
Unsharp mask
Local background normalization with Otsu's algorithm
Skew angle detection and rotation (Leptonica decides if the image needs to be rotated)

After those steps the image is passed to the Tesseract.

Test image and result

Original:

Preprocessed

Extracted text:

This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.






 D R I N K  COFFEE
L Do Stupid Faster
 With More Energy

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
lib		lib
qml		qml
rpm		rpm
src		src
translations		translations
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
bannerplain.png		bannerplain.png
harbour-textractor.desktop		harbour-textractor.desktop
harbour-textractor.png		harbour-textractor.png
harbour-textractor.pro		harbour-textractor.pro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Textractor

Documentation and Help

Environment and building

Preprocessing

Test image and result

About

Releases 7

Packages

Languages

License

skvark/Textractor

Folders and files

Latest commit

History

Repository files navigation

Textractor

Documentation and Help

Environment and building

Preprocessing

Test image and result

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages