The main task was to recognize receipts from photos.
Tesseract OCR was used as a primary tool. Library pros are trainedlanguage models (>192), different kinds of recognition (image as word, text block, vertical text), easy to setup. 3rd party wrapper from github was used as Tesseract OCR was written on C++.
The version difference is in different trained models (the 4th version is more accurate so I used it).
We need file with data for text recognition, for each language each file. Download here.
The better the image quality (size, contrast, lightning) the better the recognition result.
Also the image processing was found for the further recognition by the OpenCV library. As OpenCV is written on C++ and there's no optimalwrapper for our decision so I made my own wrapper for this library with necessary functions for image processing. The main difficulty is to choose solutions for the filter for right image processing. There's also a possibility to find receipt/test outlines, but it's not researched enough. The result was for 5-10% better.