Viktor Kinko

Character recognition

One of the most important modern trends in the software — programs that have computer vision. This technology allows us to analyze the information in the images and video files. For example, read the text, or to detect the location of certain objects.

For the practical study of this technology I was given the task of determining cup in the photo. To implement it, it was decided to use the android + OpenCV (http://opencv.org/). OpenCV is an open source computer vision library, designed for C ++, python, java and many other languages. It has many functions, but we are interested in the ability to process images and perform them search for objects using a cascade algorithm Viola-Jones.

Viola-Jones algorithm is a method for detecting objects in images based on Haar signs. Its main features are high speed and low false alarm rate. Initially the algorithm was developed for the face detection, but it can be trained to detect other objects. It splits an image into areas, evaluate brightness in these areas and cut-off areas where classified the subject is not clear. This algorithm is implemented in OpenCV as separate function, which requires the classifier file for input that defines the weights for the algorithm work and the image where the search will probe.

For training the classifier I have used a Cascade-Trainer-GUI program (http://amin-ahmadi.com/cascade-trainer-gui/), which provides a window interface for standard software from a set of OpenCV — opencv_createsamples and opencv_traincascade.
The interface of Cascade-Trainer-GUI program
The process of training begins with the preparation of input data. Images act as inputs containing and not containing the original object. Images with the object are considered positive and go to p folder and those where is no object — negative and placed in n folder. It is worth noting that the positive image should contain only recognizable object and nothing else. Otherwise it should have to create a file that will contain the coordinates of the object in the image.

opencv_createsamples is a program that generates the source files for opencv_traincascade. Negatives are formed on the principle of cutting: a random area cuts out of the negative pictures. Positives are created by putting some positive image of the object on the negative, with minor changes of brightness, rotation, and perspective.

After samples are created opencv_traincascade is used, which receives the samples and determines, based on their weight and the crossing thresholds for cascade classifier. The weights should be chosen thus that the network will reject all the negative images and will took all positive.

But it is only a theory. In practice these processes are carried out by Cascade-Trainer-GUI, which is required to get started only two folders of photographs, one of which is called p and the other n. Here the positive and negative samples are contain respectively. To train the network, I took 47 pictures of my object with different camera angle, light angle, tilt and brightness. After that, I doubled this number by specular reflection. There were only 42 negative photos, but with the help of the slicing of the images there was created 1000 negative samples. Some of the negative images was taken from the Internet for variety.
Positive samples from the folder p
Negative samples in the folder n
After data preparation we attune tool for training. By and large, most settings do not need to change, they are already installed in the optimum values, but it took me to change a few options in the tab train-cascade, namely, sample width and sample height. These options specify the proportions for the samples, for an area that the classifier would later discover. And if you leave it as they are (both equal to 24 by default), the detectable area will be a square. Because cup does not fit well in the square area, I specified these parameters equal 30 and 40, respectively, which was based on proportions of my positive samples.
The settings of Cascade-Trainer-GUI program
To obtain results we start the process of training. Depending on the number of input samples it takes different time. Initially, it was used a small number of samples, which led to bad result of the classifier work, but the training was going quite quickly (about five minutes). With 94 positive images the classifier training process took about three hours.
An example of a classifier file
Let’s write a small program on android using the classifier file. It will take an image from the camera, analyze it with the classifier and highlight areas containing the desired object. Below I will give the main points of the code relevant to our problem.

OpenCV initialization:
It should to pay attention to the fact that OpenCV is initialized using the function initDebug, not initAsync. It is because the OpenCV libraries in this case are connected to the project directly.

Opening a file classifier:
Using a classifier to obtain a list of areas:
It is a function of the object recognition in the image. It returns a matrix of Rect objects, ie recognized areas. Line
detector.detectMultiScale(mat, items, 1.1, 3, 0, new Size(25, 25), new Size());
has lots of options. The first — a source image in OpenCV matrix format. The second — a matrix of results. Other parameters: zoom factor (how does cropped image size change after each pass of the algorithm, default is 1.1), minimal neighborhood (helps eliminate false positives and draw multiple operation in one area, the default is 3), flags (unused legacy-parameter), minimum and maximum size (defines a boundary for detectable area). The minimum size is set to 25 pixels because on smaller areas the false alarms occur more often. The maximum size is left without limitations.

The result of the work is a phone app, that highlights the cup on the photos with a green rectangle. It is meanwhile unable to eliminate false positives, therefore sometimes detector allocates, as it seems, completely random area. Nevertheless, the desired object is almost always highlighted in the photos.
An example of the correct work of the classifier
An example of the correct work of the classifier
An example of false alarm of the classifier (a selection around the shadow)
An example of the lack of the object in the image
As a result, we can say that objects recognizing is not a particularly difficult task. For achieving it is enough to spend a few hours to prepare the samples and to create the classifier. The only obstacle is in OpenCV library itself. It can be connected directly through including in the project, but the library is large; in addition, this method is noted by the OpenCV developers as the debug-connection. The correct version is the one where the app prompts the user to download the library separately on your phone during OpenCV initialization. The problem with this approach is that you will have to explain to the user in some way that in order to connect the recognition function, he will need to download a separate app.

An alternative for OpenCV are web services for image recognition, for example Clarifai (https://www.clarifai.com/). This method is slower and not suitable for recognition of objects in real time, but will greatly reduce the weight of the final application. In case if the size of the final application (or installation of third-party OpenCV app for user) is not an issue, then this approach will be correct.
Thanks for reading!