Objects recognition by Vision

ML is an algorithm that allows converting different inputs into required input data. The main problem of working with ML is model training. For example, converting image into description of image content.

Vision is Apple framework that allows to produce different operations with images and videos, face, text, objects recognition. In this article we'll look through objects classification by trained ML model.

For objects recognition, using Vision, it needs to add ML model: you can use ready-made or train your own. Training can be performed right in Playground. You have to send to the model folders with objects that will be recognized. After training there's a file with .mlmodel extension. So the result is a model with input and output data: the input data are images, output data is random text data - mostly it's an appropriate image description or position of recognized image in input.

We'll use this file further in our application. To simplify work with model let's import CoreKit framework that allows initialize model by the file name instead of compiling model by link:

import CoreML				

let model = try MLModel(contentsOf: urlToModel)
var model: YourModelName!
model = YourModelName()

For recognition it needs to convert an image into CVPixelBuffer.

Mostly it needs to convert input image to exact sizes (in our case 224x224).

After converting we perform recognition by the YourModelName.predict(data: pixelBuffer) command and get our model output data.

For the work with models where output data is object position in image, we use Vision. We create a variable with VNCoreMLModel type. Then we perform a request on all the input data of model and draw the object in frame, for example.

Thanks for reading!