[Model Optimization] Optimizing models with OpenVINO toolkit – Model Optimization with OpenVINO toolkit

Tram Ho

The main content will be covered in this blog post:

  • OpenVINO ?!
  • Basic inference workflow
  • Model Optimization
  • Inference mode
  • Benchmarks
  • OpenVINO with OpenCV
  • OpenVINO model server
  • Cons
  • Some other toolkits / platforms
  • Some common usecases and conclusion

  • A series of other articles about Model Compression, Model Pruning, Multi-tasks Learning by Model Pruning, Model Optimization, all done by members of Team AI – Sun * R&D Unit:
  • During a process of making a product that has Machine Learning in it, in addition to algorithm modeling, stages related to engineering or deployment also play a very important role to complete the product. Products. Some tasks can be mentioned such as: model compression, model pruning, model quantization, … to trim, quantize the model to make the model lighter, minimize inference time and the accuracy of the model is constant or different. trivial. After that, you can apply some platforms like tensorflow serving to optimize performance when having requests to the model:

Imgur

Reference: https://mobile.twitter.com/mlpowered/status/1194788560357842944

  • A bit more frustrated, the reason I know about OpenVINO is that in the process of implementing some projects at the company, it is required to annotate quite specific data types such as photos, videos. After the process of searching and selecting some tool annotate, it found that cvat quite suitable for the needs of the project. One thing that I find quite impressive of CVAT is that there is an auto annotate mode that uses OpenVINO to significantly improve the speed inference time of a pretrained model. Therefore, with a video of several dozen minutes, Cvat (+ OpennVINO) only takes less than 1 minute to process that video.
  • Advertising corner, some good and cool features of CVAT – annotate platform:
    • Free, OSS, web-based annotation platform for computer vision task
    • Main repo of OpenCV Oganization
    • Simple interface, easy to use
    • REST API document, suitable for customizing the CVAT itself to serve different data annotate purposes.
    • Many annotate modes for different problems: Annotate mode (image classification), Interpolation mode (auto annotate mode) and Segmentation mode (auto segmentation mode)
    • Support various annotate formats: bbox (object detection), polygon (segmentation), polyline, point, auto segment.
    • Support export to many formats: CVAT format, Pascal VOC, YOLO, COCO json (object detection + segmentation), PNG Mask (segmentation), TFRecord (tensorflow object detection API)
    • Support auto-annotate mode for object detection uses the pretrained models of TF Model Zoo and OpenVINO.
    • Support auto-semi-segmentation with: https://www.youtube.com/watch?v=vnqXZ-Z-VTQ
    • Because it is an OSS, it is completely customizable with specific purposes and use-cases. The core backend of CVAT is written in Django (Python). A good example is Onepanel also custom and integrated into their system: https://www.onepanel.io/
  • So what is OpenVINO, and what does it mean to be a practical model of the system, let’s find out more. ?

OpenVINO ?!

  • OpenVINO toolkit, built and developed by Intel, was created to optimize the performance of the model on Intel’s own processors, improve inference time when deploying deploy models on many other platforms. (CPU / GPU / VPU / FPGA).

OpenVINO provides developers with improved neural network performance on a variety of Intel® processors and helps them further unlock cost-effective, real-time vision applications

  • OpenVINO stands for Open Visual Inference and Neural network Optimization toolkit . As you can see by the name, it is also possible to predict that OpenVINO was developed with the purpose of improving the model’s Inference ability, especially related to Visual and computer vision problems such as Image Classification, Object Detection, Object Tracking, …
  • Some noteworthy points of OpenVINO toolkit:
    • Improve performance, ability to inference time of the model
    • Because of Intel’s development, multi-platform support varies from CPU / GPU to embedded devices, such as edge devices such as VPU (Vision Processing Unit), Myriad, Modivius, or FPGA, etc.
    • Using the same API for inference, you only need to change the input mode to use IR (Intermediate Representation of OpenVINO) formats on different platforms.
    • Providing a lot of optimized models, the conversion to IR intermediate format of OpenVINO is also quite easy to implement.
    • Support calling IR format files (OpenVINO) with popular libraries of image processing / computer vision such as: OpenCV and OpenVX
    • Including 2 main parts:

Basic Inference Workflow

  • We need to convert the pretrained model to the IR format or the Intermediated Representation of OpenVINO. IR format includes several files as follows:
    • frozen – *. xml: network topology , is an xml file that defines the model layer, or network graph.
    • frozen – *. bin: contains the weights and biases binary data , the model’s weighted file, can be converted in formats: FP32, FP16, INT8
  • OpenVINO also supports converting to IR format for most popular frameworks such as:
    • Caffe
    • Tensorflow
    • MXNet
    • Kaldi
    • ONNX
    • [Keras / Pytorch]
  • In addition, some other frameworks such as Keras and Pytorch do not support direct conversion from pretrained models, but they can be through other intermediate formats:

  • 1 processing flow with OpenVINO is depicted in the image above.

Model Optimization

  • OpenVINO supports converting to IR format with formats FP32, FP16, INT8, which is also a form of model quantization (or model quantization)

  • Converting to formats such as FP16, INT8 reduces the size of the model, reduces the memory when making requests, and helps handle more requests, increases the speed of inference time while the accuracy of the model changes. trivial.
  • However, the format support depends on different devices (CPU / GPU / VPU / FPGA)

Imgur

  • Sample command, quite concise:

  • By default, OpenVINO provides a number of * .py files for each different framework.

Imgur

Inference mode

  • Sample code implemented inference from IR format

Benchmarks

  • Here are some benchmarks with different formats and devices:

  • Number of frame / second inference when performing tests with different formats on InceptionV3 model: .h5 (keras), frozen .pb (tensorflow), .bin (IR-OpenVINO)

Imgur

  • Processing speed on different chips of OpenVINO

  • FPS when performing inference on some popular models and different devices: CPU, GPU, FPGA

OpenVINO with OpenCV

  • One noteworthy point of OpenVINO is that the IR format files after being optimized are completely readable by OpenCV, a fairly popular library for image processing. In OpenVINO, we provide the readNetFromModelOptimizer method as shown below, the two params passed are the two .xml and .bin files created by OpenVINO. From there, you can perform the predict as usual.

Imgur

  • In addition, OpenCV also supports reading a number of other formats from some popular frameworks: Caffe, Tensorflow, Torch, ONNX, .. For example, with Tensorflow, you can pass params as follows:

OpenVINO model server

  • Usually, after training and testing the model, I often use tensorflow serving to deploy and serve the model most effectively. A few outstanding advantages of Tensorflow Serving can be mentioned as follows:
    • Under TFX (Tensorflow Extended) – can be considered as an end-to-end ecosystem for deploying ML pipelines.
    • Auto-reload and update to the latest versions of the model.
    • Serving multiple models at once with only one configuration file.
    • Handle gets more traffic.
    • Expose support for 2 types of gRPC and RestfulAPI interface
    • Supports many different data formats: text, image, embedding, ….
    • Easily packaged and customized separately from the request to the model
  • OpenVINO also provides an OSS for easy deploying and serving model in IR format. The good thing about OpenVINO model server is that it still retains the outstanding advantages of Tensorflow Serving (serving multiple models with a single config file, support gRPC + RestfulAPI, …), along with the model’s inference time Significant improvement is due to the fact that the model has been converted to an IR format for better performance.
  • I have conducted some tests with some common Model + Backbone such as model object detection (ssd / faster-rcnn), popular feature extraction networks (mobilenet / resnet), OpenVINO model server in all cases will give Better results than Tensorflow Serving, inference time is about 1.3 -> 1.6 times faster with OpenVINO.
  • With SSD-Resnet50

Imgur

  • With Faster-RCNN-Resnet50

Imgur

Cons

Some other toolkits / platforms

  • NVIDIA TensorRT – Programmable Inference Accelerator – https://developer.nvidia.com/tensorrt : NVIDIA TensorRT™ is a platform for high-performance deep learning inference , also a toolkit to improve performance and inference time of the model, very support good on GPU

  • ONNX – open format to represent deep learning models – https://onnx.ai/: actually ONNX’s purpose is completely different from OpenVINO and TensorRT. ONNX is used as a toolkit to convert the model into an intermediate format called .onnx, from which it can call and inference with different frameworks, supporting most of the deep learning frameworks today. You can train a model with Pytorch, save the model as a .pth , use ONNX to convert to .onnx format, then use another intermediate lib like: tensorflow-onnx to convert .onnx to the frozen model of tensorflow. From there, the serving model can be made using Tensorflow Serving as usual ?

Some use-cases and conclusion

  • Above is an introduction to OpenVINO – a toolkit that helps improve the performance and inference time of the model. Hopefully, my blog will give you an overview of OpenVINO and can be applied to current projects to improve the performance of the system. Some typical problems that optimize the model are especially important: MOT (Multiple Object Tracking), Object Detection, Object Tracking, … Any suggestions and feedback please comment below the article or send an email About the address: [email protected] . Thank you for watching and see you again in upcoming blog posts! ?

Reference

Share the news now

Source : Viblo