TorchServe, PyTorch model deployment support tool

Tram Ho

Introduction

Today I will briefly introduce you the PyTorch model-specific deployment tool. This tool is called TorchServe, recently developed so the repo is less star than Tensorflow Serving and the bug is more.. Link Git repo: https://github.com/pytorch/serve

TorchServe system diagram

As shown above, TorchServe system is divided into 3 parts: API, Core (Backend & Frontend), Model Storage.

The API will be divided into two parts: Management API and Inference API, the first will manage the state of the query, the state of the model, the number of workers, the second is where the user request is received.

In the core of TorchServe there are 2 parts: Frontend and Backend. The frontend receives the user’s request if multiple requests are batching and return the request status logs. These batch requests pass the inference endpoint to the backend, where the backend splits the batch requests to each worker process, each worker managing an instance of the trained model.

So where does the model get? Of course, in the model storage, there are many models used for different tasks: Classification, Detection, Segmentation, … Each model will have many different versions. TorchServe will automatically load the model based on the user’s config.

Install TorchServe and torch-model-archiver

First you need to clone the repo to already:

Based on the environment you need, the following installation options are available:

  • With CPU for Torch 1.7.1

  • With GPU and Cuda 10.2

  • With GPU and Cuda 10.1

  • With GPU and Cuda 9.2

=> Install necessary dependencies

Next install two important libraries: torchserve and torch-model-archiver can be made by conda or pip

  • With Conda

  • With Pip

Save models using TorchServe

Create a folder anywhere, named model_store

Download a sample model to deploy and predict. Here I use densene161

Use the torch-model-archive library to save the models in a format that TorchServe supports

Explain the parameters in the above statement:

  • model-name : model name
  • version : version how many
  • model-file : model file, if you save the model with torch save then not needed (Optional)
  • serialized-file : required, trained model awaits conversion, here is typed the path to the model
  • export-path : export-path location
  • extra-files : json file contains label (Optional)
  • handler : required, file processing (preprocessing, post-processing), can inherit from the classes available in the repo or customize it as you like.

Result:

Run the TorchServe server

After having the above file, you use this command. This command will open the endpoint for the user request as well as execute hidden processes to serve the model.

  • start : Start session of TorchServe
  • stop : end of TorchServe session
  • model-store : the place that contains the model, namely the folder containing the file with the .mar extension earlier
  • models : model to load, eg densenet161.mar
  • log-config : config file for log
  • ts-config : special config file for TorchServe, such as adjusting the port
  • foreground : show log when running in terminal, if disable ts will run in the background
  • ncs : disable snapshot

Link Inference API: http://127.0.0.1:8080

Link Management API: http://127.0.0.1:8081

Link Metric API: http://127.0.0.1:8082

The two links below have not been mentioned, let’s find out which first link. This link is used to predict the result using REST API

Download 1 previous image already:

Use the command line to send an image request with a POST method to the TorchServe endpoint

Output:

Predicting via gRPC

  • First of all download the gRPC protocol libraries already

  • In the serve folder, use the proto file to gen gRPC client stub

  • Use model registration

  • Predict a sample using gRPC python client

  • Unsubscribe the model

By default, TorchServe takes 2 port 7070 for the gRPC Inference API and 7071 for the gRPC Management API

As a result, I haven’t tried gRPC so I can’t show it to you (actually tried it but got bug, this prediction method has just been updated on torchserve’s repo so the error is also normal.?)

Management API

When you have multiple models, this is when you need an efficient management tool and of course torchserve supports this through the API endpoint. Supported functions

  1. Register 1 model
  2. Increase / decrease the number of workers for a specified model
  3. Describe the model’s state
  4. Unsubscribe the model
  5. Show registered models
  6. Specify a model instance as default

Model registration

Use the POST method: POST /models

List of parameters:

  • url: path to .mar or link to download model from Internet. Example: https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar
  • model_name: model name
  • handler: Make sure the handler is in PYTHONPATH. Format: module_name: method_name
  • runtime: PYTHON default
  • batch_size: default 1
  • max_batch_delay: batch timeout, default 100 ms
  • initial_workers: initialized number of workers, default 0, TorchServe will not run without workers
  • synchronous: create synchronous or asynchronous workers, default to false
  • response_timeout: timeout, default 120 s

Register model together with worker creation

Scale workers

Using the PUT method: PUT /models/{model_name}

List of parameters:

  • min_worker: (Optional) minimum number of workers, default 1
  • max_worker: (Optional) the maximum number of workers, default 1, TorchServe will not create a worker that exceeds this number.
  • number_gpu: (Optional) the number of GPU workers created, default is 0, if the number of workers exceeds the number of GPUs on the machine, the remaining workers will run on the CPU.
  • synchronous: false default
  • timeout: the time it takes for the worker to complete the pending requests. If this number is exceeded, the worker will stop working. A 0 will stop worker processing immediately. The value -1 will wait indefinitely. Default -1

If the model has multiple versions: PUT /models/{model_name}/{version}

Model description

Use the GET method: GET /models/{model_name}

If the model has multiple versions: GET /models/{model_name}/all

Unsubscribe the model

Use the Delete method: DELETE /models/{model_name}/{version}

Lists the models of registration

Use the GET: GET /models

Parameters:

  • limit: (Optional) number of items to return, default 100
  • next_page_token: (Optional) what page

Set default model

Use the PUT method: PUT /models/{model_name}/{version}/set-default

Conclude

I have just written here today, anyone interested should go to the TorchServe repo to dabble in it.

References

https://github.com/pytorch/serve

Share the news now

Source : Viblo