Model serving framework optimised for GPU and CPU environments.
Editing pbtxt
PBTXT is the protobuf text format. There is a vscode extension that can be used to provide syntax highlighting:
Name: Protobuf Text Format
Id: thesofakillers.vscode-pbtxt
Description: Protocol Buffer Text Format syntax highlighting for VS Code
Version: 0.0.4
Publisher: thesofakillers
VS Marketplace Link: https://marketplace.visualstudio.com/items?itemName=thesofakillers.vscode-pbtxt
Instance Counts and GPU/CPU
We can set the number of instances of a model that we want to run:
Optimization
For CPU execution we can enable OpenVINO acceleration
Metrics
By default triton runs its prometheus metrics endpoint on port 8002
so you can do http://hostname:8002/metrics.
Triton Ensemble
In Triton an Ensemble doesn’t mean the same as an Ensemble Model. It is more akin to a classification pipeline
Building an Ensemble
- https://blog.ml6.eu/triton-ensemble-model-for-deploying-transformers-into-production-c0f727c012e3
- We need to install transformers into the docker image
- Added a lightweight docker image:
docker build -t custom_triton .
Client-Side
Calling a Triton Model with Text
Reference: https://stackoverflow.com/questions/72101578/using-string-parameter-for-nvidia-triton
The key seems to be to tell numpy that the data is an np.object_
:
Then we pass the type of Bytes
to th InferInput
object for the dtype:
Making Requests Over HTTPS and SSL Issues
Started getting errors saying:
ssl EOF in violation of protocol
This is an issue with the underlying HTTP client
pip install --upgrade geventhttpclient
Self-Signed Certificates
Adding Authentication to Triton Client
Use the BasicAuth
plugin, pass in credentials
Server-Side
Making Requests to Other Models Within Triton Python Model
We can batch up data and make requests to other models:
the pb_utils.PreferredMemory
argument is used to ensure that the resulting tensor is on CPU and can be copied via the get_output_tensor_by_name
.
pb_utils
and InferenceResponse
objects
Getting Data out of Response:
Model Stats
/v2/models/deberta/versions/1/stats
provides model call statistics which are documented here