airpack.deploy.trt
¶
This package provides functionality to optimize a neural network using NVIDIA’s TensorRT framework and perform inference using optimized models.
Optimized models are saved in .plan
format, which is an internal,
platform-specific data format for TensorRT. Since TensorRT optimization
functions by running many variations of the network on the target hardware,
it must be executed on the platform that will be used for inference,
i.e., the AIR-T for final deployment.
The basic workflow is as follows:
Save your trained model to an ONNX file
Optimize the model using TensorRT with the
onnx2plan()
function.Create a
TrtInferFromPlan
object for your optimized model and use it to perform inference.
Module Contents¶
-
airpack.deploy.trt.
onnx2plan
(onnx_file, input_node_name='input', input_port_name='', input_len=4096, fp16_mode=True, max_workspace_size=1073741824, max_batch_size=128, verbose=False)¶ Optimize the provided ONNX model using TensorRT and save the result.
The optimized model will have a
.plan
extension and be saved in the same folder as the input ONNX model.- Parameters
onnx_file (Union[str, os.PathLike]) – Filename of the ONNX model to optimize
input_node_name (str) – Name of the ONNX model’s input layer
input_port_name (str) –
input_len (int) – Length of the ONNX model’s input layer, determined when the model was created
fp16_mode (bool) – Try to use reduced precision (float16) layers if performace would improve
max_workspace_size (int) – Maximum scratch memory that the TensorRT optimizer may use, defaults to 1GB. The default value can be used in most situations and may only need to be reduced if using very low-end GPU hardware
max_batch_size (int) – The maximum batch size to optimize for. When running inference using the optimized model, the chosen batch size must be less than the maximum specified here
verbose (bool) – Print extra information about the optimized model
- Return type
pathlib.Path
-
airpack.deploy.trt.
uff2plan
(uff_file, input_node_name='input/IteratorGetNext', input_len=4096, fp16_mode=True, max_workspace_size=1073741824, max_batch_size=128, verbose=False)¶ Optimize the provided UFF (TensorFlow 1.x) model using TensorRT and save the result.
The optimized model will have a
.plan
extension and be saved in the same folder as the input model.- Parameters
uff_file (Union[str, os.PathLike]) – Filename of the UFF model to optimize
input_node_name (str) – Name of the UFF model’s input layer
input_len (int) – Length of the UFF model’s input layer, determined when the model was created
fp16_mode (bool) – Try to use reduced precision (float16) layers if performace would improve
max_workspace_size (int) – Maximum scratch memory that the TensorRT optimizer may use, defaults to 1GB. The default value can be used in most situations and may only need to be reduced if using very low-end GPU hardware
max_batch_size (int) – The maximum batch size to optimize for. When running inference using the optimized model, the chosen batch size must be less than the maximum specified here
verbose (bool) – Print extra information about the optimized model
-
airpack.deploy.trt.
plan_bench
(plan_file_name, cplx_samples, batch_size=128, num_inferences=100, input_dtype=np.float32)¶ Benchmarks a model that has been pre-optimized using the TensorRT framework.
This function uses settings for the CUDA context and memory buffers that are optimized for NVIDIA Jetson modules and may not be optimal for desktops.
Note
When selecting a
batch_size
to benchmark, the selected size must be less than or equal to themax_batch_size
value that was specified when creating the.plan
file. Additionally, to maximise performance, power-of-two values forbatch_size
are recommended.Note
To accurately benchmark the result of TensorRT optimization, this benchmark should be run on the same computer that generated the .plan file.
- Parameters
plan_file_name (Union[str, os.PathLike]) – TensorRT optimized model file (
.plan
format)cplx_samples (int) – Input length of the neural network, in complex samples; this is half of the
input_length
of the neural network, which operates on real valuesbatch_size (int) – How many sets of
cplx_samples
inputs are batched together in a single inference callnum_inferences (Optional[int]) – Number of iterations to execute inference between measurements of inference throughput (if None, then run forever)
input_dtype (numpy.number) – Data type of a single value (a single I or Q value, not a complete complex (I, Q) sample): use one of
numpy.int16
ornumpy.float32
here
- Return type
None
-
class
airpack.deploy.trt.
TrtInferFromPlan
(plan_file, batch_size, input_buffer, verbose=True)¶ Wrapper class for TensorRT inference using a pre-optimized
.plan
file.Since it is expensive to create these inference objects, they should be created once at the start of your program and then re-used for multiple inference calls.
The buffer containing data for inference is provided when creating this inference object and will be re-used for each inference. It’s designed to be used by repeatedly copying data from the radio into this buffer and then calling the
feed_forward()
method to run inference.After calling
feed_forward()
, the inference results will be available as theMappedBuffer
objectoutput_buff
.Note
Only device-mapped memory buffers are supported.
- Parameters
plan_file (path_like) – TensorRT
.plan
file containing the optimized modelbatch_size (int) – Batch size for a single inference execution
input_buffer (MappedBuffer) – Buffer containing data for inference of size
input_length x batch_size
, whereinput_length
is the length of input to the model, determined when the neural network was createdverbose (bool) – Print verbose information about the loaded network, defaults to True
- Variables
input_buff (MappedBuffer) – Input buffer for use in inference
output_buff (MappedBuffer) – Output buffer containing inference results
-
feed_forward
(self)¶ Forward propagates the input buffer through the neural network to run inference.
Call this method each time samples from the radio are read into
input_buff
. Results will be available afterwards inoutput_buff
.- Return type
None