airpack.deploy.trt_utils
¶
Utility code for performing inference using the TensorRT framework with settings optimized for AIR-T hardware.
Module Contents¶
-
airpack.deploy.trt_utils.
make_cuda_context
(gpu_index=0)¶ Initializes a CUDA context for use with the selected GPU and makes it active.
This context is created with a set of flags that will allow us to use device mapped (pinned) memory that supports zero-copy operations on the AIR-T.
- Parameters
gpu_index (int) – Which GPU in the system to use, defaults to the first GPU (index 0)
- Return type
pycuda.driver.Context
-
class
airpack.deploy.trt_utils.
MappedBuffer
(num_elems, dtype)¶ A device-mapped memory buffer for sharing data between CPU and GPU.
Once created, the
host
field can be used to access the memory from CPU as anumpy.ndarray
, and thedevice
field can be used to access the memory from the GPU.Example usage:
# Create a buffer of 16 single-precision floats buffer = MappedBuffer(num_elems=16, dtype=numpy.float32) # Zero the buffer by writing to it on CPU buffer.host[:] = 0.0 # Pass the device pointer to an API that works with GPU buffers func_that_uses_gpu_buffer(buffer.device)
Note
Device-mapped memory is meant for Jetson embedded GPUs like the one found on the AIR-T, where both the host and device pointers refer to the same physical memory. Using this type of memory buffer on desktop GPUs will be very slow.
- Parameters
num_elems (int) – Number of elements in the created buffer
dtype (numpy.dtype) – Data type of an element (e.g.,
numpy.float32
ornumpy.int16
)
- Variables
host (numpy.ndarray) – Access to the buffer from the CPU
device (CUdeviceptr) – Access to the buffer from the GPU