TensorRT Backend API
TensorRTBackend
aitune.torch.backend.tensorrt.TensorRTBackend
Bases: Backend, TensorRTRunner
TensorRT backend for model acceleration.
This class provides functionality to build and run TensorRT engines from PyTorch models. It handles the process of exporting models to ONNX and then converting them to TensorRT engines for optimized inference.
Initialize the TensorRT backend.
Parameters:
-
config(TensorRTBackendConfig | None, default:None) –Configuration for TensorRT backend
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
device
property
Get the device of the backend.
Returns:
-
device–The device the module is using.
activate
Activates backend.
After activating, the backend should be ready to do inference.
Source code in aitune/torch/backend/backend.py
build
Build the model with the given arguments.
Building a backend should be idempotent i.e. do not cause side effects. A model is not necessarily pure functional and can have an internal state (like kv cache for LLMs). That is why build can call a sample of inputs at most once so that subsequent calls have exact same state as the first call for the given sample.
After building, the backend should be activated.
Source code in aitune/torch/backend/backend.py
deactivate
Deactivates backend.
After deactivating, the backend cannot be used to do inference.
Source code in aitune/torch/backend/backend.py
deploy
Deploys the backend.
After deploying, the backend is ready to do inference. Backend cannot be deactivated anymore.
Parameters:
-
device(device | None) –The device to deploy the backend on.
Source code in aitune/torch/backend/backend.py
describe
from_dict
classmethod
Creates a backend from a state_dict.
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
get_profiles
Create profiles from samples or from graph_spec.
If self._config.profiles is a list, return the user provided profiles. If self._config.profiles is ProfileMode.SINGLE, create a single profile from the graph spec. If self._config.profiles is ProfileMode.SAMPLES_USED, create profiles from shapes seen in samples.
Parameters:
-
graph_spec(GraphSpec) –Input graph spec
-
data(list[Sample]) –List of samples
Returns: List of The Polygraphy Profile objects
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
infer
Run inference with the given arguments.
Parameters:
-
args(Any, default:()) –Variable length argument list.
-
kwargs(Any, default:{}) –Arbitrary keyword arguments.
Returns:
-
Any(Any) –The result of the inference.
Source code in aitune/torch/backend/backend.py
key
to_dict
Returns the state_dict of the backend.
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
TensorRTBackendConfig
aitune.torch.backend.tensorrt.TensorRTBackendConfig
dataclass
TensorRTBackendConfig(use_dynamo=True, workspace_size=None, opset_version=None, optimization_level=None, compatibility_level=None, timing_cache=None, profiles=SINGLE, device='cuda', quantization_config=None, enable_tf32=True, use_cuda_graphs=False)
Bases: BackendConfig
Configuration for TensorRT backend.
Attributes:
-
use_dynamo(bool) –Whether to use torch.dynamo for export.
-
workspace_size(int | None) –The workspace size for the TensorRT engine.
-
opset_version(int | None) –The ONNX opset version to use for export.
-
optimization_level(int | None) –The optimization level for the TensorRT engine.
-
compatibility_level(int | None) –The compatibility level for the TensorRT engine.
-
timing_cache(Path | None) –The path to the timing cache for the TensorRT engine.
-
profiles(ProfileMode | list[TensorRTProfile]) –How TensorRT optimization profiles are generated. - SINGLE: auto-generate a single profile from the graph spec (default). - SAMPLES_USED: auto-generate multiple profiles from shapes of samples used for tuning. - list[TensorRTProfile]: use user-provided profiles directly.
-
device(str) –The device to use for the TensorRT engine.
-
quantization_config(ONNXAutoCastConfig | ONNXQuantizationConfig | TorchQuantizationConfig | None) –The quantization configuration for the TensorRT engine.
-
enable_tf32(bool) –Whether to enable TF32 hardware acceleration.
-
use_cuda_graphs(bool) –Whether to use CUDA graphs for the TensorRT engine.
describe
Describe the backend configuration. Display only changed fields.
from_dict
classmethod
Convert dict to TensorRTBackendConfig.
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
key
Returns the keys of the backend configuration.
profiles_from_dict
classmethod
Convert dict to list of TensorRTProfile.
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
to_dict
Convert TensorRTBackendConfig to dictionary.
Source code in aitune/torch/backend/tensorrt/tensorrt_backend.py
to_json
Saves the backend configuration to a file.
TensorRTProfile
aitune.torch.backend.tensorrt.TensorRTProfile
Class for representing a TensorRT optimization profile.
This class provides an interface for defining optimization profiles for TensorRT engines with dynamic shapes.
Initialize a TensorRT optimization profile.
Source code in aitune/torch/backend/tensorrt/tensorrt_profile.py
profile
property
Get the underlying Polygraphy Profile.
Returns:
-
Profile–The Polygraphy Profile object
__eq__
__hash__
Hash the TensorRTProfile.
__repr__
Return the official string representation of the profile.
Returns:
-
str–Official string representation
__str__
Return string representation of the profile.
Returns:
-
str–String representation
add_input_shape
Add a shape binding to the profile.
Parameters:
-
name(str) –The name of the input tensor
-
min_shape(tuple[int, ...]) –The minimum shape the profile will support
-
opt_shape(tuple[int, ...]) –The shape for which TensorRT will tune the engine
-
max_shape(tuple[int, ...]) –The maximum shape the profile will support
Returns:
-
TensorRTProfile–The profile object for chaining
Source code in aitune/torch/backend/tensorrt/tensorrt_profile.py
from_dict
classmethod
Create TensorRTProfile from dictionary.
Source code in aitune/torch/backend/tensorrt/tensorrt_profile.py
profile_to_dict
classmethod
Convert Polygraphy Profile to dictionary.
ProfileMode
aitune.torch.backend.tensorrt.ProfileMode
Bases: Enum
Mode how TRT optimization profiles will be generated for TensorRT engine.
Attributes:
-
SINGLE–auto-generate single profile from graph spec, default mode.
-
SAMPLES_USED–auto-generated multiple profiles from shapes of samples used for tuning.