Torch-TensorRT AOT Backend API
TorchTensorRTAotBackend
aitune.torch.backend.TorchTensorRTAotBackend
Bases: Backend
Backend that compiles model using TensorRT.
Initialize TensorRT backend.
Parameters:
-
config(TorchTensorRTAotBackendConfig | None, default:None) –Configuration for TensorRT compilation
Source code in aitune/torch/backend/torch_tensorrt_aot_backend.py
device
property
Get the device of the backend.
Returns:
-
device–The device the module is using.
activate
Activates backend.
After activating, the backend should be ready to do inference.
Source code in aitune/torch/backend/backend.py
build
Build the model with the given arguments.
Building a backend should be idempotent i.e. do not cause side effects. A model is not necessarily pure functional and can have an internal state (like kv cache for LLMs). That is why build can call a sample of inputs at most once so that subsequent calls have exact same state as the first call for the given sample.
After building, the backend should be activated.
Source code in aitune/torch/backend/backend.py
deactivate
Deactivates backend.
After deactivating, the backend cannot be used to do inference.
Source code in aitune/torch/backend/backend.py
deploy
Deploys the backend.
After deploying, the backend is ready to do inference. Backend cannot be deactivated anymore.
Parameters:
-
device(device | None) –The device to deploy the backend on.
Source code in aitune/torch/backend/backend.py
describe
from_dict
classmethod
Creates a backend from a state_dict.
Source code in aitune/torch/backend/torch_tensorrt_aot_backend.py
infer
Run inference with the given arguments.
Parameters:
-
args(Any, default:()) –Variable length argument list.
-
kwargs(Any, default:{}) –Arbitrary keyword arguments.
Returns:
-
Any(Any) –The result of the inference.
Source code in aitune/torch/backend/backend.py
key
to_dict
Returns the state_dict of the backend.
Source code in aitune/torch/backend/torch_tensorrt_aot_backend.py
TorchTensorRTAotBackendConfig
aitune.torch.backend.TorchTensorRTAotBackendConfig
dataclass
TorchTensorRTAotBackendConfig(ir='dynamo', compile_config=(lambda: TorchTensorRTConfig(enabled_precisions={float16}))(), pickle_protocol=DEFAULT_PICKLE_PROTOCOL)
Bases: BackendConfig
Configuration for TorchTensorRTAotBackend.
See torch_tensorrt/dynamo/_settings.py CompilationSettings for compile_config(TorchTensorRTConfig)
describe
Describe the backend configuration. Display only changed fields.
Source code in aitune/torch/backend/torch_tensorrt_aot_backend.py
key
Returns the keys of the backend configuration.
to_dict
Saves the backend configuration to a file.
to_json
Saves the backend configuration to a file.