Torch-TensorRT AOT Backend Guide
The Torch-TensorRT AOT (Ahead-Of-Time) backend compiles models using torch_tensorrt.compile() and saves the compiled model for later use. This approach is ideal for production deployments where compilation happens once during tuning.
Overview
- AOT Compilation: Compiles during tuning, not at runtime
- Model Persistence: Compiled model is saved and loaded
- Fast Startup: No compilation overhead at inference time
- Production Ready: Deterministic performance
- Multiple IR Support: dynamo, torchscript, or fx
Quick Start
from aitune.torch.backend import TorchTensorRTAotBackend, TorchTensorRTAotBackendConfig
from torch_tensorrt.dynamo import CompilationSettings
import aitune.torch as ait
# Configure backend
config = TorchTensorRTAotBackendConfig(
ir="dynamo",
compile_config=CompilationSettings(enabled_precisions={torch.float16}),
)
backend = TorchTensorRTAotBackend(config)
# Use in tuning
from aitune.torch.tune_strategy import OneBackendStrategy
strategy = OneBackendStrategy(backend=backend)
model = ait.Module(model, "my-model", strategy=strategy)
ait.tune(model, input_data)
# Save compiled model
ait.save(model, "model.ait")
# Later: Load and use
ait.load(model, "model.ait")
Configuration Options
TorchTensorRTAotBackendConfig
@dataclass
class TorchTensorRTAotBackendConfig(BackendConfig):
ir: IRType = "dynamo"
compile_config: TorchTensorRTConfig
pickle_protocol: int = DEFAULT_PICKLE_PROTOCOL
ir
Intermediate representation to use:
# Dynamo (recommended)
config = TorchTensorRTAotBackendConfig(ir="dynamo")
# TorchScript
config = TorchTensorRTAotBackendConfig(ir="ts")
# FX
config = TorchTensorRTAotBackendConfig(ir="fx")
Options:
"dynamo"(default): Modern, best compatibility"ts": TorchScript, legacy models"fx": FX graph, experimental
compile_config
Compilation settings:
from torch_tensorrt.dynamo import CompilationSettings
config = TorchTensorRTAotBackendConfig(
compile_config=CompilationSettings(
enabled_precisions={torch.float16},
workspace_size=1 << 30,
)
)
pickle_protocol
Protocol for saving compiled model:
AOT vs JIT Comparison
For a detailed explanation of JIT vs AOT backends, see the JIT vs AOT Torch-TensorRT section.
Best Practices
- Use Dynamo IR: Most compatible with modern PyTorch
- Calibration Data: Use representative data during tuning
- Verify After Load: Test loaded model before deployment
- Version Control: Track both source code and .ait files
- GPU Compatibility: Compile on the same or a compatible GPU as deployment
Troubleshooting
Issue: Load fails on different GPU
Cause: Engine compiled for different GPU architecture.
Solution: Recompile on target GPU or use hardware compatibility level in TensorRT backend.
Next Steps
- Learn about TorchTensorRT JIT Backend
- Explore Deployment Guide
- Review TensorRT Backend for more options