Skip to content

Torch-TensorRT AOT Backend Guide

The Torch-TensorRT AOT (Ahead-Of-Time) backend compiles models using torch_tensorrt.compile() and saves the compiled model for later use. This approach is ideal for production deployments where compilation happens once during tuning.

Overview

  • AOT Compilation: Compiles during tuning, not at runtime
  • Model Persistence: Compiled model is saved and loaded
  • Fast Startup: No compilation overhead at inference time
  • Production Ready: Deterministic performance
  • Multiple IR Support: dynamo, torchscript, or fx

Quick Start

from aitune.torch.backend import TorchTensorRTAotBackend, TorchTensorRTAotBackendConfig
from torch_tensorrt.dynamo import CompilationSettings
import aitune.torch as ait

# Configure backend
config = TorchTensorRTAotBackendConfig(
    ir="dynamo",
    compile_config=CompilationSettings(enabled_precisions={torch.float16}),
)
backend = TorchTensorRTAotBackend(config)

# Use in tuning
from aitune.torch.tune_strategy import OneBackendStrategy
strategy = OneBackendStrategy(backend=backend)

model = ait.Module(model, "my-model", strategy=strategy)
ait.tune(model, input_data)

# Save compiled model
ait.save(model, "model.ait")

# Later: Load and use
ait.load(model, "model.ait")

Configuration Options

TorchTensorRTAotBackendConfig

@dataclass
class TorchTensorRTAotBackendConfig(BackendConfig):
    ir: IRType = "dynamo"
    compile_config: TorchTensorRTConfig
    pickle_protocol: int = DEFAULT_PICKLE_PROTOCOL

ir

Intermediate representation to use:

# Dynamo (recommended)
config = TorchTensorRTAotBackendConfig(ir="dynamo")

# TorchScript
config = TorchTensorRTAotBackendConfig(ir="ts")

# FX
config = TorchTensorRTAotBackendConfig(ir="fx")

Options:

  • "dynamo" (default): Modern, best compatibility
  • "ts": TorchScript, legacy models
  • "fx": FX graph, experimental

compile_config

Compilation settings:

from torch_tensorrt.dynamo import CompilationSettings

config = TorchTensorRTAotBackendConfig(
    compile_config=CompilationSettings(
        enabled_precisions={torch.float16},
        workspace_size=1 << 30,
    )
)

pickle_protocol

Protocol for saving compiled model:

config = TorchTensorRTAotBackendConfig(
    pickle_protocol=4,  # Default
)

AOT vs JIT Comparison

For a detailed explanation of JIT vs AOT backends, see the JIT vs AOT Torch-TensorRT section.

Best Practices

  1. Use Dynamo IR: Most compatible with modern PyTorch
  2. Calibration Data: Use representative data during tuning
  3. Verify After Load: Test loaded model before deployment
  4. Version Control: Track both source code and .ait files
  5. GPU Compatibility: Compile on the same or a compatible GPU as deployment

Troubleshooting

Issue: Load fails on different GPU

Cause: Engine compiled for different GPU architecture.

Solution: Recompile on target GPU or use hardware compatibility level in TensorRT backend.

Next Steps