Torch Inductor Backend API
TorchInductorBackend
aitune.torch.backend.TorchInductorBackend
Bases: Backend
Backend that does torch compilation with Inductor.
Initializes backend.
Parameters:
-
config(TorchInductorBackendConfig | None, default:None) –Configuration for torch compile with inductor backend
Source code in aitune/torch/backend/torch_inductor_backend.py
device
property
Get the device of the backend.
Returns:
-
device–The device the module is using.
activate
Activates backend.
After activating, the backend should be ready to do inference.
Source code in aitune/torch/backend/backend.py
build
Build the model with the given arguments.
Building a backend should be idempotent i.e. do not cause side effects. A model is not necessarily pure functional and can have an internal state (like kv cache for LLMs). That is why build can call a sample of inputs at most once so that subsequent calls have exact same state as the first call for the given sample.
After building, the backend should be activated.
Source code in aitune/torch/backend/backend.py
deactivate
Deactivates backend.
After deactivating, the backend cannot be used to do inference.
Source code in aitune/torch/backend/backend.py
deploy
Deploys the backend.
After deploying, the backend is ready to do inference. Backend cannot be deactivated anymore.
Parameters:
-
device(device | None) –The device to deploy the backend on.
Source code in aitune/torch/backend/backend.py
describe
from_dict
classmethod
Creates a backend from a state_dict.
Source code in aitune/torch/backend/torch_inductor_backend.py
infer
Run inference with the given arguments.
Parameters:
-
args(Any, default:()) –Variable length argument list.
-
kwargs(Any, default:{}) –Arbitrary keyword arguments.
Returns:
-
Any(Any) –The result of the inference.
Source code in aitune/torch/backend/backend.py
key
to_dict
Returns the state_dict of the backend.
Source code in aitune/torch/backend/torch_inductor_backend.py
TorchInductorBackendConfig
aitune.torch.backend.TorchInductorBackendConfig
dataclass
TorchInductorBackendConfig(fullgraph=False, dynamic=None, mode=None, options=None, autocast_enabled=False, autocast_dtype=None)
Bases: BackendConfig
Configuration for torch.compile with inductor backend.
Parameters:
-
fullgraph(bool, default:False) –If False (default), torch.compile attempts to discover compileable regions in the function it will tune. If True, then we require the entire function to be captured into a single graph. If this is not possible (that is, if there are graph breaks), then this will raise an error.
-
dynamic(bool or None, default:None) –Use dynamic shape tracing. When this is True, we will up-front attempt to generate a kernel that is as dynamic as possible to avoid recompilations when sizes change. This may not always work as some operations/optimizations will force specialization; use TORCH_LOGS=dynamic to debug overspecialization. When this is False, we will NEVER generate dynamic kernels, we will always specialize. By default (None), we automatically detect if dynamism has occurred and compile a more dynamic kernel upon recompile.
-
mode(str, default:None) –Can be either "default", "reduce-overhead", "max-autotune" or "max-autotune-no-cudagraphs"
-
"default" is the default mode, which is a good balance between performance and overhead
-
"reduce-overhead" is a mode that reduces the overhead of python with CUDA graphs, useful for small batches. Reduction of overhead can come at the cost of more memory usage, as we will cache the workspace memory required for the invocation so that we do not have to reallocate it on subsequent runs. Reduction of overhead is not guaranteed to work; today, we only reduce overhead for CUDA only graphs which do not mutate inputs. There are other circumstances where CUDA graphs are not applicable; use TORCH_LOG=perf_hints to debug.
-
"max-autotune" is a mode that leverages Triton or template based matrix multiplications on supported devices and Triton based convolutions on GPU. It enables CUDA graphs by default on GPU.
-
"max-autotune-no-cudagraphs" is a mode similar to "max-autotune" but without CUDA graphs
-
To see the exact configs that each mode sets you can call
torch._inductor.list_mode_options()
-
-
options(dict, default:None) –A dictionary of options to pass to the backend. - To see the full list of configs that it supports by calling
torch._inductor.list_options() -
autocast_enabled(bool, default:False) –If True, enable autocast.
-
autocast_dtype(dtype, default:None) –The dtype to use for autocast.
__post_init__
Post init.
Source code in aitune/torch/backend/torch_inductor_backend.py
describe
Describe the backend configuration. Display only changed fields.
from_dict
classmethod
key
Returns the keys of the backend configuration.
to_dict
to_json
Saves the backend configuration to a file.