Skip to content

Changelog

0.3.0

  • feat: JIT tuning requires single sample only - tune on first model call
  • feat: JIT tuning default fallback to Torch Inductor backend
  • feat: allow to override tuning device in JIT tuning, when set to none, use the module device
  • feat: add support for multi-profile engine with auto generated and user provided profiles in TensorRT backend
  • fix: handling dynamic shapes in TorchTensorRT AoT backend
  • fix: creating calibration data for ModelOpt ONNX PTQ in TensorRT backend
  • misc: added documentation

0.2.0

  • feat: introduce Just-in-Time (JIT) tuning: no-code model tuning controlled through import or environment flag
  • feat: introduce Just-in-Time (JIT) inspect: no-code model analysis controlled through import or environment flag
  • feat: module inspect considers lists and dicts for Torch module containers
  • feat: add support for forward hooks for AOT and JIT tuning
  • feat: add support for CUDA graphs for TensorRT backend
  • feat: changing default ONNX export path to dynamo (torch.onnx.export(dynamo=True))
  • feat: add ONNX AutoCast in TensorRT Backend for mixed precision through TensorRT ModelOpt
  • feat: extend collecting profiling metrics through nvtx annotations
  • feat: suppress console output during tuning and save logs to file - controlled through verbose flag
  • feat: add support for dataclasses and user custom object in module.forward arguments
  • feat: add support for kv cache for LLMs
  • feat: add support for Static/Dynamic HuggingFace for TorchInductor backend
  • feat: optimize handling input/output metadata
  • feat: reduce CPU/GPU memory usage during tuning offloading modules to meta
  • fix: prevent cache dir override when there are two similar modules in JIT tuning
  • fix: dynamic shapes configuration for ONNX Dynamo export path in TensorRT Backend
  • fix: bfloat16 support in TensorRT Backend
  • fix: profiling of models without batching supported
  • misc: extends examples and improved dependencies

0.1.0

  • feat: add AITune features scoped for the first release
  • feat: introduce Ahead-of-Time tuning for low-code model inspection and tuning