Changelog
0.3.0
- feat: JIT tuning requires single sample only - tune on first model call
- feat: JIT tuning default fallback to Torch Inductor backend
- feat: allow to override tuning device in JIT tuning, when set to none, use the module device
- feat: add support for multi-profile engine with auto generated and user provided profiles in TensorRT backend
- fix: handling dynamic shapes in TorchTensorRT AoT backend
- fix: creating calibration data for ModelOpt ONNX PTQ in TensorRT backend
- misc: added documentation
0.2.0
- feat: introduce Just-in-Time (JIT) tuning: no-code model tuning controlled through import or environment flag
- feat: introduce Just-in-Time (JIT) inspect: no-code model analysis controlled through import or environment flag
- feat: module inspect considers lists and dicts for Torch module containers
- feat: add support for forward hooks for AOT and JIT tuning
- feat: add support for CUDA graphs for TensorRT backend
- feat: changing default ONNX export path to dynamo (torch.onnx.export(dynamo=True))
- feat: add ONNX AutoCast in TensorRT Backend for mixed precision through TensorRT ModelOpt
- feat: extend collecting profiling metrics through nvtx annotations
- feat: suppress console output during tuning and save logs to file - controlled through verbose flag
- feat: add support for dataclasses and user custom object in module.forward arguments
- feat: add support for kv cache for LLMs
- feat: add support for Static/Dynamic HuggingFace for TorchInductor backend
- feat: optimize handling input/output metadata
- feat: reduce CPU/GPU memory usage during tuning offloading modules to meta
- fix: prevent cache dir override when there are two similar modules in JIT tuning
- fix: dynamic shapes configuration for ONNX Dynamo export path in TensorRT Backend
- fix: bfloat16 support in TensorRT Backend
- fix: profiling of models without batching supported
- misc: extends examples and improved dependencies
0.1.0
- feat: add AITune features scoped for the first release
- feat: introduce Ahead-of-Time tuning for low-code model inspection and tuning