Lightning Talk: Accelerated Inference in PyTorch 2.X with Torch...- George Stefanakis & Dheeraj Peri

Поделиться
HTML-код
  • Опубликовано: 15 сен 2024
  • Lightning Talk: Accelerated Inference in PyTorch 2.X with Torch-TensorRT - George Stefanakis & Dheeraj Peri, NVIDIA
    Torch-TensorRT accelerates the inference of deep learning models in PyTorch targeting NVIDIA GPUs. Torch-TensorRT now leverages Dynamo, the graph capture technology introduced in PyTorch 2.0, to offer a new and more pythonic user experience as well as to upgrade the existing compilation workflow. The new user experience includes Just-In-Time compilation and support for arbitrary Python code (like dynamic control flow, complex I/O, and external libraries) used within your model, while still accelerating performance. A single line of code provides easy and robust acceleration of your model with full flexibility to configure the compilation process without ever leaving PyTorch: torch.compile(model, backend=”tensorrt”) The existing API has also been revamped to use Dynamo export under the hood, providing you with the same Ahead-of-Time whole-graph acceleration with fallback for custom operators and dynamic shape support as in previous versions: torch_tensorrt.compile(model, inputs=example_inputs) We will present descriptions of both paths as well as features coming soon. All of our work is open source and available at github.com/pyt....

Комментарии • 3

  • @gandoreme
    @gandoreme 10 месяцев назад +2

    We typically do pytorch-->onnx-->tensorrt. Is there an advantage over this workflow (apart from doing once conversion instead of two)?

    • @patboy24
      @patboy24 2 месяца назад

      There is a possibility that some trained models from PyTorch is not fully compatible with TensorRT conversion. By using ONNX as an intermediary before converting to TensorRT, it reduces the possibility of an incompatible conversion.

  • @Gh0st_0723
    @Gh0st_0723 10 месяцев назад

    The problem is version compatibility with cuda/cudnn and onnx