Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
-
Updated
Aug 14, 2024 - Python
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
A high-performance deep learning model inference server based on TensorRT, supporting fast inference for Embedding, Reranker, and NLI models.
"A practical guide to NLP model optimization and serving with NVIDIA Triton." / "실습으로 익히는 고성능 NLP 모델 최적화 및 NVIDIA Triton 서버 배포."
PyTorch/Hugging Face batching utility that sorts variable-length text by difficulty, then dynamically increases batch size on easier samples using a pre-trained VRAM predictor to improve GPU utilization and throughput while reducing OOM risk with fallback handling.
LM Multi-Bin Dynamic Scheduler Simulator - Implementation combining Multi-Bin batching with SLA-constrained dynamic batching
Add a description, image, and links to the dynamic-batching topic page so that developers can more easily learn about it.
To associate your repository with the dynamic-batching topic, visit your repo's landing page and select "manage topics."