Module 22 of 22 · Production Inference · Advanced

Resources & References

Duration: 2 min

Curated resources to deepen your understanding and continue learning beyond this course.

📚 Resources & References

vLLM — High-throughput LLM serving
TensorRT — NVIDIA inference optimization
Triton Inference Server — Model serving
ONNX Runtime — Cross-platform inference

← Previous Continue interactively →

Related Courses

⚙️ MLOps & Model Deployment
Advanced · 2 hr 10 min 🏗️ Local LLM Architecture
Advanced · 2 hr 10 min 🔧 Quantization Engineering
Advanced · 2 hr 10 min 🎛️ LLM Fine-Tuning — LoRA, QLoRA, PEFT, Instruction Tuning, RLHF, DPO, Evaluation
Advanced · 1 hr 55 min