Awesome LLM Inference Acceleration

Contents

  • Quantization
  • Pruning
  • Distillation
  • Speculative Inference
    • [Arxiv 2023] SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
  • Inference System
  • References

Quantization

Pruning

Distillation

Speculative Inference

[Arxiv 2023] SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

  • Miao, Xupeng, et al. “SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.” arXiv preprint arXiv:2305.09781 (2023).
  • code: https://github.com/flexflow/FlexFlow/tree/inference
  • blog: [Arxiv 2023] SpecInfer:Accelerating LLM Serving with Speculative Inference + Token Tree Verification

Inference System

References

你可能感兴趣的:(模型部署,LLM)