2 docs tagged with "paged-attention"

KV Cache Optimization (vLLM Deep Dive + Cache-Aware Routing)

Summary of core technologies like vLLM PagedAttention, Continuous Batching, FP8 KV Cache, and comparison of llm-d/NVIDIA Dynamo KV Cache-Aware Routing and Gateway configuration

vLLM Model Serving

vLLM PagedAttention, parallelization strategies, Multi-LoRA, and hardware support architecture