One doc tagged with "llama4"

Llama 4 FM Serving Benchmark: GPU vs AWS Custom Silicon

Benchmark comparing performance and cost efficiency of GPU instances (p5, p4d, g6e) and AWS custom silicon (Trainium2, Inferentia2) for vLLM-based Llama 4 model serving