Denial of Service Vulnerability in vLLM Inference Engine by vLLM
CVE-2026-34756
6.5MEDIUM
What is CVE-2026-34756?
The vLLM inference engine, used for serving large language models, has a vulnerability that could be exploited by unauthenticated attackers. They can send a single HTTP request with a disproportionately large ‘n’ parameter in requests intended for ChatCompletionRequest and CompletionRequest. This flaw results in excessive memory allocation that overwhelms the Python asyncio event loop, leading to Out-Of-Memory errors and crashing the service. The issue has been addressed in version 0.19.0 and requires an immediate upgrade for all affected users.
Affected Version(s)
vllm >= 0.1.0, < 0.19.0
