Runtime Error in vLLM's Inference Engine Affects Large Language Models
CVE-2026-44223

6.5MEDIUM

Key Information:

Status
Vendor
CVE Published:
12 May 2026

What is CVE-2026-44223?

The vLLM inference and serving engine, used for large language models, experiences a critical issue in versions prior to 0.20.0. The specific flaw lies in the extract_hidden_states speculative decoding proposer, which returns a tensor with an incorrect shape after the initial decode step. This inconsistency triggers a RuntimeError that crashes the EngineCore process when any request in a batch employs sampling penalty parameters such as repetition_penalty, frequency_penalty, or presence_penalty. Notably, even a single request incorporating a sampling penalty (e.g., with "repetition_penalty": 1.1) is adequate to lead to a server crash. This vulnerability has been rectified in version 0.20.0.

Affected Version(s)

vllm >= 0.18.0, < 0.20.0

References

CVSS V3.1

Score:
6.5
Severity:
MEDIUM
Confidentiality:
None
Integrity:
None
Availability:
None
Attack Vector:
Network
Attack Complexity:
Low
Privileges Required:
Low
User Interaction:
None
Scope:
Unchanged

Timeline

  • Vulnerability published

  • Vulnerability Reserved

.