May 18, 2026

What Four CVEs in Fourteen Days Tell Us About AI Infrastructure Risk

Between April 28 and May 12, four production-critical AI infrastructure CVEs landed in close succession. The pattern across all four says something specific about where AI infrastructure is now exposed.

The four CVEs

LiteLLM (CVE-2026-42208) carried a pre-authentication SQL injection in the open-source gateway that fronts most major model providers. The headline is “SQLi in a gateway,” but the operational impact is “SQLi in the thing that sits in front of every model call.” If your environment centralizes routing, auth, logging, rate limits, cost controls, or key management in the gateway, then one bug is not “one service,” it is the choke point for the entire LLM layer.

SGLang (CVE-2026-5760) showed how a serialized model file becomes an arbitrary Python program at load time. That is the old lesson of insecure deserialization, but the modern blast radius is bigger: model artifacts move through registries, object storage, CI build steps, and autoscaling pipelines. “Untrusted file” is not a rare edge case in ML ops. It is routine.

An AI agent framework (CVE-2026-25592) demonstrated that prompt injection can chain into host-level code execution. The key shift is that the “input” is now instructions. If the agent is allowed to call tools that can touch the filesystem, credentials, internal APIs, or shell commands, the prompt becomes a policy bypass unless there is a control plane that can enforce what actions are allowed and what evidence proves the enforcement.

The Lovable BOLA writeup, published the same week, traced how five API calls and 48 days of vibe-coded ownership checks compound into broken authorization. BOLA is not a single missing if statement. It is a graph problem across endpoints, resources, and identities. With AI-assisted coding, the rate of API surface growth goes up, while consistency of authorization logic often goes down.

What they have in common

Four different shapes, one pattern: the AI infrastructure layer inherited classic web vulnerability classes (SQL injection, deserialization, BOLA) and added a new one (prompt-injection-as-RCE). The common failure mode is not “AI is insecure.” It is that the AI stack stitches together many trust boundaries (gateway, artifact pipeline, agent tools, internal APIs), and most teams still operate them as separate systems with separate owners, logs, and remediation loops.

What fragmented stacks miss

These CVEs force security teams to change how they work.

Fragmented tooling consistently misses chains:

AppSec scanners that only look at the route handler miss the multi-call sequence in BOLA cases.
Runtime tools that only look at network exposure miss code-level deserialization in model load paths.
Compliance workflows that treat each finding as a row miss the dependency relationship that ties LiteLLM into every downstream model call.

The practical outcome is slow response and weak evidence. You might patch one component, but still not know:

Which workloads actually hit the vulnerable code path.
Which identities could have exercised it.
Which model artifacts were loaded and where they came from.
Whether the “prompt injection” risk is reduced, or just moved.

What four lenses on one ingest change

When you collapse these events into a single operating record, four lenses become additive instead of siloed:

AppSec: Understand the vulnerability class and the code-level fix, including the path and the prerequisite conditions.
SecOps: Correlate runtime exposure, including which deployments, environments, and identities can reach the vulnerable path.
Compliance: Map each CVE to the relevant control requirement and attach evidence that the fix is deployed and verified.
AI runtime control: Enforce tool-use policy so “prompt injection” cannot become “host command execution,” and log the policy decisions.

That is what an operating model does. It turns four reports into one narrative of what happened, who decided, what changed, and what evidence remains for an auditor (or an incident review) on Monday.

How Cantina handles it

Cantina is built around that operating model: one ingest, four views.

In practice, that means:

Pulling the CVE, code context, and remediation into one case.
Connecting it to runtime reality (where it is deployed, who can reach it, what is actually exposed).
Producing an auditable evidence trail without making engineers re-enter the same information in three systems.
Adding AI runtime controls for the places where “input becomes instruction,” and policy needs to be enforced continuously.

See Cantina on your stack

Book a 30-minute demo. We will pick one real case in your stack and walk through it end to end: code, runtime exposure, remediation, and the evidence trail that proves the fix landed.