Back to Blog

Where AI-Generated Code Quietly Fails: A Field Note on False Negatives

Where AI-Generated Code Quietly Fails: A Field Note on False Negatives

The interesting AI-code findings are the ones the scanner says are clean. Static analysis catches syntactic defects (missing input validation, hardcoded credentials, dangerous functions). It does not catch flow-level defects, where the code is locally correct but the workflow is broken. The Lovable BOLA (disclosed April 2026, open 48 days after a March 3 HackerOne report) is the canonical example: ownership checks existed on the primary path and were missing on the alternate paths, so any free account could read any user's source code, database credentials, and AI chat histories. Five API calls, eight million users, $6.6 billion valuation, every SAST scan returned clean. Cantina's AppSec lens traces the call graph end to end and surfaces the class of bug that pattern-matchers cannot see.

Key definitions

False-negative class: Vulnerabilities that pass static analysis because the code is locally correct (syntax, types, patterns) but the application's logical flow is broken across multiple functions.

BOLA (Broken Object Level Authorization): OWASP API Security Top 10 #1. An API that returns user-owned resources without verifying the requester owns the resource. The classic version: GET /api/projects/{id} returns the project regardless of who's asking.

Call-graph tracing: Walking every execution path from an entry point (route handler, API endpoint) through every function it touches, to verify properties like "does an ownership check fire on this path?" that no single-function scan can answer.

The opening claim

What is true, and always has been true: the interesting AI-code bugs are the ones the scanner says are clean.

The scanner is a good tool for what it does. Pattern-match for known-bad syntax. Flag the SQL string concatenation, the hardcoded API key, the unbounded loop. That layer has been solved for a decade and AI-generated code is mostly fine at it, because the LLM saw the same lint rules in training.

The class of defect that AI code is bad at is the layer above syntax: the relationships between functions. The ownership check that exists in one path and is missing in another. The permission boundary that holds for known callers and breaks for new ones. The authentication step that runs on the primary endpoint and gets skipped on the bulk variant.

Static analysis cannot tell you whether a user can access another user's data by changing an ID parameter. That sentence is from the SAST literature, and it is exactly the class of bug that just took down Lovable.

What does a false-negative AI bug actually look like?

Bugs in this class pass static analysis because the syntax is fine, the type signatures match, and the surface-level patterns look correct. The defect lives in the flow between functions, not inside any single function.

Three patterns show up over and over in this class:

  • Asymmetric ownership checks. The check exists on the primary route handler and is missing on the bulk endpoint, the export endpoint, the share endpoint, or the v2 variant.
  • Permission boundaries that hold for known callers and break for new ones. Permissions were specified per-caller during initial development, and a later route handler reused the data access layer without re-asserting the permission.
  • Authentication that runs on the front door and gets skipped on the side door. The middleware wraps the main router but a secondary router (often a generated one) bypasses it entirely.

Each of these is invisible to a function-by-function scan because each individual function passes review on its own.

The Lovable BOLA: the canonical example

The Lovable BOLA is the cleanest field example of this class published in 2026. The compound-effort pattern is brutal.

Lovable raised $330 million in Series B. Crossed $6.6 billion in valuation. Hit eight million users. Ran a HackerOne bug bounty program. They still left this bug open for 48 days after disclosure.

The technical vulnerability: a Broken Object Level Authorization (BOLA) flaw in Lovable's API, OWASP's #1 API Security Top 10 risk. Any free-account user could reach another user's profile, public projects, source code, database credentials, and full AI chat histories in as few as five API calls. The chat-history exposure compounded the damage, because developers paste error logs, business logic, and credentials into AI-coding chats and that history persisted alongside the code.

The bug lived in the relationships between route handlers that shared the same data access layer. Ownership checks existed on the primary path but were missing on the alternate paths, every SAST scan returned clean, and the bug only surfaced when a security researcher walked the call graph by hand.

How did the Lovable disclosure timeline actually play out?

On March 3, 2026, a researcher filed the bug to Lovable's HackerOne bounty program under the title "Broken Object Level Authorization on Lovable API leads to unauthorized access to user data and project source code." Lovable marked it a duplicate of an existing report, flagged it "Informative," and closed it without escalation.

The vulnerability stayed open for 48 days. A public disclosure went up on April 20, 2026, after multiple researchers independently confirmed the issue. Lovable's first response on X claimed the company had "not suffered a data breach," then called the exposure "intentional behaviour," then said the documentation around what "public" projects implied was unclear, then said the HackerOne reports had been closed because the bug bounty partner thought viewing other users' chats was intended behavior.

The fix went out for new projects. Older projects (every project built before November 2025) stayed exposed for the full 48 days, with no notification to affected developers.

Aka the patch was a config flag on the new-project path, not a fix to the routing layer that produced the bug in the first place.

Bracket credentialing: employees at Nvidia, Microsoft, Uber, and Spotify reportedly had Lovable accounts tied to affected projects, so this was not a theoretical exposure on toy code.

Why does this class slip past every modern SAST?

Because the question the bug requires is not answerable from looking at any single function.

Static analysis looks for syntactic markers and inter-procedural patterns up to a few functions of depth. The Lovable bug was a graph property: "for every route handler that hits the projects table, does an ownership check fire?" That question requires walking the call graph end to end on every path that touches the resource, not pattern-matching the body of any single handler.

The empirical numbers back this up. Studies put static analysis at missing somewhere between 47% and 80% of vulnerabilities under test conditions, and authorization gaps are the primary class of miss, because the logic looks correct (the code runs, the types match, the linter is happy) but the workflow is wrong. Inter-procedural vulnerabilities typically span an average of three functions in a chain, and most SAST tools degrade sharply past two hops.

How widespread is the false-negative class in AI-generated code?

Across 2026 studies, the figure converges on a wide but consistent band: between 40% and 62% of AI-generated code samples contain at least one vulnerability. The Cloud Security Alliance's research came in at 62%. Earlier work on Copilot from NYU landed at roughly 40%. Other 2026 studies cluster in the 45% to 48% range. One 2026 industry report puts the figure at 92% for "at least one critical vulnerability," which is the outlier and worth flagging as such.

The number that matters is not the average across the studies. It is that this bug class is structurally invisible to the tools most teams run as their security backstop, while the class is being generated at machine speed by tools that ship 22% of the merged code in the average organization.

What changes when you trace the call graph instead of the line?

Walk every path from a route handler to the data access layer. Verify that ownership checks fire on each path. The Lovable BOLA pattern becomes visible the moment you do.

This is not a new idea. It is the standard for serious AppSec review. What is new is that AI code generation has made it the only review that catches the dominant new class of bug.

Cantina (AppSec plus OpSec plus everything in between) runs this lens by default. The AppSec lens traces the graph, not the line. It validates that ownership checks run on every path to the data layer, that auth steps cannot be bypassed by an alternate route, and that permission boundaries hold across the full surface of the service. The OpSec lens pairs each finding with a runtime correlation (is the path actually reachable in production?), and the compliance lens captures the audit trail of who could read what, without anyone copying screenshots into a spreadsheet.

The finding closes properly. It does not become a permanent open ticket.

What should I do about this in my own codebase this quarter?

If you are shipping AI-generated code at any volume, the move is not to tune your existing SAST rules harder. It is to ask, for each of your route handlers, whether the security review answers a graph-level question:

  • Does every path from this handler to a data table assert ownership?
  • Does the alternate path (the bulk endpoint, the export endpoint, the v2 variant) carry the same checks as the primary?
  • Does the middleware that runs on the front door also run on the side doors that route around it?

If your security review is "the SAST returned clean," the answer to those questions is unknown, and the bug class that took down Lovable is sitting in your codebase right now, waiting for the same kind of hand-walk that surfaced it there.

See the call graph on your stack

Book a 30-minute platform tour at cantina.security. The tour walks the call graph with your engineering lead on a real handler from your repo, including the ownership-check coverage map across every path. Every flow-level finding your current SAST returned clean on, surfaced live.

FAQ

What is a false-negative in static analysis?

A false-negative is a vulnerability that the scanner did not flag, even though the bug is real and exploitable. The opposite of a false-positive. Empirical studies put SAST tools at missing between 47% and 80% of vulnerabilities under test conditions, with authorization and business-logic flaws as the dominant miss category.

Why did the Lovable BOLA pass every SAST scan?

The bug was a property of the call graph, not any single function. Ownership checks existed on some paths and were missing on others, and most SAST tools scan function-by-function or with limited inter-procedural depth, so the asymmetry across paths was invisible at the scanner level.

How long was the Lovable vulnerability open?

Forty-eight days. A researcher reported it to Lovable's HackerOne program on March 3, 2026. The report was marked as a duplicate, flagged "Informative," and closed without escalation. Public disclosure went up on April 20, 2026.

What was actually exposed in the Lovable BOLA?

Source code, database credentials, and AI chat histories tied to every project created before November 2025. Any free-account user could reach this data in as few as five API calls.

What percentage of AI-generated code contains vulnerabilities?

Across major 2026 studies, the figure lands between 40% and 62%. The Cloud Security Alliance's research found 62%. NYU's earlier Copilot work found ~40%. Other 2026 studies cluster in the 45-48% range. One industry report puts the figure at 92% for "at least one critical vulnerability," which is the outlier.

How does Cantina catch flow-level defects that SAST misses?

By tracing the full call graph from a route handler to the data layer and verifying graph-level properties (ownership checks, auth steps, permission boundaries) on every path. The finding is then paired with a runtime exposure check and a compliance audit trail in the same flow.