Back to Blog

Apex Found a 13-Year-Old Bug in WebKit. Apple Patched It Yesterday.

Apex Found a 13-Year-Old Bug in WebKit. Apple Patched It Yesterday.

Identified by Cantina's autonomous AppSec Agent, Apex.

Thirteen years.

That is how long a memory safety vulnerability sat inside WebKit, in code reviewed continuously by the best security researchers on the planet, before Apex surfaced it. Apple shipped the fix yesterday in iOS 26.5 and iPadOS 26.5, alongside two more Apex findings in the same advisory.

Key facts

  • What happened: Apex, Cantina's autonomous AppSec agent, identified a vulnerability in WebKit that had been resident in the codebase for thirteen years. Apple patched it in iOS 26.5 and iPadOS 26.5 on May 12, 2026.
  • Why it matters: WebKit is one of the most-audited codebases on the planet. A bug that survived thirteen years of scrutiny by Project Zero, dedicated browser-security teams, academic research, and Apple's own internal review, and that was then surfaced by an autonomous agent, is what AI-driven application security looks like when it stops promising and starts shipping.
  • The CVE numbers: CVE-2026-43660 (the thirteen-year-old memory safety issue), CVE-2026-28907 (a Content Security Policy bypass), and CVE-2026-28958 (a second, independent CSP bypass). All three credited to Cantina in the same Apple advisory.
  • What to do: Update every iPhone, iPad, and managed iOS device in your fleet to iOS 26.5 / iPadOS 26.5. Audit any application your team ships that embeds a WebKit-based webview. Assume your own production codebase contains older, deeper bugs than your humans have surfaced.
  • What Apex is: Cantina’s autonomous AppSec agent that reads code the way a senior security engineer reads code, on a budget no security engineer has. It generates security hypotheses about a codebase, investigates them across the architecture, falsifies the ones that fail, and reports the ones that survive.

What is WebKit?

WebKit is the browser engine that renders web content on every iPhone, iPad, Apple Watch, Apple TV, and Mac running Safari. By Apple's platform policy, every browser shipped on iOS, including Chrome and Firefox, has historically run on WebKit. The consumer install base is measured in the billions.

The codebase is mostly C++, with hand-written assembly in the JavaScriptCore JIT tiers. Its history traces back through KHTML, the KDE rendering engine Apple forked in 2001 to build Safari. Blink, the engine that powers Chromium, forked from WebKit in 2013. A browser engine's job is to turn untrusted input from the network into pixels and behavior. Hostile input by default.

That is the surface Apex was working on.

webkit

Why a thirteen-year dwell time matters

A 13-year dwell time means the vulnerable WebKit code likely landed around 2013 and survived heavy, continuous scrutiny.

That an autonomous agent surfaced it in 2026 is the signal.

thirteen-year-bug

How to read Apple's advisory

Apple's iOS 26.5 release notes credit Cantina for three WebKit findings in the same advisory. A single credit line for a single CVE is the standard pattern. Three CVE credits to the same source in a single advisory is unusual.

The three entries credited to Cantina in this release cover, in order of severity-impact-and-pedigree:

  1. The long-resident memory corruption class behind CVE-2026-43660.
  2. A Content Security Policy bypass at CVE-2026-28907.
  3. A second, independent Content Security Policy bypass at CVE-2026-28958.

The two CSP findings sit in the policy enforcement layer. The long-resident finding sits deeper, in the rendering and content-handling layer. The pair-and-singleton shape is itself a story, which is worth walking through.

how-to-read-apple-advisory

What is CVE-2026-43660?

CVE-2026-43660 is the thirteen-year-old WebKit vulnerability that Apex identified and Apple patched in iOS 26.5 and iPadOS 26.5 on May 12, 2026. The bug is a memory safety issue at the seam between two WebKit subsystems, each of which had been assuming the other was enforcing an invariant that, after a decade of refactors, no longer held. The specific interaction between those two subsystems had not been exercised by any prior auditor or harness in a way that surfaced it.

The line of code that contains the root cause is not exotic. The kind of code that produces these failures is exactly the kind of code that gets reviewed when a security engineer says "let me audit the parsing path." What survives audit is the interaction. A type assumption made in one component, fifteen function calls upstream from where the value is finally used, passes through enough refactors and feature additions that the upstream guarantee no longer holds. The downstream code still trusts the guarantee. The trust gap is invisible from either end. You can only see it if you reason about the entire path at once.

Apex reasons about the entire path at once.

For this writeup, the bug class is best described at the level of "the kind of memory safety failure that sits at the seam between two subsystems that each assume the other is enforcing an invariant." This is the most common shape of long-resident bug in mature C++ codebases. It is also the most expensive shape of bug to find manually, because the function that contains the bug looks safe in isolation, and the function that violates the precondition does not look like it is talking to security-sensitive code. Every modern static analyzer that finds bugs of this shape relies on inter-procedural analysis that crosses architectural boundaries. The pre-LLM generation of those analyzers does it with hand-tuned heuristics. The current generation does it with reasoning over an LLM's representation of the call graph and the type invariants.

The fix in the advisory is what you would expect: the precondition is now enforced at the upstream entry point as well as the downstream consumer, and a sanity check has been added at the seam. Defense-in-depth, applied in the right places. The patch is small. The path to finding it was not.

For users running iOS 26.4 or earlier, the practical question is exposure. Triggering this class of bug requires content delivered through any surface that reaches the affected WebKit path, which means web content, embedded webviews inside apps, certain types of preview rendering, and message rendering surfaces that rely on WebKit. The patched release ships the fix across all of those entry points.

Update.

The population of unpatched devices is large, the affected paths are reachable from web content, and the bug class is one where a single artifact, even a high-level one, materially reduces the work an attacker would need to do. Responsible disclosure does not stop at the patch ship date. It stops when the upgrade curve flattens.

What are CVE-2026-28907 and CVE-2026-28958?

CVE-2026-28907 and CVE-2026-28958 are two independent Content Security Policy bypass vulnerabilities in WebKit, both identified by Apex and patched in iOS 26.5 and iPadOS 26.5 on May 11, 2026. They sit in a different layer of WebKit than the memory safety issue above. They are policy enforcement bugs, not memory bugs. The Content Security Policy header that a website sends down with its response is meant to be a contract about what the page is allowed to do. When the browser's enforcement of that contract has a hole, the page becomes more attackable than the developer believed it was.

The two findings live in different code paths that handle CSP evaluation for different content situations, and they are not duplicates of the same underlying gap. The reason they appear in the same advisory is that Apex worked through the CSP enforcement layer methodically as part of the same engagement, and surfaced both before reporting.

CSP bypasses tend to land in one of a few buckets:

  1. Resource-loading bypasses occur when a directive that should block a class of network requests fails to do so because the URL or content type is matched against the policy in a way that allows it.
  2. Inline-execution bypasses, where the rules around unsafe-inline, nonces, and hashes are not applied consistently to every site that can produce inline behavior in the page.
  3. Navigation bypasses, where a directive controlling where the page can navigate, or what frame ancestors it can be embedded under, fails on edge-case URL forms.
  4. Reporting-only bypasses, where the policy is reportedly enforced but a parallel code path produces the side effects the policy was meant to prevent.

For application owners who rely on CSP as part of their defense-in-depth, the takeaway is the same one that always applies to CSP. It is a powerful tool that depends on the controls underneath it. The page that ships CSP is still responsible for input sanitization, output encoding, and a sane trust boundary around third-party scripts. CSP catches things that would otherwise hit the user. It does not eliminate the need for the layers below.

For end users, again, update. The patched browser engine enforces the policy correctly across the paths that the two CVEs cover. The pre-patch versions did not.

cve-findings

What is Content Security Policy?

Content Security Policy (CSP) is an HTTP response header that lets a website tell the browser which resources the page is allowed to load, which origins it can connect to, and under what conditions inline scripts can run. The browser is responsible for enforcing the policy. When CSP enforcement is correct, the page behaves inside the envelope the developer declared. When the enforcement has a bug, the page is more attackable than the developer believes.

The CSP defines several directives. script-src controls where JavaScript can load from. style-src controls stylesheets. img-src controls images. connect-src controls fetch and XHR endpoints. frame-src controls embedded iframes. default-src provides a fallback for anything not otherwise specified.

A site that ships a strict CSP is making an upfront declaration: "I know what I want my page to be able to do, and I want the browser to refuse to do anything outside that envelope." The motivation, historically, was cross-site scripting. XSS is the bug class where a page accidentally renders attacker-controlled content as executable script. CSP raises the cost of XSS dramatically by refusing to execute scripts the policy did not authorize.

The model worth carrying is that CSP is the browser keeping a promise on behalf of the page author. The page author writes the promise into the header. The browser is responsible for enforcing it. A CSP bypass is a moment where the browser breaks the promise without telling anyone. The page believes it is enforcing the policy. The actual behavior is permissive.

WebKit was breaking the promise. The patch keeps the promise again.

There is a secondary reason CSP enforcement bugs are interesting for an autonomous agent to work on. The CSP enforcement code in WebKit, and in every browser engine, is a layer that sees almost every navigation, every script tag, every resource load, every dynamic element insertion. The amount of state and the number of code paths that funnel through CSP evaluation is enormous. The code is also, by necessity, full of special cases for legacy compatibility with policy directives that have been added over the years. CSP is one of those layers where the surface area outruns the test coverage almost by definition. Anywhere the surface area outruns the coverage, an agent that can reason about the whole policy specification while reading the implementation has structural leverage.

Both findings in this release are examples of that leverage being applied.

content-security-policy

How did Apex find a thirteen-year-old bug in WebKit?

Apex found the thirteen-year-old WebKit bug by running a four-phase autonomous loop against the codebase. The phases are scoping (build a map of the attacker-reachable surface), hypothesis generation (propose candidate vulnerabilities of the form "if input X reaches code path Y, security property Z fails"), investigation (trace each surviving hypothesis through the code and try to falsify it), and confirmation (reproduce, minimize, and characterize impact). The loop carries state between phases, which is what makes long-resident bugs reachable.

Apex is Cantina's autonomous AppSec agent. The framing that does most of the explaining: Apex reads code the way a security engineer reads code, on a budget that no security engineer has.

The operating loop of an Apex engagement against a codebase like WebKit has four broad phases. The phases are not novel as concepts. The novelty is in how they compose, how cheap each iteration becomes, and what kinds of hypotheses the agent can carry across phases without losing the thread.

Phase 1: scoping

The agent takes the target codebase and produces a representation of the surface that an attacker can reach. For WebKit, that means identifying every input parser (HTML, CSS, JavaScript, image decoders, font shapers, audio frame readers, WebAssembly validators), every cross-process boundary, every security-sensitive layer (CSP enforcement, same-origin checks, sandbox enforcement, privilege downgrades), and the call graph that connects them. The output of this phase is not a list of files. It is an annotated map of the surface, with the trust boundaries marked.

Phase 2: hypothesis generation

Against the map, the agent generates candidate vulnerabilities of the form "if input shape X reaches code path Y, then security property Z fails." A useful hypothesis is one that is concrete enough to test and broad enough to cover real attack surface. Most hypotheses are not useful in this sense, and a good agent generates many more than it pursues. The hypothesis generation step is where reasoning over the architecture of the codebase pays off. An agent that has internalized the policy of CSP can generate "what if a directive applied to a navigation context behaves differently when the navigation is to a blob URL" without being told to look there.

Phase 3: investigation

For each hypothesis the agent decides is worth pursuing, it traces the code path, evaluates the type invariants and runtime conditions, and tries to falsify the hypothesis. The falsification is the important step. Bad bug hunting tools confirm hypotheses. Good ones falsify them, and only stop when a hypothesis survives falsification. The output of this phase, for the hypotheses that survive, is a candidate finding with a concrete trigger condition.

Phase 4: confirmation and triage

Candidate findings get reproduced, the conditions get minimized, and the impact gets characterized. The agent files what it has confirmed and discards what it could not.

Most hypotheses die in phase three. The phase-three death rate is the metric an honest agent should be evaluated on, because high false-positive output is the failure mode that killed the prior generation of automated security tooling and still kills most of what is being sold as "AI for security" today.

What an autonomous loop adds, compared to a static analyzer or a fuzzer, is the ability to hold a multi-step hypothesis across the architecture and reason about it. Static analyzers are excellent at single-procedure analysis and at certain classes of inter-procedural analysis with hand-tuned rules. Fuzzers are excellent at producing input variations and observing crashes. Neither is good at "this property holds because of an assumption that was true when this code was written in 2013, and the assumption was implicitly broken by a refactor in 2018 that the original author did not notice." The long-resident WebKit bug lives in exactly that category. The hypothesis-and-investigate loop is the right shape for finding bugs of that category.

What is an autonomous AppSec agent?

An autonomous AppSec agent is a software system that generates security hypotheses about a codebase, investigates those hypotheses across the architecture, falsifies the ones that fail, and reports the ones that survive. It carries state between investigation steps, builds and updates its own internal representations of the code (call graphs, type tables, invariant lists), and operates on the codebase the way a senior security engineer would, but at a per-iteration cost that no engineer can match.

The four categories worth keeping separate:

autonomous-appsec-agent

The honest version of where each category sits in 2026 is that static analyzers have gotten better, fuzzers have gotten better, chat assistants have made triage faster, and agents are now producing findings against codebases that the prior categories did not produce findings against. The WebKit result is the most legible example of that shift so far, sitting on top of earlier Apex findings in widely-used code that did not break out of the security press.

Apex is in the fourth category by design.

What this finding signals about AI-driven AppSec in 2026

The case for AI-driven application security has been argued, in conferences and product pages and venture decks, for several years. Every argument has a version that runs "machines can read code at scale and reason about it the way humans do, only faster." Every argument has had a counter that runs "every prior generation of automated tooling promised this and produced false-positive farms."

The way these arguments get resolved is by results, not by rhetoric. A finding in a well-audited codebase that the prior generation of tooling did not produce is the kind of result that resolves an argument.

What the WebKit result signals, specifically:

First, the floor moved. Autonomous agents can now reach into the most-audited code in commercial computing and produce findings that survived everything that came before. The floor for what an agent can reach is now higher than the floor for human-plus-static-analysis.

Second, architecture matters. An agent that runs a single chat-style prompt over a codebase does not produce this result. A structured hypothesis-and-investigate loop with persistent state and architecture-level reasoning does. Buyers in this category should ask to see the loop.

Third, this separates signal from noise. Every security vendor is launching an AI feature and every AI vendor is launching a security demo. Three Apex findings in one Apple advisory, with credit lines anyone can verify, is what production output looks like. The fourth signal is what this means for everyone else's codebase. That deserves its own section.

If a bug like this is hiding in WebKit, it's hiding in your code

Every mature codebase has seams between subsystems where each side assumes the other is enforcing an invariant, and those seams hide bugs that only show up when something reasons about the full path at once.

Auditing those seams manually is expensive, which is why most codebases have not done it.

The WebKit result is the public proof, on the most-read code in commercial computing, that an agent can find the bugs human reviewers cannot.

The agent that surfaced a thirteen-year-old WebKit bug is the same kind of agent that can find the bug nobody on your team has had the architectural reasoning budget to reach.

The teams that run that audit first will be running it against bugs that have been latent for years.

hiding-in-your-code

Recommended action

The practical actions are split into immediate, near-term, and structural.

Immediate: update every device under management to iOS 26.5 and iPadOS 26.5. The patches ship today. Standard MDM rollouts apply.

Near-term: audit the embedded WebKit surfaces in applications your organization owns or operates. iOS apps that ship a webview as part of their authentication flow, their help center, their preview rendering, their email rendering, or their in-app browser are running WebKit. The patch reaches those surfaces when the device updates. Verify that the apps in your fleet are not relying on a custom WebKit fork or an older system version that lags the patch ship.

Structural: take the implication seriously. A codebase audited at any fraction of WebKit's intensity almost certainly contains older, deeper bugs than its current toolchain has surfaced. Decide, with the budget and timeline that decision deserves, whether you are going to find those bugs before someone else does. The cost to find them has dropped.

recommendation-action

Acknowledgements

Apple Product Security and the WebKit team handled the coordination on these findings the way they always handle coordination: fast, careful, and professional. The fixes are clean, the credit is accurate, and the advisory communicates what users need to know without giving attackers what they need to know. The bar for browser security coordination is high, and Apple sets it.

The Cantina team that worked on these findings, both the engineers building Apex and the security researchers reviewing its output, did the part that makes any autonomous tool real, which is the careful triage, reproduction, and disclosure work that never gets a credit line in an advisory. The credit line is theirs too.

Frequently asked questions

What is Apex?

Apex is Cantina's autonomous AppSec agent. It reviews codebases for security vulnerabilities by generating hypotheses about how security properties might fail, investigating those hypotheses across the architecture, falsifying the ones that fail, and reporting the ones that survive. The thirteen-year-old WebKit finding is one in an ongoing series of Apex-discovered vulnerabilities in widely used software.

Who is Cantina?

Cantina (cantina.security) is the security company that builds and operates Apex. Cantina is part of the Spearbit ecosystem of security research and assessment. Cantina's full platform covers application security (the Apex agent), security operations, compliance, and runtime controls for AI agents.

How old is the WebKit vulnerability Apex found?

Thirteen years. The vulnerable code path was introduced into WebKit around 2013 and remained in the codebase, in code that was reviewed and refactored continuously, until Apex surfaced it in 2026. Apple patched it in iOS 26.5 and iPadOS 26.5 on May 12, 2026.

Which CVEs were credited to Cantina in Apple's iOS 26.5 advisory?

Three CVEs, all in WebKit, all in the same Apple advisory: CVE-2026-43660 (the thirteen-year-old memory safety issue), CVE-2026-28907 (a Content Security Policy bypass), and CVE-2026-28958 (a second, independent CSP bypass).

What should I do as an end user?

Update every iPhone and iPad you own or manage to iOS 26.5 or iPadOS 26.5. The patches are available now through the standard software update path.

What should I do as a developer or security engineer?

If your applications embed a WebKit-based webview (for in-app browsing, OAuth flows, help center rendering, preview rendering, or email rendering), verify those surfaces are not pinned to an older system version that lags the patch ship. If you operate a large codebase, the implication of this finding is that older, deeper bugs than your current toolchain has surfaced almost certainly exist in your code, and the cost of finding them with an autonomous agent has dropped.

Is exploit code or a proof of concept published?

No. The population of devices on the previous iOS release is still in the hundreds of millions. This writeup stays deliberately generic about the inputs that trigger the long-resident bug. Responsible disclosure does not stop at the patch ship date. It stops when the upgrade curve flattens.

What is Content Security Policy and why do the two CSP bypasses matter?

Content Security Policy (CSP) is an HTTP response header that tells the browser what a web page is allowed to do. A CSP bypass is a moment where the browser breaks the policy without telling anyone, so the page becomes more attackable than the developer believed it was. WebKit shipping two independent CSP bypasses to fix in the same advisory is unusual and reinforces the value of treating CSP as one layer in a stack, not as a substitute for the controls underneath it.

Thirteen years was the warmup

Thirteen years means thirteen years of every other tool in the security industry looking at this code and not finding this bug.

The work that produced the find did not need a bigger team, a longer engagement, or a more skilled human reviewer. It needed an agent that could carry the architecture in its head, generate hypotheses that crossed component boundaries, and falsify the wrong ones cheaply enough to keep going until the right one survived.

That is what shipped today, in three credit lines on an Apple advisory page.

The same loop is already running against other widely-used code. The next advisories will tell us how fast.

Apex is Cantina's autonomous AppSec agent. The three CVEs above are part of an ongoing series of Apex findings in widely-used codebases.