Cantina Threat Discovery: Race Condition in Ruby Core

Ruby 4.0.5 shipped on 2026-05-20 with a single-purpose security fix. The bug is CVE-2026-46727, a use-after-free race in Ruby's pthread-based DNS resolver. It crashes the Ruby process when an attacker can stall DNS responses for a hostname that the process is resolving with a timeout. Apex (Cantina's AppSec agent) found it in code. Ruby's own committers had already reviewed and publicly patched it once. The patch went out the same day, by Ruby committer shioimm.
Versions affected: Ruby 4.0.0 through 4.0.4. Ruby 3.4 series and earlier are not affected.
The impact is bounded. This matters if your Ruby app resolves hostnames that an attacker can influence (webhooks, partner APIs, user-supplied URL fetchers). Housekeeping otherwise.
Ruby Core Is Already Heavily Scrutinized
Ruby's ext/socket/ extension sits on the path for a big amount of Ruby software. It has been reviewed repeatedly over the years by core contributors and by engineers running Ruby at large scale.
In late 2023, the Ruby team added a feature to make DNS name resolution interruptible (Feature #19965). Before that, a hung DNS server could freeze a Ruby process indefinitely, because the underlying C call (getaddrinfo) was not cancellable. The fix was to spawn a worker pthread to do the DNS lookup, while the calling Ruby thread waited with a timeout and could give up if the worker took too long.
Within weeks of the feature shipping, byroot at Shopify saw the same code path crash production CI under load and reported it (note #4 on the original feature). The Ruby team patched that crash a few hours later (commit d0066211f2). The code went into Ruby 3.3 in December 2023 and has been in every Ruby release since.
This is the code Apex analyzed next. Reviewed once, patched once, run at scale by some of the largest Ruby deployments on the internet. Apex found a second bug in it.
What Apex Was Looking For
When Apex analyzes a language runtime, it focuses on its recently added concurrency. New concurrency in old C code is the highest-yield source of memory-safety regressions, because the surrounding code's conventions were written for the previous concurrency model, and the new code has to invent its own. Apex tracks every object that crosses a thread boundary, follows who allocates it, who writes to it, who frees it, and asks what happens at every exit path the calling thread can take.
The pthread DNS resolver was flagged. It is C code in the standard library; it owns a shared object that crosses a thread boundary, and the calling thread has an explicit "give up" path. Byroot's 2023 segfault on the same code was already a public signal that the concurrency contract was not airtight. That history is the kind of input Apex weighs when ranking where to spend audit time.
The question Apex pursued was narrow: when the calling thread gives up on the worker and walks away, who still has a pointer to the shared state, and what stops them from writing into it after it has been freed?
What's Actually Wrong
Socket.tcp on Ruby 4.0.x resolves hostnames using Happy Eyeballs v2 (RFC 8305). Two child pthreads run in parallel, one resolving the IPv4 record and one resolving IPv6, and whichever returns first wins. Both threads write into a struct that the main Ruby thread also reads.
The shared struct holds an array of entries, one per resolution attempt:
struct fast_fallback_getaddrinfo_shared {
// state shared by main thread + two resolver workers
struct fast_fallback_getaddrinfo_entry *entries;
int entry_count;
};
struct fast_fallback_getaddrinfo_entry {
int family; // AF_INET or AF_INET6
struct addrinfo *res; // resolved address chain
// and shared state the entry's free function reads from
};
// Pre-fix cleanup ordering on timeout:
free(shared); // (1) shared freed first
free_entry(entries[i]); // (2) reads from shared -> use-after-freeWhen the main thread times out, it triggers cleanup. The shared struct and the individual entries were freed separately, and the entries' free function read state from the shared struct to know what to clean up. If cleanup freed the shared struct first and then tried to free an entry, the entry's deallocation dereferenced memory the allocator had already reclaimed.
The race window opens the moment cleanup begins and closes the moment both worker pthreads exit. An attacker who can hold a DNS response just past the configured timeout, then let it complete, can drive cleanup into the unsafe ordering.
The Impact Is Bounded
This is a denial-of-service primitive. The crash kills a Ruby process. There is no public route from this bug to code execution or information disclosure.
The CVE matters where Ruby applications resolve hostnames an attacker can influence:
- Webhook callers, where an attacker registers a webhook URL with a hostname they control
- Federated identity callbacks (OIDC, OAuth) that resolve provider hostnames from request context
- SSRF-protection probes that resolve user-submitted URLs
- RSS readers, OpenGraph fetchers, image proxies, anywhere user input flows into a hostname
For applications that only resolve their own static internal hostnames (a known database, a fixed set of upstreams), CVE-2026-46727 is a low-priority upgrade. For applications in the first bucket, the same primitive crashes the process on every request that hits the resolver. On a worker pool processing attacker-influenced hostnames, that is availability loss until the upgrade lands.
The Fix
Ruby 4.0.5 shipped on 2026-05-20 with shioimm's patch (PR #13231). The fix decouples the deallocation of the entry's addrinfo memory from the lifecycle of the entry struct itself. The entry no longer needs to reach into the shared struct at cleanup time, so the two can be freed in any order without a use-after-free.
// After the patch: entries own enough state to free themselves
struct fast_fallback_getaddrinfo_entry {
int family;
struct addrinfo *res;
// addrinfo memory is now released via its own path,
// independent of the entry's lifecycle and access
};
// Cleanup runs safely in either order:
free_entry_addrinfo(entries[i]); // entry's own data, self-contained
free(shared); // shared no longer touched by entry freesThis is the standard fix for use-after-free bugs caused by shared lifetime across cooperating threads. Every object that crosses a thread boundary needs to be able to deallocate itself without reading from another object that might already be freed.
Ruby 4.0.0 through 4.0.4 are affected. Ruby 4.1.0-dev was affected before the fix landed in master. Ruby 3.4 series and earlier are not affected per the official advisory.
If you cannot upgrade immediately, the advisory's workaround is to stop passing timeout: to Addrinfo.getaddrinfo and resolv_timeout: to Socket.tcp. Without those arguments, the cancellation path the bug lives in is never taken. The cost is the original pain point the feature was added to solve: a hung DNS server will block the Ruby thread. Use it only as a bridge.
The Limits of “Heavily Audited”
The interesting part of this finding is where the bug lived.
The pthread DNS resolver was a careful piece of work by the Ruby team. The committers reviewed it, byroot reported the first crash within weeks, the Ruby team patched it in days, and the code has been in every Ruby release for two and a half years. The Ruby team is unusually attentive to this code.
And the same bug class still produced CVE-2026-46727.
Audit depth costs something on human time, and this CVE shows what. Catching this bug required reading the cancellation path of a feature you reviewed eighteen months ago with the same level of care you brought the first time, and the second time, and the eighth time. Nobody does that, because nobody can. The unsafe path is reached only through a specific exception (the calling thread's timeout) that test suites rarely exercise under attacker-controlled timing. Humans skim. Humans assume the part they fixed last year is still fixed. Humans get tired of reading the same file.
Apex does not.
How Cantina Handles Vulnerabilities Like This
Cantina turns vulnerability research into verified remediation.
Apex found this bug by auditing the hardest part of modern runtimes: new concurrency inside old C code. When it flags an issue, it does not stop at a finding. It traces real ownership, drives the fix through your existing workflow, and verifies the loop is closed.
Apex runs continuously as dependencies change, so the same class of cancellation-path bugs gets caught before it becomes an incident.