Imagine a tiny tweak in how a system organizes data causing a massive outage for millions of users. That’s exactly what happened to Cloudflare’s popular 1.1.1.1 DNS service, and the culprit? A decades-old ambiguity in how DNS records are ordered. But here’s where it gets controversial: Was this a failure of the system’s design, or did Cloudflare overlook a critical detail? Let’s dive in.
In a recent blog post titled What came first—the CNAME or the A record? (https://blog.cloudflare.com/cname-a-record-order-dns-standards/), Cloudflare sheds light on how an unclear specification in the RFC (Request for Comments) standards led to a significant disruption. The issue? A routine update on January 8 altered the order of CNAME records in DNS responses, causing some clients to fail when resolving names. While most modern DNS software ignores the order of records, Cloudflare discovered that certain implementations strictly expect CNAME records to appear first.
And this is the part most people miss: The change wasn’t just a simple bug—it was a subtle optimization. Sebastiaan Neuteboom, a systems engineer at Cloudflare, explained that the update aimed to reduce memory usage in their cache implementation. The change was introduced on December 2, 2025, tested on December 10, and deployed globally starting January 7, 2026. When the order of CNAME records shifted, DNS resolution broke, leading to a widespread outage of the 1.1.1.1 service.
Here’s how it works: When a DNS resolver encounters a CNAME record, it follows a chain of aliases to reach the final address. Each step in this chain is cached with its own expiration time. If part of the chain expires, the resolver only fetches the outdated portion and combines it with the valid parts. Cloudflare’s update altered this process by appending CNAME records to the end of the response instead of placing them at the beginning. For example:
;; QUESTION SECTION:
;; www.example.com. IN A
;; ANSWER SECTION:
cdn.example.com. 300 IN A 198.51.100.1
www.example.com. 3600 IN CNAME cdn.example.com.
While many DNS clients, like systemd-resolved (https://codelucky.com/systemd-resolved-linux-guide/), handle this gracefully, others—such as the getaddrinfo function in glibc—rely on CNAME records appearing first. This dependency sparked debates on platforms like Reddit and Hacker News. One user commented, ‘I respect their engineering standards, but it feels like they lack proper global impact testing.’ Another user invoked Hyrum’s Law: ‘With enough users, every observable behavior becomes a dependency,’ and questioned whether Cloudflare violated Postel’s Law: ‘Be conservative in what you send, liberal in what you accept.’
To address this, Cloudflare has proposed an Internet-Draft (https://datatracker.ietf.org/doc/draft-jabley-dnsop-ordered-answer-section/) for discussion at the IETF, aiming to clarify how CNAME records should be handled in DNS responses. According to their timeline, the global rollout began on January 7, reaching 90% of servers by January 8 at 17:40 UTC. The outage was declared shortly after, and the rollback started at 18:27 UTC, completing by 19:55 UTC.
Now, here’s the question for you: Is the ambiguity in the RFC standards to blame, or should Cloudflare have anticipated this dependency? Let us know in the comments—this is a debate worth having!