The landscape of digital discovery is undergoing a seismic shift as generative artificial intelligence (AI) increasingly replaces traditional search engines as the primary gateway to information. New data released by Muck Rack in May 2026 provides a comprehensive look at the underlying sources fueling these AI systems, revealing that journalism and earned media remain the indispensable foundation of the Large Language Models (LLMs) that power tools like ChatGPT, Claude, and Gemini. According to the report, titled "What Is AI Reading?", an overwhelming 99% of the links cited by AI platforms originate from non-paid media sources, underscoring a pivot back toward high-quality, authoritative content in an era of digital saturation.
The findings come at a critical juncture for the communications and media industries. As the practice of Generative Engine Optimization (GEO) begins to supersede traditional Search Engine Optimization (SEO), the competition for visibility within AI-generated responses has intensified. The Muck Rack data suggests that while the medium of delivery has changed, the value of traditional reporting and verified corporate storytelling has never been higher.
The Hierarchy of AI Citations
The architecture of AI search is built upon a diverse array of content, but the Muck Rack report highlights a clear hierarchy in what these models prioritize. Journalism stands at the pinnacle, accounting for 27% of all content found on AI platforms. This category includes news articles, investigative reports, and feature stories from established media outlets. Within this journalism segment, recency is a dominant factor; 57% of journalism citations with known publication dates were produced within the last 12 months. This indicates that AI models are increasingly being optimized to provide real-time or near-real-time information to users, moving away from the static datasets of the past.
The second-largest contributor to the AI information ecosystem is corporate blogs and owned content, which represents 24% of citations. This figure highlights a significant opportunity for brands to influence AI responses by maintaining robust, informative, and authoritative digital newsrooms. Following corporate content, "aggregators and encyclopedias"—a category that includes sites like Wikipedia and specialized knowledge hubs—account for 17.4% of the measured citations.
Conversely, the data reveals a stark reality for traditional promotional tactics. Paid content and advertorials represent a negligible 0.3% of all citations found on AI platforms. Perhaps most surprisingly for the public relations industry, press releases account for a mere 1.1% of citations. This suggests that while press releases remain a vital tool for communicating with journalists, they hold almost no direct value in the eyes of the algorithms that power AI search.
Platform-Specific "Diets": ChatGPT, Claude, and Gemini
One of the most significant revelations in the updated data is the distinct "diet" preferred by different AI chatbots. Rather than pulling from a homogenous pool of data, each major LLM appears to have developed a specific reliance on certain types of sources, likely due to the training priorities and partnership agreements of their parent companies.
ChatGPT, developed by OpenAI, continues to rely heavily on Wikipedia as a primary source for general knowledge and historical context. This reliance ensures a level of neutrality and breadth, though it also inherits the limitations of the crowdsourced encyclopedia. OpenAI has also recently signed high-profile licensing deals with major publishers like News Corp and Axel Springer to bolster its access to premium journalistic content.
Claude, the AI developed by Anthropic, shows a marked preference for academic and specialized data, frequently pulling information from PubMed Central. This orientation makes Claude a preferred tool for users seeking scientific, medical, or highly technical information, as the model prioritizes peer-reviewed and scholarly sources over general web content.
Gemini, Google’s entry into the generative AI space, has taken a different route by leaning heavily on Reddit. This is largely a result of a reported $60 million-per-year deal between Google and the social media platform, allowing Google’s models real-time access to Reddit’s vast repository of human conversation, opinions, and niche community knowledge. While this gives Gemini a "human" edge in understanding slang and current trends, it also presents challenges regarding the accuracy and civility of the information retrieved.
The Evolution of Search: From SEO to GEO
The transition from traditional search to AI-driven discovery has given rise to the concept of Generative Engine Optimization (GEO). For decades, PR professionals and digital marketers focused on SEO—optimizing websites to rank on the first page of Google through keywords, backlinks, and technical site health. However, in a world where an AI chatbot provides a single, synthesized answer rather than a list of links, the goal has changed.
GEO focuses on ensuring a brand or a story is included in the synthesized response of an LLM. The Muck Rack data proves that the most effective way to achieve this is through earned media. Because AI models prioritize journalism and authoritative corporate content, a single mention in a reputable news outlet is now worth more than dozens of low-quality backlinks.
"GEO is making earned media hot again," the report notes, echoing a sentiment felt across the communications industry. However, this shift also presents a challenge. Because everyone is now looking at the same data, PR professionals are increasingly pitching the same high-authority outlets, leading to a crowded and highly competitive media landscape. The "spray and pray" method of distribution—sending generic pitches to hundreds of journalists—is becoming obsolete. Success in the age of AI requires a more targeted, strategic, and thoughtful approach.
Chronology of AI Data Integration
The current state of AI search is the result of a rapid evolution over the last three years:
- Late 2022: The launch of ChatGPT marks the beginning of the public generative AI era, though the model is initially limited by a "knowledge cutoff" that prevents it from accessing real-time news.
- 2023: Search engines like Bing and Google begin integrating LLMs into their search interfaces (SGE – Search Generative Experience). This sparks a debate over "zero-click" searches, where users get their answers without ever visiting the source website.
- Early 2024: Major publishers begin suing AI companies (e.g., The New York Times vs. OpenAI) over copyright infringement, claiming their articles were used to train models without compensation.
- Late 2024 – 2025: A wave of licensing agreements is signed. Platforms like OpenAI and Google pay hundreds of millions of dollars to secure legal access to the archives of The Wall Street Journal, The Guardian, and The Associated Press.
- May 2026: Muck Rack’s data confirms that these partnerships and the prioritization of journalism have fundamentally shaped how AI models cite information, with journalism becoming the leading source of "truth" for LLMs.
Regulatory and Ethical Context
The shift toward AI-driven information retrieval is not happening in a vacuum. Regulatory bodies are beginning to take note of how information is disseminated and prioritized. The Securities and Exchange Commission (SEC) recently unveiled new rules regarding digital disclosures, which may further change the PR game. These rules emphasize the need for transparency and accuracy in how corporate information is shared online, directly impacting the "corporate content" category that makes up 24% of AI citations.
Furthermore, the ethics of AI training remain a point of contention. Katharine Viner, the editor-in-chief of The Guardian, has frequently revisited the importance of original reporting in an era where AI can "hallucinate" or misinterpret data. The Guardian’s stance—and that of many other legacy outlets—is that while AI can be a tool for discovery, it must not be allowed to erode the financial viability of the newsrooms that provide its "fuel."
Implications for the Future of Communications
The Muck Rack report serves as a wake-up call for the PR and media industries. The fact that press releases account for only 1.1% of AI citations suggests that the traditional "announcement" is losing its efficacy in the digital age. To influence the AI models of the future, brands must focus on storytelling that resonates with journalists and provides genuine value to audiences.
The practitioners who succeed in this environment will be those who act as true partners to journalists. By providing compelling stories, expert commentary, and data-driven insights, PR professionals can secure the earned media placements that LLMs crave. This, in turn, ensures that their clients and organizations remain visible in the AI-generated answers that are becoming the new standard for search.
As the data shows, the "core foundation" of AI is still the human-led endeavor of journalism. While the technology is revolutionary, the most valuable currency in the information economy remains trust, authority, and original reporting. For those navigating the complexities of AI search, the path forward is clear: quality content is no longer just a goal; it is a requirement for survival in the generative age.






