CAISI announces pre-deployment evaluations with Google DeepMind, Microsoft, xAI
CAISI announced new agreements with Google DeepMind, Microsoft and xAI to conduct pre-deployment evaluations and targeted research to assess frontier AI capabilities and advance AI security.
Objective Facts
On May 5, 2026, CAISI announced new agreements with Google DeepMind, Microsoft and xAI to conduct pre-deployment evaluations and targeted research to assess frontier AI capabilities and advance the state of AI security. CAISI's agreements with frontier AI developers enable government evaluation of AI models before they are publicly available, as well as post-deployment assessment and other research. These agreements build on previously announced partnerships that were renegotiated to reflect CAISI's directives from Commerce Secretary Howard Lutnick and America's AI Action Plan. Under the direction of Secretary Howard Lutnick, CAISI has been designated to serve as industry's primary point of contact within the U.S. government to facilitate testing, collaborative research and best practice development related to commercial AI systems. The Trump Administration has engaged in a head-spinning policy pirouette, now considering oversight for advanced AI models driven by concerns about the national security implications of Anthropic's new 'Mythos' AI model, with its ability to identify and exploit cyber security vulnerabilities—as well as broader fears around cyber capabilities and dangerous misuse.
Left-Leaning Perspective
Left-leaning outlets and civil society experts have raised sharp concerns about the political dimensions of the CAISI agreements. Rumman Chowdhury, CEO of Humane Intelligence, characterized this as "a 180 for the Trump administration, that has very explicitly been anti-any sort of regulation and also has explicitly tried to block states from enacting any kind of regulation". The Fortune article covering the shift noted that Chowdhury said the administration's efforts may sound good, but "the devil is in the details," stating "Evaluations are a policy tool, they are not actually data-driven. My concern is that this is another political tool that the administration wants to own and wield," and expressed concern about whether CAISI has the funding and authority needed to fulfill its mission. Progressive critics argue that the voluntary framework lacks transparency and independence. Critics warn that without published criteria and clear threat modeling, evaluations could become politicized rather than truly reducing risks, with CAISI conducting assessments whose standards, threat models, and procedures the public cannot see. An analysis titled "Experts Warn Trump's AI Safety Tests Could Fail" noted that without published criteria, evaluation processes could get politicized, possibly slowing innovation or hiding outputs for reasons unrelated to safety, and this lack of openness might also scare off legitimate industry cooperation if developers worry about unclear or biased assessments, while critics also point out big gaps in funding and expertise for ongoing evaluation of advanced AI systems. Left-leaning coverage emphasizes institutional instability and conflict with Anthropic. The originally appointed director Collin Burns—a former Anthropic and OpenAI researcher—was pushed out just four days into the job after White House officials raised concerns about his ties to Anthropic; an oversight body with an unstable leadership history, no permanent legal standing, and a complicated relationship with at least one of the companies it's evaluating is not a picture of institutional strength. Progressive outlets have also highlighted that despite the White House's explicit ban on cooperating with Anthropic, multiple federal agencies are quietly circumventing this prohibition to test the company's newly released Mythos model, reflecting an increasingly sharp tension in the Trump administration between blocking allied tech firms and protecting national cybersecurity.
Right-Leaning Perspective
Conservative and industry-aligned commentary has focused on national security imperatives and the practical benefits of government-industry collaboration. White House National Economic Council Director Kevin Hassett said the administration is "studying possibly an executive order" to ensure new AI models are secure before release, comparing the approach to FDA drug evaluation, stating frontier AI that could create vulnerabilities should "be released in the wild after they've been proven safe, just like an FDA drug". This framing emphasizes responsible governance rather than overregulation. Cybersecurity experts on the right side of this debate emphasize the practical value of pre-deployment testing. Fritz Jean-Louis, principal cybersecurity advisor at Info-Tech Research Group, said the CAISI agreements signal a shift toward proactive security for agentic AI by enabling government-led testing before and after deployment, which should "help strengthen visibility into autonomous behaviors while accelerating the development of standards to mitigate risks. By combining early access, continuous evaluation, and cross-sector collaboration, the initiative pushes the industry toward security-by-design," though he noted potential hurdles like intellectual property protection. The Business Software Alliance also supported the framework. Aaron Cooper, Business Software Alliance Senior Vice President of Global Policy, said CAISI brings necessary expertise to evaluate frontier models for safety and national security risks, and that "Today's announcement reinforces CAISI's role as the right institutional home within government for advancing evaluation and measurement science and convening AI companies and stakeholders on a voluntary basis around responsible practices. A strong role for CAISI can also help further global collaboration and alignment on safety and security". Right-leaning analysts also note this represents a necessary evolution from Trump's initial deregulatory stance. Trump officials are framing this shift as a response to escalating cybersecurity and national-security risks rather than as a broader embrace of EU-style AI regulation, focusing on Anthropic's Mythos and its potential use by hackers, while emphasizing they want to avoid "onerous" controls on everyday AI applications, as frontier models that could supercharge cyberwarfare, bio-threats, or other strategic dangers are another matter.
Deep Dive
The CAISI expansion represents a genuine pivot in Trump administration AI policy, driven not by ideological consistency but by a single triggering event: Anthropic's powerful new Mythos AI model pushed concerns about AI's impact on cybersecurity to a tipping point last month, helping prompt the White House to weigh a formal review process for AI. This creates a credibility problem. The same administration that in October 2025 had David Sacks, then the White House's AI and crypto czar, publicly accuse Anthropic of "running a sophisticated regulatory capture strategy based on fear-mongering", is now building a testing framework in which all five major U.S. frontier labs participate—except Anthropic remains in a separate legal dispute over a supply chain risk designation. Both perspectives get something right and something crucial wrong. The left correctly identifies that the framework lacks transparency, permanent legal standing, and sufficient funding. The Trump-aligned America First Policy Institute called CAISI "chronically underfunded," with approximately 30 total staff and having received $30 million since 2024, which is less than similar AI centers in Canada and Singapore; the think tank argued Congress should fund CAISI with $50-100 million in annual funding. The right correctly notes that pre-deployment evaluation, conducted by government scientists with access to unreleased models, is a genuine advance over pure industry self-governance. The U.S. government has quietly secured something the AI industry has resisted for years: a seat at the table before models ship; combined with existing and recently renegotiated agreements from Anthropic and OpenAI, every major U.S. frontier AI lab now participates in voluntary pre-release government evaluations. What remains unresolved: whether this voluntary testing arrangement can harden into something closer to enforceable national policy without Congress writing a new AI law. Cornell assistant professor Gregory Falco noted that "The federal government does not currently have the in-house technical expertise, infrastructure, or day-to-day insight needed to directly evaluate these systems on its own," which explains why the arrangement depends on industry cooperation. The critical question for oversight: will evaluators operate as neutral scientists, or as extensions of whoever controls CAISI's directorship? The speed of institutional turnover (director Collin Burns lasted four days) suggests the latter risk is real.