- Cambridge-led index reviews safety disclosures of 30 AI agents.
- Only four agents publish formal, agent-specific safety reports.
- Browser-based agents show highest safety disclosure gaps.
- Researchers warn transparency lags behind AI deployment pace.
Another scholarly analysis of prominent artificial intelligence agents has discovered that most of them does not release fundamental safety and assessment reports, which issues transparency challenges as AI instruments are increasingly integrated into everyday activities.
It is a component of the AI Agent Index, a platform designed by scholars at the University of Cambridge, and was analyzed on 30 well-known AI agents in terms of chat, browser, and enterprise workflow. The group accessed the information that was available publicly and communicated with the developers personally to analyze the practice of disclosure.

The researchers discovered that among 30 of the agents, only four issue formal, agent-specific, system cards of the information on safety measures, level of autonomy, and risk assessments. Neither internal safety testing nor no third-party evaluation information was revealed in twenty-five agents.
The update of index which was completed with the help of the available data till the end of 2025 will contain the verified information in 1,350 reporting fields of the chosen AI systems.
Lack of Disclosure in the Fast Growing Industry
Leon Staufer, a researcher based at the Leverhulme Centre of the Future of Intelligence located at Cambridge, led the study, but was assisted by researchers based at institutions such as MIT, Stanford and the Hebrew University of Jerusalem.
This is because, as Staufer points out, many developers check the AI safety box, claiming to work on the large language model beneath it, and offering little or no information about the safety of the agents that are developed on top of it.
Critical to AI safety behaviours are a product of the planning, the tools, and the memory and policies of the agent itself and not simply of the underlying model and very few developers share such assessments.
AI agents are not just one-dimensional standalone language models as they can be autonomous, e.g. browse web, complete forms, book services, create workplace documents or operate business processes. Researchers came up with 13 agents who were described as frontier levels of autonomy. Of these, just four of them reported any safety assessment on the agent as such.
Also Read: Foreign Investors Pull Billions From Asian Tech Stocks As AI Sell-Off Spreads
Developers issue general, high level safety and ethics standards that are providing some reassurance, but are also publishing little empirical evidence required to gain real knowledge about the risks, Staufer said.
The developers are significantly more open with regards to the functionality of their AI agent. It is a type of transparency asymmetry, which implies a less robust kind of safety washing.
In their review, the five Chinese AI agents that were included had only one that published any type of safety framework or compliance standard.
The browser agents have the worst indicators of risk
The most secret agent group that was detected in the Index was AI-enhanced web browser agents - software that acted as a user-agent to carry out activities online by clicking, scrolling and filling forms.
The report indicated that the numbers of the disclosure areas that were not reported by browser agents was 64 percent of the total number of safety related areas, the highest percentage of disclosure areas examined. Agents that are enterprise oriented were just behind, with 63 percent of safety based fields overlooked whereas a chat based agent was without 43 percent of such disclosures.
There was also poor security vulnerability transparency as recorded by researchers. Only five of the agents had their known security incidents or concerns published, and vulnerabilities of immediate injection, where malicious instructions are used to cause a security protection to be bypassed, were reported in two.

It was observed that at least a number of six agents adopt code and IP settings that are designed to simulate a human browsing behavior, which may bypass the anti-bot identifying systems. There is only three agents that assist in watermarking to recognize AI generated media.
Staufer said that the operators of a website were unable to tell the difference between a human user and a legitimate agent and a scraping bot anymore. The implications on online shopping and filling forms, as well as the bookings and content scraping, are rather drastic.
Dependency concentration in the AI ecosystem was also brought out in the report. The majority of non-Chinese agents use few foundation models, such as GPT, Claude and Gemini. According to the researchers, it forms possible choke points in the system.
Staufer said that this common dependency would result in possible single points of failure. Even a single change in price or service disruption or safety regression in a model would trickle down to hundreds of AI agents. It also brings about possibilities of safety assessments and surveillance.
Case Study Succumbs to Operational Issconcerns
The update contains the case study of Perplexity Comet, which is defined as one of the most autonomous browser-based agents that was reviewed and the least open in safety reporting.
Comet positions itself as operating similar to a human assistant as one navigates the web. In the report, Amazon has warned of suing the agent because of failing to represent itself as an artificial intelligence system when using its services.
It was observed that the weaknesses in browser agents might have a direct real time impact because these systems might be utilized to execute functions, post forms or make use of related accounts.
Staufer based his study on earlier information by security researchers which indicated that malicious web content can request browser agents to execute unwanted commands. Some other reported attacks had the ability to steal personal information of the linked services. Unless there are effective safety disclosures, vulnerabilities can only be discovered once they are exploited.
According to the recent AI Agent Index the gap between the rate of deployment and the rate of safety evaluation is getting larger. There is very little information about safety, assessments, and societal effects by the majority of developers.
According to Becker, the sphere of artificial intelligence and the agents themselves are becoming increasingly autonomous and able to act in the real world, yet the transparency and governance models designed to handle this change are perilously behind schedule.
The researchers indicated that the Index is supposed to offer standardized and similar data on the ecosystem of AI agents to facilitate oversight, policy formulation, and risk evaluation as the technology keeps on growing.
Recommended FAQs
What did the AI Agent Index study reveal about safety transparency?
The study found that most leading AI agents do not publish detailed safety or risk assessment reports. Only four of 30 agents reviewed released formal system cards outlining safety measures and autonomy levels.
Which types of AI agents had the weakest safety disclosures?
AI-powered browser agents showed the largest gaps in safety reporting, leaving 64% of safety-related fields undisclosed. Enterprise agents followed closely, while chat-based agents disclosed comparatively more information.
Why are researchers concerned about browser-based AI agents?
Browser agents can autonomously browse websites, fill forms and perform transactions. Researchers warned that limited disclosure and potential vulnerabilities could expose users and online platforms to security risks.
What risks arise from shared foundation models like GPT or Claude?
Many AI agents rely on a small number of underlying models such as GPT, Claude and Gemini. Researchers said this concentration could create single points of failure if pricing, service or safety standards change.
How could limited safety reporting affect AI oversight?
Researchers said weak transparency may delay the detection of vulnerabilities until after exploitation. The Index aims to provide standardized data to support policy, governance and risk evaluation as AI agents grow more autonomous.