RESEARCH BRIEF

Trust & Safety in an Agentic World

Lessons from the fraud trenches for the age of AI agents

Derrick Ongchin, Alex David, Jimmy Steinauer·March 28, 2025·12 min read

The Nightmare Scenario

Imagine you're running the fraud and risk team for one of the fastest-growing companies in the world. You pull up your city map, and it's a live view of every trip in progress, but something isn't quite right. Instead of the organic scatter of real rides, which is your typical view, you see clusters of cars, moving in formation, circling the city in patterns no human passenger would ever create.

That's what Derrick Ongchin's team saw when Uber expanded into China. And by the time those clusters appeared, the damage was already done.

Derrick, whose career spans Trust and Safety leadership at Google, Uber, and Afterpay — recently joined with G2's Product R&D team to share what those fraud wars taught him. The lessons were urgent then. In a world of AI agents, they're critical now.

Uber fraud traffic patterns showing suspicious ride clusters across a city — Fraudulent ride clusters detected across Uber's platform in China

The Proto-Agent Wars: A Parable From the Trenches

A decade before "Agentic AI" was a term anyone used, Derrick's team at Uber was already fighting what he calls the proto-agent wars. Traditional T&S assumes a human is the actor. A person posts, clicks, submits. You moderate the content or the account. Proto-agents break that model in three ways:

Speed

They act faster than any human review loop was designed for. By the time a flag triggers, the action is done.

Opacity

The harm doesn't come from a single output. It emerges from a chain of actions — most of which were never reviewed.

Diffuse Accountability

Who's responsible? The user who prompted it? The developer who built it? The platform that hosted it? Proto-agents dissolve the clean lines T&S depends on.

Trust & Safety was built for humans behaving badly. Proto-agents aren't behaving badly — they're just executing. And that's exactly the problem.

Organized fraud syndicates — some VC-backed, operating entirely in the open — had built a sophisticated, automated machine for exploiting the platform. They assembled SIM racks loaded with real carrier cards to pass phone verification at scale. They wrote GPS spoofing software that faked accelerometer readings, device fingerprints, and route data so convincingly that a fraudulent trip looked identical to a real one. They built OTA update channels to patch their tools within hours whenever Uber closed a loophole and went as far as notifying customers of "service outages" while they worked, then bringing the system back online.

A small team of three operators was pulling in $500K to $1 million a month. On stolen financials. Remotely. And they were only one operation among many.

"It was the craziest cat and mouse game I've ever had to deal with," Derrick said. "We would patch things in our apps, send updates — and within 8 hours, they would patch all their customers' apps and get back up and running."

The end state was Derrick's worst-case scenario made real: a city where it was nearly impossible to distinguish a real trip from a fabricated one. Fraudsters had so thoroughly polluted Beijing's supply-and-demand economy that legitimate users couldn't get a ride. ETAs were broken. Surge pricing was constant. Real drivers defected to competitors. Uber had to go back to the drawing board — rebuilding driver onboarding, overhauling verification, and rethinking incentive structures from the ground up to make fraud economically unviable. The cost of competing in that environment ultimately contributed to Uber's exit from China.

But here's what keeps the lesson relevant today: the overhead that made those fraud operations expensive to run — the human labor, the physical SIM racks, the coordinated rings — is being replaced by AI. The cost of attack is collapsing. What once required an organization now requires a prompt.

Here's the critical detail: none of this required AI. It was humans, scripts, and coordination. The lesson Derrick carried forward was simple and terrifying: if this is what humans with scripts could do, what happens when you hand those same playbooks to agents that think, adapt, and optimize on their own?

Attack Evolution

Scripts

Early-stage automation

Bots

Automated interaction

Coordinated Fraud Rings

Human-directed networks

AI Agents

The new frontier — autonomous, adaptive agents that bypass traditional security in real-time

Three attack vectors converging into autonomous AI agents

When Trust Becomes a Target

Derrick put it plainly:

"How do you know you're not living in some matrix, where there are just tons of agents — agencies using agents, that are using agents — spinning up accounts, writing reviews, and the whole thing is fabricated? That's a question you should be constantly asking yourself as technology evolves."

Derrick wasn't speaking hypothetically. Any platform where reputation, rankings, or reviews carry real economic value — think marketplaces, review platforms, financial services, e-commerce — faces this exact threat. The higher the stakes attached to a number, a rating, or a ranking, the stronger the incentive to manipulate it.

At G2, being ranked #1 in a software category drives pipeline, influences enterprise buying decisions, and can be worth millions in revenue to a vendor. That kind of value is a magnet for manipulation. And the tools to manipulate at scale have never been more accessible.

Today, the most common attacks are familiar: fake accounts chasing gift card incentives, coordinated review poisoning to sabotage competitors. But the threat is evolving fast. As AI agents become capable of simulating hundreds of hours of software usage, generating contextually accurate reviews, and maintaining convincing behavioral profiles, the signal that used to separate real from fake begins to disappear.

The "clusters on the map" moment for a review platform won't look like cars driving in formation. It'll look like entirely plausible, well-written, verified-looking reviews — and by the time the pattern is visible, trust in the platform has already eroded.

The Identity Problem Is Getting Harder, Not Easier

Traditional Trust & Safety is built on a foundation of identity: verify who someone is, and you can assess their behavior. But that foundation is cracking.

LinkedIn logins, business email domains, purchase proof uploads, each of these verification layers was designed for a world where faking them required meaningful effort. That's no longer true. Business email domains can be spun up programmatically. Deepfakes can bypass selfie verification. Stolen national IDs, combined with AI-generated video overlays, are already being used to defeat biometric checks — and the cost of doing so keeps falling.

"With a $20-a-month LLM subscription, bypassing biometrics will become increasingly easy," Derrick noted. "That's something to watch out for."

The deeper shift isn't just that individual verification methods are becoming weaker. It's that the entire paradigm of identity-based trust is giving way to something new: provenance-based trust.

The question is no longer just "who are you?" It's "where did this data come from, and how did it get here?"

In an agentic world, where an AI agent may be submitting a review on behalf of a human user, the chain of custody matters as much as the identity at the end of it. As Derrick put it: "Really being able to assess the context and the data origination in that interaction is what you have to get good at."

The tools market is responding in real time to this shift. A new generation of platforms has emerged that treats provenance as a first-class problem — not an afterthought. Hive deploys proprietary AI models for content classification at massive scale. Checkstep combines automated detection with governance workflows built for a world where regulators want to know not just what was flagged, but why and how. ActiveFence takes a proactive posture, tracking harmful networks and off-platform threat signals before they ever surface on your system. These aren't moderation tools with AI layered on top — they're trust infrastructure built for the agentic era from day one.

The market is taking notice. The content moderation space is attracting serious capital, strategic acquisitions are accelerating, and buyers are increasingly looking for tools that can handle the full provenance chain — not just catch bad content after the fact.

The Future of Content Moderation

Market growth, regulatory drivers, and technological shifts shaping the industry.

14.42%CAGR (2026–2031)

$26.09BMarket by 2031

The content moderation market is expected to grow from USD 11.63 billion in 2025 to USD 26.09 billion by 2031. This expansion reflects the steep rise in user-generated content, more demanding regulatory frameworks, and advertisers' insistence on brand-safe environments.

Global Content Moderation Market Forecast (USD Billion)

Projected market size highlighting steady growth up to 2031.

11.63

2025

13.31

2026

26.09

2031

Key Drivers & Technological Shifts

🛡️

Regulatory Mandates

The EU Digital Services Act and UK Online Safety Act force platforms to shift from reactive takedowns to continuous risk-assessment regimes.

⚡

Real-Time AI Scaling

Short-form video, live-stream, and voice chat add billions of assets daily, intensifying the need for AI that scales without sacrificing context sensitivity.

🤝

Vendor Consolidation

Consolidation creates platform-agnostic suites unifying moderation. Brands seek partners to manage rising costs and psychological risks to human moderators.

Source: mordorintelligence.com

The New Playbook: Fighting Intent, Not Scripts

The good news is that the underlying defensive principles haven't changed. What's changed is the speed, scale, and adaptability of what you're defending against.

Derrick's framework from Uber still holds: a layered strategy of upstream friction (account creation), inline signals (real-time behavioral data), and offline analysis (post-transaction, with the full signal set). The key insight is that the richest analysis happens after the transaction, when you have the most data and the least time pressure.

"Your trust systems will only be as strong as the raw data you're collecting," Derrick said. "You should be collecting as much raw signal as you feel is necessary — without creeping into data privacy concerns — in order to assess risk and safety on your platform."

But there's a new layer that didn't exist a decade ago: agent-to-agent verification. As AI agents begin acting on behalf of users such as submitting reviews, researching vendors, interacting with platforms through MCP servers — the trust infrastructure has to operate at that same layer.

Alex David, G2's GM of AI Solutions, puts it in sharp relief. And he offers something that reframes not just how we build trust infrastructure, but how we think about trust itself:

If an AI is evaluating information, it's much more cut and dry. It's looking for key components where it has an algorithmic weighting setup to understand how much emphasis to put on different pieces, and then it evaluates trust based on that. It's also oftentimes thinking about this with a sense of minimizing regrettability: how big is the impact if it gets this wrong? So it's going to optimize differently based on those weights and that regrettability impact. It's going to need much more concrete evidence to extend trust. And that means we need to look at, or really re-evaluate, the core ways of making an argument or providing information. Think ethos, pathos, logos. How do we communicate those in a way that an AI will actually understand? What is the source of the information? Is there a logical connection that makes sense? Is there a point of validation data that supports the argument being made?

Alex David, GM of AI Solutions, G2

That leads directly to his sharper conclusion — the one that lands the G2 implication:

Human trust is everything through gut reactions and personal bias, and something can be completely true but still feel wrong to us. AI agents don't have that luxury or that flaw. They evaluate trust through concrete signals like source credibility, logical consistency, and supporting data. For G2, that means our trust infrastructure has to communicate trustworthiness in ways that hold up to algorithmic scrutiny, not just human intuition. Because increasingly, the one researching a vendor or reading a review is going to be an agent acting on someone's behalf.

Alex David, GM of AI Solutions, G2

That reframes the entire challenge. It's not just about catching bad actors anymore — it's about building a trust system legible to both humans and the agents increasingly acting on their behalf. In practice, that means investing in:

AI-Powered T&S at Scale

You cannot review every interaction manually. Agents trained on your platform's workflows and risk patterns need to handle the first pass, escalating only the genuinely ambiguous cases to humans. "If you don't have that today, you better get on it," Derrick said.

Red Teaming as a Core Workflow

Not a one-off exercise. Derrick's team in China was constantly buying fraud software on dark web marketplaces, decompiling it, and pressure-testing their own systems against it. The best bad actors are invisible until you go looking for them. "The most sophisticated spoofers — you cannot tell," he warned. "And that's scary."

Behavioral Turing Tests

When an agent interacts with your platform, can you design interactions that surface behavioral signatures a script or agent can't convincingly fake? This is the frontier.

Verification Agents as Product

One of Derrick's most interesting suggestions: building your own specialized AI agent to independently verify vendor software claims — cross-checking reviews against actual product behavior, flagging inconsistencies, and providing an objective second layer of trust. That's not just a defensive tool. That could be a product.

How G2 Is Thinking About It

G2 is not a passive observer in this shift. The recent launch of G2's MCP server — enabling AI agents to interact with G2's data and category intelligence — is a deliberate move toward building infrastructure for an agentic world. And the foundation for that infrastructure already exists. G2 has long employed rigorous moderation practices to ensure reviews reflect honest, real-world experiences — from how reviews are collected, to how they are moderated and published. As agents begin acting on behalf of buyers and vendors alike, that foundation becomes more important than ever.

But with that openness comes responsibility. As agents begin researching software on behalf of buyers, and as vendors increasingly use AI to manage their G2 presence, the integrity of the underlying data becomes more important than ever.

We sat down with members of G2's Trust & Safety team to get their perspective.

Jimmy Steinauer, T&S Lead Analyst noted:

There's a principle that's always been true in Trust & Safety: the more anonymity you allow on your platform, the more safety incidents you'll have. Agentic AI doesn't change that rule — it just accelerates it. Automated systems can exploit anonymity at a scale and speed no human fraud ring can match. The teams that stay ahead are the ones building trust layers that assume automation on both ends of the table, and that treat user verification not as a friction point, but as a foundational requirement.

Jimmy Steinauer, T&S Lead Analyst, G2

The stakes are clear: the value of G2 as a platform rests entirely on the trustworthiness of the data within it. A G2 ranking that can be gamed is a G2 ranking that means nothing — to buyers, to vendors, and to the market G2 is built to serve.

Stay Vigilant

Derrick closed the session with a line that stuck:

"Nothing's really new, in a way. In the old days, it was just scripts. But now we're fighting with intent. Agents are thinking and optimized toward a goal — they're very adaptable, whereas a script follows a set of rules."

That's the shift. The attack surface isn't just bigger — it's smarter. And the only way to stay ahead of it is to build T&S infrastructure that's smarter too.

For platforms like G2, and for every SaaS product whose reputation lives or dies by what users say about them — that work starts now.

Interested in how software categories are evolving? Explore G2's Content Moderation Tools to see how the market is responding in real time.

Explore Content Moderation Tools →