Essay

I Watched AI Chatbots Watch Me

I built a local monitor for my own browser traffic and used ChatGPT, Claude, Gemini, and Grok normally for 10 days. What fell out was a more concrete picture of how much of the experience is conversation, and how much is telemetry.

2026-04-12

I wanted to answer a simple question:

When you use AI chatbots in the browser, how much of the network traffic is actually the conversation?

So I built a local monitor for my own machine. It hooks into Chrome DevTools, captures AI-related traffic into a local SQLite database, and lets me inspect the requests later instead of trying to manually scroll through the Network tab forever.

Then I used the products normally for 10 days.

What I got was not a scandal. It was something more useful: a concrete picture of how much browser-based AI is wrapped in telemetry, analytics, suggestions, ads, and background infrastructure.

The setup

This was intentionally simple.

one local capture daemon
my own browser traffic only
normal day-to-day usage
no decompilation, no privileged access, no remote interception

At the end of the run I had almost 20,000 captured requests across ChatGPT, Claude, Gemini, and Grok.

That is the first thing that changes your intuition. These products are not a tidy "send prompt, get answer" loop. They are full browser apps with lots of supporting traffic around the main interaction.

The top-line numbers

Over 10 days of my own usage:

Platform	Total requests	Sessions	Notes
ChatGPT	13,304	48	biggest footprint in this capture
Grok	4,741	21	heavy telemetry and suggestion traffic
Claude	1,264	10	smallest request volume of the group
Gemini	242	few	least-used in this dataset

That works out to roughly:

277 requests per ChatGPT session
226 requests per Grok session
126 requests per Claude session

Those numbers are usage-dependent, so they are not universal. But they are enough to make the point: the visible conversation is only one layer of what is happening.

ChatGPT: the busiest surface

ChatGPT generated by far the most traffic in this capture, which partly reflects that I used it the most. But the shape of the traffic was interesting too.

In the sampled period:

about 29% of ChatGPT traffic was telemetry or tracking rather than core conversation traffic
I captured 1,634 analytics events
I saw repeated A/B-test and experiment traffic
I saw autocomplete-related requests before message submission

The event stream was especially revealing because it makes attention patterns visible. Things like link clicks, focus changes, pastes, and UI interactions show up as data instead of just interface behavior.

That is not shocking in the abstract. Most large web products do analytics. What changed for me was seeing the volume and granularity in one place.

Grok: suggestion traffic with more context than I expected

Grok was the most surprising system in the capture.

Two things stood out:

It sent a lot of suggestion-related traffic while typing.
Some of those requests carried more conversation context than I expected that early in the interaction.

In this run I captured:

271 keystroke-related transmissions
40 unique leaked conversation titles in the local database

That second point is the one that sticks. Before the user experience feels like "I sent a message," the product may already be sending surrounding context to support suggestions and continuity.

Again, this is still your own browser making requests you can inspect in DevTools. The difference is that almost nobody actually watches it long enough to build a mental model.

Claude: smaller footprint, still real telemetry

Claude had the lightest request volume in my capture, but that does not mean it was telemetry-free.

What I saw:

1,264 requests over 10 sessions
roughly 36% telemetry in this capture
analytics events tied to product behavior and feature usage
no keystroke capture in the way I saw on other platforms

The absolute volume was much lower than ChatGPT, which matters in practice. But it was still clearly a modern product telemetry surface, not a bare prompt-response channel.

Gemini: the browser product sits on top of Google's larger stack

Gemini had the fewest sessions in my dataset because I used it the least, so I want to be careful not to overstate the comparison.

Still, even limited captures made one thing obvious: a large share of the traffic sits on top of Google's broader web infrastructure rather than a clean, isolated chat surface.

That makes sense if you know the company. It just looks different when you see the requests directly instead of thinking about "Gemini" as a self-contained app.

What actually surprised me

The biggest surprise was not that these products do telemetry. Of course they do.

The surprises were:

how many requests it takes to support what feels like one simple interaction
how early suggestion and assistance features begin sending data
how much product experimentation and feature-flagging traffic surrounds the main flow
how different the implementation styles are across providers even when the UX looks similar

This is why I think local capture is useful. Privacy policies talk in categories. Network traces show behavior.

The important caveat

This is not a universal ranking of "which chatbot is best for privacy."

It is a measured snapshot of:

my accounts
my usage
my browser
my capture window
the specific product behavior during that period

Some platforms may look heavier simply because I used them more. Some telemetry can be product-quality instrumentation rather than something more sinister. Some features only activate for certain users or experiments.

So the right way to read this is not as a final scoreboard. It is as an existence proof that these products are much easier to inspect than people assume, and that the surrounding telemetry layer is substantial.

Why I built the tool

The browser already exposes all of this. The hard part is not access. The hard part is making the data survivable.

After a few sessions, the Network tab is noise. A local monitor that stores the traffic, classifies it, and lets you ask questions later is much more useful.

Questions like:

how much of this traffic is actually telemetry
are keystrokes sent before submit
what identifiers does this platform assign
what protocol does it use to stream responses
what changed between two capture sessions

Those are practical questions, not abstract privacy debates.

What I took away

Browser-based AI is not just "AI." It is AI plus product analytics, growth infrastructure, suggestions, caching, experiments, and whatever else the company has layered around the chat box.

That does not automatically make it bad.

But it does mean the default mental model most people have is too simple.

If you care about privacy, product design, protocol behavior, or just how these systems are really built, watching the traffic is worth it. Not because it reveals hidden magic, but because it removes the abstraction.

Once you do that, the products feel less like mysterious chat interfaces and more like what they really are: large web applications with an LLM in the middle.