Essay

Inside the Tools We Hand the Keys To

We give coding agents and on-device LLMs the run of our machines — so I took four of them apart to find out where the agent loop actually runs and who actually holds the keys.

2026-05-20

There is a quiet shift happening on every developer's laptop. A year ago the AI on your machine answered questions. Now it edits files, runs bash, deletes resources, drives the cursor. We hand it the keys — and we mostly trust the marketing page's word about what it does with them.

That trust is a security claim. And a security claim is a thing you can test. So I took four of these systems apart — two agentic coding tools, the on-device LLM stack inside iOS, and the kernel underneath all of it — with one question each time: where is the trust boundary, and who is standing on which side of it?

Amp: your laptop is the sandbox, not the brain

I started with Amp, Sourcegraph's coding-agent CLI. It ships as a 70 MB Bun single-file executable[^bun], so the first job was carving the embedded __BUN,__bun segment back out — 9 MB of minified JS recovered with dd — and reading the bundle.

The finding reframes the whole product. Every other agent CLI I'd looked at — Claude Code, Codex, Droid, Junie — runs the agent loop locally: the infer → tool_use → tool_result cycle happens on your machine. Amp doesn't. Amp's transport is a persistent WebSocket to ampcode.com, and the server runs the loop. The local CLI is a capability proxy: bash, filesystem, edit, LSP and MCP all execute on your machine, but they are invoked by a remote loop. The client-to-server messages tell the story plainly — client_resume, client_retry, client_cancel, client_append_manual_bash_invocation. The laptop is not the brain. It's the hands.

That is the single most security-relevant fact about Amp, and it inverts the usual threat model. The Amp WebSocket is, functionally, a privileged remote-shell channel. A server compromise — or a malicious thread — equals local code execution. Thread state lives server-side too (https://ampcode.com/threads/T-<uuid>, rejoinable from any device), and the bundle contains an environment flag named, with deliberate ugliness, AMP_RESUME_OTHER_USER_THREADS_INSECURE. A flag named that way is an authz boundary asking to be probed.

There is a genuine upside, and it's worth being fair about: you never supply a provider key. Every model call proxies through ampcode.com — /api/provider/anthropic, /openai/v1, /google, /xai/v1, and more — so Anthropic and OpenAI credentials never touch your client. Amp fronts the billing. The keys you do hold (the sgamp_us... CLI token) sit in the macOS Keychain. The trust boundary is just drawn in an unfamiliar place: not "your machine vs. the model" but "your machine vs. Amp's server, which holds the model and the loop."

Warp: the terminal that drives other agents

Warp, the Rust-native AI terminal, is open enough to read from source — a 698 MB clone, ~94k lines of Rust across 65 crates. Its trust boundary sits in a different place again.

Warp models its agent as one large enum, AIAgentActionType, and reading the variants is reading the capability surface directly: RequestCommandOutput, ReadFiles, RequestFileEdits, CallMCPTool, UseComputer, RunAgents. That UseComputer is not a metaphor — the computer_use crate exposes keyboard, mouse and screenshot actors with real per-OS implementations. A Warp agent can drive the host UI.

Two things make this a broad surface. First, Warp can spawn other agent CLIs as subprocesses — there are first-class harnesses for claude_code, codex and gemini. An agent that orchestrates other agents multiplies whatever each one can do. Second, the signature "blocks" feature — every command and its output as a discrete addressable unit — is implemented as an in-band protocol over custom ANSI escape sequences[^osc], with block IDs minted by the shell's precmd hook. Anything that can write to the PTY can emit those sequences. That is a classic terminal-escape-injection surface, sitting underneath the agent.

Where Warp draws its line carefully is the cloud. Remote agents land in a detected sandbox — Docker, a Warp-managed Docker sandbox, Kubernetes or a namespace — and call back using a workload token. That token is the trust anchor for remote execution. As with Amp, all inference — even bring-your-own-key — funnels through a server gateway (warp_multi_agent_api); no LLM client code exists in the repo at all.

Apple Intelligence: the agent loop done with a seatbelt

The two coding tools answer "where does the loop run" with server. Apple Intelligence, which I pulled apart from ~80 real arm64 frameworks in the iOS 26.5 simulator runtime[^sim], answers it with on-device — and is the most interesting of the four because it shows what a contained agent loop can actually look like.

The on-device LLM is roughly a 3B model (server-side, Apple's codename "Ajax", it's afm-text-30b — ten times larger). But the security-relevant part is the planner, IntelligenceFlow. It does not let the model emit free text and hope. It builds a BNF grammar dynamically from the available tools — the App Intents on your device — and then forces the LLM to generate only syntactically valid plans against that grammar, using a boolean mask over the vocabulary at every token (generateNextTokenIDMask). The model literally cannot emit a malformed tool call.

The plan it produces is an AST in a small DSL, executed by an actual Interpreter. Each action carries ActionRequirement preconditions — Authentication, CarPlay, CarBluetooth — checked before the action runs, and a RiskResolver assigns per-tool risk with an ActionConfirmation gate for anything that needs a human yes.

Hold that against the lesson from [Post 1 of this series][^post1], where a 350M model called unauthorized tools in over half of all tests because its system prompt was, functionally, a suggestion. Apple's stack is the opposite design: the authorization does not live in the weights and it does not live in a prompt. It lives in a grammar, a set of precondition evaluators, and an interpreter — outside the model. That is the safety boundary in the right place.

The privacy story rhymes. PrivateFederatedLearning.framework does real on-device training — it ships EspressoIRTrainer, so gradients compute on the Neural Engine — but no raw data leaves the device. Only differentially-private aggregated gradients do, gated by a system called Dedisco whose error type has 16 distinct privacy-violation variants. The whole "private" claim rests on Dedisco's correctness being a single, auditable layer.

mac-internals: the floor everything stands on

Every claim above — Amp's bash proxy, Warp's computer-use, Apple's interpreter — assumes the operating system underneath it actually contains a process. So the last teardown was the kernel itself: macOS 26 / XNU on an M4 Pro.

What I found is the mitigation inventory that any escape has to defeat. kalloc_type is fully decoded — 1,479 typed-allocator call sites, so a type-confusion bug is constrained to objects of identical memory layout. zone_require adds a runtime "this object must live in exactly this zone" check. PAC is present in its strongest form (FEAT_FPAC — fault-on-auth-fail, not just a corrupted pointer). And SPTM is observable as a real __DATA_SPTM segment — even DMA from the IOMMU is mediated by SPTM's frame-retyping authority.

But the honest finding is a gap. Every FEAT_MTE bit on the M4 Pro returns 0. There is no memory-tagging hardware. Memory Integrity Enforcement — the M5/A19 spatial-safety feature — is a die-level capability that simply does not exist on current silicon. The floor under all those agents is PAC plus type isolation. It is good. It is not memory tagging. That distinction is the real containment story, and no agent's marketing page mentions it.

Where the boundary actually is

Four systems, one question, four different answers — and that is the answer. Amp's trust boundary is the WebSocket to its server. Warp's is the workload token on a cloud sandbox. Apple Intelligence's is a grammar and a precondition evaluator sitting outside the model. The kernel's is PAC and kalloc_type — with no MTE behind them yet.

Every agentic tool has a trust boundary. None of them advertise where it is. Finding it — naming the exact channel, token, or check that everything else depends on — is the entire security question. The model is a capability; the boundary is somebody's responsibility, and you should know whose.

There is a thread running through all of this that I keep pulling on. Each of these teardowns followed the same loop: carve the binary, map the agent's action surface, find the channel that holds the keys. That loop is mechanical enough to automate. The natural next step in this series is to stop doing reverse-engineering by hand and turn it into an agent of its own — one that takes a tool apart while I watch. That's where I'm headed next.

[^bun]: Bun compiles a JS project into a single native executable by appending the bundle as a Mach-O segment. Recovering it is a matter of locating __BUN,__bun (here at offset 60,243,968, 9,002,608 bytes) and carving it back out — the embedded bundle is an 8.6 MB minified VFS. [^osc]: Warp's zsh bootstrap defines them directly — DCS_START='\eP$', OSC_START='\e]9278;', OSC_RESET_GRID='\e]9279\a'. The grid/ANSI parser itself is a fork of Alacritty. [^sim]: Apple's first-party apps and frameworks ship unencrypted (cryptid 0) inside the iOS Simulator runtime that Xcode installs — no IPSW, no device, no jailbreak. It is a significant lowering of the bar for iOS reverse-engineering. [^post1]: "1,299 Security Tests Against a 350M Language Model" — the first post in this series. The tool-use authorization finding there was 55.2% vulnerable, with confused-deputy and hidden-call attacks at 100%.